ESP8266: The JSON Streaming Parser

ESP32 Development Board WiFi+Bluetooth Ultra-Low Power Consumption Dual Cores ESP-32 ESP-32S Board
Integrated antenna and RF balun, power amplifier, low-noise amplifiers, filters, and power management module. The entire solution takes up the least amount of printed circuit board area. This board is used with 2.4 GHz dual-mode Wi-Fi and Bluetooth chips by TSMC 40nm low power technology, power and RF properties best, which is safe, reliable, and scalable to a variety of applications.

You might not know it but the most important puzzle piece for all my recent ESP8266 projects is a thing called a streaming parser. Keep reading if you want to know what that is, how it works and why it is so important for my ESP8266 projects.

What is a streaming parser?

What is a parser anyway? You are most certainly using parsers every day. A parser is a piece of code that analyses an input (text, document) by reading in its content based. To do that the parser has knowledge about the structure of the text, sometimes called a syntax. The syntax is like the grammar of your natural language. The web browser you are using to read this text uses an HTML parser to understand the tags that are downloaded from my webserver and then put into a visualization with formatted text, pictures and links.

So now that we roughly understand what a parser is the next question would be what is a streaming parser? With a modern smart phone or desktop computer we often don’t need streaming parsers anymore, we use document object model parsers (DOM) parsers instead. A DOM parser creates a tree like structure of the document it parses, keeps this structure in memory and makes it available for the code that does something meaningful with it. DOM parsers are very easy to use, fast and convenient. But this convenience comes at the price of memory requirements. The DOM parser  needs a lot of memory, since it keeps the whole document in memory until it is no longer used. If you have a lot of RAM and your documents are not that big this is perfectly fine. But if the documents are big compared to the available (heap) memory you might run into a serious problem.

Imagine the parser to be something like a water meter and we are comparing now two different types of meters. A water meter which works like a DOM parser needs a bucket and measures the amount of water by filling the bucket and then measuring the weight of the water in the bucket. If there is a lot of water then the bucket must be big. A water meter which works like a streaming parser measures the water while it flows through and doesn’t care what happens to the water afterwards. The bucket in this analogy is the heap or working memory of your micro controller, the water is the stream of bits and bytes that you receive, either from the file system or from a remote server. And the parser does not just measure the amount of bits and bytes but also tries to understand the content. The streaming parser doesn’t care how big the document (or the amount of water) is, it just takes out what it needs from the stream. Streaming parsers are also referred to as event based parsers since they react to certain events in the data stream.  DOM parser are referred to as a tree based parser since they build a full representation of the document in the tree-like structure. In an HTML tree the <html> element would be the root of the tree, the <body> tag a fork in that tree.

The grammar

The following image describes the grammar of a valid JSON object in a very concise way. It means that a JSON object knows three basic types:

  • object
  • array and
  • value

 

JSON grammar (click to zoom)

Objects always start and end with curly brackets. They can be either empty (line to the top) or contain string/value pairs, separated by a colon. These pairs can be repeated by adding a comma between them.
Arrays start and end with square brackets. They can be either empty or contain a value. At this point we don’t know yet what a value is. Values in an array can be repeated and must be separated by a comma.
Values were already used for the two previous definitions and here lies the power of this kind of grammar. Because a value can contain a simple text, a number, an object (yes, the object we defined before!!!) an array (also defined before), booleans and a null value.

This is so powerful because we are reusing the definitions and we are nesting them within each other: an object can contain a value, a value can contain an array or an object. And finally an array can contain a value, repeatedly! Isn’t this beautiful?

The JSON Streaming Parser Library

Why would we want to use a streaming parser on the ESP8266? Embedded devices  usually have very limited resources available. One scarce resource is the heap memory. Many of the REST APIs I am using in my projects provide big response objects, but we are usually just interested in a small fraction of it. As mentioned earlier, a tree based parser would load the whole document into memory and make it available once the document stream has ended. And this would just crash the ESP8266 pretty quickly, it does not have the resources to keep 200kb on the heap.

This made me port a PHP JSON parser over to C++ and make it available as a library, mostly to be used in my own projects. Let’s have a look at the header file of the JsonListener:

The methods here are callback methods which will get invoked if the respective event happens while parsing the document. Let’s start with an example. For the JSON object {“name”: “Eichhorn”} we get the following invocations:

  1. startDocument(): we start receiving a json document
  2. startObject(): the json object starts with “{“
  3. key(“name”): the parser detected key object which contains “name”
  4. value(“Eichhorn”): the parser detected a value containing “Eichhorn”
  5. endObject(): the object ends with “}”
  6. endDocument(): the stream of data ends and so does the document

I often just implement (AKA “write code”) for key and the value method. In the key method I store the value of the key parameter. Then later in the value method I check what the last key was I had seen and then I store the value in the appropriate variable. For the example from before I would do

In the stream of the object {“name”: “Eichhorn”} we will first get a call to the method key with the value “name” which we store in currentKey_. Next the parser will detect a value and call our value method with the value “Eichhorn”. The parser can now make the connection (or create a context) that after the key name the value Eichhorn should be stored in the member variable name_.

If this example was to simple then have a look here: https://github.com/squix78/esp8266-weather-station/blob/master/WundergroundClient.cpp This is the code which parses the responses from Wunderground for my WeatherStation.

Conclusion

For a document or object of the size we had in the example a streaming parser is usually an extreme overkill. It is complicated to use, requires you to write a lot of code and is memory wise probably even worse than a tree parser. It is only recommended to implement a streaming parser if you have big objects or if you just don’t know how big your object might be. In those cases a streaming parser will be a good friend, since it only requires memory for the objects you actually want to use from the whole big document. You can find my library here: https://github.com/squix78/json-streaming-parser

 

Posted by squix78

13 comments

  1. Cool stuff.
    I took a very quick look at the GitHub repository. Two things caught my eye:

    1) C++! I didn’t know you could insert C++ into the nodemcu-firmware compilation. Are there other pieces that are written in C++? Was this a difficult thing to set up, toolchain-wise?

    2) Lots of commented out throws. My guess is that the limited C++ environment in nodemcu-firmware and others don’t have, among other feature, exception handling baked in. That’s not surprising. I will say the it makes me nervous to see it. Any malformed JSON or other problem means you get undocumented behavior. Can you elaborate on this? There are, after all, other ways of error handling in C++.

    Thanks!

  2. Thanks for your great work on that! For today, I struggling with getting a specific value from EasyIoT server (e.g. http://www.laart.eu/esp8266/2). Tried with bblanchon solution, however I was able to get value which I am intrested in only if there is small amount of data. I assume that I exceed ESP memory limit. Hope that I will able to use your work.
    Thanks for sharing!

  3. Hello! Thanks a lot for the library! I successfully parse json data with it (and ArduinoJson can’t handle such data). But I don’t know hot do I set big json variables (for example, 50Kb or more). ESP8266 seems to crash every time I try to parse for example Openweathermap forecast data (current weather json is ok).

  4. I’m trying to parse weather data from weatherunlocked.com which uses an array of days and within each day there is an array of timeframes. I’m having trouble figuring out how to use the JSONListener class to access things like Day[2]Timeframe[3]. Does anyone have any example code for how to process arrays?

    • Hi

      The parser doesn’t care, how it receives data. This is in the responsibility of the implementer. You’ll have to open a stream with the WifiClientSecure or similar and feed the stream of chars into the parser. I have done this and it works without problems.

      Daniel

      • Yes, I can see that makes sense. I’ll look into making a secure connection with WifiClientSecure/ Thank you

      • WiFiClientSecure client seems to [still] have a memory leak bug so it’ll crash any program that requires multiple https calls .. so, I’m looking for an alternative https client I can use on an esp8266 … have you used any other libraries that you can recommend?

        • Hi
          Don’t mix up memory leak with a lot of memory consumption. I believe that the TLS stack just consumes a lot of memory (~20kb) and when you are opening more than one connection you are allocating n times this amount which might be too much for your sketch. I solved this problem in my sketches by downloading one resource first and storing it on the flash memory. Then I parse this JSON and download the referenced resources in the initial JSON file. This way I only always have one SSL connection open, which works fine. Also the allocated memory is freed after the connection closes, which clearly shows that there is no memory leak…
          Daniel

          • That’s a good point. I was really surprised the amount of required overhead in a secure connection. In my app I only have one SSL connection at a time, and each SSL call is separated by at least 5-20 minutes. While debugging, I print the amount of heap available quite often, and although the values vary a lot, there was an overall download trend after each secure call. Watching the heap, you can see less and less available. When making non-SLL calls I don’t see the downward trend in the amount of available heap.

            Googling tells me that this problem is known, and although it looks like it might have been addressed, I see the effects being described in the problems reports. And I have updated my libraries in the hope to benefit from any fixes.

  5. With embarrassment I need to apologize –the problem was my code and not the libraries. I was not handling the json data correctly resulting in using more ram every time I made a call (it still hurts to think of it). I appreciate your taking the time to respond, even though the problem was my poor programming skills.

  6. Hello there, I am using OpenWeatherMapCurrentDemo example code. The tempMax, WindSpeed and WindDeg do not seem to be work. It spits out a large number everytime…so you also see the problem? Please see the response below over serial output. Thank you.

    [HTTP] GET…
    [HTTP] GET… code: 200
    start document
    ————————————
    lon: -79.389999
    lat: 43.650002
    weatherId: 721
    main: Haze
    description: haze
    icon: 50n
    iconMeteoCon: M
    temp: 24.139999
    pressure: 1020
    humidity: 78
    tempMin: 22.000000
    tempMax: -159465800836912022475758179315756302336.000000
    windSpeed: -159465800836912022475758179315756302336.000000
    windDeg: -159465800836912022475758179315756302336.000000
    clouds: 254
    observationTime: -17829890, full date: Sun Jun 8 15:15:10 1969
    country:
    sunrise: -17829890, full date: Sun Jun 8 15:15:10 1969
    sunset: -17829890, full date: Sun Jun 8 15:15:10 1969
    cityName:

    —————————————————/

Leave a Reply