ESP8266: The JSON Streaming Parser

Daniel Eichhorn on 27. January 2017

You might not know it but the most important puzzle piece for all my recent ESP8266 projects is a thing called a streaming parser. Keep reading if you want to know what that is, how it works and why it is so important for my ESP8266 projects.

What is a streaming parser?

What is a parser anyway? You are most certainly using parsers every day. A parser is a piece of code that analyses an input (text, document) by reading in its content based. To do that the parser has knowledge about the structure of the text, sometimes called a syntax. The syntax is like the grammar of your natural language. The web browser you are using to read this text uses an HTML parser to understand the tags that are downloaded from my webserver and then put into a visualization with formatted text, pictures and links.

So now that we roughly understand what a parser is the next question would be what is a streaming parser? With a modern smart phone or desktop computer we often don’t need streaming parsers anymore, we use document object model parsers (DOM) parsers instead. A DOM parser creates a tree like structure of the document it parses, keeps this structure in memory and makes it available for the code that does something meaningful with it. DOM parsers are very easy to use, fast and convenient. But this convenience comes at the price of memory requirements. The DOM parser needs a lot of memory, since it keeps the whole document in memory until it is no longer used. If you have a lot of RAM and your documents are not that big this is perfectly fine. But if the documents are big compared to the available (heap) memory you might run into a serious problem.

Imagine the parser to be something like a water meter and we are comparing now two different types of meters. A water meter which works like a DOM parser needs a bucket and measures the amount of water by filling the bucket and then measuring the weight of the water in the bucket. If there is a lot of water then the bucket must be big. A water meter which works like a streaming parser measures the water while it flows through and doesn’t care what happens to the water afterwards. The bucket in this analogy is the heap or working memory of your micro controller, the water is the stream of bits and bytes that you receive, either from the file system or from a remote server. And the parser does not just measure the amount of bits and bytes but also tries to understand the content. The streaming parser doesn’t care how big the document (or the amount of water) is, it just takes out what it needs from the stream. Streaming parsers are also referred to as event based parsers since they react to certain events in the data stream. DOM parser are referred to as a tree based parser since they build a full representation of the document in the tree-like structure. In an HTML tree the <html> element would be the root of the tree, the <body> tag a fork in that tree.

The grammar

The following image describes the grammar of a valid JSON object in a very concise way. It means that a JSON object knows three basic types:

object
array and
value

Objects always start and end with curly brackets. They can be either empty (line to the top) or contain string/value pairs, separated by a colon. These pairs can be repeated by adding a comma between them.
Arrays start and end with square brackets. They can be either empty or contain a value. At this point we don’t know yet what a value is. Values in an array can be repeated and must be separated by a comma.
Values were already used for the two previous definitions and here lies the power of this kind of grammar. Because a value can contain a simple text, a number, an object (yes, the object we defined before!!!) an array (also defined before), booleans and a null value.

This is so powerful because we are reusing the definitions and we are nesting them within each other: an object can contain a value, a value can contain an array or an object. And finally an array can contain a value, repeatedly! Isn’t this beautiful?

The JSON Streaming Parser Library

Why would we want to use a streaming parser on the ESP8266? Embedded devices usually have very limited resources available. One scarce resource is the heap memory. Many of the REST APIs I am using in my projects provide big response objects, but we are usually just interested in a small fraction of it. As mentioned earlier, a tree based parser would load the whole document into memory and make it available once the document stream has ended. And this would just crash the ESP8266 pretty quickly, it does not have the resources to keep 200kb on the heap.

This made me port a PHP JSON parser over to C++ and make it available as a library, mostly to be used in my own projects. Let’s have a look at the header file of the JsonListener:

The methods here are callback methods which will get invoked if the respective event happens while parsing the document. Let’s start with an example. For the JSON object {“name”: “Eichhorn”} we get the following invocations:

startDocument(): we start receiving a json document
startObject(): the json object starts with “{“
key(“name”): the parser detected key object which contains “name”
value(“Eichhorn”): the parser detected a value containing “Eichhorn”
endObject(): the object ends with “}”
endDocument(): the stream of data ends and so does the document

I often just implement (AKA “write code”) for key and the value method. In the key method I store the value of the key parameter. Then later in the value method I check what the last key was I had seen and then I store the value in the appropriate variable. For the example from before I would do

In the stream of the object {“name”: “Eichhorn”} we will first get a call to the method key with the value “name” which we store in currentKey_. Next the parser will detect a value and call our value method with the value “Eichhorn”. The parser can now make the connection (or create a context) that after the key name the value Eichhorn should be stored in the member variable name_.

If this example was to simple then have a look here: https://github.com/squix78/esp8266-weather-station/blob/master/WundergroundClient.cpp This is the code which parses the responses from Wunderground for my WeatherStation.

Conclusion

For a document or object of the size we had in the example a streaming parser is usually an extreme overkill. It is complicated to use, requires you to write a lot of code and is memory wise probably even worse than a tree parser. It is only recommended to implement a streaming parser if you have big objects or if you just don’t know how big your object might be. In those cases a streaming parser will be a good friend, since it only requires memory for the objects you actually want to use from the whole big document. You can find my library here: https://github.com/squix78/json-streaming-parser

Posted by Daniel Eichhorn

Daniel Eichhorn is a software engineer and an enthusiastic maker. He loves working on projects related to the Internet of Things, electronics, and embedded software. He owns two 3D printers: a Creality Ender 3 V2 and an Elegoo Mars 3. In 2018, he co-founded ThingPulse along with Marcel Stör. Together, they develop IoT hardware and distribute it to various locations around the world.

All Posts Website

14 comments

kirsch says:
28. January 2017 at 4:30

Cool stuff.
I took a very quick look at the GitHub repository. Two things caught my eye:

1) C++! I didn’t know you could insert C++ into the nodemcu-firmware compilation. Are there other pieces that are written in C++? Was this a difficult thing to set up, toolchain-wise?

2) Lots of commented out throws. My guess is that the limited C++ environment in nodemcu-firmware and others don’t have, among other feature, exception handling baked in. That’s not surprising. I will say the it makes me nervous to see it. Any malformed JSON or other problem means you get undocumented behavior. Can you elaborate on this? There are, after all, other ways of error handling in C++.

Thanks!
Reply
Artur says:
28. January 2017 at 14:15

Thanks for your great work on that! For today, I struggling with getting a specific value from EasyIoT server (e.g. http://www.laart.eu/esp8266/2). Tried with bblanchon solution, however I was able to get value which I am intrested in only if there is small amount of data. I assume that I exceed ESP memory limit. Hope that I will able to use your work.
Thanks for sharing!
Reply
MaxPower3000 says:
1. November 2017 at 16:05

Hello! Thanks a lot for the library! I successfully parse json data with it (and ArduinoJson can’t handle such data). But I don’t know hot do I set big json variables (for example, 50Kb or more). ESP8266 seems to crash every time I try to parse for example Openweathermap forecast data (current weather json is ok).
Reply
Craig Lindley says:
18. February 2018 at 0:12

I’m trying to parse weather data from weatherunlocked.com which uses an array of days and within each day there is an array of timeframes. I’m having trouble figuring out how to use the JSONListener class to access things like Day[2]Timeframe[3]. Does anyone have any example code for how to process arrays?
Reply
alang says:
22. June 2018 at 9:01

I have a data source that requires a connection over https.Does the parser handle https requests?
Reply
- squix78 says:
  22. June 2018 at 9:15
  
  Hi
  
  The parser doesn’t care, how it receives data. This is in the responsibility of the implementer. You’ll have to open a stream with the WifiClientSecure or similar and feed the stream of chars into the parser. I have done this and it works without problems.
  
  Daniel
  Reply
  - alang says:
    22. June 2018 at 9:25
    
    Yes, I can see that makes sense. I’ll look into making a secure connection with WifiClientSecure/ Thank you
    Reply
    - alang says:
      22. June 2018 at 19:33
      
      secure https achieved using WifiClientSecure. Thank you for the tip
      Reply
  - alang says:
    27. June 2018 at 2:52
    
    WiFiClientSecure client seems to [still] have a memory leak bug so it’ll crash any program that requires multiple https calls .. so, I’m looking for an alternative https client I can use on an esp8266 … have you used any other libraries that you can recommend?
    Reply
    - squix78 says:
      27. June 2018 at 8:42
      
      Hi
      Don’t mix up memory leak with a lot of memory consumption. I believe that the TLS stack just consumes a lot of memory (~20kb) and when you are opening more than one connection you are allocating n times this amount which might be too much for your sketch. I solved this problem in my sketches by downloading one resource first and storing it on the flash memory. Then I parse this JSON and download the referenced resources in the initial JSON file. This way I only always have one SSL connection open, which works fine. Also the allocated memory is freed after the connection closes, which clearly shows that there is no memory leak…
      Daniel
      Reply
      - alang says:
        27. June 2018 at 9:07
        
        That’s a good point. I was really surprised the amount of required overhead in a secure connection. In my app I only have one SSL connection at a time, and each SSL call is separated by at least 5-20 minutes. While debugging, I print the amount of heap available quite often, and although the values vary a lot, there was an overall download trend after each secure call. Watching the heap, you can see less and less available. When making non-SLL calls I don’t see the downward trend in the amount of available heap.
        
        Googling tells me that this problem is known, and although it looks like it might have been addressed, I see the effects being described in the problems reports. And I have updated my libraries in the hope to benefit from any fixes.
alang says:
27. June 2018 at 21:15

With embarrassment I need to apologize –the problem was my code and not the libraries. I was not handling the json data correctly resulting in using more ram every time I made a call (it still hurts to think of it). I appreciate your taking the time to respond, even though the problem was my poor programming skills.
Reply
Yaseen Khan says:
5. August 2018 at 20:56

Hello there, I am using OpenWeatherMapCurrentDemo example code. The tempMax, WindSpeed and WindDeg do not seem to be work. It spits out a large number everytime…so you also see the problem? Please see the response below over serial output. Thank you.

[HTTP] GET…
[HTTP] GET… code: 200
start document
————————————
lon: -79.389999
lat: 43.650002
weatherId: 721
main: Haze
description: haze
icon: 50n
iconMeteoCon: M
temp: 24.139999
pressure: 1020
humidity: 78
tempMin: 22.000000
tempMax: -159465800836912022475758179315756302336.000000
windSpeed: -159465800836912022475758179315756302336.000000
windDeg: -159465800836912022475758179315756302336.000000
clouds: 254
observationTime: -17829890, full date: Sun Jun 8 15:15:10 1969
country:
sunrise: -17829890, full date: Sun Jun 8 15:15:10 1969
sunset: -17829890, full date: Sun Jun 8 15:15:10 1969
cityName:

—————————————————/
Reply
- Aristide says:
  9. August 2018 at 8:55
  
  Hello,
  actually I have a similar problem. I bought the 2.9″ ESPAPER KIT which is great. I updated it to the openweather version, which is even greater with the temperature plot over the next few days. Bravo !
  Regarding the parser, I get some trouble with the current temperature update, which is sometimes incorrect. I believe this is due to the parser.
  I tried to implement the parser to print my own sensor temperature on the epaper. Unfortunately the parser is able to retrieve the temperature and humidity from the json of my own domoticz website. Here is a copy of the json I get from the website:
  {
  “ActTime” : 1533748571,
  “AstrTwilightEnd” : “23:33”,
  “AstrTwilightStart” : “04:21”,
  “CivTwilightEnd” : “21:54”,
  “CivTwilightStart” : “05:60”,
  “DayLength” : “14:43”,
  “NautTwilightEnd” : “22:39”,
  “NautTwilightStart” : “05:15”,
  “ServerTime” : “2018-08-08 19:16:11”,
  “SunAtSouth” : “13:05”,
  “Sunrise” : “06:36”,
  “Sunset” : “21:18”,
  “app_version” : “4.9796”,
  “result” : [
  {
  “AddjMulti” : 1.0,
  “AddjMulti2” : 1.0,
  “AddjValue” : 0.0,
  “AddjValue2” : 0.0,
  “BatteryLevel” : 255,
  “CustomImage” : 0,
  “Data” : “27.4 C, 46 %”,
  “Description” : “”,
  “DewPoint” : “14.76”,
  “Favorite” : 1,
  “HardwareID” : 4,
  “HardwareName” : “RFLink”,
  “HardwareType” : “RFLink Gateway USB”,
  “HardwareTypeVal” : 46,
  “HaveTimeout” : false,
  “Humidity” : 46,
  “HumidityStatus” : “Comfortable”,
  “ID” : “0100”,
  “LastUpdate” : “2018-08-08 19:14:08”,
  “Name” : “Exterieur”,
  “Notifications” : “false”,
  “PlanID” : “0”,
  “PlanIDs” : [ 0 ],
  “Protected” : false,
  “ShowNotifications” : true,
  “SignalLevel” : “-“,
  “SubType” : “LaCrosse TX3”,
  “Temp” : 27.399999999999999,
  “Timers” : “false”,
  “Type” : “Temp + Humidity”,
  “TypeImg” : “temperature”,
  “Unit” : 2,
  “Used” : 1,
  “XOffset” : “0”,
  “YOffset” : “0”,
  “idx” : “222”
  }
  ],
  “status” : “OK”,
  “title” : “Devices”
  }
  
  Once in a while it works, but most of the time it would stop at “app_version” key. Any chance that both issues are related to the json streaming parser?
  I played around for quite some time now and have not clue about what is going on. Help would be greatly appreciated!
  Congratulations on your device and programs again
  
  Aristide
  Reply