ESP32/ ESP-EYE: Browser Based Spectrum Analyzer

In this write-up I’m showing you how you can visualize frequency bands recorded by an ESP32 with I2S microphone in your browser. This is hardly new but it allows me to test-run some concepts I want to use in a later project and might help you to get deeper into these concepts as well. Fast Fourier Transformation, ESP32 as web server using Chart.js and websockets to deliver the data to the browser.

A while back we received an ESP-EYE development board from Espressif. To demonstrate the capabilities of the board Espressif had a an impressive (pun intended) application installed on the module: voice activated face detection and recognition.

After receiving the board I played with this firmware and worked to understand how the firmware was written. I never thought that the ESP32 was powerful enough to run these tasks in real-time but the clever engineers at Espressif pulled it off!

Accessing the microphone

Now I wanted to investigate what it took to use the onboard microphone. It turns out, that his is really easy and can be done with just a few lines of code. The following snippet initializes the microphone:

void setupMic() {
  Serial.println("Configuring I2S...");
  esp_err_t err;

  // The I2S config as per the example
  const i2s_config_t i2s_config = {
      .mode = i2s_mode_t(I2S_MODE_MASTER | I2S_MODE_RX), // Receive, not transfer
      .sample_rate = 40000,                        
      .bits_per_sample = I2S_BITS_PER_SAMPLE_32BIT, // could only get it to work with 32bits
      .channel_format = I2S_CHANNEL_FMT_ONLY_RIGHT, // although the SEL config should be left, it seems to transmit on right
      .communication_format = i2s_comm_format_t(I2S_COMM_FORMAT_I2S | I2S_COMM_FORMAT_I2S_MSB),
      .intr_alloc_flags = ESP_INTR_FLAG_LEVEL1,     // Interrupt level 1
      .dma_buf_count = 4,                           // number of buffers
      .dma_buf_len = 512                     // samples per buffer
  };

  // The pin config as per the setup
    i2s_pin_config_t pin_config = {
        .bck_io_num = 26,  // IIS_SCLK
        .ws_io_num = 32,   // IIS_LCLK
        .data_out_num = -1,// IIS_DSIN
        .data_in_num = 33  // IIS_DOUT
    };

  // Configuring the I2S driver and pins.
  // This function must be called before any I2S driver read/write operations.
  err = i2s_driver_install(I2S_PORT, &i2s_config, 0, NULL);
  if (err != ESP_OK) {
    Serial.printf("Failed installing driver: %d\n", err);
    while (true);
  }
  err = i2s_set_pin(I2S_PORT, &pin_config);
  if (err != ESP_OK) {
    Serial.printf("Failed setting pin: %d\n", err);
    while (true);
  }
  Serial.println("I2S driver installed.");
}

To sample data from the microphone you need a buffer to store the audio information:

int32_t samples[BLOCK_SIZE];
int num_bytes_read = i2s_read_bytes(
         I2S_PORT, 
         (char *)samples, 
         BLOCK_SIZE,     // the doc says bytes, but its elements.
         portMAX_DELAY); // no timeout

After that the samples array will contain the data.

Fast Fourier Transform

A spectrum analyzer visualizes which frequencies are used in a recording. Our microphone records voltage level representing the air pressure at a given moment on the microphone membrane. Many single frequencies might be summed up in this voltage. So how to we get from this information to a data structure which shows how intense each possible frequency is currently present in the recording?

The answer is called Fourier transform. It analyzes the data and converts the signal into frequency domain:

Fourier Transform from Time/Amplitude space to Frequency/Amplitude space. Source: https://en.wikipedia.org/wiki/Fast_Fourier_transform

The generic version of this algorithm requires a lot of computing power. While it is precise in the resulting output it is not very suitable to computers and micro processors especially. But don’t worry, there is help: Fast Fourier Transform (FFT). It is (as the name suggests) much faster than the generic algorithm but not as precise. And it requires that the size of the input data is to the power of two: 64, 128, 256, 512… samples.

For most applications the precision and the input data size limitation are not a problem. And for our little project this works perfectly fine. A audio spectrum analyzer usually groups several frequencies together in one bin and displays the results as colorful bar. Hence we are anyway loosing precision in the grouping process.

Since we don’t want to implement the FFT algorithm by ourself we need a library. Install this library into your Arduino IDE, either by downloading the zip file or by using the library manager:

To do a Fourier transform on the recorded data we need to convert the samples into the data structure which the Arduino FFT library understands. Then we can call the FFT functions:

for (uint16_t i = 0; i < BLOCK_SIZE; i++) {
  vReal[i] = samples[i] << 8; // Adjust input range
  vImag[i] = 0.0; //Imaginary part must be zeroed
}

FFT.Windowing(vReal, BLOCK_SIZE, FFT_WIN_TYP_HAMMING, FFT_FORWARD);
FFT.Compute(vReal, vImag, BLOCK_SIZE, FFT_FORWARD);
FFT.ComplexToMagnitude(vReal, vImag, BLOCK_SIZE);

Web Sockets

The last component in this proof of concept is the web socket communication. What are web sockets? Here is the definition from the Mozilla documentation:

The WebSocket API is an advanced technology that makes it possible to open a two-way interactive communication session between the user’s browser and a server. With this API, you can send messages to a server and receive event-driven responses without having to poll the server for a reply.

The keywords here are “two-way communication” and “without polling”. In a regular http communication the client (your browser) requests resources from the server whenever it needs that. With web sockets the server can push new data to the client whenever an update is available.

In our setup we want to push updated values of the analyzed spectrum to the client(s) to display them in a chart. As an alternative the client could request updates in regular intervals (AKA polling) but this would never be quite right: either the client requests an update before the data has changed on the server or it would miss a couple of new data sets because it the polling interval is to slow. In addition to that polling consumes a lot of resources on both client and server and should be avoided.

In order to to run a web socket server on a ESP32 we need a library. Install this library by Markus Sattler either by downloading the zip file or by using the Arduino IDE library manager:

A minimalistic web socket server code would looks something like this:

#include <WebSocketsServer.h>

// Adding a websocket to the server
WebSocketsServer webSocket = WebSocketsServer(81);

void webSocketEvent(uint8_t num, WStype_t type, uint8_t * payload, size_t length){
  // Do something with the data from the client
  if(type == WStype_TEXT){

  }
}

void setup() {
  webSocket.begin();
  webSocket.onEvent(webSocketEvent);
}

void loop() {
   webSocket.loop();
   String message = "Hello";
   webSocket.broadcastTXT(message.c_str(), message.length());
}

The webSocketEvent callback method is being triggered when the client sends data back to the server. The server can detect different message types and react to them with specific code.

The broadcastTXT method will send messages to the client whenever an update is available.

Running the Code

Now we just to put everything together. Get the code from github: https://github.com/squix78/esp32-mic-fft

Change the WiFi credentials on line 32 and 33 and flash the firmware to your ESP32. After restarting the ESP will print the acquired IP in the serial console. Open the browser at http://<ip from console.

If everything worked you should see something like this: