Supercharge Your LVGL Render on ESP32 Using SPI DMA | Fast and Efficient

153

LVGL Render is a topic every ESP32 UI developer eventually runs into. If you’ve ever built a graphical interface using LVGL (Light and Versatile Graphics Library) on ESP32, chances are you’ve encountered frustrating issues like slow screen refresh, input lag, or visible flickering.

But the problem doesn’t lie with LVGL itself — it’s caused by the data transfer method used in the underlying display driver. This article will walk you through how to dramatically improve TFT LCD rendering efficiency on ESP32 by leveraging SPI DMA (Direct Memory Access). With this approach, you can lay a solid and high-performance foundation for future LVGL integration.

What is LVGL Render?

The core task of LVGL Render (Light and Versatile Graphics Library rendering) is to convert the graphical screen into pixel data (a frame buffer), then send these pixels to the display via a flush function. This process is called rendering.

On ESP32, this LVGL Render flush of pixel data is usually done via the SPI interface, writing LVGL’s pixel data into the TFT LCD’s RAM.

LVGL Render itself doesn’t care how you send data — you can send pixel by pixel, or use DMA. But using DMA with LVGL Render can send an entire block of the screen in one go, providing a huge performance boost.

Brief Introduction to ESP32 SPI DMA Principle

DMA (Direct Memory Access) is a mechanism allowing hardware to transfer data directly between memory and peripherals (like SPI) without CPU intervention. For SPI:

Without DMA: CPU sends data one byte at a time to the SPI register.
With DMA: CPU configures once, then the DMA controller automatically transfers the entire frame data to SPI.

ESP32 supports DMA and offers APIs like spi_device_queue_trans() to asynchronously submit large screen data. For a 240×320 TFT screen, this optimization is essential.

With vs Without DMA in LVGL Render

Item	Without DMA	With DMA
Data Transmission	CPU sends pixel by pixel	DMA transfers entire block
CPU Usage	High	Low
Screen Refresh Speed	Slow (noticeable lag)	Fast (close to hardware limit)
LVGL User Experience	Laggy, frame drops	Smooth
Code Complexity	Low	Slightly higher (DMA setup needed)

Development Environment

Before starting your programming, make sure to complete the following preparations:

Install ESP-IDF (version 4.4 or higher): ESP-IDF is the official development framework for programming the ESP32, and it supports multiple operating systems such as Windows, macOS, and Linux.
ESP32 Development Board: An ESP32 board is required.
Use an SPI interface TFT LCD (e.g., ST7789).

ESP32 LVGL Project Structure

Create a clean ESP-IDF project for LVGL Render with DMA like this:

tft_dma_lvgl_demo/
├── CMakeLists.txt
├── sdkconfig
└── main/
    ├── main.c         # Main program
    └── tft_driver.c   # TFT driver (including DMA implementation)

This article focuses on placing all SPI DMA demo logic in main.c. Later, you can refactor tft_fill_color() into a separate driver module.

Code Comparison of Two Approaches

Traditional Version (Without DMA):

void tft_fill_color(uint16_t color) {
    uint8_t color_data[2] = {color >> 8, color & 0xFF};  // Convert 16-bit color to two 8-bit bytes
    for (int i = 0; i < TFT_WIDTH * TFT_HEIGHT; i++) {
        tft_send_data(color_data, 2);  // Send 2 bytes per pixel
    }
}

DMA Accelerated Version:

void tft_fill_color_dma(uint16_t color) {
    for (int i = 0; i < TFT_WIDTH * TFT_HEIGHT; i++) {
        frame_buffer[i * 2]     = color >> 8;       // High byte of color
        frame_buffer[i * 2 + 1] = color & 0xFF;     // Low byte of color
    }

    spi_transaction_t t = {
        .length = TFT_WIDTH * TFT_HEIGHT * 16,      // Total bits (16 bits per pixel)
        .tx_buffer = frame_buffer,
        .user = (void*)1
    };
    spi_device_queue_trans(spi, &t, portMAX_DELAY);          // Queue the DMA transfer
    spi_transaction_t *ret;
    spi_device_get_trans_result(spi, &ret, portMAX_DELAY);   // Wait for transfer to complete
}

Practical Results (Same Full-Screen Fill Task):

Test Item	Traditional Way	DMA Way
Full-screen fill time	~1 second	Less than 1 second
CPU Idle Time	Nearly 0	Allows other tasks
Smoothness	Very choppy	Smooth

Complete Code

#include "driver/spi_master.h"
#include "driver/gpio.h"
#include "esp_log.h"
#include "freertos/FreeRTOS.h"
#include "freertos/task.h"
#include "esp_heap_caps.h"

#define TAG "TFT_DMA"

// Hardware pin definitions
#define PIN_NUM_MISO  -1
#define PIN_NUM_MOSI  13
#define PIN_NUM_CLK   14
#define PIN_NUM_CS    4
#define PIN_NUM_DC    15
#define PIN_NUM_RST   2
#define PIN_NUM_LED   27

// Screen resolution
#define TFT_WIDTH     240
#define TFT_HEIGHT    320

// ST7789 command definitions
#define ST7789_SWRESET 0x01
#define ST7789_SLPOUT  0x11
#define ST7789_DISPON  0x29
#define ST7789_CASET   0x2A
#define ST7789_RASET   0x2B
#define ST7789_RAMWR   0x2C

spi_device_handle_t spi;
uint8_t *frame_buffer = NULL;

/* SPI pre-transfer callback: sets DC pin */
static void IRAM_ATTR spi_pre_transfer_callback(spi_transaction_t *t) {
    gpio_set_level(PIN_NUM_DC, (int)t->user);
}

/* Send command to TFT */
void tft_send_cmd(uint8_t cmd) {
    spi_transaction_t t = {
        .length = 8,
        .tx_buffer = &cmd,
        .user = (void*)0,
    };
    spi_device_polling_transmit(spi, &t);
}

/* Send data to TFT */
void tft_send_data(uint8_t *data, uint16_t len) {
    spi_transaction_t t = {
        .length = len * 8,
        .tx_buffer = data,
        .user = (void*)1,
    };
    spi_device_polling_transmit(spi, &t);
}

/* Set display window area */
void tft_set_window(uint16_t x1, uint16_t y1, uint16_t x2, uint16_t y2) {
    uint8_t buf[4];

    buf[0] = x1 >> 8; buf[1] = x1 & 0xFF;
    buf[2] = x2 >> 8; buf[3] = x2 & 0xFF;
    tft_send_cmd(ST7789_CASET);
    tft_send_data(buf, 4);

    buf[0] = y1 >> 8; buf[1] = y1 & 0xFF;
    buf[2] = y2 >> 8; buf[3] = y2 & 0xFF;
    tft_send_cmd(ST7789_RASET);
    tft_send_data(buf, 4);
}

/* Fill the entire screen with a color using DMA */
void tft_fill_color_dma(uint16_t color) {
    if (!frame_buffer) {
        frame_buffer = (uint8_t *)heap_caps_malloc(TFT_WIDTH * TFT_HEIGHT * 2, MALLOC_CAP_DMA);
        if (!frame_buffer) {
            ESP_LOGE(TAG, "Frame buffer allocation failed");
            return;
        }
    }

    // Fill DMA buffer with color (RGB565 format)
    for (int i = 0; i < TFT_WIDTH * TFT_HEIGHT; i++) {
        frame_buffer[i * 2]     = color >> 8;
        frame_buffer[i * 2 + 1] = color & 0xFF;
    }

    tft_set_window(0, 0, TFT_WIDTH - 1, TFT_HEIGHT - 1);
    tft_send_cmd(ST7789_RAMWR);

    // DMA SPI transfer for full frame
    spi_transaction_t t = {
        .length = TFT_WIDTH * TFT_HEIGHT * 16,
        .tx_buffer = frame_buffer,
        .user = (void*)1
    };

    ESP_ERROR_CHECK(spi_device_queue_trans(spi, &t, portMAX_DELAY));
    spi_transaction_t *ret;
    ESP_ERROR_CHECK(spi_device_get_trans_result(spi, &ret, portMAX_DELAY));
}

/* Initialize TFT display */
void tft_init() {
    // Configure GPIO
    gpio_config_t io_conf = {
        .pin_bit_mask = (1ULL << PIN_NUM_LED) | (1ULL << PIN_NUM_DC) | (1ULL << PIN_NUM_RST),
        .mode = GPIO_MODE_OUTPUT,
    };
    gpio_config(&io_conf);

    gpio_set_level(PIN_NUM_RST, 0);
    vTaskDelay(pdMS_TO_TICKS(100));
    gpio_set_level(PIN_NUM_RST, 1);
    vTaskDelay(pdMS_TO_TICKS(120));

    // Initialize SPI bus
    spi_bus_config_t buscfg = {
        .mosi_io_num = PIN_NUM_MOSI,
        .miso_io_num = PIN_NUM_MISO,
        .sclk_io_num = PIN_NUM_CLK,
        .quadwp_io_num = -1,
        .quadhd_io_num = -1,
        .max_transfer_sz = TFT_WIDTH * TFT_HEIGHT * 2,
    };
    ESP_ERROR_CHECK(spi_bus_initialize(SPI2_HOST, &buscfg, SPI_DMA_CH_AUTO));

    spi_device_interface_config_t devcfg = {
        .clock_speed_hz = 40 * 1000 * 1000,
        .mode = 0,
        .spics_io_num = PIN_NUM_CS,
        .queue_size = 7,
        .pre_cb = spi_pre_transfer_callback,
    };
    ESP_ERROR_CHECK(spi_bus_add_device(SPI2_HOST, &devcfg, &spi));

    // Initialize TFT controller
    tft_send_cmd(ST7789_SWRESET);
    vTaskDelay(pdMS_TO_TICKS(150));

    tft_send_cmd(ST7789_SLPOUT);
    vTaskDelay(pdMS_TO_TICKS(120));

    uint8_t colmod_cmd[] = {0x3A, 0x55}; // Set color mode to RGB565
    tft_send_cmd(colmod_cmd[0]);
    tft_send_data(&colmod_cmd[1], 1);

    tft_send_cmd(ST7789_DISPON);
    vTaskDelay(pdMS_TO_TICKS(120));

    gpio_set_level(PIN_NUM_LED, 1);
}

/* Main function */
void app_main(void) {
    tft_init();
    ESP_LOGI(TAG, "TFT initialized with DMA");

    while (1) {
        tft_fill_color_dma(0xF800); // Red
        vTaskDelay(pdMS_TO_TICKS(1000));

        tft_fill_color_dma(0x07E0); // Green
        vTaskDelay(pdMS_TO_TICKS(1000));

        tft_fill_color_dma(0x001F); // Blue
        vTaskDelay(pdMS_TO_TICKS(1000));
    }
}

This code demonstrates how to use SPI DMA transfers on the ESP32 to improve the display performance of a TFT LCD (with an ST7789 controller), laying the foundation for high-speed screen refreshing and smooth rendering in combination with the LVGL rendering concept.

Compile and Flash

After writing the code, you can use the ESP-IDF tools to build, flash, and monitor:

In the VS Code lower-left ESP-IDF toolbar:

Click Build project
Click Flash device
Click Monitor device

When the program starts, the TFT display will continuously cycle through three solid color screens, switching every second:

Red screen (0xF800): Red in RGB565 format, displayed for 1 second
Green screen (0x07E0): Displayed for 1 second
Blue screen (0x001F): Displayed for 1 second

Then the cycle repeats from the beginning.

This means the entire screen is fully refreshed once per second, with the contents of the frame buffer being transferred directly to the LCD via DMA. This significantly reduces CPU intervention and minimizes potential stuttering.

Conclusion

In LVGL Render embedded systems, the CPU is a precious resource. Traditional SPI data transfers are not only time-consuming but can also cause noticeable system lag and sluggish UI responses. By introducing DMA (Direct Memory Access), it’s possible to offload the entire screen data transfer with minimal CPU involvement, dramatically improving performance.

Although this article does not yet integrate LVGL Render directly, the low-level architecture presented here forms the performance-critical foundation for future integration. It’s this DMA-powered backend that will ultimately enable smooth and responsive rendering when LVGL Render is fully implemented.