Supercharge Your LVGL Render on ESP32 Using SPI DMA | Fast and Efficient
LVGL Render is a topic every ESP32 UI developer eventually runs into. If you’ve ever built a graphical interface using LVGL (Light and Versatile Graphics Library) on ESP32, chances are you’ve encountered frustrating issues like slow screen refresh, input lag, or visible flickering.
But the problem doesn’t lie with LVGL itself — it’s caused by the data transfer method used in the underlying display driver. This article will walk you through how to dramatically improve TFT LCD rendering efficiency on ESP32 by leveraging SPI DMA (Direct Memory Access). With this approach, you can lay a solid and high-performance foundation for future LVGL integration.

Contents
What is LVGL Render?
The core task of LVGL Render (Light and Versatile Graphics Library rendering) is to convert the graphical screen into pixel data (a frame buffer), then send these pixels to the display via a flush function. This process is called rendering.
On ESP32, this LVGL Render flush of pixel data is usually done via the SPI interface, writing LVGL’s pixel data into the TFT LCD’s RAM.
LVGL Render itself doesn’t care how you send data — you can send pixel by pixel, or use DMA. But using DMA with LVGL Render can send an entire block of the screen in one go, providing a huge performance boost.
Brief Introduction to ESP32 SPI DMA Principle
DMA (Direct Memory Access) is a mechanism allowing hardware to transfer data directly between memory and peripherals (like SPI) without CPU intervention. For SPI:
- Without DMA: CPU sends data one byte at a time to the SPI register.
- With DMA: CPU configures once, then the DMA controller automatically transfers the entire frame data to SPI.
ESP32 supports DMA and offers APIs like spi_device_queue_trans()
to asynchronously submit large screen data. For a 240×320 TFT screen, this optimization is essential.
With vs Without DMA in LVGL Render
Item | Without DMA | With DMA |
---|---|---|
Data Transmission | CPU sends pixel by pixel | DMA transfers entire block |
CPU Usage | High | Low |
Screen Refresh Speed | Slow (noticeable lag) | Fast (close to hardware limit) |
LVGL User Experience | Laggy, frame drops | Smooth |
Code Complexity | Low | Slightly higher (DMA setup needed) |
Development Environment
Before starting your programming, make sure to complete the following preparations:
- Install ESP-IDF (version 4.4 or higher): ESP-IDF is the official development framework for programming the ESP32, and it supports multiple operating systems such as Windows, macOS, and Linux.
- ESP32 Development Board: An ESP32 board is required.
- Use an SPI interface TFT LCD (e.g., ST7789).
ESP32 LVGL Project Structure
Create a clean ESP-IDF project for LVGL Render with DMA like this:
tft_dma_lvgl_demo/
├── CMakeLists.txt
├── sdkconfig
└── main/
├── main.c # Main program
└── tft_driver.c # TFT driver (including DMA implementation)
This article focuses on placing all SPI DMA demo logic in main.c. Later, you can refactor tft_fill_color()
into a separate driver module.
Code Comparison of Two Approaches
Traditional Version (Without DMA):
void tft_fill_color(uint16_t color) {
uint8_t color_data[2] = {color >> 8, color & 0xFF}; // Convert 16-bit color to two 8-bit bytes
for (int i = 0; i < TFT_WIDTH * TFT_HEIGHT; i++) {
tft_send_data(color_data, 2); // Send 2 bytes per pixel
}
}
DMA Accelerated Version:
void tft_fill_color_dma(uint16_t color) {
for (int i = 0; i < TFT_WIDTH * TFT_HEIGHT; i++) {
frame_buffer[i * 2] = color >> 8; // High byte of color
frame_buffer[i * 2 + 1] = color & 0xFF; // Low byte of color
}
spi_transaction_t t = {
.length = TFT_WIDTH * TFT_HEIGHT * 16, // Total bits (16 bits per pixel)
.tx_buffer = frame_buffer,
.user = (void*)1
};
spi_device_queue_trans(spi, &t, portMAX_DELAY); // Queue the DMA transfer
spi_transaction_t *ret;
spi_device_get_trans_result(spi, &ret, portMAX_DELAY); // Wait for transfer to complete
}
Practical Results (Same Full-Screen Fill Task):
Test Item | Traditional Way | DMA Way |
---|---|---|
Full-screen fill time | ~1 second | Less than 1 second |
CPU Idle Time | Nearly 0 | Allows other tasks |
Smoothness | Very choppy | Smooth |
Complete Code
#include "driver/spi_master.h"
#include "driver/gpio.h"
#include "esp_log.h"
#include "freertos/FreeRTOS.h"
#include "freertos/task.h"
#include "esp_heap_caps.h"
#define TAG "TFT_DMA"
// Hardware pin definitions
#define PIN_NUM_MISO -1
#define PIN_NUM_MOSI 13
#define PIN_NUM_CLK 14
#define PIN_NUM_CS 4
#define PIN_NUM_DC 15
#define PIN_NUM_RST 2
#define PIN_NUM_LED 27
// Screen resolution
#define TFT_WIDTH 240
#define TFT_HEIGHT 320
// ST7789 command definitions
#define ST7789_SWRESET 0x01
#define ST7789_SLPOUT 0x11
#define ST7789_DISPON 0x29
#define ST7789_CASET 0x2A
#define ST7789_RASET 0x2B
#define ST7789_RAMWR 0x2C
spi_device_handle_t spi;
uint8_t *frame_buffer = NULL;
/* SPI pre-transfer callback: sets DC pin */
static void IRAM_ATTR spi_pre_transfer_callback(spi_transaction_t *t) {
gpio_set_level(PIN_NUM_DC, (int)t->user);
}
/* Send command to TFT */
void tft_send_cmd(uint8_t cmd) {
spi_transaction_t t = {
.length = 8,
.tx_buffer = &cmd,
.user = (void*)0,
};
spi_device_polling_transmit(spi, &t);
}
/* Send data to TFT */
void tft_send_data(uint8_t *data, uint16_t len) {
spi_transaction_t t = {
.length = len * 8,
.tx_buffer = data,
.user = (void*)1,
};
spi_device_polling_transmit(spi, &t);
}
/* Set display window area */
void tft_set_window(uint16_t x1, uint16_t y1, uint16_t x2, uint16_t y2) {
uint8_t buf[4];
buf[0] = x1 >> 8; buf[1] = x1 & 0xFF;
buf[2] = x2 >> 8; buf[3] = x2 & 0xFF;
tft_send_cmd(ST7789_CASET);
tft_send_data(buf, 4);
buf[0] = y1 >> 8; buf[1] = y1 & 0xFF;
buf[2] = y2 >> 8; buf[3] = y2 & 0xFF;
tft_send_cmd(ST7789_RASET);
tft_send_data(buf, 4);
}
/* Fill the entire screen with a color using DMA */
void tft_fill_color_dma(uint16_t color) {
if (!frame_buffer) {
frame_buffer = (uint8_t *)heap_caps_malloc(TFT_WIDTH * TFT_HEIGHT * 2, MALLOC_CAP_DMA);
if (!frame_buffer) {
ESP_LOGE(TAG, "Frame buffer allocation failed");
return;
}
}
// Fill DMA buffer with color (RGB565 format)
for (int i = 0; i < TFT_WIDTH * TFT_HEIGHT; i++) {
frame_buffer[i * 2] = color >> 8;
frame_buffer[i * 2 + 1] = color & 0xFF;
}
tft_set_window(0, 0, TFT_WIDTH - 1, TFT_HEIGHT - 1);
tft_send_cmd(ST7789_RAMWR);
// DMA SPI transfer for full frame
spi_transaction_t t = {
.length = TFT_WIDTH * TFT_HEIGHT * 16,
.tx_buffer = frame_buffer,
.user = (void*)1
};
ESP_ERROR_CHECK(spi_device_queue_trans(spi, &t, portMAX_DELAY));
spi_transaction_t *ret;
ESP_ERROR_CHECK(spi_device_get_trans_result(spi, &ret, portMAX_DELAY));
}
/* Initialize TFT display */
void tft_init() {
// Configure GPIO
gpio_config_t io_conf = {
.pin_bit_mask = (1ULL << PIN_NUM_LED) | (1ULL << PIN_NUM_DC) | (1ULL << PIN_NUM_RST),
.mode = GPIO_MODE_OUTPUT,
};
gpio_config(&io_conf);
gpio_set_level(PIN_NUM_RST, 0);
vTaskDelay(pdMS_TO_TICKS(100));
gpio_set_level(PIN_NUM_RST, 1);
vTaskDelay(pdMS_TO_TICKS(120));
// Initialize SPI bus
spi_bus_config_t buscfg = {
.mosi_io_num = PIN_NUM_MOSI,
.miso_io_num = PIN_NUM_MISO,
.sclk_io_num = PIN_NUM_CLK,
.quadwp_io_num = -1,
.quadhd_io_num = -1,
.max_transfer_sz = TFT_WIDTH * TFT_HEIGHT * 2,
};
ESP_ERROR_CHECK(spi_bus_initialize(SPI2_HOST, &buscfg, SPI_DMA_CH_AUTO));
spi_device_interface_config_t devcfg = {
.clock_speed_hz = 40 * 1000 * 1000,
.mode = 0,
.spics_io_num = PIN_NUM_CS,
.queue_size = 7,
.pre_cb = spi_pre_transfer_callback,
};
ESP_ERROR_CHECK(spi_bus_add_device(SPI2_HOST, &devcfg, &spi));
// Initialize TFT controller
tft_send_cmd(ST7789_SWRESET);
vTaskDelay(pdMS_TO_TICKS(150));
tft_send_cmd(ST7789_SLPOUT);
vTaskDelay(pdMS_TO_TICKS(120));
uint8_t colmod_cmd[] = {0x3A, 0x55}; // Set color mode to RGB565
tft_send_cmd(colmod_cmd[0]);
tft_send_data(&colmod_cmd[1], 1);
tft_send_cmd(ST7789_DISPON);
vTaskDelay(pdMS_TO_TICKS(120));
gpio_set_level(PIN_NUM_LED, 1);
}
/* Main function */
void app_main(void) {
tft_init();
ESP_LOGI(TAG, "TFT initialized with DMA");
while (1) {
tft_fill_color_dma(0xF800); // Red
vTaskDelay(pdMS_TO_TICKS(1000));
tft_fill_color_dma(0x07E0); // Green
vTaskDelay(pdMS_TO_TICKS(1000));
tft_fill_color_dma(0x001F); // Blue
vTaskDelay(pdMS_TO_TICKS(1000));
}
}
This code demonstrates how to use SPI DMA transfers on the ESP32 to improve the display performance of a TFT LCD (with an ST7789 controller), laying the foundation for high-speed screen refreshing and smooth rendering in combination with the LVGL rendering concept.
Compile and Flash
After writing the code, you can use the ESP-IDF tools to build, flash, and monitor:
In the VS Code lower-left ESP-IDF toolbar:
- Click Build project
- Click Flash device
- Click Monitor device
When the program starts, the TFT display will continuously cycle through three solid color screens, switching every second:
- Red screen (0xF800): Red in RGB565 format, displayed for 1 second
- Green screen (0x07E0): Displayed for 1 second
- Blue screen (0x001F): Displayed for 1 second
Then the cycle repeats from the beginning.
This means the entire screen is fully refreshed once per second, with the contents of the frame buffer being transferred directly to the LCD via DMA. This significantly reduces CPU intervention and minimizes potential stuttering.
Conclusion
In LVGL Render embedded systems, the CPU is a precious resource. Traditional SPI data transfers are not only time-consuming but can also cause noticeable system lag and sluggish UI responses. By introducing DMA (Direct Memory Access), it’s possible to offload the entire screen data transfer with minimal CPU involvement, dramatically improving performance.
Although this article does not yet integrate LVGL Render directly, the low-level architecture presented here forms the performance-critical foundation for future integration. It’s this DMA-powered backend that will ultimately enable smooth and responsive rendering when LVGL Render is fully implemented.