Recovery and restoration service:frontea online,corp.

Recovery and restoration service:frontea online,corp.

Deep Abyss Audio (5) Mounting the UAC Engine: Building a Pro‑Audio‑Grade USB DAC with the ESP32‑S3 Core

‹ 2025/12/30 ›

Hello again.

Welcome to Episode 5.

Today we’re finally installing the USB (UAC) engine on the DDC input side.

But before diving in, let me give you a quick update on the current completion level of this project. 😊


Project Status Update

As of now, this Pro‑Audio‑spec USB DAC has reached 100% completion on Windows hosts.

(iPhone sits at 99%.)

And when I say Pro‑Audio spec, I mean the level used in recording and editing studios—

and 100% means “surpassing the top tier of professional DAC equipment.”

Yes, you read that correctly.



At this moment, I can confidently say:

“This device has already reached the very top of professional audio gear.”


If you’re thinking:

  • “No way.”
  • “Impossible.”
  • “If something like that existed, I’d want one.”
  • “Just show me how it’s built.”
  • “I’m not even interested.”

—no matter which group you belong to,

as long as you have any interest in music,

bear with me and follow the article for a little while.

I promise I can take you to a place

where “the boundary of time becomes visible.”

(Yes, I’m sweating too.)

From here on, I’ll refer to this entire effort as the PopoDAC Project.


Revisiting the I2S Tachometer

Before we mount the UAC engine,

I made a small revision to the I2S monitoring module. 😅

The mounting method itself had no issues.

However, I decided to swap the timer type I was using.

This is a conversion from ESP Timer → GPTimer.


A small drama

“Coach, I can still go! I can still do this! Let me run!”

—ESPTimer

“I’m sorry, ESPT… you’re just not the one.”

—Me


Why switch timers if the specs are similar?

On paper, both ESPTimer and GPTimer have similar specs and practical accuracy.

Under normal circumstances, there would be no need to convert.

But this project is different.

To achieve real‑time behavior,

I need to run four engines across the two CPU cores:

  • UAC engine
  • I2S engine
  • Control engine
  • Monitoring engine

And here lies the issue:

Using ESPTimer introduces a small but critical disadvantage in CPU core placement.


This makes it harder to run the essential functions

in a stress‑free, reliable configuration.

GPTimer solves that.


GPTimer Begins

Here is the updated init_monitor() code after switching from ESPTimer to GPTimer:

void init_monitor(void)

{

    esp_err_t ret;

    ESP_LOGI(TAG, "init_monitor enter");


#if (TRANSFER_MODEL & TRANSFER_SYNCLK_PCNT)

(... same as previous version ...)

#endif


    // GPTimer configuration

    gptimer_config_t config = {

        .clk_src = GPTIMER_CLK_SRC_APB,

        .direction = GPTIMER_COUNT_UP,

        .resolution_hz = 1000000, // 1MHz = 1 tick = 1us

        .intr_priority = TASK_GPTIMER_INTR_PRIORITY, // 0 = low, 3 = highest

        .flags = {

            .intr_shared = 0,

            .allow_pd = 0,

            .backup_before_sleep = 0,

        }

    };

    ret = gptimer_new_timer(&config, &g_gptimer);

    ESP_LOGI(TAG, "%s to create %s", ret == ESP_OK?"Succeeded":"Failed", "gptimer");


    gptimer_alarm_config_t alarm_config = {

        .reload_count = 0,

        .alarm_count = 1000, // 1000us = 1ms

        .flags.auto_reload_on_alarm = true,

    };

    ret = gptimer_set_alarm_action(g_gptimer, &alarm_config);

    ESP_LOGI(TAG, "%s to set alarm time=%dus", ret == ESP_OK?"Succeeded":"Failed", alarm_config.alarm_count);

    

    gptimer_event_callbacks_t cbs = {

        .on_alarm = monitor_task_gpt_isr,

    };

    ret = gptimer_register_event_callbacks(g_gptimer, &cbs, NULL);

    ESP_LOGI(TAG, "%s to set alarm %s", ret == ESP_OK?"Succeeded":"Failed", "monitor_task_gpt_isr");


    ret = gptimer_enable(g_gptimer);

    ESP_LOGI(TAG, "%s to set enable %s", ret == ESP_OK?"Succeeded":"Failed", "gptimer");

    

    ret = gptimer_start(g_gptimer);

    ESP_LOGI(TAG, "%s to start %s", ret == ESP_OK?"Succeeded":"Failed", "gptimer");

}


Switching from ESPTimer to GPTimer is not difficult.

However, there are important considerations.

GPTimer is a hardware timer (four timers available) and its callback is an ISR.

This means:

The ISR runs inside an FPU‑disabled region.


So the amount of work you can safely do inside the ISR must be strictly limited.

Additionally, unlike the old TimerGroup era (two timers),

you cannot manually pin GPTimer to a specific CPU core.

For this project, we overcome both constraints simultaneously with a particular strategy—

which I’ll explain shortly.


Selecting the USB (UAC) Driver

Now, today’s main topic: choosing the USB driver.

For USB, you can choose:

  • an existing driver
  • a custom implementation
  • or something in between

But since we’re on Espressif/IDF, the answer is simple:

TinyUSB. No hesitation needed.


TinyUSB is extremely reliable,

and unless you have very special requirements,

there’s no need to modify it or write your own UAC driver.

However…

Remember the development environment we chose for this project?

  • Microsoft Visual Studio Code
  • PlatformIO
  • NOT Espressif‑IDE

And yes—

this is where several traps are hidden.

We must understand them clearly and avoid stepping on them.


Espressif vs PlatformIO — The Problem

Just like Microsoft vs Apple,

here we go again: another faction war. 😅

(How many of these do we need…)

The Espressif‑IDF you get inside a Visual Studio Code PlatformIO project

is effectively a slightly discounted version of the real thing.

Even if the framework version numbers look similar,

they are not the same.

For this project, we proceed with the “discount version.”

TinyUSB is also affected by this divide.

You cannot use the TinyUSB bundled with Espressif’s distribution.


Instead, we use:

The official upstream TinyUSB.


These two are similar but not identical,

so keep that in mind.

(And honestly, if the Hardware I/O implementation is solid,

the upstream TinyUSB is the better choice—

clean, unpolluted, and delicious.)


USB (UAC) Engine — Standby

Alright, let’s jump straight into mounting and starting the USB engine.


usb_init()

void usb_init(void)

{

    // PHY configuration

    usb_phy_config_t phy_conf = {

        .controller = USB_PHY_CTRL_OTG,

        .otg_mode = USB_OTG_MODE_DEVICE,

        .target = USB_PHY_TARGET_INT,

#if CONFIG_TINYUSB_RHPORT_HS

        .otg_speed = USB_PHY_SPEED_HIGH,

#endif

    };


    usb_phy_handle_t phy_hdl;

    esp_err_t ret = usb_new_phy(&phy_conf, &phy_hdl);

    if (ret != ESP_OK) {

        printf("USB PHY init failed\n");

        return;

    }


    tusb_init();


    #if (TRANSFER_MODEL&TRANSFER_FEEDBACK_SOFISR)

    // Enabling SOF callback must be done after usb_init()

    //tud_sof_cb_enable(true);

    #endif

}


Initialization here follows the standard, straightforward bring‑up procedure.


The Importance of Start‑Of‑Frame (SOF)

But there is one extremely important point we need to talk about.

Notice that:

tud_sof_cb_enable(true);


is commented out — meaning we do not (and cannot) use it.

This is a big deal.

SOF is the single most important factor for correctly controlling UAC.

And we cannot use it directly.

Yes, I cried too.


This ties directly into the philosophy of upstream TinyUSB.

In TinyUSB, when USB Audio Class is active — that is, when the device has accepted the Configuration Descriptor —

TinyUSB internally delegates SOF to the Feedback ISR.


Some overseas users try to “free” SOF by modifying the stack,

but there is no need to fight the design.

Just follow the philosophy:

Choose the UAC mode that uses Asynchronous Feedback.

That’s all.


usb_task()

void usb_task(void *param)

{

    // Exclude this task from the Task Watchdog

    esp_task_wdt_delete(NULL);


    ESP_LOGI(TAG, "usb_task enter");

 

    while (1) {

        tud_task();   // TinyUSB device task

    }

}


There isn’t much to explain here.

I added a small charm spell to escape the Task Watchdog. 😅

Inside the while loop, the rule is absolute:

Never insert vTaskDelay(), never add extra processing.


tud_task() wants to run strictly in sync with SOF.

It must spin here without interruption.

“Add nothing.”

That is the truth.


tud_audio_feedback_params_cb() — Today’s Most Important Function

Here is the key function of the day:

void tud_audio_feedback_params_cb(uint8_t func_id, uint8_t alt_itf, audio_feedback_params_t* feedback_param)

{

    (void)func_id;

    (void)alt_itf;


#if (TRANSFER_MODEL&TRANSFER_FEEDBACK_SOFFIFO) == TRANSFER_FEEDBACK_SOFFIFO

    // Set feedback method to fifo counting

    // TinyUSB handles the response automatically; no further work needed on the application side

    feedback_param->method = AUDIO_FEEDBACK_METHOD_FIFO_COUNT;

    feedback_param->sample_freq = g_ddc.uac_quality.sample_rate;

#else

    // Set feedback method to pulse counting for device look

    feedback_param->method = AUDIO_FEEDBACK_METHOD_FREQUENCY_FIXED;

    feedback_param->sample_freq = g_ddc.uac_quality.sample_rate;

    feedback_param->frequency.mclk_freq = g_cnt_ideal.mclk;

#endif


    ESP_LOGI(TAG, "tud_audio_feedback_params_cb %d, sample freq: %"PRIu32"", feedback_param->method, feedback_param->sample_freq);

}


Why This Callback Matters

tud_audio_feedback_params_cb() is the only hook TinyUSB provides

to influence how the Audio SOF ISR behaves.

This is where you declare your feedback strategy.

And generally, you have two choices:


1. AUDIO_FEEDBACK_METHOD_FIFO_COUNT

This method reports the sample rate via FIFO depth.

In practice, this means:

“TinyUSB, please handle feedback for me.”


You hand over responsibility to the stack.

Simple, safe, and perfectly fine for consumer‑grade devices.


2. AUDIO_FEEDBACK_METHOD_FREQUENCY_FIXED

This method uses a precise MCLK pulse counter

and returns feedback values generated by your own hardware.

This is effectively saying:

“Call me via tud_audio_feedback_interval_isr(),

I will provide the exact feedback myself.”


This is the path for people who want:

  • Pro‑Audio behavior
  • Sample‑accurate drift control
  • True asynchronous feedback
  • Studio‑grade stability

In other words:

If you want a Pro‑Audio USB DAC,

this is the only correct choice.


And yes — that’s the one we choose.


A Third Option Exists (XMOS Style)

Interestingly, there is another concept used by XMOS:

AUDIO_FEEDBACK_METHOD_FREQUENCY_FLOAT

This is used when the device does not have a fixed pulse counter,

yet still claims:

“Trust me, I’ll handle feedback.”


It’s a floating, adaptive feedback model.

Fascinating approach, right?


Configuration Descriptor — The Part Everyone Wants to See

For many readers—especially those who already know USB Audio Class—

this is probably the part you’ve been waiting for.

I understand that feeling very well. 😄


Here is the function:

void MakeConfigurationDescriptor(int32_t samplerate, int8_t channels, int8_t bits, int8_t samplebytes) 

{

#if CONFIG_UAC_VERSION==CONFIG_UAC_VERSION10

  // ---------------- Configuration Descriptor ----------------

  uint8_t desc_configuration[] =

  {

    // Configuration Descriptor

    0x09, TUSB_DESC_CONFIGURATION,              // bLength, bDescriptorType (CONFIGURATION)

#if CFG_TUD_AUDIO_ENABLE_FEEDBACK_EP == 1

    U16_BYTES(0x6D 9),              // wTotalLength(109)

#else

    U16_BYTES(0x6D),              // wTotalLength(109)

#endif


    0x02,                    // bNumInterfaces (AC AS)

    0x01,                    // bConfigurationValue

    0x00,                    // iConfiguration

    0x80,                    // bmAttributes (Bus Powered)

    0x32,                    // MaxPower (100mA)


    // ===== Interface 0: Audio Control =====

    0x09, 0x04,              // bLength, bDescriptorType (INTERFACE)

    0x00,                    // bInterfaceNumber

    0x00,                    // bAlternateSetting

    0x00,                    // bNumEndpoints

    0x01,                    // bInterfaceClass (Audio)

    0x01,                    // bInterfaceSubClass (Audio Control)

    0x00,                    // bInterfaceProtocol

    0x00,                    // iInterface


    // Audio Control Header

    0x09, TUSB_DESC_CS_INTERFACE, 0x01,        // bLength, bDescriptorType=CS_INTERFACE, HEADER

    U16_BYTES(CONFIG_UAC_VERSION),              // bcdADC = 1.00

    U16_BYTES(0x27),              // wTotalLength = 39 bytes

    0x01,                    // bInCollection

    0x01,                    // baInterfaceNr[1] = 1


    // Input Terminal (USB Streaming)

    0x0C, TUSB_DESC_CS_INTERFACE, 0x02,        // bLength, CS_INTERFACE, INPUT_TERMINAL

    UAC_ENTITY_INPUT_TERMINAL,                    // bTerminalID

    U16_BYTES(AUDIO_TERM_TYPE_USB_STREAMING),              // wTerminalType = USB Streaming

    0x00,                    // bAssocTerminal

    channels,                    // bNrChannels = 2

    0x03, 0x00,              // wChannelConfig = Left Right

    0x00,                    // iChannelNames

    0x00,                    // iTerminal


    // Feature Unit (Mute Volume)

    0x09, TUSB_DESC_CS_INTERFACE, 0x06,        // bLength, CS_INTERFACE, FEATURE_UNIT

    UAC_ENTITY_FEATURE_UNIT,                    // bUnitID

    0x01,                    // bSourceID (Input Terminal)

    0x01,                    // bControlSize

    0x03,                    // bmaControls[master] (Mute Volume)

    0x00,                    // bmaControls[channel 0]

    0x00,                    // bmaControls[channel 1]


    // Output Terminal (Speaker)

    0x09, TUSB_DESC_CS_INTERFACE, 0x03,        // bLength, CS_INTERFACE, OUTPUT_TERMINAL

    UAC_ENTITY_OUTPUT_TERMINAL,                    // bTerminalID

    U16_BYTES(AUDIO_TERM_TYPE_OUT_GENERIC_SPEAKER),              // wTerminalType = Speaker

    0x00,                    // bAssocTerminal

    0x02,                    // bSourceID (Feature Unit)

    0x04,                    // iTerminal → "PopoDAC Speaker"


    // ===== Interface 1: Audio Streaming =====

    // Alt0 (no endpoints)

    0x09, TUSB_DESC_INTERFACE,              // bLength, INTERFACE

    0x01,                    // bInterfaceNumber

    0x00,                    // bAlternateSetting

    0x00,                    // bNumEndpoints

    0x01,                    // bInterfaceClass (Audio)

    AUDIO_SUBCLASS_STREAMING,                    // bInterfaceSubClass (Audio Streaming)

    0x00,                    // bInterfaceProtocol

    0x00,                    // iInterface


    // Alt1 (with one OUT endpoint)

    0x09, TUSB_DESC_INTERFACE,              // bLength, INTERFACE

    0x01,                    // bInterfaceNumber

    0x01,                    // bAlternateSetting

    (CFG_TUD_AUDIO_ENABLE_FEEDBACK_EP==1?2:1), // bNumEndpoints(Out only or Out FB)

    0x01,                    // bInterfaceClass (Audio)

    AUDIO_SUBCLASS_STREAMING,                    // bInterfaceSubClass (Audio Streaming)

    0x00,                    // bInterfaceProtocol

    0x00,                    // iInterface


    // AS General

    0x07, TUSB_DESC_CS_INTERFACE, 0x01,               // bLength, CS_INTERFACE, AS_GENERAL

    UAC_ENTITY_INPUT_TERMINAL,    // bTerminalLink (Input Terminal ID=1)

    0x01,                           // bDelay

    0x01, 0x00,                     // wFormatTag = PCM


    // Format Type

    0x0B, TUSB_DESC_CS_INTERFACE, 0x02,        // bLength, CS_INTERFACE, FORMAT_TYPE

    AUDIO20_FORMAT_TYPE_I,   // bFormatType = FORMAT_TYPE_I

    channels,                    // bNrChannels = 2

    samplebytes,                    // bSubframeSize = 3 bytes

    bits,                    // bBitResolution = 24

    0x01,                    // bSamFreqType = 1 (Discrete)

    U24_TO_U8S_LE(samplerate),        // tSamFreq[1] = 96000 Hz


    // Endpoint Descriptor (Isochronous OUT)

    0x09, TUSB_DESC_ENDPOINT,                       // bLength, ENDPOINT

    CFG_TUD_AUDIO_FUNC_1_EP_OUT,      // bEndpointAddress = EP1 OUT

    UAC_EP_OUT_ATTRIBUTE_USE, // bmAttributes = Isochronous, Adaptive, Data

    U16_BYTES((samplerate*channels*samplebytes/1000)), // 576 bytes

    0x01,                    // bInterval = 1

    0x00,                    // bRefresh

#if CFG_TUD_AUDIO_ENABLE_FEEDBACK_EP == 1 // bSynchAddress

    CFG_TUD_AUDIO_FUNC_1_EP_FB,

#else

    0,

#endif


#if CFG_TUD_AUDIO_ENABLE_FEEDBACK_EP == 1

    // Feedback Endpoint Descriptor

    0x09,                   // bLength

    TUSB_DESC_ENDPOINT,                   // bDescriptorType = ENDPOINT

    CFG_TUD_AUDIO_FUNC_1_EP_FB,  // bEndpointAddress = IN, EP2

    UAC_EP_OUT_ATTRIBUTE_FEEEDBACK, // bmAttributes = Isochronous, Sync = Feedback, Usage = Data

    0x03, 0x00,             // wMaxPacketSize = 3 bytes (feedback reports are 3 bytes)

    UAC_EXPLICIT_FB_INTERVAL/*0x01*/,                   // bInterval = 1 (1ms)

    0x00,                   // bRefresh

    0x00,                   // bSyncAddress = 0

#endif


    // Class-Specific Audio Data Endpoint

    0x07, TUSB_DESC_CS_ENDPOINT, 0x01,        // bLength, CS_ENDPOINT, EP_GENERAL

#if CFG_TUD_AUDIO_ENABLE_FEEDBACK_EP == 1

    AUDIO10_CS_AS_ISO_DATA_EP_ATT_NON_MAX_PACKETS_OK,                    // bmAttributes

    AUDIO10_CS_AS_ISO_DATA_EP_LOCK_DELAY_UNIT_MILLISEC,                    // bLockDelayUnits

    U16_BYTES(0)             // wLockDelay

#else

    0x00,

    0x00,

    U16_BYTES(0)             // wLockDelay

#endif

};

#else

(...UAC2 descriptor)

#endif


  if (g_desc_configuration != NULL)

    free(g_desc_configuration);


  g_desc_configuration = malloc(sizeof(desc_configuration));

  memcpy (g_desc_configuration, desc_configuration, sizeof(desc_configuration));

}


But… copying this won’t get you anywhere near Pro‑Audio quality

You can imitate this descriptor all you want,

but that alone will never get you to Pro‑Audio behavior.

Why?

Because the real reason will only become clear

when you eventually collide with the PopoDAC philosophy

later in this series.

(That will be fun.)


What’s Actually Different Here

The descriptor layout itself is standard.

You’ve probably seen similar structures in many USB Audio examples.

The key difference is:

PopoDAC generates the descriptor dynamically inside MakeConfigurationDescriptor().


And this exposes the only real weakness of TinyUSB’s Audio Class implementation.

Not a bug — a design philosophy.

TinyUSB assumes:

  • Audio descriptors are static
  • Audio format does not change at runtime
  • Sample rate is fixed at enumeration

But PopoDAC does not follow that assumption.

PopoDAC supports dynamic Audio Quality Presets.

Therefore the descriptor must be rebuilt dynamically.


This is where TinyUSB’s philosophy and PopoDAC’s philosophy diverge.

And this divergence is exactly what leads to the deeper

Pro‑Audio specification philosophy

that we will soon explore.


We’re almost ready to explain the PopoDAC Pro‑Audio Philosophy

Just a few more component choices remain.

After that, we can finally talk about the core philosophy

that makes PopoDAC fundamentally different from typical USB DACs.