Recovery and restoration service:frontea online,corp.

Recovery and restoration service:frontea online,corp.

Deep Abyss Audio (2) DDC Architecture and MCLK Design: Building a Pro‑Audio USB DAC with an ESP32‑S3 Core (DIY)

‹ 2025/12/24 ›

Hello again.

In Part 2, we will design the USB‑DDC (the “copy‑MOS” section) and the I2S‑side MCLK architecture.


"Wait… do you design the MCLK? If you don’t use an external clock, doesn’t the ESP32‑S3 framework simply divide down its internal clock source?"


You might think so.


The answer is both YES and NO.


But since we are aiming for Pro Audio‑grade performance, we will not leave this to the framework. We will design it ourselves—down to this level.


Core Architecture of a USB‑DDC

Before diving into the implementation, let’s review the essential requirements of a USB‑DAC‑oriented DDC and establish the fundamental architecture of the “copy‑MOS” block.


At a high level, there are two promising approaches:

  • Adaptive mode
  • Feedback (Async) mode

TinyUSB supports both.


We also have the choice of UAC1.0 vs UAC2.0, but for this project we will implement UAC1.0. (The reasons will be explained another time.)

Adaptive‑Type USB DAC

In short, what is a Digital‑to‑Digital Converter (DDC)?


Ultimately, it is nothing more than buffer to buffer.

In the diagram, the functions that must be implemented inside the ESP32‑S3 are highlighted in pink.



If each processing block behaves ideally—and if the host allows it—this architecture can deliver extremely clear, highly reproducible, luxurious sound quality.


In this mode, the final audio quality is determined by the SRC (Sample Rate Conversion) inside the AdjustRate block.


  • Characteristics

  • Behavior: Device finely adjusts to the host’s data rate
  • Host affinity: Works well with Windows, poorly with iPhone
  • Quality preservation: SRC required on the DDC side; DAC must follow the USB host clock unconditionally


Feedback‑Type USB DAC

Compared to Adaptive mode, the architecture differs slightly.

In this mode, an additional endpoint is used in the UAC protocol:

  • Feedback Endpoint (Feedback EP)

Its direction is opposite to the Output EP.

With the Feedback EP, the copy‑MOS implementation changes: the device can now report its DDC clock status back to the host.



In other words:

“Hey brother, match your pace with mine.” “Got it, I’ll adjust on my side too.”


  • Characteristics

  • Behavior: Device becomes the clock source; host follows the feedback
  • Host affinity: Poor with Windows, excellent with iPhone
  • Quality preservation: Accurate real‑time measurements must be fed back; SRC still required on the DDC side


Which Should You Choose?

Looking at the core of UAC1.0 DDC design, a major Pro Audio challenge becomes clear:


Host affinity: Microsoft vs Apple


A battle that has lasted a quarter century.

The answer is simple:

  • For the highest playback quality on iPhone → Feedback mode
  • For Windows‑only systems → Adaptive mode (simpler and easier to integrate)

Why does this happen?

It comes from the design philosophy of each OS.


Windows

  • Does not obsess over exact data rate
  • Flexible toward clock drift
  • “If it’s roughly correct, let’s keep going.”

Apple (iPhone)

  • Absolute trust in its own internal clock
  • Does not tolerate drift
  • Adjusts data transmission immediately when deviation is detected

The tolerance difference is enormous—on the order of 100 digits of ppm within 1 ms.


Example: Building an Adaptive‑Type DDC

Suppose we implement:


“Read TinyUSB buffer using the I2S clock, then immediately write to I2S.”


This seems like perfect UAC‑I2S synchronization. You may feel there is no data loss at all.


However, in real operation:

  • USB‑side DMA
  • I2S‑side DMA

These two buffering systems inevitably produce tiny underflows/overflows and cycle drift.


On Windows

Even if the host clock fluctuates:

  • Windows tolerates it
  • Does not change its data transmission behavior
  • Result: maybe tiny clicks, but overall acceptable playback

On iPhone

  • Extremely strict
  • Even slight deviation triggers host‑side rate adjustment
  • DDC receives drifting buffers every cycle
  • Sending them directly to I2S produces many audible artifacts

The audible result is:


“Brrip… pssst…” — click noise.


Therefore, SRC must be inserted as shown in the diagram.


Even with SRC working well, on iPhone:

  • SRC activation frequency increases
  • Data modification width increases

Meaning: You are no longer playing the original PCM, but a re‑processed PCM. This inevitably degrades sound quality (blurred, softened).


And with the ESP32‑S3—being a multifunction device rather than an audio‑dedicated chip—this problem becomes even harder to avoid.


Thus:

For high‑quality playback on iPhone, Feedback mode is superior.


DDC Design Philosophy

As we proceed with the architecture, there is one extremely important principle:


Eliminate all locking mechanisms.


Do not use:

  • Critical sections
  • Semaphores
  • Mutexes

For real‑time performance and stable quality, the UAC‑to‑DAC pipeline must remain completely lock‑free at all times.

Maintaining consistency without locks is essential.


DAC Specification

The very first thing that must maintain strict coherence with the DAC is the MCLK on the I²S side.

(Phew… after mentioning this in the introduction, it took quite a journey to finally get here.)

Since we are using the PCM5102A, let’s begin by reviewing its specifications (datasheet).



Looking at the specification table, the sampling rates we want to support this time are 48 kHz to 192 kHz, which means 192fs or 256fs are the appropriate system‑clock multipliers.

(With sincere greetings to TI’s excellent technical documentation.)

For example, with 96 kHz at 256fs, the required System Clock (MCLK generated by the DDC) is:

24.576 MHz

This means we simply need to generate 24.576 MHz from the ESP32‑S3’s internal 160 MHz PLL.

Let’s calculate it:

Divider value:

6.510416666666… ≒ 160000000 / 24576000

At this point you might think:

- “Internal fractional division can’t do this, lol”

- “Just use an external clock, lol”

- “Close enough is fine, right?”

- “No, I want a perfect 24.576 MHz!”

Don’t worry.

Yes — you can generate it.

Rather than arguing, let’s look at the actual “Pakuri‑MOS” operation log:


I (265) main_task: Started on CPU0

I (288) main_task: Calling app_main()

I (298) PopoDAC21: load:uac rate=96000, ch=2, bits(bytes)=24(3)

I (299) PopoDAC21: uac rate=96000, ch=2, bits(bytes)=24(3), frame=576

I (299) PopoDAC21: i2s dma count=16, size=192, priority=3

I (304) SSD1306: New i2c driver is used

I (358) SSD1306: OLED configured successfully

I (414) PopoDAC21: Succeeded to create monitor_task_esp

I (414) PopoDAC21: Succeeded to start monitor_task_esp per=1000us

I (414) PopoDAC21: i2s slotcfg data=24, ws=32, slot=32, mode=2

I (419) PopoDAC21: i2s request mclk=24576000, mclk_multi=256 blck=6144000, blck_div=4.0

I (428) PopoDAC21: i2s(1) sel=2 src=160000000 mlck=24576000.00 num=6 x=1 y=2 z=47 yn1=1

I (435) PopoDAC21: i2s(1) en=0, act=0 bits=23 bck_div=3

I (440) PopoDAC21: i2s(1) latency 0us/1ms

I (444) PopoDAC21: usb_task enter


As you can see, the generated MCLK has 0.0 ppm error.


MCLK Generation (PLL Divider Module)

Now for the second theme: MCLK design.

The ESP32‑S3 introduces a new fractional divider register set (ESP‑IDF v5 series), which can be used directly for I²S MCLK computation.

In fact, I²S already incorporates this new divider logic internally, so you don’t even need to implement your own algorithm.

However, for those thinking:

  • - “What exactly changed?”
  • - “Is it really generating 0 ppm?”
  • - “I want to understand this myself.”

I prepared an explanation.


Explanation of the New Divider Registers

The header soc/i2s_reg.h contains detailed comments. Here are the key parts:


// Integral I2S TX clock divider value. f_I2S_CLK = f_I2S_CLK_S/(N b/a). 

// There will be (a-b) * n-div and b * (n 1)-div.  

// So the average combination will be:  for b<= a/2, z * [x * n-div (n 1)-div] y * n-div. For b > a/2, z * [n-div x *(n 1)-div] y * (n 1)-div.

int div_num;

// For b <= a/2, the value of I2S_TX_CLKM_DIV_X is (a/b) - 1. For b > a/2, the value of I2S_TX_CLKM_DIV_X is (a/(a-b)) - 1.

int div_x;

// For b <= a/2, the value of I2S_TX_CLKM_DIV_Y is (a%b) . For b > a/2, the value of I2S_TX_CLKM_DIV_Y is (a%(a-b)).

int div_y;

// For b <= a/2, the value of I2S_TX_CLKM_DIV_Z is b. For b > a/2, the value of I2S_TX_CLKM_DIV_Z is (a-b).

int div_z;

// For b <= a/2, the value of I2S_TX_CLKM_DIV_YN1 is 0 . For b > a/2, the value of I2S_TX_CLKM_DIV_YN1 is 1.

int div_yn1;


In short:

The fractional divider changed from the old AB‑style (ESP‑IDF v4) to the new high‑resolution XYZ‑style (ESP‑IDF v5).

Below is the function that extracts the fractional components (X, Y, Z, YN1) from the I²S registers according to the new scheme.


Then, for those still unsatisfied, I also provide:

  • a function to re‑scan for better fractional values
  • a function to re‑apply the new divider settings

I2S: Get Current Clock 

void i2s_current_clock(i2s_port_t port, clock_info_t* clk)

{    

    clk->clk_enable = REG_GET_FIELD(I2S_TX_CLKM_CONF_REG(port), I2S_CLK_EN);

    clk->clk_active = REG_GET_FIELD(I2S_TX_CLKM_CONF_REG(port), I2S_TX_CLK_ACTIVE);


    // Select I2S Tx module source clock. 0: XTAL clock. 1: APLL. 2: CLK160. 3: I2S_MCL

    clk->clk_sel = REG_GET_FIELD(I2S_TX_CLKM_CONF_REG(port), I2S_TX_CLK_SEL);

    clk->bits_mod = REG_GET_FIELD(I2S_TX_CONF1_REG(port), I2S_TX_BITS_MOD);

    clk->bck_div = REG_GET_FIELD(I2S_TX_CONF1_REG(port), I2S_TX_BCK_DIV_NUM);


    switch (clk->clk_sel) {

        case 0: clk->src_clk = 40000000;   break; // XTAL 40 MHz

        case 1: clk->src_clk = 491520000;  break; // APLL 491.52 MHz

        case 2: clk->src_clk = 160000000;  break; // PLL160 MHz

        case 3: clk->src_clk = 0; break; // external MCLK

        default: clk->src_clk = 0; break;

    }


    clk->div_num = REG_GET_FIELD(I2S_TX_CLKM_CONF_REG(port), I2S_TX_CLKM_DIV_NUM);


    clk->div_x   = REG_GET_FIELD(I2S_TX_CLKM_DIV_CONF_REG(port), I2S_TX_CLKM_DIV_X);

    clk->div_y   = REG_GET_FIELD(I2S_TX_CLKM_DIV_CONF_REG(port), I2S_TX_CLKM_DIV_Y);

    clk->div_z   = REG_GET_FIELD(I2S_TX_CLKM_DIV_CONF_REG(port), I2S_TX_CLKM_DIV_Z);

    clk->div_yn1 = REG_GET_FIELD(I2S_TX_CLKM_DIV_CONF_REG(port), I2S_TX_CLKM_DIV_YN1);



    double frac = 0.0;

    if (clk->div_x && clk->div_z) 

    {

        int a,b;

        if (clk->div_yn1 == 0) {

            // b <= a/2

            b = clk->div_z;

            a = clk->div_z * (clk->div_x 1) clk->div_y;


        } else {

            // b > a/2

            int a_minus_b = clk->div_z;

            a = a_minus_b * (clk->div_x 1) clk->div_y;

            b = a - a_minus_b;

        }

        frac = (double)b / (double)a;

    }


    clk->divider = (double)clk->div_num frac;

    clk->mclk = (clk->src_clk && clk->divider) ? ((double)clk->src_clk / clk->divider) : 0.0;


}


I2S: Searching for Optimal High‑Resolution Values for CLKM_DIV_X / Y / Z / YN1

bool i2s_scan_div(double src_clk, double target_mclk, int div_num, int div_a, clock_info_t* clki) 

{

    double target_sub = fabs(src_clk / target_mclk - (double)div_num);

    int match_div_b = 0;

    double *candidate_subs = (double *)malloc(div_a * sizeof(double));

    int div_b;

    double matched = 0;


    // collect div_b

    for (div_b=0; div_b

        candidate_subs[div_b] = (double)div_b/(double)div_a;

    }


    // scaning

    for (div_b=0; div_b

        if (fabs(target_sub - fabs(candidate_subs[div_b])) < fabs(target_sub - fabs(candidate_subs[match_div_b]))) {

            match_div_b = div_b;

        }

    }

    

    if (match_div_b == 0)

        return false;


    if (match_div_b <= div_a/2) {

        // For b <= a/2

        clki->div_yn1 = 0;        

        clki->div_x = (div_a/match_div_b) - 1;

        clki->div_y = (div_a%match_div_b);

        clki->div_z = match_div_b;

     } else {

        //  For b > a/2

        clki->div_yn1 = 1;        

        clki->div_x = (div_a/(div_a-match_div_b)) - 1;

        clki->div_y = (div_a%(div_a-match_div_b));

        clki->div_z = (div_a-match_div_b);

    }


    clki->div_num = div_num;

    clki->divider = (double)clki->div_num (double)match_div_b/(double)div_a;

    clki->mclk = src_clk / ((double)div_num (double)match_div_b/(double)div_a);


    matched = candidate_subs[match_div_b];

 

    free(candidate_subs);


    return fabs(matched)

}


I2S: Reconfiguring CLKM_DIV_X / Y / Z / YN1 

void i2s_set_clock_div(i2s_port_t port, clock_info_t* clk)

{

    // param setting(clkm, bck, mclk...)

    REG_SET_FIELD(I2S_TX_CLKM_CONF_REG(port), I2S_TX_CLKM_DIV_NUM, clk->div_num);

    REG_SET_FIELD(I2S_TX_CLKM_DIV_CONF_REG(port), I2S_TX_CLKM_DIV_X, clk->div_x); 

    REG_SET_FIELD(I2S_TX_CLKM_DIV_CONF_REG(port), I2S_TX_CLKM_DIV_Y, clk->div_y); 

    REG_SET_FIELD(I2S_TX_CLKM_DIV_CONF_REG(port), I2S_TX_CLKM_DIV_Z, clk->div_z); 

    REG_SET_FIELD(I2S_TX_CLKM_DIV_CONF_REG(port), I2S_TX_CLKM_DIV_YN1, clk->div_yn1); 


    // apply

    REG_SET_FIELD(I2S_TX_CONF_REG(port), I2S_TX_UPDATE, 1);


}


Conclusion

With all this work, we confirmed that every sampling rate we intend to support can achieve 0 ppm MCLK error using the ESP32‑S3’s internal PLL and the new I²S fractional divider.

Thus, the MCLK design is complete.

In the end, we didn’t have to do anything special — we simply discovered that the internal PLL is surprisingly capable.


From next time, we can finally move on to the main topic:

the internal design and implementation of the Pakuri‑MOS.