BitBanging I2S with SPI on CH32V203

I don't care, show me the math :D I don't care, show me the code :D How does it work

Background

The CH32V203 is an 80 cent uController with a good amount of IO and hardware capabilities. It features 2 groups of 12-bit ADC, 2 groups of OPAs and comparators, DMA, RTC, low power/sleep mode, 28KB BootLoader, 64KB SRAM, and 224KB flash. With its low power mode, 12-bit ADC, and OPAs, it's almost good enough to create a portable synth (future project). ALMOST good enough, however, it does not have a hardware DAC.

Solutions

There are several solutions to this problem: fast PWM with a filter, R2R ladder, just use square waves, I2S, use a different uController.

Goals

Because the end goal is to make a portable synthesizer I have some goals before we get into the solutions individually. These goals are ranked highest priority to lowest priority from 1 to N

Can generate signals properly up to 10kHz (tested with sine wave)
Low noise output
Higher bit depth for a lower noise floor
Less active components is better

Fast PWM with a Low Pass Filter (LPF)

This solution worked pretty well. Getting the proper frequency also wasn't too bad. However, even with a "state of the art" LPF, a resistor and capacitor, there was a lot of noise coming through. I could filter it more with unity gain op-amp and more LPFs, however, it adds active components and additional power necessity. Not to mention at frequencies higher than 5kHz, this method could no longer produce a signal resembling a sine wave.

R2R ladder

This solution was promising at 8-bit depth and easily controllable with simple port manipulation. (Not 16-bit because the ports are 8-bits wide). This solution uses a series of resistors to create an R2R Ladder. The component cost is quite low, 9x N-Ohm resistors, and 8x 2N-Ohm resistors. With the DMA on the CH32V203 which can be clocked stably at 4MHz, it could create a signal up to 2MHz (1 on cycle and 1 off cycle). However, this is not a clean sine-wave. Obviously, I don't really need a 2MHz sine-wave. So what about a 10kHz sine-wave? At 10kHz we have 400 sample points of a sine-wave, and honestly, thats good enough.

*Correction after already finishing this project I realized that GPIOx->OUTDR can control all 16 ports simultaneously.

Just use a square wave

No, too much harmonics. Doesn't really sound good either.

I2S

WHAT A GREAT IDEA! Use something that already exists to make my life easier. Psych! The CH32V203 doesn't support I2S protocol. However, if I could get it to work, that would give me access to 16/24/32-bit DAC at some arbitrary sampling rate. This would by far have the lowest noise floor of all the proposed solutions.

Use a different uController

I like to make project with what I have on hand. That just so happened to be the CH32V203 and PCM5102A. I could opt to use the CH32V30X which all have 12-bit DACs, but that wouldn't be interesting.

What I Chose and Why

I chose to implement the I2S solution because it seemed promising at the time with the lowest noise floor due to 16/24/32-bit support. The protocol itself also didn't seem that bad to implement during the time (I was wrong).

The R2R ladder seemed promising, but I2S could achieve a lower noise floor (during my misconceived idea that GPIOx->OUTDR could only control 8-bits of output). I also had issues designing a way to change frequencies and shapes that I wanted.

Note after finishing I2S bitbanging

I realized I ran into more issues with I2S including some issues from R2R ladder and ended up solving those. It may be worth while going back and implementing a DMA+R2R 16-bit DAC

How Does it Work?

I2S

I2S has three important signals, BCLK, LRCK/WS, and DIN. BCLK is bit clock, which determines how fast we send bits 2kHz would be 2000 bits/s. LRCK/WS is left/right clock or word select, which determines the amount of bits being sent to the I2S device. LRCK=N*BCLK where N is the bit depth of the audio signal. Lastly, DIN is data in, which determines the amplitude of the output. DIN is sent most significant bit (MSB) first and is signed (2's complement). You can see the specifications on section 9.3.2.2 PCM Audio Data Formats for the PCM5102A

Framework/Platform

For this project, I am using the CH32V203 and the CH32FUN library by CNlohr

BitBanging

I2S is a serial protocol which sends MSB first so we need something that shifts out data, a shift register. Luckily, the CH32V203 has an SPI shift register which can be used for this purpose. The SPI hardware on this device even provides us with a clock, SCK, whose value is determined by (frequency of CPU)/(prescaler). So that's already 2 of our signals, BCLK and DIN.

/*
 * Initializes SPI and DMA
 * SPI Pins
 *  - MOSI PA7
 *  - MISO PA6
 *  - SCK PA5
 *  - NSS PA4
 */
void SPI1_DMA_Init(void) {
    RCC->APB2PCENR |= RCC_APB2Periph_SPI1;
    RCC->AHBPCENR  |= RCC_AHBPeriph_DMA1;

    // Turn of SPI before setting up
    SPI1->CTLR1 &= ~SPI_CTLR1_SPE;
	SPI1->CTLR1 = SPI_CTLR1_MSTR |
                  SPI_CTLR1_CPOL | 
                  SPI_CTLR1_CPHA |
                  SPI_DataSize_16b |
                  SPI_CTLR1_SSM |
                  SPI_CTLR1_SSI |
                  SPI_BaudRatePrescaler_64 | 
                  SPI_CTLR1_SPE;

    // Enable SPI DMA
    SPI1->CTLR2 = SPI_CTLR2_TXDMAEN;

    // DMA1 Channel3 (SPI1_TX)
	DMA1_Channel3->CFGR = DMA_CFGR3_MINC | 
                          DMA_CFGR3_DIR |
                          DMA_CFGR3_CIRC |
                          DMA_CFGR3_PSIZE_0 | 
                          DMA_CFGR3_MSIZE_0;
	DMA1_Channel3->PADDR = (uint32_t)&SPI1->DATAR;
    DMA1_Channel3->MADDR = (uint32_t)audio_buffer;
    DMA1_Channel3->CNTR  = SAMPLES;

    // Enable DMA
    DMA1_Channel3->CFGR |= DMA_CFGR3_EN;
}

The code above also sets up DMA to copy whatever is in audio_buffer into SPI1 Data Register. The audio_buffer array is empty for now.

Lastly, I need to generate LRCK/WS. The CH32V203 can achieve this with its hardware timers. I used TIMER2 with external trigger (set to trigger when SCK is falling) and set the auto-reload to 31 and with 50% duty cycle 16 cycles on and 16 cycles off (16 bits for both left and right channels).

/*
 * Set up TIM2 to trigger based on SPI SCK so we can have "synced" clocks. The
 * clocks aren't synced perfectly, but good enough to bitbang I2S. So it's
 * functionally good enough. Kind of ugly solution, must connect PA5 (SCK+BCLK)
 * to PA0 (TIM2_ETR). The output for WS/LRCK is PA1.
 */
void TIM2_Init_SPI_Trigger(void) {
	// Enable TIM2 clock
	RCC->APB1PCENR |= RCC_APB1Periph_TIM2;

	// Reset TIM2
	RCC->APB1PRSTR |= RCC_APB1Periph_TIM2;
	RCC->APB1PRSTR &= ~RCC_APB1Periph_TIM2;

	// Set auto-reload (ARR) for PWM frequency (divide by 32) 16 Left + 16 Right
	TIM2->ATRLR = (BITS_PER_SAMPLE<<1)-1;
	TIM2->CH2CVR = (TIM2->ATRLR+1) >> 1;

	// Configure CH2 as PWM1
	TIM2->CHCTLR1 &= ~0xFF00;
	TIM2->CHCTLR1 |= TIM_OC2M_1 | TIM_OC2M_2;
	TIM2->CCER |= TIM_CC2E;

	// External trigger source = ETRF
	TIM2->SMCFGR = TIM_TS_ETRF | TIM_MSM | TIM_ETP;

	// External clock mode
	TIM2->SMCFGR &= ~TIM_SMS;
	TIM2->SMCFGR |= TIM_SMS;

	// Start timer
	TIM2->CTLR1 |= TIM_CEN;
}

Now all the signals are generated. But the signal isn't being generated. This is because we have not yet populated the audio buffer with 16 bit audio. This can be done with a simple for loop and math. BUT WAIT A MINUTE! The CH32V203 doesn't have a floating point unit! So instead of using sine to calculate and use software floats, we can instead use a sine look-up-table (LUT). There's a bit of whacky math which determines the best audio_buffer size, but it's really not that interesting. You can check out the math here

/*
 * Integer based rounding
 * Not fast, but good enough
 */
uint32_t iround(uint32_t num, uint32_t den) {
    return num % den < den>>1 ? num/den : num/den+1;
}

volatile uint16_t freq = 9500;
/*
 * Put the sounds into the buffer thing
 */
void update_dma_buffer(void) {

	// Get the decimal to be smaller by multiplying it by this value
	// the "factored" portion
	// See math here https://www.desmos.com/calculator/kenuknb777
	// Better solution: Find the number that minimizes the decimal instead of arbitrarily multiplying SAMPLES
    // Also better: rounding > floor
	uint32_t factored = iround(freq*3000, (RATE<<1));

    // Create factored_buffer size
    uint32_t factored_buf = iround((RATE<<1) * factored, freq);

    // Debug
    printf("%lu\n", factored);
    printf("%lu\n", factored_buf);

    // Fill Buffer until factored_buffer size
    for(int i=0;i<factored_buf;i++) {
        // TODO: Fix issue misalignment part 2
        // NOTE: This shouldn't happen according to calculations. Will figure out later issue occuring at 9500Hz
        // NOTE: Issue happens at beginning and end of buffer and each time it loops
        // HACK: Issue doesn't seem to appear if using "max" buffer size of 3000 
        // SAMPLES: 2*PI essentially
        // factored: skip this much through the LUT
        // factored_buf: size of buffer factored
        audio_buffer[i] = sine_lut[iround(i*factored*SAMPLES,factored_buf)%4096];
    }

    DMA1_Channel3->CFGR &= ~DMA_CFGR3_EN;
    DMA1_Channel3->CNTR = factored_buf;
    DMA1_Channel3->CFGR |= DMA_CFGR3_EN;
}

Why?

I want to make a portable synth, inspired off the HiChord portable synth. However, I want to make it out of affordable parts because SURELY it doesn't cost $300 (cost of a HiChord) to make. Going back, I might retry implementing the R2R method if I have the right resistors on hand.