Veronica | Blondihacks

Veronica – GPU Interface

December 19, 2012

It’s been a long time coming.

It’s been quite an odyssey getting to this point. One of the tougher challenges I’ve encountered so far on this project is setting up an interface between the 6502 CPU and the ATMega324pa GPU. Both processors are quite good at their little jobs in their respective worlds, but getting them to talk to each other nicely has been educational indeed. I’ve spent many an hour over the past few months with this test setup on my bench:

Veronica’s guts are splayed out on the left, my VGA Dev Board is in the vice, and on the breadboard is the interface circuit in-progress. Many different approaches have been tried here!

The reason this is so challenging is that this is where everything I’ve done so far on Veronica meets in the middle. This is where I find out if my bus architecture is any good, if my ROM code is executing properly on the 6502, if my memory-map address decodes are right, if my GPU code is talking to the VRAM properly, if the VGA signals are all being generated properly, and a thousand little details I’m glossing over here. The point is that, while I tested all these systems in isolation, you find all manner of new problems when you start connecting them to each other.

To recap, my overall design for this is that the GPU acts pretty independently, and handles all the pushing of pixels. The CPU only sends small commands about what things to render in what locations. In 8-bit tech terms, this is closer to a Nintendo than an Apple //. Old game consoles (and some of the better 8-bit computers) used custom chips for generating and rendering video, since it’s difficult for a slow CPU to do and still have any time left to do anything else. The ATMega in Veronica is basically playing that role. I can’t fabricate custom silicon, but I can fake it with software on a fast microcontroller.

My initial plan for CPU/GPU communication was to use the extra memory I have at the top of my frame buffer as shared space for both processors. Since the frame buffer is only about 60k in size, I have 4k at the top that isn’t in use. I figured I would use that space for (among other things) receiving commands from the CPU for what to render.

Well, after many trials and tribulations, I ended up settling on something quite a bit simpler than sharing the upper 4k of VRAM. Getting the two processors to share that memory without disrupting the VGA signal generation proved very difficult, and more trouble than it was worth. The AVR isn’t really designed to use external memory, and it really isn’t designed to share external memory with someone else. As a result, it just doesn’t have the right set of interrupts and control signals that would be needed to do this well. Furthermore, I simply ran out of pins! Using external 16-bit parallel-access RAM eats up a great many pins, and the VGA generator chews up the rest. I needed to accomplish this communication channel with the CPU using very few control signals.

What I ended up doing was setting up a sort of “virtual register” on the ATMega. This acts as an 8-bit buffer that can receive a command from the CPU asynchronously. This register is memory-mapped into the 6502’s memory space. Processing the command in this register is done on an interrupt on the ATMega, so that the CPU won’t have to wait. However, the interrupt structure on the AVR isn’t set up quite right for this. The VGA signal generator is using a timer interrupt, but this virtual register needs to trigger an external interrupt. External interrupts on the AVR are always higher priority than internal ones, so the CPU will unavoidably be able to interrupt the VGA signal generator.

To compensate for that, the CPU has to play nice and only send commands during the vertical blank (VBL). That means it needs to know when the vertical blank is! So, I make this same “virtual register” bi-directional, and it does double-duty as a status register for the GPU. The GPU sets a bit to indicate the VBL is underway, and it’s safe to send rendering commands.

The virtual register maps to memory address $EFFF, which is an area reserved in Veronica’s memory map for communicating with peripherals and such. The VGA board has some address decode logic that detects attempts to read or write to that address, and reads or writes the virtual register in response. The address decode logic is a pair 0f 74HC688 8-bit comparators, which are great chips for decoding a single address without a ton of boolean logic. To save even more chips, both the “reading” and “writing” version of this communication register are at the same address.

The trick here is that this “virtual register” is really two different chips that are selected by the memory R/W signal from the CPU. When the CPU attempts to read, a 75HC541 buffer is activated, which drives the data bus. The inputs on the ‘541 are connected to PORTB on the ATMega, which handles all the control signals (including the VBL). When the CPU attempts to write to $EFFF, the ‘541 is switched off, and a 74HC573 8-bit latch is activated. This needs a little more explanation.

You see, when the CPU sends a rendering command, the GPU can’t process it right away. It may be busy doing other things (such as other rendering tasks). So, the virtual register interrupt handler needs to send out a signal when it’s ready and the GPU’s data bus is free for use by the CPU. There’s a small delay there, because the GPU needs to handle the interrupt and set up some states to receive the data. During that delay, the CPU will have already moved on and given up writing its command to the data bus. Here’s what that looks like on the oscilloscope:

What’s 100 nanoseconds between friends? Sadly, “close” only counts in horseshoes and hand grenades, kids.

The green pulses are the “address decoded” and “write” signals (active high) from the interface logic, indicating that the CPU is writing a command to the virtual register right the hell now. The red pulse (active low) is the GPU saying, “okay, I’ve cleared all other traffic off my data bus, go ahead and write to it”. However, see that little gap between when the green pulses end and the red one begins? By the time the GPU says “go ahead”, the CPU has already given up.

My solution is to latch the data into a bank of flip-flops, so that the GPU can get to it when it’s ready, and the CPU can keep calm and carry on. That’s the magic of the 74HC573.

So let’s cut to the chase. What does all this look like? Well, brace yourself. Here’s what I believe will be version A of Veronica’s VGA display board.

Apologies for the messiness. All the power pins and unused gates are invoked and the decoupling caps are shown, because this will be used to layout and etch a PCB.

It’s easily the most complex Veronica board to date, but it’s really not as gnarly as it looks once you understand the basic workings of it (which I hopefully explained well enough above).

Here’s the prototype, finally working after many many long nights. My hands still bear the imprint of the logic probe (incidentally, one of the more underrated tools in hacking circles, in my opinion).

The interface is quite simple, in the end. That’s usually a good sign. You should have seen some of the design monstrosities that didn’t make the cut.

There’s some critical code needed to drive this logic on the GPU side. When commands come in over the system data bus and are decoded, an IRQ is sent to the AVR. That’s handled like so:

// Configure INT2 (interrupt 2, on line PB2) to be a falling // edge interrupt that we'll get from the CPU // See ATmega324PA datasheet p68 cbi EIMSK,INT2_BIT ldi accum,EICRA ori accum,_BV(ISC21) | _BV(ISC20) sts EICRA,accum cbi EIFR,INTF2 sbi EIMSK,INT2_BIT ... ///////////////// // CPU Interrupt Service Routine // INT2_vect: // Save registers push accum in accum,SREG push accum in accum,PORTD push accum in accum,DDRD push accum // Enable the CPU's bus access cbi PORTB,cpuBusAllow // Read the data from the CPU ldi accum,0x00 out DDRD,accum ldi accum,0xff out PORTD,accum nop in commandBuffer,PIND // Disable the CPU's bus access sbi PORTB,cpuBusAllow // Restore registers pop accum out DDRD,accum pop accum out PORTD,accum pop accum out SREG,accum pop accum reti

(apologies for the clunky formatting- the WordPress gist embedder is not cooperating these days)

That interrupt handler needs to be as simple and quick as possible, so as to not disrupt the GPU’s other rendering that may be executing. Note that even this short little handler is enough to disrupt the VGA sync pulses (and cause the monitor to lose signal lock), which is why it’s important that the CPU only act during the vertical blank. This handler simply copies the commands byte to a buffer register for later processing on the GPU’s main code loop.

Speaking of the GPU’s main code loop, here it is:

mainloop: // Wait for VBL to process commands sbrs VBL,0 rjmp mainloop // Switch on command buffer value cpi commandBuffer,0x01 breq redirectTR1 cpi commandBuffer,0x02 breq redirectTR2 cpi commandBuffer,0x03 breq redirectTR3 cpi commandBuffer,0x04 breq redirectTR4 noCommand: rjmp mainloop redirectTR1: clr commandBuffer rcall testRender1 rjmp mainloop redirectTR2: clr commandBuffer rcall testRender2 rjmp mainloop redirectTR3: clr commandBuffer rcall testRender3 rjmp mainloop redirectTR4: clr commandBuffer rcall testRender4 rjmp mainloop

This is just a temporary mockup, but you can see that ultimately it will be a simple jump table to the various rendering routines provided by the GPU.

Now let’s see it in action. For that, the CPU needs to actually request a command. For a very long time, Veronica’s CPU has frankly been doing squat besides sitting there looking all “retro”. Sure, I built a fancy on-board EEPROM programmer and everything, but it’s been waiting for some way to interact with the outside world. Now, fingers crossed, we have that. Here’s some test code in 6502 assembly that I put in ROM to run at startup:

ldy #$ff OUTER: ldx $ff INNER: nop nop nop nop nop nop nop nop nop nop nop nop nop nop nop nop nop nop nop nop dex bne INNER dey bne OUTER WVBLLO: lda $efff cmp #$00 bne WVBLLO WVBLHI: lda $efff cmp #$ff bne WVBLHI lda #$04 sta #efff lda #$03 sta $efff HALT: jmp HALT

That code waits 4 seconds (roughly, assuming 1Mhz clock speed), then starts polling the GPU status register by reading from $EFFF. It first waits for the VBL to go low, then waits for it to go high again. This is to prevent attempting some rendering right at the end of the blanking. If we start rendering, and the VBL is already underway, we don’t know how much time we have. By waiting for the VBL signal to go low, then high, we ensure we’re catching the start of the blanking period. Note that we can attempt rendering at other times if we want, but it will cause visual glitches on the display, and can potentially cause the monitor to lose sync for a moment if one of the critical control pulses is interrupted. This is not so different from 1980s computers, except they were immune to losing the video signal lock because they generated their critical sync signals in hardware.

Once the CPU has found the VBL, it sends two commands by writing values to $EFFF. The first one ($04) is the command to clear the screen. The second command ($03) renders a test sprite which is an 8×8 pixel red box.

Incidentally, that code was written and assembled with this nice little online 6502 assembler that google turned up. If I was being truly purist, I’d be hand-assembling all code at this point, since it’s technically cheating to use modern tools to get the old-timey code in there. In the old days, until the computer was up enough to use it to write its own code, hand-assembly was the only way! Well, I’ve done my time (back in the day) in that regard, so I’m willing to skip that part of the retro experience now. If pressed, I could probably still rattle off the bit patterns for all the 68000’s addressing modes from memory. Good times.

So, with that code uploaded to Veronica’s main ROM, does it work? Let’s find out. In this video, I power up Veronica, and you see the usual splash screen being rendered by the GPU (that’s hardcoded, and has nothing to do with the CPU). Then I flip the switch to start the CPU, which runs the code shown above.

Doesn’t seem like much, clearing the screen and drawing a little red box by the CPU’s request. But a whole, whole, whole lot of things have to work in perfect harmony to make that happen. It’s been a good day.

Veronica

Blondihacks

Category: Veronica

Veronica – GPU Interface

It’s been a long time coming.