Category: Veronica

Veronica – GPU Recap

By popular demand.

 

Housekeeping Note: I’ve added a new column on the left that groups Veronica posts together. If you want to follow this entire project, they’re all there, with the oldest at the bottom

 

So, now that the graphics board seems to be working (a good engineer never trusts anything completely), I can finally clean up the schematics and code a bit. People have been asking to see both for quite a while now, but you would not have wanted to see the pile of elephant vomit that was my project folder for the past 4 months. It’s still not pristine, but I think it’s at least constructive to share at this point.

First things first, the updated schematic for the VGA board (and the Eagle file):

..

Still cluttered, but you get what you pay for.

 

The primary difference from the earlier version of this board is the replacement of the 74HC573 latch with the 7200L FIFO. I do not have an updated PCB layout for this board, as I haven’t gone in and rerouted all the traces for this new version. I don’t plan to do that unless I need to re-etch this board for some reason. If you want the old BRD file as a starting point, it’s still available here.

 

Now, on to the firmware code for the GPU. It’s a single AVR assembly file, with a few parts broken out into .h files which are included for clarity. The makefile is included, which should work in your IDE of choice. It relies on GNU’s m4, so be aware of that if you plan to build this code. This code will not build as-is under AVR Studio, or other non-GNU AVR environments. If you’re on any flavor of un*x system, you’ll have m4 already, and assuming you have the GNU AVR toolchain installed, it should just build and go.

Here’s the complete code package. I’m not going to cover every line in gory detail- I refer you to the comments for that. I’ll just hit a few highlights here. Apologies for the funky indenting in the code samples below. I’ve yet to deduce the voodoo to get github-gists to indent the way I want.

The meat of the firmware is, of course, the VGA signal generator. That’s been covered in gory detail before, so I’m not going to go over that again. I will mention that I’ve changed my approach slightly. The VGA signals are still bit-banged out of PORTB, but instead of computing the various states, I now read them from a look-up table. The table looks like this:

// Frame, VSync, VBL, VRAMH
scanLines:

.byte 0,1,7,0x00    	// 00
.byte 0,1,7,0x00
.byte 0,0,7,0x00
.byte 0,0,7,0x00
.byte 0,0,7,0x00
.byte 0,0,7,0x00
.byte 0,0,7,0x00
.byte 0,0,7,0x00
.byte 0,0,7,0x00
.byte 0,0,7,0x00
.byte 0,0,7,0x00
.byte 0,0,7,0x00
.byte 0,0,7,0x00
.byte 0,0,7,0x00
.byte 0,0,7,0x00
.byte 0,0,7,0x00
.byte 0,0,7,0x00
.byte 0,0,7,0x00
.byte 0,0,7,0x00
.byte 0,0,7,0x00
.byte 0,0,7,0x00		// 20
.byte 0,0,7,0x00
.byte 0,0,7,0x00
.byte 0,0,7,0x00
.byte 0,0,7,0x00
.byte 0,0,7,0x00
.byte 0,0,7,0x00
.byte 0,0,5,0x00

...

.byte 0,0,0,0xEF
.byte 0,0,0,0xEF
.byte 0,0,7,0x00    	// 517
.byte 0,0,7,0x00
.byte 0,0,7,0x00
.byte 0,0,7,0x00
.byte 0,0,7,0x00
.byte 0,0,7,0x00
.byte 0,0,7,0x00
.byte 0,0,7,0x00		// 524
.byte 1,0,0,0x00		// End of frame marker. This does not count as a scanline

As the scanline interrupts fire, the Z register is used to walk through that table line-by-line. Each line provides all the state needed for that pass- the state of the vertical blanking, the VRAM address of the line, and so on. This saves a ton of registers and computation, and makes the code quite tidy. Rather than saving and restoring the Z-register every time the interrupt fires, I just globally reserve it. No other code may use it. I do the same for a number of other registers. The ATMega has enough registers to get away with this, and it saves a lot of cycles in the interrupt service routine (which it sorely needs). Note that I have to use the Z register here, because I’m indexing into program memory (where the scanline table lives). Only the Z register can be used for indirect addressing into program space. If this becomes a major problem, I can copy the scanline table into SRAM (using the .data segment). I currently do this with the font data, so that I can indirect into it with the X register for rendering.

The second core component to the firmware is the command buffer handling. When the 6502 issues rendering commands (by writing to system memory address $EFFF), the VGA board intercepts those bytes and pushes them into the hardware FIFO. During vertical blanking, when the GPU isn’t doing much else, it pulls commands out of this FIFO and processes them. Here’s the code:

mainloop:

    // Wait for VBL to process commands
	sbrs	VBL,2		// Bit 2 of VBL register is our rendering window
	rjmp	mainloop

	// Poll the FIFO to see if there's a command pending
	in		accum,PINB
	sbrs	accum,fifoReady
	rjmp	mainloop

	// Set up Port D as an input
	ldi		accum,0x00
	out		DDRD,accum
	ldi		accum,0xff
	clr		accum
	out		PORTD,accum
	nop						// Make sure PORTD has changed direction

	// Read from the FIFO
	sbi		PORTB,vramOE	// Kick VRAM off the bus so we can read the FIFO
	cbi		PORTB,fifoRead
	nop						// Allow for FIFO data setup time
	sbi		PORTB,fifoRead
	in		regS,PIND
	cbi		PORTB,vramOE	// Give bus back to VRAM

	// See if this is a new command, or completing a previous two-byte packet
	lds		accum,CMDBUFFER
	cpi		accum,0
	brne	mainHaveParamByte
	sts		CMDBUFFER,regS

	rjmp	mainloop

mainHaveParamByte:
	sts		CMDPARAM,regS
	lds		regT,CMDBUFFER
	clr		accum
	sts		CMDBUFFER,accum

	cpi		regT,NUM_COMMANDS
	brge	mainloop		// Illegal command, ignore it

	// Bounce off a jump table based on command value
	lsl		regT
	ldi		XL,lo8(pm(renderJumpTable))
	ldi		XH,hi8(pm(renderJumpTable))
	clr		accum
	add		XL,regT
	adc		XH,accum

	push	XL		// Using ret, trick AVR into jumping to an indirect address
	push	XH
	ret				// Note that we're not using ijmp, because we can't use Z

renderJumpTable:
	nop
	rjmp	mainloop
	rcall	renderFillScreen
	rjmp	mainloop
	rcall	renderPlotChar
	rjmp	mainloop
	rcall	renderPlotStr
	rjmp	mainloop
	rcall	renderFontColor
	rjmp	mainloop

In a nutshell, it waits for the VBL, polls the inverted Empty Flag on the FIFO (“not empty” means at least one byte is in there) and springs into action if needed. To read the command byte, it takes control of the VRAM bus (which is shared with the FIFO), reads the byte by twiddling the control lines of the FIFO, and stores the byte in internal SRAM. When two bytes have been read (forming a two-byte render command packet), the command is processed.

The first byte of the command packet is an index into a jump table containing pointers to the various render routines. Because of how I’m doing the pointer math here, I’m ultimately limited to 127 different render commands. Surely that will be enough. As you can see, there are currently only four- filling the screen, plotting a character, changing the font color, and plotting a “string” (which just means plotting a character and advancing the cursor). One trick to note here- the ATMega has the opcode ‘ijmp’, intended for things like jump tables. However, it requires the Z register, which is reserved by the interrupt handler. Instead, I’m using a very old school trick to jump indirectly- I calculate the address I want to jump to, push it on the stack, then ‘ret’ (return from subroutine). The ‘ret’ code pulls a two-byte address of the stack and jumps to it. You can’t modify the program counter directly on AVRs, but you can easily fool it into modifying itself the way you want.

The rest of the code is pretty boilerplate stuff, I think. I’ll show one rendering command, just so you can see what those look like. Here’s plotting a character:

/////////////////
// Plot a single character
//
// Parameter: The ASCII value to plot
//
renderPlotChar:

    // Take control of VRAM
	EnableVRAMWrite

	lds		accum,CMDPARAM
	subi	accum,0x41			// Offset from start of ASCII
	brcc	plotCharASCII		// A character we can render?
	rjmp	plotCharDone

plotCharASCII:

	// Compute pointer to desired character
	ldi		XH,FONTADDR_H
	ldi		XL,FONTADDR_L

	ldi		regS,4				// Offset by 4 bytes per char
	mul		regS,accum
	add		XL,r0
	adc		XH,r1

	// Compute high VRAM address for Y cursor position
	lds		regT,CURSORY
	lsl		regT
	lsl		regT
	lsl		regT

	out		PORTC,regT
	ldi		regT,CHARBYTES

plotCharOuter:

	// Compute low VRAM address of X cursor position
	lds		regS,CURSORX
	lsl		regS
	lsl		regS

	// Load two rows of character from font
	ld		regU,X+

	// 0 - Render odd row, leftmost pixel
	out		PORTA,regS
	clr		accum
	sbrc	regU,7
	lds		accum,FONTCOLOR

	out		PORTD,accum
	PulseVRAMWrite

	inc		regS

	// 1
	out		PORTA,regS
	clr		accum
	sbrc	regU,6
	lds		accum,FONTCOLOR

	out		PORTD,accum
	PulseVRAMWrite

	inc		regS

	// 2
	out		PORTA,regS
	clr		accum
	sbrc	regU,5
	lds		accum,FONTCOLOR

	out		PORTD,accum
	PulseVRAMWrite

	inc		regS

	// 3
	out		PORTA,regS
	clr		accum
	sbrc	regU,4
	lds		accum,FONTCOLOR

	out	PORTD,accum
	PulseVRAMWrite

	inc	regS

	// Next line
	in		accum,PORTC
	inc		accum
	out		PORTC,accum
	subi	regS,0x4

	// 0 - Render even row, leftmost pixel
	out		PORTA,regS
	clr		accum
	sbrc	regU,3
	lds		accum,FONTCOLOR

	out		PORTD,accum
	PulseVRAMWrite

	inc		regS

	// 1
	out		PORTA,regS
	clr		accum
	sbrc	regU,2
	lds		accum,FONTCOLOR

	out		PORTD,accum
	PulseVRAMWrite

	inc		regS

	// 2
	out		PORTA,regS
	clr		accum
	sbrc	regU,1
	lds		accum,FONTCOLOR

	out		PORTD,accum
	PulseVRAMWrite

	inc		regS

	// 3
	out		PORTA,regS
	clr		accum
	sbrc	regU,0
	lds		accum,FONTCOLOR

	out		PORTD,accum
	PulseVRAMWrite

	// Next pair of lines
	in		accum,PORTC
	inc		accum
	out		PORTC,accum
	dec		regT
	breq	plotCharDone
	rjmp	plotCharOuter

plotCharDone:

	// Clean up and we're done
	DisableVRAMWrite
	ret

That’s just an unrolled loop to iterate through the bits in a character’s bitmap, and store the “font color” byte into VRAM for every set bit. This version forces a black background, but I also have a transparent version which just skips rendering for 0 bits in the character. The latter is slower, since it has a lot more branching in it. I may add the ability to set a text background color instead.

That’s all there is to it! It’s really not a lot of code. It will be interesting to see if this system holds up to real world use as I start to build out the other areas of Veronica, and start to write real software.

Veronica