Compy FPGA – Palettes and line buffers

A long time have passed since I started implementing Compy in FPGA, and I’ve made a lot of mistakes but also I’ve learned a lot. I knew a lot less about computer architecture and design than I thought I did! After being stuck for a long time with basic issues, these last weeks I’ve been able to advance a lot faster and more importantly, with better knowledge to know why things break and how to fix them.

I started writing about all this as single post, but it grew a bit, so I splat it into three posts

Compy FPGA – Palettes and line buffers (this one)
Compy FPGA – VRAM and Cornet CPU
Compy FPGAs and how I got here

I’ll start with some design changes and improvements to the design of Compy that were driven by the FPGA implementation, and I will end talking about the main issues that I had to fight against and how I was able pass beyond them.

The global palette and output modes

Chroni uses up to 256 colors from a 16 bit RGB palette (RGB565). At the beginning I thought about putting the palette anywhere in VRAM but then I realized that reading pixel data and color data from the same memory at the same time will leave too little time to add sprites later, you must know that there are limited cycles between outputting one pixel and the next, specially on the high resolution video modes. If you require more cycles than allowed between pixels, your screen will just glitch.

How older machines resolved this problem? Older computers had fixed palettes, so they don’t have load data from memory to know which RGB values they need for that pixel. Computers like the Atari800, C64 and ZX Spectrum, they all have fixed and well known palettes. Then the IBM-PC had a different method, the RGB values are not stored in memory but in registers, so I went this route instead: You just write a palette index and then you write two values to store the RGB565 color for that index. Later Chroni will read these registers to know the RGB565 color to use for a certain 8-bit pixel value.

You will see that this approach opened the alternatives for other interesting features!

This is an example code setting the RGB565 color for palette index #9

BG_COLOR = $29AC 

LDA #0
STA $9004

LDA #<BG_COLOR
STA $9005

LDA #>BG_COLOR
STA $9005

Thanks to a suggestion from the ATLAS group in Telegram, I ended up using dual port RAM blocks for storing the palete. One of the advantages of this method is that you can read and write values using different clocks. I defined that the system (CPU, main memory, and others) will run at a fixed frequency (100Mhz based), but the output (VGA) will run at the frequency driven by the output resolution (25Mhz – 150Mhz). This separation allows Compy to use different output modes without affecting the computer speed. Note that older computers were strictly tied to the screen output, so PAL and NTSC computers ran at slightly different clock rates and they were all based on the pixel clock frequency which was around 3.58Mhz. With the amount of available video options these days, I think that it’s very restrictive to tie the computer speed to the output frequency of certain video mode.

Line buffers

Older computers also read one pixel at a time when rendering to the screen, but this restricts the number of operations that you can do for each pixel, also this is one of the reasons the computer speed is tied to the output frequency. As I separated the main system from the video output I needed a way to render the pixels not at the same time that they are going out to the screen, so I first render one line of video and then the output logic will just read this line buffer when needed, at the required output frequency.

This method has several advantages. It’s easier to:

Use any scaling method on the output
It’s easier to define the final pixels, for example when overlaying sprites
Reading the pixel is just reading one pixel value and then the palette entry.

Computers like the Atari800 had this kind a separation but at a pixel scale. The ANTIC chip was responsible of reading the memory and ended outputting just a bit encoded color index, then this bitstream was fed into the GTIA, which was responsible of turning this color code to the signal required by the television encoding (final pixel).

Of course, line buffers are also implemented as dual port RAM. On one port the pixels are calculated and defined, and on the other port the final pixel values are just read. Using double buffer, before each scan line has to be sent to the video output, one line buffer is populated while the other is being drawn.

Continue on: Compy FPGA – VRAM and Cornet CPU

Updates