For the last few weeks I've been playing around with MiSTer, which is a project to develop, port, and package recreations of historical computers, games consoles, and arcade logic boards for use on an FPGA development board designed by Terasic, called the DE10-Nano.
I'm sure I'll have more specific things to say about MiSTer in the coming weeks as I assemble the SDRAM addon board and otherwise prepare it to take up residence near my TV, but I currently have just the DE10-Nano board and so my options for which cores to use are limited: most of them require the SDRAM addon in order to preserve accurate memory access timing.
One core that seems to work just fine without the addon board is the Sega Genesis core. First, to get it out of the way: I grew up in the UK and so I knew this console as the Sega Mega Drive, but because I now live in the US and because the MiSTer distribution refers to it as the Genesis I'm going to use that name here.
During the heydey of the Sega Genesis my own daily driver was an Amiga 500 Plus, and indeed that's the machine that played host to most of my early programming exploration (after some earlier time spent with the Commodore VIC-20 and Commodore 64).
At the time, I had a friend who owned a Sega Mega Drive and I was aware that it shared the same Motorola 68000 CPU as my Amiga, but in those days there was not a generally-available resource on the details of the other parts of the hardware, except for official Sega licencees and companies with enough resources to reverse-engineer parts themselves. Given that some of the game library was common between Amiga and Genesis though, at the time I'd assumed the other hardware was likely similar in design too, and had no obvious way to know for sure.
Having an FPGA recreation of the Sega Genesis hardware on my workbench inspired me to finally delve in and learn more about it. These days there are many resources online describing the architecture and functionality of the individual custom chips, including copies of the official Genesis Development Manual that Sega had shared with official licensees.
I'm not going to delve into all of the details here because that would just duplicate information readily available elsewhere. Instead, I'm going to talk about my brief adventure in learning enough about it just to program the graphics chip — officially known as the Visual Display Processor or VDP — to say "Hello", and talk a little about how it works along the way.
I referred to a number of different online resources along the way here and sadly did not make good notes about them all, but some key parts of my program that we'll be exploring below are derived from the example by Matt Philips in his article Awaking the Beast, and that blog includes a number of other articles exploring topics such as the VDP's sprite capabilities, the sound hardware, etc. By coincidence, only a few days after I was working on this (after I initially wrote this article) the YouTube channel Computerphile also published a video Sega Megadrive Hello World which covers the same content and is presented by the very same Matt Philips.
Sega Genesis Testing Environment
During the main period of software development for the Sega Genesis, Sega would sell licensees specialized equipment to enable their development, which seems to have come in several forms over the years but the common idea is to bundle the stock Sega Genesis hardware with some additional boards that would substitute SDRAM instead of cartridge ROM — to avoid constantly writing ROM chips during development — and that provided a debugger interface controlled by a development PC used alongside.
Here's a photo showing one such development kit, presumably a later one due to the inclusion also of the Sega CD (Mega CD) addon:
The one above seems to be the top part of the main Genesis case, the bottom part of the case from the Sega CD, and then some extra custom stuff in between. You can see a very similar unit, along with the PC software that interacts with it, in the Computerphile video I linked above with Matt Philips. Apparently some studios such as Electronic Arts created their own development kits in-house by reverse engineering, too.
(Unfortunately I wasn't able to figure out the original source of this photo because it seems to have been re-published all over, but I hope the original photographer doesn't mind me including this just for exposition.)
Fortunately, modern Sega Genesis homebrew dabblers like myself have the luxury of using software emulators and FPGA reimplementations for testing, and so I did all of what I'm describing below on my main PC and just launched the resulting ROM image in the emulator dgen, just because it was readily available in Ubuntu's official repository.
Writing Code for a Sega Genesis
Most official Sega Genesis game releases had all or most of their code written in Motorola 68000 assembly language. I know from my experiences on Amiga that there were C compilers available at that time, but it seems that the games known to be written in C (such as Sonic Spinball) tended to suffer from poor framerates.
I expect modern C compilers can do a better job, but I'm ultimately going to be interacting directly with the Genesis hardware here so I decided to just go with the flow and write assembly language. As mentioned above, the Amiga 500 Plus had the same CPU as the Genesis and so I'd written 68000 assembly a bit back in the early 90s, but I'm definitely out of practice!
I used vasm as my assembler, because it
was easy to compile and 68000 assembler seems to be its most well-supported
assembly language. I'm sure I could've got similar results with the GNU
assembler targeting m68k
, but the GNU toolchain is a lot heavier to get
built and up and running.
I built vasm using CPU=m68k SYNTAX=mot
to select the Motorola assembly
language syntax conventions, because that syntax is what I was using on
my Amiga back in the day and I was hoping that I'd be able to find some
muscle memory for it! (I'm not sure it made a lot of difference in practice.)
Other examples online seem to use a different assembler syntax, which might
be what GNU's assembler accepts by default. I'm not sure, but the differences
seem mostly just cosmetic.
As a slight workflow optimization, I wrote myself a trivial little Makefile mainly so I wouldn't have to remember the vasm command line usage:
main.bin: main.asm vasmm68k_mot -Fbin $< -o $@ run: main.bin dgen -S 4 $<
With vasm built (as vasmm68k_mot
in the above) and the Makefile working,
it was time to get stuck in!
The ROM Header
A fun thing about working in assembler is that we can just directly generate the bytes that would normally get written into the ROM chip on a Genesis cartridge, whereas if we were using C we'd likely end up with a little assembler stub and a custom linker script to build it all together.
A Sega Genesis ROM is mapped into the CPU address space at address zero, which is convenient because that is where the M68000 CPU expects to find its initial stack pointer address, initial program counter address, and the interrupt vector table. So our ROM image will with those very things, ready for the CPU to read directly:
rom_header: dc.l $00FFFFFE ; Initial stack pointer value dc.l init ; Initial program counter value dc.l ignore_handler ; Bus error dc.l ignore_handler ; Address error dc.l ignore_handler ; Illegal instruction dc.l ignore_handler ; Division by zero dc.l ignore_handler ; CHK exception dc.l ignore_handler ; TRAPV exception dc.l ignore_handler ; Privilege violation dc.l ignore_handler ; TRACE exception dc.l ignore_handler ; Line-A emulator dc.l ignore_handler ; Line-F emulator dc.l ignore_handler ; Unused (reserved) dc.l ignore_handler ; Unused (reserved) dc.l ignore_handler ; Unused (reserved) dc.l ignore_handler ; Unused (reserved) dc.l ignore_handler ; Unused (reserved) dc.l ignore_handler ; Unused (reserved) dc.l ignore_handler ; Unused (reserved) dc.l ignore_handler ; Unused (reserved) dc.l ignore_handler ; Unused (reserved) dc.l ignore_handler ; Unused (reserved) dc.l ignore_handler ; Unused (reserved) dc.l ignore_handler ; Unused (reserved) dc.l ignore_handler ; Spurious exception dc.l ignore_handler ; IRQ level 1 dc.l ignore_handler ; IRQ level 2 dc.l ignore_handler ; IRQ level 3 dc.l ignore_handler ; IRQ level 4 (horiz. retrace int.) dc.l ignore_handler ; IRQ level 5 dc.l ignore_handler ; IRQ level 6 (vert. retrace int.) dc.l ignore_handler ; IRQ level 7 dc.l ignore_handler ; TRAP #00 exception dc.l ignore_handler ; TRAP #01 exception dc.l ignore_handler ; TRAP #02 exception dc.l ignore_handler ; TRAP #03 exception dc.l ignore_handler ; TRAP #04 exception dc.l ignore_handler ; TRAP #05 exception dc.l ignore_handler ; TRAP #06 exception dc.l ignore_handler ; TRAP #07 exception dc.l ignore_handler ; TRAP #08 exception dc.l ignore_handler ; TRAP #09 exception dc.l ignore_handler ; TRAP #10 exception dc.l ignore_handler ; TRAP #11 exception dc.l ignore_handler ; TRAP #12 exception dc.l ignore_handler ; TRAP #13 exception dc.l ignore_handler ; TRAP #14 exception dc.l ignore_handler ; TRAP #15 exception dc.l ignore_handler ; Unused (reserved) dc.l ignore_handler ; Unused (reserved) dc.l ignore_handler ; Unused (reserved) dc.l ignore_handler ; Unused (reserved) dc.l ignore_handler ; Unused (reserved) dc.l ignore_handler ; Unused (reserved) dc.l ignore_handler ; Unused (reserved) dc.l ignore_handler ; Unused (reserved) dc.l ignore_handler ; Unused (reserved) dc.l ignore_handler ; Unused (reserved) dc.l ignore_handler ; Unused (reserved) dc.l ignore_handler ; Unused (reserved) dc.l ignore_handler ; Unused (reserved) dc.l ignore_handler ; Unused (reserved) dc.l ignore_handler ; Unused (reserved) dc.l ignore_handler ; Unused (reserved)
The stack pointer value $00FFFFFE
is pointing into the main system RAM,
and we'll see what code is at the address init
in a later section.
When the CPU is first powered on (or reset) it begins its work by first
fetching both of these values, populating the stack pointer register and
then beginning execution at the initial program counter.
The interrupt vector table here is pretty boring because for this simple
program I disabled the interrupts anyway! But to make it at least
valid they all refer to the same ignore_handler
address, which
contains the simplest possible interrupt handler:
ignore_handler: rte ; continue from where the interrupt was triggered
After the vector table the CPU consumes, there's a Genesis-specific header. I must confess I'm not sure exactly what reads this and at what point when running on real hardware, but I know that emulators often use this header to automatically select a suitable regional variant to emulate, and otherwise adapt their behavior to the needs of a particular cartridge:
dc.b "SEGA GENESIS " ; Console name dc.b "(C) MA " ; Copyrght holder and release date dc.b "BOOT LOGO " ; Domest. name dc.b "BOOT LOGO " ; Intern. name dc.b "2019-01-20 " ; Version number dc.w $0000 ; Checksum dc.b "J " ; I/O support dc.l $00000000 ; Start address of ROM dc.l __end ; End address of ROM dc.l $00FF0000 ; Start address of RAM dc.l $00FFFFFF ; End address of RAM dc.l $00000000 ; SRAM enabled dc.l $00000000 ; Unused dc.l $00000000 ; Start address of SRAM dc.l $00000000 ; End address of SRAM dc.l $00000000 ; Unused dc.l $00000000 ; Unused dc.b " " ; Notes dc.b "JUE " ; Country codes (Japan, USA, Europe)
The dc.l
, dc.w
, and dc.b
keywords in the examples above are directives to
the assembler to just insert a literal value into the output, rather than
generating opcodes for the CPU. The naming of these includes suffixes we'll
also see on real instructions later, each of which describes the size of
value we're working with (or, in this case, generating):
.b
is for "byte", or eight bits..w
is for "word", which on this architecture is two bytes, or 16 bits. Although the 68000 CPU has 32-bit registers, its memory bus is only 16 bits wide and so it's common to use word-sized values for data..l
is for "long word", which is four bytes, or 32 bits. This is the value size used for memory addresses, as we can see in the vector table and some of the header fields.
We can also see in the above that these dc.
directives can take a sequence
of values, which is particularly useful when providing a string literal to
dc.b
to concisely insert a sequence of ASCII characters into the ROM for
the console name, copyright, title names, etc.
Everything after this point in the ROM is entirely up to the creator of the program, as long as the addresses in the header above refer to sensible data or instructions. There's 4MiB address space in the CPU memory map for the ROM, so that's the physical limit for the size of a Genesis game unless it employs extra hardware on the cartridge, such as bank switching logic. Since I'm just writing a flat ROM, that address space limit is also the limit on the size of my generated ROM file, but that's more than enough to show a simple message on-screen.
Trademark Security System
Back when I was writing 68000 assembler on my Amiga, I had the luxury of the system already having been booted and initialized by the Amiga's Kickstart ROM and other OS features, and so I could just get down to business poking at the system. It would be quite some years later when I worked with microcontrollers for the first time that I'd write code to bring a system up from scratch.
Because the initial program counter for a Genesis ROM points directly at code in the ROM image, the cartridge ROM is responsible for all of the system initialization tasks. I suppose re-shipping the same (small) initialization code with every game helped to keep the cost down vs. including at least a minimal OS kernel in the system as was expected for computer systems.
The very first thing my ROM does is disable interrupts, though given that this is the very first instruction the CPU will execute in retrospect I think this is redundant:
init: ; Main entry point move #$2700,sr ; disable interrupts
With the first hardware revision of the Genesis, Sega introduced a mechanism to try to prevent unlicensed games on their platform called the Trademark Security System or TMSS. By modern DRM standards this is comically simple, but any Genesis ROM must do this early in boot or else the graphics hardware will shut off and blank the screen:
; "Trademark Security System" (TMSS) handshake move.b $00A10001,d0 ; Move Megadrive hardware ver. to d0 andi.b #$0F,d0 ; Version is stored in last four bits ; so mask it with 0F beq @Skip ; If version = 0, skip TMSS signature move.l #'SEGA',$00A14000 ; Move string "SEGA" to $A14000 @Skip:
The TMSS isn't present in first-generation (version zero) hardware, so the
Genesis software development guide tells us to first read the memory-mapped
register at $00A10001
to determine the hardware revision number and skip
over the TMSS response if it returns zero. Otherwise, the TMSS simply requires
that we write the 32-bit number representing the string "SEGA" to the
register at $00A14000
. That's it!
I'm not 100% sure of the TMSS behavior on real Genesis hardware because
information online seems conflicted, but it seems that on hardware with
TMSS the TMSS actually ends up running first, before the cartridge ROM code,
and checks the header I described in the previous section to verify that it
contains the string (C) SEGA
. If so, it displays an on-screen message saying
that the game is "produced by or under licence from" Sega.
I gather this wasvintended to be primarily a legal rather than a technical mechanism where Sega would sue infringers for infringing their Sega trademark, but it was ultimately tested and invalidated in a lawsuit with publisher Accolade, where the court found that Sega's own system was responsible for the claimed trademark infringement, not the software written by Accolade.
Emulators don't tend to enforce the TMSS checks, so I expect the above isn't really needed in my case but I included it anyway because the official documentation told me to!
Z80 Initialization
I've mentioned several times already that the CPU in the Genesis is a Motorola 68000, but the system actually has two CPUs. Alongside the 68000 there is a Zilog Z80 CPU which in most cases is used as a coprocessor to control the audio hardware, though it also serves as the main CPU when the system is running older software designed for the system's predecessor, the Sega Master System.
I don't intend to use the Z80 at all here, so my main goal is just to put it into a well-defined state where I know it won't get up to any mischief while I'm working with the video hardware.
Interacting with the Z80 processor requires interacting with some memory-mapped registers in the I/O area of the memory map. For readability I defined some symbols for the addresses of those registers:
z80_bus_req = $00A11100 z80_bus_grant = $00A11101 z80_reset = $00A11200 z80_ram = $00A00000
Though one of the I/O chips, the software running on the 68000 CPU can take ownership of the Z80's memory bus using the Z80 features that might normally be used for direct memory access (DMA) in a more conventional system. We can then write into the Z80's work RAM to control the instructions it will see once it resumes execution:
; Initialize Z80 move.w #$0100,z80_bus_req ; Request access to the Z80 bus move.w #$0100,z80_reset ; Hold the Z80 in a reset state @Wait: btst #$0,z80_bus_grant ; Check if we have access to the Z80 bus yet bne @Wait ; (bit zero is set once bus access is granted) move.l #z80_ram,a1 ; Copy Z80 RAM address to a1 move.l #$00C30000,(a1) ; Copy some instructions to the start of Z80 RAM: nop, jp 0x0000 move.w #$0000,z80_reset ; Release reset state move.w #$0000,z80_bus_req ; Release control of bus
After taking control of the Z80 memory bus, we activate its reset signal to cancel whatever it might've been doing already, and then write to the start of its RAM a 32-bit value that corresponds to two Z80 machine instructions. If we were writing that in Z80 assembly language then this trivial program might look like this:
nop ; do nothing jp 0x0000 ; jump back to address zero
In other words, we're just asking the Z80 to spin forever doing nothing, so that we can forget about it and concentrate on programming the VDP with the 68000 CPU.
Finally, we take the Z80 back out of reset mode, causing it to behave as if it was just powered on, and release its memory bus so it can begin executing the useless program we wrote above.
Clearing the main RAM
So far we've not interacted with the main RAM at all. Although emulators tend to use a fresh block of clean memory to emulate the RAM, on real hardware the RAM is likely to be full of garbage until we write some specific values into it. For good measure, I wanted to initialize the RAM with all zeroes.
; Clear RAM (top 64k of memory space) move.l #$00000000,d0 ; We're going to write zeroes over the whole of RAM, 4 bytes at a time move.l #$00000000,a0 ; Starting from address 0x0, clearing backwards move.l #$00003FFF,d1 ; Clear 64k, 4 bytes at a time. That's 16383 writes @ClearRAM: move.l d0,-(a0) ; Decrement address by 4 bytes and then copy our zero to that address dbra d1,@ClearRAM ; Decrement loop counter d1, exiting when it reaches zero
The 64k of main work RAM in the Sega Genesis appears right at the top of the
memory map, so the code above clears it from the highest address to the
lowest address. Register a0
tracks the current address, which starts off
at zero because we later use the pre-increment mode -(a0)
to decrement
a0
as a side-effect of the move.l
instruction that writes to that address.
The 68000 instruction set is very dense and concise, so we can do a lot with relatively few instructions compared to a RISC instruction set like ARM:
move.l d0,-(a0)
both decrementsa0
and then writes the value fromd0
to the resulting address in a single instruction.dbra d1,@ClearRAM
both decrementsd1
(our loop counter) and tests if the result is zero, jumping to@ClearRAM
if not.
Initializing the VDP
For me, this was the main fun part: getting ready to put something on the screen! But there's still a little general book-keeping to do first. The VDP is a memory-mapped device in the 68000 memory space, and we need to write some values into its registers to get it into a predictable state before we begin.
For this part, I really just copied the very-reasonable-looking default register values from Matt Philips' article, loading them directly into the VDP registers:
vdp_control = $C00004 vdp_data = $C00000 ; Initialise video (VDP) move.l #VDPRegisters,a0 ; Load address of register table move.l #$18,d0 ; 24 registers to write move.l #$00008000,d1 ; 'Set register 0' command ; (and clear the rest of d1 ready) @CopyVDP: move.b (a0)+,d1 ; Copy register value to d1 move.w d1,vdp_control ; Write command and value to ; VDP control port add.w #$0100,d1 ; Increment register # dbra d0,@CopyVDP ; (subsequent code continues here...) VDPRegisters: VDPReg0: dc.b $14 ; 0: H interrupt on, palettes on VDPReg1: dc.b $74 ; 1: V interrupt on, display on, DMA on, ; Genesis mode on VDPReg2: dc.b $30 ; 2: Pattern table for Scroll Plane A ; at VRAM $C000 ; (bits 3-5 = bits 13-15) VDPReg3: dc.b $00 ; 3: Pattern table for Window Plane ; at VRAM $0000 ; (disabled) (bits 1-5 = bits 11-15) VDPReg4: dc.b $07 ; 4: Pattern table for Scroll Plane B ; at VRAM $E000 ; (bits 0-2 = bits 11-15) VDPReg5: dc.b $78 ; 5: Sprite table at VRAM $F000 ; (bits 0-6 = bits 9-15) VDPReg6: dc.b $00 ; 6: Unused VDPReg7: dc.b $00 ; 7: Background colour - bit 0-3 = colour, ; bits 4-5 = palette VDPReg8: dc.b $00 ; 8: Unused VDPReg9: dc.b $00 ; 9: Unused VDPRegA: dc.b $FF ; 10: Frequency of Horiz. interrupt in ; Rasters (number of lines travelled by ; the beam) VDPRegB: dc.b $00 ; 11: External interrupts off, ; V scroll fullscreen, ; H scroll fullscreen VDPRegC: dc.b $81 ; 12: Shadows and highlights off, ; interlace off, ; H40 mode (320 x 224 screen res) VDPRegD: dc.b $3F ; 13: Horiz. scroll table at VRAM $FC00 ; (bits 0-5) VDPRegE: dc.b $00 ; 14: Unused VDPRegF: dc.b $02 ; 15: Autoincrement 2 bytes VDPReg10: dc.b $01 ; 16: Vert. scroll 32, Horiz. scroll 64 VDPReg11: dc.b $00 ; 17: Window Plane X pos 0 left ; (pos in bits 0-4, left/right in bit 7) VDPReg12: dc.b $00 ; 18: Window Plane Y pos 0 up ; (pos in bits 0-4, up/down in bit 7) VDPReg13: dc.b $FF ; 19: DMA length lo byte VDPReg14: dc.b $FF ; 20: DMA length hi byte VDPReg15: dc.b $00 ; 21: DMA source address lo byte VDPReg16: dc.b $00 ; 22: DMA source address mid byte VDPReg17: dc.b $80 ; 23: DMA source address hi byte, ; memory-to-VRAM mode (bits 6-7)
We'll get into more detail on what some of these registers do in later sections, but the main idea here is to just get the VDP in a reasonable default state we can build from.
The VDP has its own RAM, independent of the CPU work RAM, and so the addresses we see in the above are VRAM rather than CPU RAM addresses. The video RAM is not directly accessible from the CPU, so we can only interact with it indirectly through the VDP registers, which we'll see in a moment.
Loading a Color Palette
One aspect of the VDP design that is familiar to me from Amiga graphics hardware (and, for that matter, most graphics hardware of the era) is that it uses indexed-color graphics modes where graphics are built from a small number of colors chosen from a larger colorspace.
In the case of the Amiga, the color space is 4 bits per channel (4096 distinct colors) of which up to 32 can be active at once. The Genesis capabilities are more modest: 3 bits per channel (512 distinct colors) of which up to 16 can be active at once. Both systems have some special modes which can push those boundaries with some caveats, but that's the capabilities in terms of color palettes.
For the Genesis VDP, the palettes are kept in a special portion of the video RAM called "color RAM", which is addressed separately from the main video RAM. The color RAM is literally just an array of color palettes: 6 colors each, over four distinct palettes. I get the impression from the technical docs that the color RAM is separated because it's arranged into 64 9-bit words, allowing the 9-bit palette entries to be stored as densely as possible. However, when we access the color RAM from our program it gets translated into a 16-bit value by padding with zeroes, as we'll see.
The color RAM is used in conjunction with patterns in the main video RAM, which are 8x8-pixel blocks of graphics with each pixel referring to one of the 16 colors in a palette from the color RAM. We therefore need a palettes and some patterns in video RAM to display anything, and we'll start with the palette in this section.
As I hinted in the previous section, we can't access the VDP RAM areas directly
from the CPU. Instead, we use the VDP's control and data ports to ask the VDP
to manipulate its RAM on our behalf. We already saw the vdp_control
register in the previous snippet where I was using it to write to the
VDP's registers; that port can also be used along with vdp_data
to
set up word-by-word memory transfers for the three video RAM regions:
vdp_control Bit Pattern | Meaning |
10?RRRRR DDDDDDDD | Set register RRRRR to value DDDDDDDD . |
00AAAAAA AAAAAAAA ???????? 0000??AA | Prepare to read from VRAM address AAAAAAAA AAAAAAAA . |
01AAAAAA AAAAAAAA ???????? 0000??AA | Prepare to write to VRAM address AAAAAAAA AAAAAAAA . |
00AAAAAA AAAAAAAA ???????? 0000??AA | Prepare to read from CRAM address AAAAAAAA AAAAAAAA . |
11AAAAAA AAAAAAAA ???????? 0010??AA | Prepare to write to CRAM address AAAAAAAA AAAAAAAA . |
00AAAAAA AAAAAAAA ???????? 0001??AA | Prepare to read from VSRAM address AAAAAAAA AAAAAAAA . |
01AAAAAA AAAAAAAA ???????? 0001??AA | Prepare to write to VSRAM address AAAAAAAA AAAAAAAA . |
VRAM and CRAM are the main video RAM and the color RAM respectively. VSRAM is another video RAM area, video scroll RAM, which contains values that control horizontal scrolling. I'm not using VSRAM at all in this simple program.
In the above table, the symbol ?
means "don't care", but in practice these
bits need to be set to something so I just left them set to zero.
Also, the packing of the AAAA...
portions of the patterns above requires a
little more scrutiny. The first sequence of address bits in the first word
of the value are address bits thirteen through zero, while the final two bits
fifteen and fourteen are packed into the to A
bits at the end of the second
word.
Because all of these operations require some rather specific and counter-intuitive packing, I decided to improve the readability of my code by defining two macros, which allow generating instructions somewhat-programmatically during assembly:
VDP_SET_REG MACRO ; REGISTER, VALUE move.w #((($80|\1)<<8)|\2),vdp_control ENDM VDP_VRAM_WRITE = %000001 VDP_CRAM_WRITE = %000011 VDP_VSRAM_WRITE = %000101 VDP_VRAM_READ = %000000 VDP_CRAM_READ = %001000 VDP_VSRAM_READ = %000100 VDP_REG_AUTOINC = 15 ; RAM access address auto-increment VDP_SET_ADDR MACRO ; OP_TYPE, ADDR ; Configuring a RAM access requires some odd argument packing: ; byte 0: CD1 CD0 A13 A12 A11 A10 A9 A8 ; byte 1: A7 A6 A5 A4 A3 A2 A1 A0 ; byte 2: 0 0 0 0 0 0 0 0 ; byte 3: CD5 CD4 CD3 CD2 0 0 A15 A14 ; The bit manipulation below is dealing with that packing so that ; calls to this macro can look more straightforward. move.l #((\1&%11)<<30|(\2&%11111111111111)<<16|(\1&%11100)<<2|(\2>>14)),vdp_control ENDM VDP_WRITE_DATA MACRO ; DATA move.w \1,vdp_data ENDM
The noisy punctuation sequences in each of the MACRO
blocks above are
bitmasking and bitshifting the numbers given in macro arguments to conform
to the packing scheme expected for the vdp_control
port, and then
writing the result to that port. This is the same principle as a preprocessor
macro in C: the assembler replaces any mention of these macros later in the
input with the body of the macro, substituting the argument values where
we see the markers \1
and \2
(representing unnamed positional arguments).
With those macros in place, we can load some colors into the first palette table in the CRAM like this:
; Load palette into VDP VDP_SET_ADDR VDP_CRAM_WRITE,$0000 ; Prepare to write to CRAM address $000 VDP_SET_REG VDP_REG_AUTOINC,2 ; Increment 2 bytes per CRAM write lea Palette,a0 ; a0 = address of first palette entry move.l #(((EndPalette-Palette)>>2)-1),d0 ; d0 = palette length in longwords minus one (loop counter) @PaletteLoop: move.l (a0)+,vdp_data ; Write current longword to vdp_data, which is really two vdp_data writes. Post-increment a0. dbra d0,@PaletteLoop ; Decrement d0 until zero ; (subsequent code continues here...) Palette: dc.w $000 ; 0 Black (transparent) dc.w $E00 ; 1 Blue dc.w $0E0 ; 2 Green dc.w $EE0 ; 3 Cyan dc.w $00E ; 4 Red dc.w $E0E ; 5 Purple dc.w $0EE ; 6 Yellow dc.w $EEE ; 7 White dc.w $EEE ; 8 White dc.w $EEE ; 9 White dc.w $EEE ; A White dc.w $EEE ; B White dc.w $EEE ; C White dc.w $EEE ; D White dc.w $EEE ; E White dc.w $EEE ; F White EndPalette:
There's another subtle trick at play here which I've not mentioned yet.
By default, after we prepare a memory access by writing to vdp_control
we
would then just read or write a single value to vdp_data
to complete that
operation and then perhaps write something else to vdp_control
for the
next operation.
However, because copying long blocks of data into video RAM is a pretty common
operation in practice, the VDP has an auto-increment feature controlled by
VDP register 15. If that register has a non-zero value then each write to
vdp_data
will have the side-effect of post-incrementing the configured
memory address by the auto-increment amount.
In the above, I've set the auto-increment register to two becase I'm writing
palette data in word-sized portions. This code gives the impression of
writing the data one longword at a time, but as I noted earlier the memory
bus of the CPU is actually only 16 bits wide, so a move.l
to memory
appears to memory-mapped devices as two separate writes, which in the case of
the VDP means auto-incrementing twice and thus moving four bytes later in
CRAM for each write.
Although the physical CRAM is 64 9-bit words, the VDP memory access port maps that to a 16-bit word with additional "don't care" bits to ensure that each of the color channels sits in a separate 4-bit portion, and so the color values are expressed in hex rather than octal:
???? BBB? GGG? RRR?
The color value $EE0
above, then, is 0000 1110 1110 0000
in binary, and
thus 100% blue and green but 0% red, giving cyan.
Loading Patterns
The concept of patterns is the first area where my experience with my Amiga graphics hardware didn't apply to the Genesis VDP at all. The Amiga graphics architecture uses full bitmap graphics, where pixels can be placed individually at any point on the screen. By contrast, the Genesis display has an extra level of indirection, in the form of patterns.
A pattern is a re-usable 8x8-pixel graphics block, where each pixel is populated with one of the colors from a selected color palette. The overall screen is then built from a further 2D grid of pattern selections, each potentially selecting a different palette to read colors from.
This memory organization was actually common on computers of the 8-bit era, such as the Commodore 64. In the case of the Commodore 64 we called them "characters" rather than "patterns" because by default the patterns were called from a built-in ROM containing letters, numbers, and symbols to show text on-screen. The Genesis has no such built-in character ROM, but we could potentially recreate it in the VDP main video memory if we wanted, perhaps recreating the iconic Commodore 64 boot screen. (But that's a diversion for another day!)
The upshot of this is that any imagery we want to display onscreen must be composed from these 8x8 patterns. Typical platform games would build the on-screen world from a variety of these pattern tiles, and in many cases the artists would carefully design the tiles and the levels such that the grid-based display is not so obvious, like in the Sonic games:
Because Sonic The Hedgehog 3 uses parallax scrolling and sprites there are multiple grids on-screen that don't necessarily align, but once you're aware of the grid you can spot patterns that appear multiple times on the screen with identical content. This was often a common display architecture for games on the Amiga too, but the tile patterns were rendered in software rather than being an innate constraint of the hardware.
The patterns for my simple "Hello" program are nowhere near as sophisticated, though:
Characters: dc.l $22000220 dc.l $22000220 dc.l $22000220 dc.l $22222220 dc.l $22000220 dc.l $22000220 dc.l $22000220 dc.l $00000000 dc.l $22222220 dc.l $22000000 dc.l $22000000 dc.l $22222000 dc.l $22000000 dc.l $22000000 dc.l $22222220 dc.l $00000000 dc.l $22000000 dc.l $22000000 dc.l $22000000 dc.l $22000000 dc.l $22000000 dc.l $22000000 dc.l $22222220 dc.l $00000000 dc.l $02222200 dc.l $22000220 dc.l $22000220 dc.l $22000220 dc.l $22000220 dc.l $22000220 dc.l $02222200 dc.l $00000000 EndCharacters:
Even this very modest pattern set shows the benefit of building the display from patterns: we only need to store the shape of the "L" once, even though it appears twice in the word "HELLO".
The 2
and 0
in the above are 4-bit references to colors from a 16-color
palette. We only have one palette loaded for this program, so 2
here refers
to index two from that palette, which was green. Color references of zero are
actually special in patterns: rather than referring directly to palette index
zero, this actually specifies that those pixels are transparent. However,
in our VDP initialization earlier we set register 7, "backdrop color", to
specify color zero from palette zero, so we are ultimately still selecting
our black from the palette in the end anyway.
We can load the character patterns into VRAM by the same principal as we
loaded the colors into CRAM. The pattern data starts at the beginning of the
VDP's VRAM, so we'll start writing our definition of "H" at VRAM offset
$020
so it will appear as pattern index 1, leaving index 0 to represent
a totally transparent pattern for the parts of the screen we won't be using:
; Load patterns into VDP VDP_SET_ADDR VDP_VRAM_WRITE,$0020 ; Prepare to write to VRAM address $020, which is character pattern 1 VDP_SET_REG VDP_REG_AUTOINC,2 ; Increment 2 bytes per CRAM write lea Characters,a0 ; a0 = address of first byte of first character move.l #(((EndCharacters-Characters)>>2)-1),d0 ; d0 = buffer length in longwords minus one (loop counter) @CharacterLoop: move.l (a0)+,vdp_data ; Write current longword to vdp_data, which is really two vdp_data writes. Post-increment a0. dbra d0,@CharacterLoop ; Decrement d0 until zero
Displaying the Characters
We're almost there! The VDP now has one palette loaded into its color RAM and five patterns (including the blank one) loaded into its main RAM, and our only remaining task is to define which tile to show in each position in the pattern grid:
; Select patterns on Plane A, whose pattern table was placed at $C000 by the VDP setup code VDP_SET_ADDR VDP_VRAM_WRITE,$C000 ; Prepare to write to VRAM offset $C000 VDP_SET_REG VDP_REG_AUTOINC,2 ; Increment 2 bytes per CRAM write VDP_WRITE_DATA #$0001 ; Low plane, palette 0, no flipping, tile ID 1 (H) VDP_WRITE_DATA #$0002 ; Low plane, palette 0, no flipping, tile ID 2 (E) VDP_WRITE_DATA #$0003 ; Low plane, palette 0, no flipping, tile ID 3 (L) VDP_WRITE_DATA #$0003 ; Low plane, palette 0, no flipping, tile ID 3 (L) VDP_WRITE_DATA #$0004 ; Low plane, palette 0, no flipping, tile ID 4 (O)
The address $C000
was defined as the pattern table for "plane A" when we
set up the registers earlier. The VDP builds the display out of two tile
planes (A and B), each of which can have independent tile selections and
scrolling, but for this very simple program we're only using plane A, which
is the foreground plane.
The tile IDs we wrote into the pattern table above are indexes into the pattern data we wrote in the previous section. After all of this setup we should finally see something on screen, though we'll just quickly append a busy loop at the end here so that the CPU doesn't start trying to execute all of the data that we're placing in the rest of the ROM:
loop: jmp loop ; restart main loop
And here we go...
Whoops! The left side of the H is hanging off the screen. That's what I get for not bothering to deal with the scroll offsets in the VSRAM, I suppose. Still, that was satisfying enough for me!
Where from here?
I don't have any aspirations as a Sega Genesis homebrew developer, so I'm satisfied just having developed a basic understanding of how the Sega Genesis VDP works and thus what must've gone in to the various games I remember playing many years ago.
The VDP has many more capabilities that I've not made use of here, including:
Hardware sprites
A DMA controller for more efficiently loading data from work RAM into video RAM, and for populating new data in video RAM without direct CPU involvement.
Multiple tile planes, including an unscrolled portion of plane A for what we might these days call a "HUD".
Highlight and shadow modes, for additional on-screen colors derived from those in the palettes.
Flipping tiles and sprites in both the X and Y axis automatically.
Raster scan interrupts to allow modifying the VDP registers as the image is being assembled, to exceed the static capabilities of the chip.
I'm personally satisfied just knowing that these features exist and what their capabilities are, but if you want to dig in further and learn the details you can find various technical information online, including copies of the aforementioned official Genesis Development Manual.
Full Source Code
Here's the whole assembly program that the snippets above came from, in case you'd like to use it as a basis for your own Sega Genesis experiments:
rom_header: dc.l $00FFFFFE ; Initial stack pointer value dc.l init ; Initial program counter value dc.l ignore_handler ; Bus error dc.l ignore_handler ; Address error dc.l ignore_handler ; Illegal instruction dc.l ignore_handler ; Division by zero dc.l ignore_handler ; CHK exception dc.l ignore_handler ; TRAPV exception dc.l ignore_handler ; Privilege violation dc.l ignore_handler ; TRACE exception dc.l ignore_handler ; Line-A emulator dc.l ignore_handler ; Line-F emulator dc.l ignore_handler ; Unused (reserved) dc.l ignore_handler ; Unused (reserved) dc.l ignore_handler ; Unused (reserved) dc.l ignore_handler ; Unused (reserved) dc.l ignore_handler ; Unused (reserved) dc.l ignore_handler ; Unused (reserved) dc.l ignore_handler ; Unused (reserved) dc.l ignore_handler ; Unused (reserved) dc.l ignore_handler ; Unused (reserved) dc.l ignore_handler ; Unused (reserved) dc.l ignore_handler ; Unused (reserved) dc.l ignore_handler ; Unused (reserved) dc.l ignore_handler ; Spurious exception dc.l ignore_handler ; IRQ level 1 dc.l ignore_handler ; IRQ level 2 dc.l ignore_handler ; IRQ level 3 dc.l ignore_handler ; IRQ level 4 (horiz. retrace int.) dc.l ignore_handler ; IRQ level 5 dc.l ignore_handler ; IRQ level 6 (vert. retrace int.) dc.l ignore_handler ; IRQ level 7 dc.l ignore_handler ; TRAP #00 exception dc.l ignore_handler ; TRAP #01 exception dc.l ignore_handler ; TRAP #02 exception dc.l ignore_handler ; TRAP #03 exception dc.l ignore_handler ; TRAP #04 exception dc.l ignore_handler ; TRAP #05 exception dc.l ignore_handler ; TRAP #06 exception dc.l ignore_handler ; TRAP #07 exception dc.l ignore_handler ; TRAP #08 exception dc.l ignore_handler ; TRAP #09 exception dc.l ignore_handler ; TRAP #10 exception dc.l ignore_handler ; TRAP #11 exception dc.l ignore_handler ; TRAP #12 exception dc.l ignore_handler ; TRAP #13 exception dc.l ignore_handler ; TRAP #14 exception dc.l ignore_handler ; TRAP #15 exception dc.l ignore_handler ; Unused (reserved) dc.l ignore_handler ; Unused (reserved) dc.l ignore_handler ; Unused (reserved) dc.l ignore_handler ; Unused (reserved) dc.l ignore_handler ; Unused (reserved) dc.l ignore_handler ; Unused (reserved) dc.l ignore_handler ; Unused (reserved) dc.l ignore_handler ; Unused (reserved) dc.l ignore_handler ; Unused (reserved) dc.l ignore_handler ; Unused (reserved) dc.l ignore_handler ; Unused (reserved) dc.l ignore_handler ; Unused (reserved) dc.l ignore_handler ; Unused (reserved) dc.l ignore_handler ; Unused (reserved) dc.l ignore_handler ; Unused (reserved) dc.l ignore_handler ; Unused (reserved) dc.b "SEGA GENESIS " ; Console name dc.b "(C) MA " ; Copyrght holder and release date dc.b "BOOT LOGO " ; Domest. name dc.b "BOOT LOGO " ; Intern. name dc.b "2019-01-20 " ; Version number dc.w $0000 ; Checksum dc.b "J " ; I/O support dc.l $00000000 ; Start address of ROM dc.l __end ; End address of ROM dc.l $00FF0000 ; Start address of RAM dc.l $00FFFFFF ; End address of RAM dc.l $00000000 ; SRAM enabled dc.l $00000000 ; Unused dc.l $00000000 ; Start address of SRAM dc.l $00000000 ; End address of SRAM dc.l $00000000 ; Unused dc.l $00000000 ; Unused dc.b " " ; Notes dc.b "JUE " ; Country codes ; Memory mapped I/O base addresses ; ------------------------------------------------------------- vdp_control = $C00004 vdp_data = $C00000 z80_bus_req = $00A11100 z80_bus_grant = $00A11101 z80_reset = $00A11200 z80_ram = $00A00000 hw_revision = $00A10001 tmss_write = $00A14000 jp1ctrl = $00A10009 jp2ctrl = $00A1000B extctrl = $00A1000D ; VDP RAM operations and Registers VDP_VRAM_WRITE = %000001 VDP_CRAM_WRITE = %000011 VDP_VSRAM_WRITE = %000101 VDP_VRAM_READ = %000000 VDP_CRAM_READ = %001000 VDP_VSRAM_READ = %000100 VDP_REG_AUTOINC = 15 ; RAM access address auto-increment init: ; Main entry point move #$2700,sr ; disable interrupts ; "Trademark Security System" (TMSS) handshake move.b hw_revision,d0 ; Move Megadrive hardware ver. to d0 andi.b #$0F,d0 ; Version is stored in last four bits ; so mask it with 0F beq @Skip ; If version = 0, skip TMSS signature move.l #'SEGA',tmss_write ; Move string "SEGA" to $A14000 @Skip: ; Initialize Z80 move.w #$0100,z80_bus_req ; Request access to the Z80 bus move.w #$0100,z80_reset ; Hold the Z80 in a reset state @Wait: btst #$0,z80_bus_grant ; Check if we have access to the Z80 bus yet bne @Wait ; (bit zero is set once bus access is granted) move.l #z80_ram,a1 ; Copy Z80 RAM address to a1 move.l #$00C30000,(a1) ; Copy some instructions to the start of Z80 RAM: nop, jp 0x0000 move.w #$0000,z80_reset ; Release reset state move.w #$0000,z80_bus_req ; Release control of bus ; Clear RAM (top 64k of memory space) move.l #$00000000,d0 ; We're going to write zeroes over the whole of RAM, 4 bytes at a time move.l #$00000000,a0 ; Starting from address 0x0, clearing backwards move.l #$00003FFF,d1 ; Clear 64k, 4 bytes at a time. That's 16383 writes @ClearRAM: move.l d0,-(a0) ; Decrement address by 4 bytes and then copy our zero to that address dbra d1,@ClearRAM ; Decrement loop counter d1, exiting when it reaches zero ; Initialise video (VDP) move.l #VDPRegisters,a0 ; Load address of register table move.l #$18,d0 ; 24 registers to write move.l #$00008000,d1 ; 'Set register 0' command ; (and clear the rest of d1 ready) @CopyVDP: move.b (a0)+,d1 ; Copy register value to d1 move.w d1,vdp_control ; Write command and value to ; VDP control port add.w #$0100,d1 ; Increment register # dbra d0,@CopyVDP ; Set registers to a predictable state ; ------------------------------------------------------------- move.l #$00000000,a0 ; Move 0x0 to a0 movem.l (a0),d0-d7/a1-a7 ; Multiple move 0 to all registers move #$2700,sr ; no trace, A7 is Interrupt Stack Pointer, no interrupts, clear condition code bits VDP_SET_REG MACRO ; REGISTER, VALUE move.w #((($80|\1)<<8)|\2),vdp_control ENDM VDP_SET_ADDR MACRO ; OP_TYPE, ADDR ; Configuring a RAM access requires some odd argument packing: ; byte 0: CD1 CD0 A13 A12 A11 A10 A9 A8 ; byte 1: A7 A6 A5 A4 A3 A2 A1 A0 ; byte 2: 0 0 0 0 0 0 0 0 ; byte 3: CD5 CD4 CD3 CD2 0 0 A15 A14 ; The bit manipulation below is dealing with that packing so that ; calls to this macro can look more straightforward. move.l #((\1&%11)<<30|(\2&%11111111111111)<<16|(\1&%11100)<<2|(\2>>14)),vdp_control ENDM VDP_WRITE_DATA MACRO ; DATA move.w \1,vdp_data ENDM ; Main Program ; ------------------------------------------------------------- __main: ; Load palette into VDP VDP_SET_ADDR VDP_CRAM_WRITE,$0000 ; Prepare to write to CRAM address $000 VDP_SET_REG VDP_REG_AUTOINC,2 ; Increment 2 bytes per CRAM write lea Palette,a0 ; a0 = address of first palette entry move.l #(((EndPalette-Palette)>>2)-1),d0 ; d0 = palette length in longwords minus one (loop counter) @PaletteLoop: move.l (a0)+,vdp_data ; Write current longword to vdp_data, which is really two vdp_data writes. Post-increment a0. dbra d0,@PaletteLoop ; Decrement d0 until zero ; Load patterns into VDP VDP_SET_ADDR VDP_VRAM_WRITE,$0020 ; Prepare to write to VRAM address $020, which is character pattern 1 VDP_SET_REG VDP_REG_AUTOINC,2 ; Increment 2 bytes per CRAM write lea Characters,a0 ; a0 = address of first byte of first character move.l #(((EndCharacters-Characters)>>2)-1),d0 ; d0 = buffer length in longwords minus one (loop counter) @CharacterLoop: move.l (a0)+,vdp_data ; Write current longword to vdp_data, which is really two vdp_data writes. Post-increment a0. dbra d0,@CharacterLoop ; Decrement d0 until zero ; Select patterns on Plane A, whose pattern table was placed at $C000 by the VDP setup code VDP_SET_ADDR VDP_VRAM_WRITE,$C000 ; Prepare to write to VRAM offset $C000 VDP_SET_REG VDP_REG_AUTOINC,2 ; Increment 2 bytes per CRAM write VDP_WRITE_DATA #$0001 ; Low plane, palette 0, no flipping, tile ID 1 (H) VDP_WRITE_DATA #$0002 ; Low plane, palette 0, no flipping, tile ID 2 (E) VDP_WRITE_DATA #$0003 ; Low plane, palette 0, no flipping, tile ID 3 (L) VDP_WRITE_DATA #$0003 ; Low plane, palette 0, no flipping, tile ID 3 (L) VDP_WRITE_DATA #$0004 ; Low plane, palette 0, no flipping, tile ID 4 (O) loop: ; VDP_WRITE_DATA d0 ; write color in d0 to palette index zero (background color) ; add.w #1,d0 ; increment d0 to select another color ; move.w #100,d1 ; set up delay loop counter d1 ;.wait: ; dbra d1,.wait ; decrement d1 until it reaches zero jmp loop ; restart main loop ; Interrupt handling ; ------------------------------------------------------------- align 2 ; word-align code ignore_handler: rte ; continue from where the interrupt was triggered ; VDP Register initialization ; ------------------------------------------------------------- ; By Matt Philips: ; https://blog.bigevilcorporation.co.uk/2012/03/09/sega-megadrive-3-awaking-the-beast/ align 2 ; word-align code VDPRegisters: VDPReg0: dc.b $14 ; 0: H interrupt on, palettes on VDPReg1: dc.b $74 ; 1: V interrupt on, display on, DMA on, ; Genesis mode on VDPReg2: dc.b $30 ; 2: Pattern table for Scroll Plane A ; at VRAM $C000 ; (bits 3-5 = bits 13-15) VDPReg3: dc.b $00 ; 3: Pattern table for Window Plane ; at VRAM $0000 ; (disabled) (bits 1-5 = bits 11-15) VDPReg4: dc.b $07 ; 4: Pattern table for Scroll Plane B ; at VRAM $E000 ; (bits 0-2 = bits 11-15) VDPReg5: dc.b $78 ; 5: Sprite table at VRAM $F000 ; (bits 0-6 = bits 9-15) VDPReg6: dc.b $00 ; 6: Unused VDPReg7: dc.b $00 ; 7: Background colour - bit 0-3 = colour, ; bits 4-5 = palette VDPReg8: dc.b $00 ; 8: Unused VDPReg9: dc.b $00 ; 9: Unused VDPRegA: dc.b $FF ; 10: Frequency of Horiz. interrupt in ; Rasters (number of lines travelled by ; the beam) VDPRegB: dc.b $00 ; 11: External interrupts off, ; V scroll fullscreen, ; H scroll fullscreen VDPRegC: dc.b $81 ; 12: Shadows and highlights off, ; interlace off, ; H40 mode (320 x 224 screen res) VDPRegD: dc.b $3F ; 13: Horiz. scroll table at VRAM $FC00 ; (bits 0-5) VDPRegE: dc.b $00 ; 14: Unused VDPRegF: dc.b $02 ; 15: Autoincrement 2 bytes VDPReg10: dc.b $01 ; 16: Vert. scroll 32, Horiz. scroll 64 VDPReg11: dc.b $00 ; 17: Window Plane X pos 0 left ; (pos in bits 0-4, left/right in bit 7) VDPReg12: dc.b $00 ; 18: Window Plane Y pos 0 up ; (pos in bits 0-4, up/down in bit 7) VDPReg13: dc.b $FF ; 19: DMA length lo byte VDPReg14: dc.b $FF ; 20: DMA length hi byte VDPReg15: dc.b $00 ; 21: DMA source address lo byte VDPReg16: dc.b $00 ; 22: DMA source address mid byte VDPReg17: dc.b $80 ; 23: DMA source address hi byte, ; memory-to-VRAM mode (bits 6-7) Characters: dc.l $22000220 dc.l $22000220 dc.l $22000220 dc.l $22222220 dc.l $22000220 dc.l $22000220 dc.l $22000220 dc.l $00000000 dc.l $22222220 dc.l $22000000 dc.l $22000000 dc.l $22222000 dc.l $22000000 dc.l $22000000 dc.l $22222220 dc.l $00000000 dc.l $22000000 dc.l $22000000 dc.l $22000000 dc.l $22000000 dc.l $22000000 dc.l $22000000 dc.l $22222220 dc.l $00000000 dc.l $02222200 dc.l $22000220 dc.l $22000220 dc.l $22000220 dc.l $22000220 dc.l $22000220 dc.l $02222200 dc.l $00000000 EndCharacters: Palette: dc.w $000 ; 0 Black (transparent) dc.w $E00 ; 1 Blue dc.w $0E0 ; 2 Green dc.w $EE0 ; 3 Cyan dc.w $00E ; 4 Red dc.w $E0E ; 5 Purple dc.w $0EE ; 6 Yellow dc.w $EEE ; 7 White dc.w $EEE ; 8 White dc.w $EEE ; 9 White dc.w $EEE ; A White dc.w $EEE ; B White dc.w $EEE ; C White dc.w $EEE ; D White dc.w $EEE ; E White dc.w $EEE ; F White EndPalette: __end: