01234567890123456789012345678901234567890123456789012345678901234567890123456789 Videobrain Unwrapped -------------------- 061013 V0.05 Written by: Kevin Horton Description: ------------ The Videobrain Family Computer was an ill-received home computer and videogame playing machine that was way ahead of its time. The system was designed in 1977 and only sold for about one year somewhere in the 1977-1979 time frame. There was only a few games and application programs released for the system before it died a premature death. With more advertising and programs, it might've gotten more popular. A lack of programming and advertising doomed it. --- Reproduction: ------------- I got interested in the VB because I got a hold of a bunch of the ASICs for the system by chance. Videobrain systems (from now on called "VB") are very hard to come by. The last one on ebay was $120 or so and broken, and before that, $400 for a complete system. Since the systems are so rare and expensive, and I had some ASICs, I decided to BUILD my own system! I had most of the chips and parts already to build it so I bought some perf board, 2102 RAM chips, and proceeded to build it. It took me about 4 days to complete it. I followed the schematic I found on the internet. After assembly, I went through and checked every connection with the multimeter. I found 3 or 4 minor flubs which were fixed, and I tried running it. I fired it up and the CPU was getting stuck somewhere. Before the CPU got wedged, the address lines to the RAM chips were toggling as the CPU tried to clear RAM out every reset. After the RAM clearing, it'd hang up and not restart. I managed to find the datasheet for the 2102 RAM chip (this was quite difficult) and figured out that the schematic had Din and Dout flipped. Great. Moving 24 wire later, things looked better: RAM was only being cleared once now and it was possible to see where the CPU was reading the signature byte to skip the clear routine. At this point, the CPU was still stopping. I hooked my logic analyzer up (an Agilent 16700B with three 16750A acquisition cards). Within about 5 minutes I could watch the CPU executing code, and see it was dying when it tried to write to an ASIC register. RAM reading and writing were working fine, halting the CPU and such, but when the ASIC was written to it died and the CPU never woke up. It took me a bit longer to figure out why THAT was happening. After playing around a bit and checking out the spreadsheet that Sean Riddle produced which had the circuit connections in it, it became obvious that the schematic had yet another error. Pins 35 and 37 were shown swapped on the schematic. I flipped these around and now the CPU was running without stopping. I dropped a wire onto some ASIC pins and fed it into my monitor and some stuff that looked like video was appearing on the screen. After a little dorking around, I made an RGB video mod for the system. --- Conventions in this document: I will define several terms first which will hopefully make it easier to explain what is going on. MCLK - Master Clock. This is a 28.6363MHz clock from which all timing is derived. CPUCLK - CPU Clock. This is a 2.04MHz clock which runs the CPU. BRCLK - BR Clock. Not sure what BR stands for. It runs at 3.579MHz. buffered bus This is the buffered address/data bus connected to the cartridge, SRAM, and UV201. All timing is related to one of these three clocks. To make testing and reverse engineering easier, I hacked my Videobrain "prototype" up so that both the CPU clock and the UV202 clocks were derived off the same source. The UV202 uses a 14.3181MHz crystal, and the CPU runs off a 4MHz crystal divided by two via the wait state generator JK flipflop. Originally, the UV202 generated a 14.3181MHz / 7 or 2.04MHz signal, but it stopped 1 cycle too late and this caused issues with wait the CPU. A bunch of "fixup" logic was provided externally to work around the issue. Instead of the 2.04MHz signal, they went with a 2.0MHz clock derived from a separate oscillator. My hack was to start with a 28.6363MHz oscillator and divide it by 2, and feed this 14.318181MHz signal into the UV202, and then divide it by 7 to produce a 4.09MHz signal which was fed into the wait state circuit in place of its 4.0MHz crystal. This hack then ties both the CPU and UV202 timing together under a common clock source. Hopefully this will make sussing out timing easier. It also gives my logic analyzer a perfect clock to suckle off of which will keep the sample count an exact multiple of the CPU and UV202 clocks. --- Chip descriptions: Videobrain ASIC UV202: ---------------------- The UV202 is the timing generator and bus transceiver that links the data bus from the CPU with the buffered data bus. All of the video timing information is generated by the UV202, and it also handles the CPU wait state stuff, but this was broken somewhat in the ASIC so they did some external circuitry to fix it. I investigated it, and the UV202's clock that would've run the 3850 CPU runs a cycle or so too long vs. the external hardware solution. I guess this was too much and it'd hit some kind of timing glitch during a wait state. It seems to accurately track the external clock otherwise. This clock is approximately 2.045MHz, which is 14.318181MHz / 7. This somewhat surprised me that they'd divide by 7 and not 8. Too bad that it is buggy. The duty cycle was 50% so they are checking both edges of the input clock. This is somewhat similar to divide by 3 on the Atari 2600 TIA that divides the 3.579MHz clock down to approx. 1.19MHz. The 3850 is now run off an external 2MHz clock generated by a JK flipflop running off a 4MHz crystal oscillator. UV-202 pinout: 1 - UMIREQ1 2 - UMIREQ0 3 - CPUREQ0 4 - Xin 5 - Xout 6 - DMAREQ0 7 - CPU CLK 8 - WACK 9 - D0 10 - BD0 11 - D1 12 - BD1 13 - D2 14 - BD2 15 - D3 16 - BD3 17 - Hblank 18 - VBLANK 19 - Burst 20 - Csynch 21 - +5V 22 - +12V 23 - Scanline 24 - Field 25 - BD4 26 - D4 27 - BD5 28 - D5 29 - BD6 30 - D6 31 - BD7 32 - D7 33 - BRCLK 34 - COLCLK 35 - DMAREQ1 36 - RST 37 - /800-BFF 38 - GND 39 - CPUREQ1 40 - BISTROBE 1 - UMIREQ1 High when the secondary UV201 wants to DMA data from buffered bus for display. It has low priority for display with respect to CPU accesses. 2 - UMIREQ0 High when the primary UV201 wants to DMA data from buffered bus for display. It has low priority for accessing the bus with respect to CPU accesses. 3 - CPUREQ0 See the wait states section for information about this pin. 4 - Xin 5 - Xout Normally connected to a 14.318181MHz crystal. This sets timing for all video rendering and wait state generation. To drive this pin from an external clock, you must connect an inverter from Xin to Xout, and then drive the Xin pin. a 74HC04 is ideal. 6 - DMAREQ0 Pulses high to tell the primary UV201 that it is now OK to perform one DMA from buffered bus. 7 - CPUCLK Broken CPU clock output. This pin toggles at 2.04MHz, which is 14.3181MHz divided by 7. It is a 50% duty cycle waveform. It does not quit soon enough in the wait state, so it was fixed with external circuitry. 8 - WACK Wait Acknowledge. Goes high to acknowledge the wait state. It will not cause the CPU clock to continue until the falling edge of this signal. It is used to gate the ASIC and SRAM writes when high. 9 - D0 11 - D1 13 - D2 15 - D3 26 - D4 28 - D5 30 - D6 32 - D7 These are the data lines that connect to a 2K ROM and the CPU and 3853 SRAM interface. 10 - D0 12 - D1 14 - D2 16 - D3 25 - D4 27 - D5 29 - D6 31 - D7 These are the buffered data lines. They connect to the SRAM, UV201 data lines, a 2K ROM, and the cartridge. 17 - Hblank Goes high in horizontal blank. 18 - VBLANK Goes high for 21 scanlines every 262 or 263 scanlines depending on if this is the odd or even field. It is offset 1/2 scanline during odd fields. This signal is used by the UV201 to synchronize rendering to the scan. 19 - Burst Goes high every scanline immediately after the synch pulse, except for the first 9 scanlines VBLANK is high. 20 - Csynch Composite synch line that goes high at the beginning of each scanline, except in the Vblank region (see below). 21 - 5V Connects to +5V supply 22 - 12V Connects to +12V supply 23 - Scanline This line is not used on the VB, but it toggles at the start of each scanline. It has been useful during debugging. 24 - Field This line toggles every field and reflects if the odd or even field is being displayed. low = odd field, high = even field. 33 - BRCLK This clock is a 3.579MHz clock that is generated by dividing the 14.3181MHz clock down by 4. It clocks the UV201 and provides all timing information to it. 34 - COLCLK This clock appears to be an exact clone of BRCLK. It's the same phase and frequency as BRCLK. It is used by the LM1889 NTSC encoder / TV modulator. 35 - DMAREQ1 Pulses high to tell the secondary UV201 that it is now OK to perform one DMA from buffered bus. 36 - RST Taking this low resets the chip. It is normally connected to a flipflop so that reset goes high at the start of the frame. I think this was done simply so that the reset button was a "reset once when pressed" signal instead of continuous while the button is held down. This would be required for the weird "top loading VCR" style cartridge slot, which resets the CPU when a cartridge is inserted . 37 - /800-BFF input. The external logic must pull this line low when the CPU attempts to access 800-BFF, which is the UV201 address range. 38 - ground Connects to ground. 39 - CPUREQ1 See the wait states section for more information about this pin. 40 - BISTROBE Pulses low to perform a read or write operation. --- Videobrain ASIC UV201: ---------------------- The UV201 is the actual video rendering portion of the system. It fetches data via DMA and then squirts it out its 4 video pins. It relys on the UV202 to generate frame timing information for it. This chip holds ALL of the ASIC registers. 7 - UV201-4 (D0) 8 - UV201-2 (D1) 9 - UV201-5 (D2) 10- UV201-3 (D3) UV201 pinout: ------------- 1 - GND 2 - video D1 3 - video D3 4 - video D0 5 - video D2 6 - BA0 7 - BA1 8 - BA2 9 - BA3 10 - BA4 11 - BA5 12 - BA6 13 - BA7 14 - BA8 15 - BA9 16 - BA10 17 - BA11 18 - BA12 19 - +5V 20 - +12V 21 - BD0 22 - BD1 23 - BD2 24 - BD3 25 - BD4 26 - BD5 27 - BD6 28 - BD7 29 - Field 30 - EXT INT 31 - keypad column 8 32 - R/W 33 - VBLANK 34 - HBLANK 35 - BRCLK 36 - UMIREQ0 37 - BISTROBE 38 - RESET 39 - /CE 40 - DMAREQ 1 - GND System ground 4 - video D0 2 - video D1 5 - video D2 3 - video D3 These 4 pins emit the colour. A table is provided below. 6 - BA0 7 - BA1 8 - BA2 9 - BA3 10 - BA4 11 - BA5 12 - BA6 13 - BA7 14 - BA8 15 - BA9 16 - BA10 17 - BA11 18 - BA12 These 13 pins are the buffered address bus. They are used to decode the write address when the CPU writes to the UV201, and they also drive the buffered address bus when performing a DMA read. 19 - +5V 5V power supply 20 - +12V 12V power supply 21 - BD0 22 - BD1 23 - BD2 24 - BD3 25 - BD4 26 - BD5 27 - BD6 28 - BD7 These 8 pins are the buffered data bus. 29 - Field Connects to pin 24 of the UV202 for even/odd field determination. 30 - EXT INT Drives the 3853 SRAM interface chip to perform interrupts. 31 - keypad column 8 Drives column 8 of the keypad (lol) 32 - R/W 37 - BISTROBE 39 - /CE These three pins determine what is occuring to the UV201. When /CE is pulled low, a CPU read or write is in progress. the R/W line will determine if this is a read (low) or write (high). It is an inverted version of CPUREQ1. BISTROBE pulses low when the UV202 is finishing its buffered bus cycle, and lets the UV201 and SRAM know that a write should be taking place. 33 - VBLANK 34 - HBLANK These two signals are generated by the UV202 and inform the UV201 where the raster is. 35 - BRCLK 3.579MHz clock that runs the show. 36 - UMIREQ0 Pulled high by UV201 when it wishes to DMA data from the buffered bus. 38 - RESET Low resets the chip. 40 - DMAREQ The UV202 pulses this line high when the UV201 should perform its DMA cycle. --- Frame timing: ------------- Video on the system is indeed interlaced. The timing looks very good. It's not exactly 100% NTSC timing compliant, but for 1978 it probably would've been plenty good enough to be used in a broadcast environment. They seem to have gone all out to make sure their signal is standards-complaint. I suspect they were hoping it might be used at a TV or cable station for showing info screens and such. The way timing is internally generated is mostly done via a scanline counter that counts by 262 or 263 depending on if the odd or even field is being displayed. I have used my logic analyzer (an HP 16700B) to suss out the exact timing, to the cycle, of the VB's frame. The system alternates between even and odd fields approximately 60 times a second, and displays approximately 30 frames per second. Exact cycle timings: Odd Even scanlines 263 262 clocks 59964 59736 (BRCLK's) fields/sec 59.81 frames/sec 29.90 Scanline 228 (BRCLK's) To perform the interlace, the hardware will render 262 scanlines for one field, then 263 for the other, and then shift the vsynch period back or forth 1/2 scanline to align it at the 1/2 scanline point each field. This is common and how most interlaced video is generated at the hardware level. The exact number of clocks for the video is as follows: Each scanline is 228 BRCLKs long, which incidentally is the same as the Atari 2600. It's a tad longer than it "should" be, but it works well enough. During a normal scanline: CSYNCH goes high during cycles 0-17 (18 total). BURST goes high during cycles 21-29 (9 total). HBLANK goes high during cycles 222-227 and 0-32 (39 total). HBLANK spans the end of the previous line to the beginning of the next line. This makes the total visible area on the screen 189 BRCLKs long. During the vertical blanking interval, timing is a bit different. Basically, it is composed of the following: --- (odd field) 3 scanlines of vsynch 3 scanlines of equalization pulses 244 normal scanlines 2.5 scanlines of equalization pulses --- (even field) 3 scanlines of vsynch 3 scanlines of equalization pulses 243.5 normal scanlines 3 scanlines of equalization pulses These two alternate back and forth every field (approx. 60 times a second) to perform the interlacing. The TV synchronizes off of Vsynch, so if you count scanlines starting with it, each field will contain 252.2 scanlines exactly. CSYNCH goes high TWICE during scanlines which contain equalization pulses: It goes high on cycles 0-8, and again at cycles 114-122. These pulses are only 9 clocks wide instead of 18 on a normal scanline. Vsynch is different yet again: CSYNCH is high for the majority of the scanline and pulses low twice. Unlike the equalization scanlines, its low pulses are 18 clocks wide (same as on a normal scanline, but inverted). It sends these pulses out on cycles 105-113 and 219-227. NOTE: even though video is interlaced, as far as the rendering is concerned, it is progressive: each field shows the same data, so there are only around 262 addressable scanlines instead of the full 525. --- Wait States ----------- The UV202 performs all the wait stating. When the CPU wishes to access the buffered bus, the CPU is stopped and waits are inserted. The quantity of wait states inserted varies greatly depending on what peripheral the CPU is accessing. For this timing, I replaced the 28.63MHz oscillator I was using with a 1MHz one. The reason for this is I was being buffaloed by propagation delays in the chips and connections, which made sussing out the exact timing very difficult. The following is what happens inside the UV202 when no propagation delays are taken into account. This will be cleared up later. All of the following timing is relative to the state of the pins on the UV202, vs. the inputs of the D-flops that synchronize the signals (UMIREQ0, CPUREQ0 and 1). --- The pins related to wait states are: UVIREQ1 (pin 1) - High when secondary UV201 wants to DMA data from buffered bus. UMIREQ0 (pin 2) - High when the UV201 wants to DMA data from buffered bus. CPUREQ0 (pin 3) - High when the CPU writes anywhere in 800-1FFF (ASIC, RAM, cart), or reads from C00-1FFF (RAM, cart). DMAREQ0 (pin 6) - The UV202 pulls this high to let the primary UV201 perform DMA. WACK (pin 8) - Wait Ack. Goes high when the UV202 is performing a CPU based read or write. The falling edge restarts the CPU clock. DMAREQ1 (pin 35) - The UV202 pulls this high to let secondary UV201 perform DMA. /800-BFF (pin 37) - Low when the CPU is accessing 800-BFFh, high otherwise. CPUREQ1 (pin 39) - High when the CPU reads from 800-1FFFh. (ASIC, RAM, cart) BISTROBE (pin 40) - Pulses high in the middle of every buffered bus access This truth table should make it a bit clearer what happens on each pin during a specific kind of access: event: UMIREQ0 CPUREQ0 CPUREQ1 /800-BFF ------------------------------------------------- UV201 Write x 1 0 0 RAM Write x 1 0 1 UV201 Read x 0 1 0 RAM Read x 1 1 1 Cart Read x 1 1 1 DMA Read 1 0 0 0 x = don't care --- Let's start with the basics. The CPU is executing code out of the cartridge, and the UV201 is performing periodic DMA requests. CPU reads (cart/RAM) and DMA reads: UV201 DMA requests come either singly or in bursts. A burst is simply N bytes fetched in a row by the UV201. The CPU can interrupt a burst at any time, and data is read fast enough even with these interruptions to produce a glitch free display. Interestingly, the UV202 seems to be set up to handle TWO UV201's at the same time, as described in the patent. If both UV201's request data at the same time, the UV202 will alternate between servicing them. CPU read requests only come every now and again, only once every 3 DMA requests maximum. If a CPU read and a DMA request both occur simultaniously, the CPU read is serviced first followed by the DMA. At the start of any bus access (CPU read or DMA fetch), a 1 BRCLK penalty occurs. If a DMA read follows a CPU read, a 1 BRCLK penalty is inserted. Interestingly, if a CPU read follows a DMA read, no penalty is inserted. In the event a CPU read and a DMA fetch both occur at the same time, a double setup penalty occurs: (all cycles are 1 BRCLK each) 1 penalty 3 cycle CPU read 1 penalty 3 cycle DMA read 3 cycle DMA read 3 cycle DMA read Otherwise, the following type of thing occurs: 1 penalty 3 cycle DMA read 3 cycle DMA read 3 cycle CPU read // this CPU read appears to incur no penalty 1 penalty 3 cycle DMA read // but the following DMA does 3 cycle DMA read 3 cycle CPU read 1 penalty 3 cycle DMA read The CPU read above appears to incur no setup penalty, but it IS there. The DMA access will only be replaced by a CPU read if CPUREQ went high on the rising edge of BRCLK leading into the last cycle of the DMA request. This means that during DMA fetching, the CPU will see a 4, 5, or 6 BRCLK wait while the DMA finishes. Depending on video and CPU synchronization, one of these maybe favored over another. --- UV201 register reads: reads from the UV201 vary a bit. These take 6, 7, 8, or 9 BRCLKs with 1 setup BRCLK. There appears to be a modulo 4 counter that continuously runs, clocked off BRCLK. The read sequence is delayed after it starts 1, 2, or 3 extra BRCLKs depending on the value of this counter. I have confirmed this behaviour by recording the start cycle of many UV201 reads and comparing them. When I wrote down the start MCLK cycle of each read and divided it by 8 (to get BRCLKs) then modulo'ed by 4, the results are clear: 0: 9 BRCLKs 1: 8 BRCLKs 2: 7 BRCLKs 3: 6 BRCLKs This is fairly definitive proof of a divide by 4 effect going on. When HBLANK goes high, the divide by 4 contains 3. Most likely what happens is the read sequence will delay at some point while it waits for this divide by 4 to contain 00b or 11b. The usual conditions of the read during a DMA are in effect as before. The only difference is the read is stretched out from 3 BRCLKs to 4, 5, 6, or 7. Total number of BRCLKs will be 5, 6, 7, 8 (no DMA) and 9 or 10 is possible with DMA. --- UV201 register writes: These follow the exact same conditions as UV201 reads, above. --- RAM writes: These follow the exact same conditions as cart/RAM reads, above. --- So the above is the "ideal". What happens on the real console, however is slightly different because of propagation delays. When running the system 28x slower than normal speed, the CPU actually is halted less (even though the CPU clock is also derived from this same 1MHz clock!) The reason is due to propagation delays. The amount of time the CPU spends halted during normal operation at normal operating speed is as follows: Total "sunk cost" for my particular Videobrain are as follows: CPUCLKs UV201R/W during DMA 4.0 (4 BRCLKs) 4.5 X X (5 BRCLKs) 5.0 X X (5 BRCLKs) mine alternated between 4.5 and 5.0 5.5 X X (6 BRCLKs) 6.0 X X (7 BRCLKs) 6.5 X X (8 BRCLKs) 7.0 X (9 or 10 BRCLKs) (the MCLK values were from falling edge of the last CPUCLK to falling edge of the first after the halt, so values shown are minus 1 CPUCLK.) The above is approximately correct. The amounts vary by 1/2 clock depending on exact synchronization, and they will change plus or minus 1/2 clock on a normal system due to the difference in frequency of the 2.0MHz CPUCLK and the 14.318MHz UV202 clock. This means cycle counted code is not possible on the Videobrain, due to the random nature of the possible wait states. Every frame, the amount of waiting will be different due to alignment of the two clocks and code relative to these clocks. Temperature and voltage variances, phase of the moon, etc. will conspire to make sure that these timings will never, ever perfectly line up from one run of a program to the next. Total "sunk cost" for RAM/cart reads and writes is 4.0 CPUCLKs, and during DMA, the total increases in line with the above chart. This pretty much sums up Videobrain wait states to my satisfaction. There's not much more to it. --- Address Space: -------------- Now that the UV202 is adequately described, it's probably a good idea to look at the address space. This will make things easier to grasp when it comes to graphics rendering. Because there are TWO busses, I will describe both in turn, starting with the CPU bus. CPU Bus: -------- This is what the CPU sees. It can access any peripheral in the system with 0 to 10 BRCLK waits (0 or 4.0 to 7.0 CPUCLKs) for each access. 0000 - 07FF (2048) RES1 ROM. The BIOS ROM maps here. 0800 - 08FF (256) UV201. The UV201's registers are visible here. 0900 - 0BFF (768) Cartridge mapped (see below) 0C00 - 0FFF (1024) System RAM (8x 2102 SRAM chips) 1000 - 1FFF (4096) Cartridge ROM is mapped here, either 2K or 4K worth. 2000 - 27FF (2048) RES2 ROM. The second BIOS ROM maps here. 2800 - 28FF (256) ASIC mirror 2900 - 2BFF (768) Cartridge mapped mirror 2C00 - 2FFF (1024) System RAM mirror 3000 - 3FFF (4096) Cartridge ROM mirror 4000 - 7FFF (16384) Mirrors of 0000-3FFF 8000 - BFFF (16384) Mirrors of 0000-3FFF C000 - FFFF (16384) Mirrors of 0000-3FFF The upper 2 address lines from the CPU are not used, so it is not surprising that 4000-FFFF "mirror" the mapping of 0000-3FFF. "Mirroring" is sometimes called "aliasing", and basically means the same peripheral or device is visible in multiple places in the address space due to incomplete decoding. I found it highly interesting how the RES2 ROM is mapped in. Also, 8K games could've been made for the system if BA13 were used as an upper address line to select between the two 4K pages. The amount of penalty cycles per read or write of the CPU space is as follows: 0000 - 07FF : 0 BRCLKs (RES1 ROM) 0800 - 0BFF : 6-10 BRCLKs (UV201 and external device) 0BFF - 1FFF : 4-6 BRCLKs (SRAM and cart ROM) 2000 - 27FF : 4-6 BRCLKs (RES2 ROM) 2800 - 2BFF : 6-10 BRCLKs (UV201 and external device) 2C00 - 3FFF : 4-6 BRCLKs (SRAM and cart ROM) 4000 - FFFF mirror the above sequence 3 more times as in the normal address decoding. * Cartridge mapped area: The area at 0900-0BFF (and again at 2900-2BFF) is interesting, because it is decoded with the UV201, and can be used to place more SRAM on the system or even map a small ROM here. It incurs the full wrath of the wait state generator, however, with 6-10 BRCLKs. But it is easily possible to map SRAM here. I wonder if the APL cartridge does this? It'd be useless without more RAM otherwise I'd think. Pin 29 on the cartridge will go low when this range is being accessed. Pin 40 is then usable as R/W. Little more than 2 NAND gates can be used to hook up some more SRAM on the bus, in a similar fashion to the existing 2102's. Buffered Bus: ------------- Only the UV201 can see this bus completely, the CPU gets a modified view of it. The buffered bus is only 8K deep, and only uses BA0-BA12. BA13 *is* generated but only is connected to the cartridge connector. It could be used to make 8K sized cartridges. 0000 - 07FF (2048) RES2 ROM. The second BIOS ROM maps here. 0800 - 08FF (256) open bus * 0900 - 0BFF (768) open bus * 0C00 - 0FFF (1024) System RAM. 1000 - 1FFF (4096) Cartridge ROM. * Open bus - nothing is mapped here, and typically the last thing on the bus (or something else) will be present if the UV201 DMAs from here. On my perf VB, I have pulldown resistors (this is to help the logic analyzer) so I get 00h mapped in here. On a real unit, the contents will be very random depending on last access, bus noise, phase of the moon and other variations. This means it is possible for a cartridge to map another 1K of graphics here, theoretically. Only things on this bus can be used by the UV201 during graphics rendering, and the UV201 cannot pull graphics out of say, the RES1 ROM directly. But it CAN pull graphics directly out of the cartridge ROM. The game Gladiator was caught pulling graphics directly out of the cartridge ROM when I had it on the logic analyzer. The UV201 CANNOT read anything at 0800-0BFF, even though the UV201 and the cartridge enable are connected here. The WACK line on the UV202 does not go high when the DMA is in progress, and this line is used to enable the UV201 for register access and the cartridge device at 900-BFF. Thus, these two things are not visible during graphics rendering. --- UV201 Specifics: ---------------- First off, the colour table. When the "color" test screen is used, the following is the order (related to the 4 video pins above) that they are displayed: F - black 8 - light grey 9 - dark yellow (orange) A - dark magenta B - dark red C - dark cyan D - dark green E - dark blue 7 - dark grey 0 - white 1 - yellow 2 - magenta 3 - red 4 - cyan 5 - green 6 - light blue 8 F E D C B A 9 0 7 6 5 4 3 2 1 I have compared this with the only screen shot that exists, and it appears to follow this pattern, but dark yellow seems to actually be orange. The circuit also inverts the 4 video lines before using them, so this is why black is the all 1's condition- because then all 4 outputs of the inverters go low. Same with white, it's the all 0's condition, which gets inverted to make all white video. After reading the patent, I know why this was done. Two or more(!) UV201's can be used in parallel if you wish to have more than 16 objects on the screen at once! The patent describes having another UV201 in a cartridge, and now a bunch of the connections on the cartridge make sense! When the primary UV201 is showing black pixels, the video bus is all 1's, and is open collector, so a secondary UV201 can then pull down these pins and substitute some new pixel data in place of the all 1's black pixels. These registers do not correspond to any particular displayed object- any of these objects can be selected via registers 0870-088F! ---- 0800-080F : Object pointer LSB 0810-081F : Object pointer MSB and colour 0820-082F : Object X size, intensity and Xcopy 0830-083F : Object Y size 0840-084F : Object X position These registers are the true 16 objects- these objects reference the above 5 banks of register, via the "X order" bits (see register descriptions below) ---- 0850-085F : Object Y position LSB for list A 0860-086F : Object Y position LSB for list B 0870-087F : Object Y position MSB and X order for list A 0880-088F : Object Y position MSB and X order for list B Control and status registers ---- 08F0 : Y interrupt register 08F2 : Final modifier 08F5 : Background register 08F7 : Command register 08F8 : X-freeze register 08F9 : Y-freeze LSB 08FA : Y freeze MSB and odd/even 08FB : Current Y LSB 0800-080F Object pointer LSB 0810-081F Object pointer MSB and colour ---------------------------------------- These two registers together form a 13 bit pointer and 3 bits for colour. 0810 0800 <- (register number example) 7 0 7 0 --------- --------- CCCA AAAA AAAA AAAA C: Colour These three bits select one of 8 colours for the object. A: Address The 13 address bits form the 13 bit address where data will be fetched from. This is the 8K buffered bus. NOTE: the addresses will be modified during rendering of the screen. They must be reset before the start of the next frame. They will be bumped by the number of bytes read from memory during the frame. NOTE: only the cartridge ROM, system RAM, and RES2 ROM can be addressed. The RES1 ROM cannot be read since it is not on the buffered bus. --- 0820-082F : object X size, intensity and Xcopy ---------------------------------------------- 0820 7 0 --------- XI.W WWWW X: Xcopy Setting this bit will prevent the address pointer from being modified during the scanline (registers 080x and 081x). The net effect is the same 8 bits will be pulled for every 8 pixels of the object displayed. I: Intensity This bit sets the intensity of the colour for this object. 1 = bright, 0 = dim. W: Width of object in bytes. The width of the object is in multiples of 8 pixels. To get width of the object, multiply the value in the W bits by 8. i.e. if W is set to 5, the object will be 40 pixels wide (5*8). Each scanline the UV201 will fetch 5 bytes of data and display them. After each byte of data is read, the address in registers 080x and 081x will be incremented by 1. NOTE: If the X copy bit is set, the UV201 will still (in our above example) pull 5 bytes of data, but it will only increment the address once on the last byte of data read. This allows large objects such as checkerboards or lines to be rendered more efficiently- you do not have to store the entire line of pixels, but only 8 pixels worth that get repeated. The columns and arrow graphics at the top of the screen on Gladiator is a good example of this in use. --- 0830-083F : Object Y size ------------------------- 0830 7 0 --------- ..SS SSSS S: Y size Only 6 bits of this register are used to contain the height of the object, in pixels. The video hardware decrements this register for each scanline of an object that is shown. Only heights from 0-3F are allowed. A height of 1 will render the object for 1 scanline, 2 for 2 scanlines and so on. A height of 0 will render it for 64 scanlines, however. The video hardware will continue to show the object until this register reaches 0. It is decremented at the end of rendering the object on any particular scanline. If it is not reloaded before the next frame, the maximum height of 64 pixels is rendered. --- 0840-084F : Object X position ----------------------------- All 8 bits of this register select the X starting position. --- 0850-085F : Object Y position LSB for list A 0860-086F : Object Y position LSB for list B 0870-087F : Object Y position MSB and X order for list A 0880-088F : Object Y position MSB and X order for list B --------------------------------------------------------- The Object Y position stuff is very odd, there are no less than two lists for some reason. These registers are the true objects, because they control which data is pulled from 0800-084F. The list in use can be selected using bit 6 of 08F7. Both lists are otherwise identical. NOTE: 0850-085F and 0860-086F are functionally identical, and which is in use is selected by bit 6 of 08F7. This goes for 0870-087F and 0880-088F. 0870 0850 <- (register number example) 7 0 7 0 --------- --------- Y... OOOO YYYY YYYY Y: Y Position These 9 bits select which scanline object will be displayed on (Y position). O: Object # Which of the 16 register banks above (0800-080F, through 0840-084F) will be used for this object. NOTE: You must sort the X order entries, so if multiple objects occupy a scanline the entries should be adjacent. This is because the hardware is not fast enough to sort the list itself in real time. I.e. if object 1 is first on the scanline followed by object 2, your software must sort them this way so they occupy adjacent entries in the list. I suspect what happens is after rendering the first object, the next object in the list is immediately checked to see if it needs to be rendered. If so, it is rendered. Later on I will discuss how this works exactly. 08F0 : Y interrupt register 08F2 : Final modifier 08F5 : Background register 08F7 : Command register 08F8 : X-freeze register 08F9 : Y-freeze LSB 08FA : Y freeze MSB and odd/even 08FB : Current Y LSB ----------------------- Before delving into how the rendering engine works, first I will outline the sets of tests I performed, along with the data I gathered. As I progressed building my theory of operation, I made more tests. I will not muddy the waters by explaining how parts work until after all the tests are presented. This hopefully will make it clearer. Hardware tests: --------------- I constructed a set of "interesting" tests. These involve coming up with a pattern of activity to try and suss out how fetching operates. All tests were performed with an add-on board I made. This add-on board is made with a PIC18F452 and a divide by 16 counter (74HC163). The PIC is on a smaller piece of perf board and it replaces the F8 CPU board and user interface. it lets me perform some hardware tests that would be more difficult on the real hardware, if not impossible and gives better control over what happens. This PIC also lets me have some level of interactivity with the hardware, letting me construct a series of tests to perform where only 1 variable at a time is changed. An example of such a series of tests is displaying 8 (or more) 8 pixel wide sprites on a scanline, with 0 pixel spacing. i.e. sprite 0 is rendered at X = 0, sprite 1 is rendered at X = 08h, and so on. In this hypothetical test, I will remove one of the sprites, leaving the others to be rendered... i.e. remove sprite 0, but render 1-7, and not changing the X coordinates of the sprites... i.e. sprite 1 will still render at X = 08h. The next test will be removing sprite 1, then sprite 2, and so on down the line. An example of such a test: 18 18 20 20 20 20 20 20 20 20 no sprites removed 18 18 20 20 20 20 20 22 18 20 sprite 7 removed 18 18 20 20 20 20 22 18 20 20 sprite 6 removed 18 18 20 20 20 22 18 20 20 20 sprite 5 removed 18 18 20 20 22 18 20 20 20 20 sprite 4 removed 18 18 20 22 18 20 20 20 20 20 sprite 3 removed 18 18 22 18 20 20 20 20 20 20 sprite 2 removed 18 20 18 20 20 20 20 20 20 20 sprite 1 removed 22 18 18 20 20 20 20 20 20 20 sprite 0 removed The numbers are the cycle counts of BRCLKs. The first number (18 in the "no sprites removed" line) is how many cycles from the falling edge of HBLANK to the first DMA. The second and subsequent numbers are the number of cycles from the start of the DMA to the start of the NEXT DMA. i.e. the following occurs on the scanline, left to right: HBLANK ENDS | SPRITE INIT OCCURS | DMA | SPRITE CALC | DMA | SPRITE CALC | DMA | | 18 | 18 | 20 | An example number of cycles each thing takes is illustrated above, and how cycles are counted. There are 228 cycles in a scanline. I call cycle 0 the cycle where HBLANK falls (it falls on the falling edge of BRCLK, and its internal state is latched on the rising edge of BRCLK). Cycle number is changed on the rising edge of BRCLK. Because each cycle count in the tables is an aggregate quantity, some creative methods to figure out the exact timing were employed. I seem to have figured most of it out. After the data below I will explain how the hardware appears to work, giving an example that works for all cases. There was little indication what was occuring inside the hardware and recording cycle counts based on DMA starting positions was the easiest and least error prone way to do it. The UV201 is a scanline renderer, and as such only 1 scanline of each sprite activity was recorded. This is fine because each subsequent scanline rendered will be an exact clone of the previous. Y coordinates bear little significance on what happens on each scanline, other than if a sprite will be rendered or not. ---- Tests 1,2,3: In these tests, sprites were placed touching each other. i.e. the 8 pixel wide sprites had X coordinates of 0, 08h, 10h, 18h, etc. The 16 pixel ones had spacings of 00h, 10h, 20h, and the 24 pixel ones had spacings of 00h, 18h, 30h, etc. In each test, a single sprite was removed. Its Y coordinate was changed to push it farther down the screen, eliminating it from display with the others. The other sprites were NOT touched. (8 wide) 18 18 18 20 20 20 20 20 20 20 no sprites removed 18 18 18 20 20 20 20 22 18 20 sprite 7 removed 18 18 18 20 20 20 22 18 20 20 sprite 6 removed 18 18 18 20 20 22 18 20 20 20 sprite 5 removed 18 18 18 20 22 18 20 20 20 20 sprite 4 removed 18 18 18 22 18 20 20 20 20 20 sprite 3 removed 18 18 20 18 20 20 20 20 20 20 sprite 2 removed 18 20 20 20 20 20 20 20 20 20 sprite 1 removed 20 18 18 20 20 20 20 20 20 20 sprite 0 removed (16 wide) 18 22 24 24 24 24 24 24 no sprites removed 18 22 24 24 24 24 24 26 sprite 7 removed 18 22 24 24 24 24 26 22 sprite 6 removed 18 22 24 24 24 26 22 24 sprite 5 removed 18 22 24 24 26 22 24 24 sprite 4 removed 18 22 24 26 22 24 24 24 sprite 3 removed 18 22 26 22 24 24 24 24 sprite 2 removed 18 24 22 24 24 24 24 24 sprite 1 removed 20 22 22 24 24 24 24 24 sprite 0 removed (24 wide) 18 24 26 24 24 24 24 24 no sprites removed 18 24 26 24 24 24 24 26 sprite 7 removed 18 24 26 24 24 24 26 26 sprite 6 removed 18 24 26 24 24 26 26 24 sprite 5 removed 18 24 26 24 26 26 24 24 sprite 4 removed 18 24 26 26 26 24 24 24 sprite 3 removed 18 24 28 26 24 24 24 24 sprite 2 removed 18 26 24 24 24 24 24 24 sprite 1 removed 20 24 24 24 24 24 24 24 sprite 0 removed ---- Tests 4,5,6 In these tests, there is now a space in front of the first sprite, and spaces between sprites. i.e. X coordinates of the 8 pixel wide sprites are: 01h, 0ah, 12h, etc. As before, a single sprite was eliminated from rendering in each test. The cycle counts were recorded: 8 pixels wide 20 20 22 20 20 20 20 20 20 20 20 no sprites removed 20 20 22 20 20 20 20 20 20 20 sprite 9 removed 20 20 22 20 20 20 20 20 20 20 sprite 8 removed 20 20 22 20 20 20 20 20 20 20 sprite 7 removed 20 20 22 20 20 20 20 20 20 20 sprite 6 removed 20 20 22 20 20 20 20 20 20 20 sprite 5 removed 20 20 22 20 20 20 20 20 20 20 sprite 4 removed 20 20 22 20 20 20 20 20 20 20 sprite 3 removed 20 20 22 20 20 20 20 20 20 20 sprite 2 removed 20 20 22 20 20 20 20 20 20 20 sprite 1 removed 20 20 22 20 20 20 20 20 20 20 sprite 0 removed 16 pixels wide 20 24 26 24 24 24 24 24 24 no sprites removed 20 24 26 24 24 24 24 24 sprite 7 removed 20 24 26 24 24 24 24 24 sprite 6 removed 20 24 26 24 24 24 24 24 sprite 5 removed 20 24 26 24 24 24 24 24 sprite 4 removed 20 24 26 24 24 24 24 24 sprite 3 removed 20 24 26 24 24 24 24 24 sprite 2 removed 20 24 26 24 24 24 24 24 sprite 1 removed 20 24 26 24 24 24 24 24 sprite 0 removed 24 pixels wide 20 26 28 28 28 28 28 28 28 no sprites removed 20 26 28 28 28 28 28 28 sprite 7 removed 20 26 28 28 28 28 28 28 sprite 6 removed 20 26 28 28 28 28 28 28 sprite 5 removed 20 26 28 28 28 28 28 28 sprite 4 removed 20 26 28 28 28 28 28 28 sprite 3 removed 20 26 28 28 28 28 28 28 sprite 2 removed 20 26 28 28 28 28 28 28 sprite 1 removed 20 26 28 28 28 28 28 28 sprite 0 removed ---- Test 7: sprite touch test. all sprites 8 pixels wide, with 1 space in front of each. One by one, the space between a set of sprites is elimated, so that the two touch. (i.e. they have adjacent X coordinates such as 12h and 1ah.) 20 20 22 20 20 20 20 20 20 20 20 no sprites touching 20 20 22 20 20 20 20 20.18 22 20 sprites 7/8 touching 20 20 22 20 20 20 20.18 22 20 20 sprites 6/7 touching 20 20 22 20 20 20.18 22 20 20 20 sprites 5/6 touching 20 20 22 20 20.18 22 20 20 20 20 sprites 4/5 touching 20 20 22 20.18 22 20 20 20 20 20 sprites 3/4 touching 20 20 22.18 22 20 20 20 20 20 20 sprites 2/3 touching 20 20.20 22 20 20 20 20 20 20 20 sprites 1/2 touching 20.18 20 20 20 20 20 20 20 20 20 sprites 0/1 touching ---- Test 8: sprite touch test 2. Three sprites touching. Same as before, except now three touch each other. 20 20 22 20 20 20 20 20 20 20 20 no sprites touching 20 20 22 20 20 20 20 20.18.20 22 sprites 7/8/9 touching 20 20 22 20 20 20 20.18.20 22 20 sprites 6/7/8 touching 20 20 22 20 20 20.18.20 22 20 20 sprites 5/6/7 touching 20 20 22 20 20.18.20 22 20 20 20 sprites 4/5/6 touching 20 20 22 20.18.20 22 20 20 20 20 sprites 3/4/5 touching 20 20 22.18.20 22 20 20 20 20 20 sprites 2/3/4 touching 20 20.20.20 22 20 20 20 20 20 20 sprites 1/2/3 touching 20.18.18 22 20 20 20 20 20 20 20 sprites 0/1/2 touching ---- Test 9: More, varied sprite touch tests using two sets of sprites touching, located at different intervals. 20 20 22 20 20 20 20 20 20 20 20 no sprites touching 20.18 20.18 22 20 20 20 20 20 20 sprites 0/1 and 2/3 touching 20.18 20 20.18 22 20 20 20 20 20 sprites 0/1 and 3/4 touching 20.18 20 20 20.18 22 20 20 20 20 sprites 0/1 and 4/5 touching 20 20 22 20 20 20 20 20 20 20 20 no sprites touching 20 20.20 22.18 22 20 20 20 20 20 sprites 1/2 and 3/4 touching 20 20.20 22 20.18 22 20 20 20 20 sprites 1/2 and 4/5 touching 20 20.20 22 20 20.18 22 20 20 20 sprites 1/2 and 5/6 touching 20 20 22 20 20 20 20 20 20 20 20 no sprites touching 20 20 22.18 22.18 22 20 20 20 20 sprites 2/3 and 4/5 touching 20 20 22.18 22 20.18 22 20 20 20 sprites 2/3 and 5/6 touching 20 20 22.18 22 20 20.18 22 20 20 sprites 2/3 and 6/7 touching ---- Tests 10,11,12: In these tests, all sprites were made 8 pixels wide, except for one of the sprites which is then made different widths (16 pixels, 24 pixels, etc). All sprites "touch" each other, and sprite 0 is at X coordinate 00h. i.e. if sprite 0 is set to 24 pixels wide, sprite 1's X coord will be 18h. Test results sprite sizes ---------------------------- 18 18 18 20 20 8,8,8,8,8 Test 10: sprite 0 size varied 18 22 20 20 20 16,8,8,8,8 18 24 18 20 20 24,8,8,8,8 18 28 20 20 20 32,8,8,8,8 18 30 20 20 20 40,8,8,8,8 18 34 20 20 20 48,8,8,8,8 18 38 20 20 20 56,8,8,8,8 18 42 20 20 20 64,8,8,8,8 18 42 20 20 20 72,8,8,8,8 18 46 20 20 20 80,8,8,8,8 18 50 20 20 20 88,8,8,8,8 18 18 18 20 20 8, 8,8,8,8 Test 11: sprite 1 size varied 18 18 22 20 20 8,16,8,8,8 18 18 26 20 20 8,24,8,8,8 18 18 30 20 20 8,32,8,8,8 18 18 30 20 20 8,40,8,8,8 18 18 34 20 20 8,48,8,8,8 18 18 38 20 20 8,56,8,8,8 18 18 42 20 20 8,64,8,8,8 18 18 42 20 20 8,72,8,8,8 18 18 46 20 20 8,80,8,8,8 18 18 50 20 20 8,88,8,8,8 18 18 18 20 20 8,8, 8,8,8 Test 12: sprite 2 size varied 18 18 18 24 20 8,8,16,8,8 18 18 18 24 20 8,8,24,8,8 18 18 18 28 20 8,8,32,8,8 18 18 18 32 20 8,8,40,8,8 18 18 18 36 20 8,8,48,8,8 18 18 18 36 20 8,8,56,8,8 18 18 18 40 20 8,8,64,8,8 18 18 18 44 20 8,8,72,8,8 18 18 18 48 20 8,8,80,8,8 18 18 18 48 20 8,8,88,8,8 -- Tests 13, 14: These two are the same as tests 10 and 11 above, except now there is a space in front of the first sprite. This means sprite 0's X coord is now 01h. All other settings are the same as before. 20 18 18 20 20 8,8,8,8,8 Test 13: same as test 10, now with space in front 20 22 18 20 20 16,8,8,8,8 20 24 20 20 20 24,8,8,8,8 20 28 20 20 20 32,8,8,8,8 20 32 20 20 20 40,8,8,8,8 20 36 20 20 20 48,8,8,8,8 20 36 20 20 20 56,8,8,8,8 20 40 20 20 20 64,8,8,8,8 20 44 20 20 20 72,8,8,8,8 20 48 20 20 20 80,8,8,8,8 20 48 20 20 20 88,8,8,8,8 20 18 18 20 20 8, 8,8,8,8 Test 14: same as test 11, now with space in front 20 18 22 20 20 8,16,8,8,8 20 18 26 20 20 8,24,8,8,8 20 18 30 20 20 8,32,8,8,8 20 18 30 20 20 8,40,8,8,8 20 18 34 20 20 8,48,8,8,8 20 18 38 20 20 8,56,8,8,8 20 18 42 20 20 8,64,8,8,8 20 18 42 20 20 8,72,8,8,8 20 18 46 20 20 8,80,8,8,8 20 18 50 20 20 8,88,8,8,8 ---- Test 15: Like tests 10 and 13, except the space has been placed between sprites 0 and 1. No other changes. i.e. X coord of sprite 0 is 0, and X coord of sprite 1 is 09h (for an 8 pixel wide sprite 0). 18 20 20 20 20 8,8,8,8,8 18 24 18 20 20 16,8,8,8,8 18 26 20 20 20 24,8,8,8,8 18 30 18 20 20 32,8,8,8,8 18 32 18 20 20 40,8,8,8,8 18 36 18 20 20 48,8,8,8,8 18 40 18 20 20 56,8,8,8,8 18 44 18 20 20 64,8,8,8,8 18 44 18 20 20 72,8,8,8,8 18 48 18 20 20 80,8,8,8,8 18 52 18 20 20 88,8,8,8,8 ---- At this point, my test hardware gets an upgrade! I have added a 74HC163 counter, a 16 position binary rotary switch that outputs a 4 bit binary word between 0000b and 1111b, and made a few connections to the unused gates on the Videobrain main perf unit. The upgrade lets me insert "wait states" into the DMA's from the UV201. This hardware lets me insert 0 through 15 extra wait states. 0 extra wait states results in normal behaviour. The real hardware can insert wait states into the UV201 if the CPU is in the process of reading something from RAM or ROM or UV201 regs when the UV201 requests a DMA. The circuit I made lets me simulate various timings of this. ---- Test 16: I have placed 5 sprites on the screen, with their X coordinates "touching" each other, that is sprite 0's X coord is 00h, sprite 1's is 08h, etc. All sprites are 8 pixels wide. The number of extra wait states (BRCLKs) being inserted is noted below, along with the effect it has on timing. 18 18 18 20 20 no extra BRCLKs 18 20 22 20 20 1 extra 18 20 22 20 20 2 extra 18 22 24 24 24 3 extra 18 22 24 24 24 4 extra 18 24 26 24 24 5 extra 18 24 26 24 24 6 extra 18 26 28 28 28 7 extra 18 26 28 28 28 8 extra 18 28 28 28 28 9 extra 18 28 28 28 28 10 extra 18 30 32 32 32 11 extra 18 30 32 32 32 12 extra 18 34 32 32 32 13 extra 18 34 32 32 32 14 extra 18 34 36 36 36 15 extra ---- Test 17: Same as above, but now all the sprites are shifted right 1 pixel, so that the X coords of the first 3 sprites would be 01h, 09h, 11h, etc. Sprites are still 8 pixels wide. 20 18 18 20 20 no extra BRCLKs 20 20 20 20 20 1 extra 20 20 20 20 20 2 extra 20 22 22 24 24 3 extra 20 22 22 24 24 4 extra 20 24 24 24 24 5 extra 20 24 24 24 24 6 extra 20 26 28 28 28 7 extra 20 26 28 28 28 8 extra 20 28 28 28 28 9 extra 20 28 28 28 28 10 extra 20 32 32 32 32 11 extra 20 32 32 32 32 12 extra 20 32 32 32 32 13 extra 20 32 32 32 32 14 extra 20 36 36 36 36 15 extra ---- Test 18: Same as test 16, except this time we eliminate sprite 0. Sprites 1, 2, 3, etc are pushed left so sprite 1's X coord is 00h, sprite 2's is 08h, etc. Again all sprites are 8 pixels wide. NOTE: timing is identical to test 16. 18 18 18 20 20 no extra BRCLKs 18 20 22 20 20 1 extra 18 20 22 20 20 2 extra 18 22 24 24 24 3 extra 18 22 24 24 24 4 extra 18 24 26 24 24 5 extra 18 24 26 24 24 6 extra 18 26 28 28 28 7 extra 18 26 28 28 28 8 extra 18 28 28 28 28 9 extra 18 28 28 28 28 10 extra 18 30 32 32 32 11 extra 18 30 32 32 32 12 extra 18 34 32 32 32 13 extra 18 34 32 32 32 14 extra 18 34 36 36 36 15 extra ---- Test 19: Same as 18, except now we move sprites right 1 pixel and put a space in front. i.e. X coords of the first few sprites are 01h, 09h, 11h, etc. We also are now skipping the first TWO sprites, and starting on sprite 2. 22 18 20 20 20 no extra BRCLKs 22 20 22 20 20 1 extra 22 20 22 20 20 2 extra 22 22 24 24 24 3 extra 22 22 24 24 24 4 extra 22 24 24 24 24 5 extra 22 24 24 24 24 6 extra 22 26 28 28 28 7 extra 22 26 28 28 28 8 extra 22 30 28 28 28 9 extra 22 30 28 28 28 10 extra 22 30 32 32 32 11 extra 22 30 32 32 32 12 extra 22 34 32 32 32 13 extra 22 34 32 32 32 14 extra 22 34 36 36 36 15 extra ---- Test 20: Same as test 19, except now we skip 4 sprites and sprite 4 is at X coord = 00h. 24 18 18 20 20 no extra BRCLKs 24 20 20 20 20 1 extra 24 20 20 20 20 2 extra 24 22 24 24 24 3 extra 24 22 24 24 24 4 extra 24 24 24 24 24 5 extra 24 24 24 24 24 6 extra 24 28 28 28 28 7 extra 24 28 28 28 28 8 extra 24 28 28 28 28 9 extra 24 28 28 28 28 10 extra 24 32 32 32 32 11 extra 24 32 32 32 32 12 extra 24 32 32 32 32 13 extra 24 32 32 32 32 14 extra 24 36 36 36 36 15 extra ---- Test 21: Same as above, only skip the first 6 sprites. 26 18 20 20 20 no extra BRCLKs 26 20 20 20 20 1 extra 26 20 20 20 20 2 extra 26 22 24 24 24 3 extra 26 22 24 24 24 4 extra 26 26 24 24 24 5 extra 26 26 24 24 24 6 extra 26 26 28 28 28 7 extra 26 26 28 28 28 8 extra 26 30 28 28 28 9 extra 26 30 28 28 28 10 extra 26 30 32 32 32 11 extra 26 30 32 32 32 12 extra 26 34 32 32 32 13 extra 26 34 32 32 32 14 extra 26 34 36 36 36 15 extra ---- Test 22: This time we will use a 16 pixel wide sprite for sprite 0, and re-run test 16. 18 22 20 20 20 no extra BRCLKs 18 22 20 20 20 1 extra 18 24 22 20 20 2 extra 18 24 22 24 24 3 extra 18 26 24 24 24 4 extra 18 26 24 24 24 5 extra 18 28 24 24 24 6 extra 18 28 28 28 28 7 extra 18 30 28 28 28 8 extra 18 30 28 28 28 9 extra 18 34 28 28 28 10 extra 18 34 32 32 32 11 extra 18 34 32 32 32 12 extra 18 34 32 32 32 13 extra 18 38 32 32 32 14 extra 18 38 36 36 36 15 extra ---- Test 23: Sprite 0 is increased to 24 pixels wide. Tests repeated. 18 24 18 20 20 no extra BRCLKs 18 26 20 20 20 1 extra 18 26 20 20 20 2 extra 18 28 24 24 24 3 extra 18 28 24 24 24 4 extra 18 30 24 24 24 5 extra 18 30 24 24 24 6 extra 18 34 28 28 28 7 extra 18 34 28 28 28 8 extra 18 34 28 28 28 9 extra 18 34 28 28 28 10 extra 18 38 32 32 32 11 extra 18 38 32 32 32 12 extra 18 38 32 32 32 13 extra 18 38 32 32 32 14 extra 18 42 36 36 36 15 extra ---- Test 24: Sprite 1 is now 16 pixels wide (sprites 0, 3, 4, etc are 8 pixels wide) Interesting note. count goes DOWN then back up at i.e. 3 extra and 7 extra. 18 18 22 20 20 no extra BRCLKs 18 20 22 20 20 1 extra 18 20 26 20 20 2 extra 18 22 24 24 24 3 extra 18 22 28 24 24 4 extra 18 24 26 24 24 5 extra 18 24 30 24 24 6 extra 18 26 28 28 28 7 extra 18 26 32 28 28 8 extra 18 28 32 28 28 9 extra 18 28 32 28 28 10 extra 18 30 32 32 32 11 extra 18 30 36 32 32 12 extra 18 34 36 32 32 13 extra 18 34 36 32 32 14 extra 18 34 36 36 36 15 extra ---- Test 25: Sprite 1 is now 24 pixels wide. 18 18 26 20 20 no extra BRCLKs 18 20 26 20 20 1 extra 18 20 26 20 20 2 extra 18 22 28 24 24 3 extra 18 22 28 24 24 4 extra 18 24 30 24 24 5 extra 18 24 30 24 24 6 extra 18 26 32 28 28 7 extra 18 26 32 28 28 8 extra 18 28 36 28 28 9 extra 18 28 36 28 28 10 extra 18 30 36 32 32 11 extra 18 30 36 32 32 12 extra 18 34 40 32 32 13 extra 18 34 40 32 32 14 extra 18 34 40 36 36 15 extra ---- The basics of UV201 rendering ----------------------------- The UV201 has two major pieces of hardware that are used to render the screen. The first is the actual hardware that performs rendering is fed via a 10 deep FIFO. Each FIFO entry is either a count of background pixels, or 8 sprite pixels + sprite colour to emit. When the FIFO is empty, the rendering hardware simply sends out background pixels until something appears in the FIFO. The second piece of hardware is the fetcher. It fetches data from the RAM or cartridge or BIOS ROM and stuffs it into the FIFO, along with calculating how many blank pixels exist between sprites, if any. --- The renderer: When the FIFO's read port has data present, the rendering hardware fetches it and either serializes the pixel data and outputs it, or it will emit N pixels of background colour. * The FIFO is full when 10 entries are written to it. * The FIFO is writable when it has 8 or less entries, and becomes unwritable when it has 10 entries. This means that when the FIFO is full and data is being removed from it, 2 entries must be removed before any more data can be stuffed into it. * When HBLANK rises, the FIFO is instantly cleared. All 10 entries are zeroed out and the FIFO will immediately start accepting bytes again. --- The fetcher: Most of the magic happens here. First, I will describe what happens during normal operation. "Normal Operation" is defined by not having any sprites that would be rendered past the end of the scanline, spilling into the next. These edge cases will be discussed later. The number of BRCLKs before the first DMA occurs is calculated like so: On entry: usesprite[16] indicates wether the sprite's Y coordinate is in range or not for all 16 sprites. // xdelta register we used to calculate background pixel count xdelta = 0; // start checking on sprite 0 spritenumber = 0; // find first used sprite. while((spritenumber < 16) && (usesprite(spritenumber) == 0)) spritenumber++; cycles = (spritenumber & 7) + 17; // calculate initial delay value if (cycles & 1) cycles++; // even numbers only, round up usegap = 0; // if we load the fifo with bkg pixels if (spritex[spritenumber] > xdelta) { cycles = cycles + 2; // inserting pix count costs 2 cycles writefifo(xdelta); // write it to the fifo xdelta = spritex; usegap = 1; } totalcycles = cycles; // if sprite width is 8 pixels, and there's no blank pixels in front, // and no waitstates, then add 2 to totalcycles if ( (spritewidth[spritenumber] == 1) && (usegap = 0) && (waitstates = 0) ) totalcycles += 2; delay (cycles); // delay the right number of cycles ---- What happens is the cycle count ("cycles") will be 18-26 after the execution of this code (and the fifo might have a background pixel count in it if xdelta was smaller than the first visible xcoordinate of the sprite) The totalcycles variable is used to determine where the UV201 changes from dividing by 2 to dividing by 4 on the DMAs. I am not sure why it does this, but it sure does. Maybe it uses the cycles to calculate the next set of visible sprites on the next scanline? I am not sure why that fixup is needed, when those three conditions are true, but it is. It might be something inside that is ready preferentially and no longer is ready if the DMA is not as short as possible. After this, fetching is more regular and it occurs via a loop that checks the rest of the sprites for visibilty. ---- for (;spritenumber < 16; spritenumber++) { if (spritex[spritenumber] == xdelta) { for (i = spritewidth[spritenumber]; i > 0; i--) { writefifo(spritedata); // write fetched data xdelta += 8; // update xdelta too } } cycles = (3 * spritewidth[spritenumber]) + waitstates + 15; if (cycles & 1) cycles++; // round up totalcycles += cycles; // add to our total if (totalcycles == 46) totalcycles += 2; // breakover point // at breakover point, round cycles up by 4 instead of 2. offset = 0; if (totalcycles > 48) offset = (totalcycles & 2); totalcycles += offset; cycles += offset; // go to next visible sprite while((spritenumber < 16) && (usesprite(spritenumber) == 0)) spritenumber++; // if there's a gap, process it if (spritex[spritenumber] > xdelta) { cycles = cycles + 2; // inserting pix count costs 2 cycles totalcycles += offset; writefifo(xdelta); // write it to the fifo xdelta = spritex; usegap = 1; } delay(cycles); // delay the right number of cycles } ---- Hopefully I can explain a bit more on what's happening. The amount of time it takes to perform the DMA is calculated via the following formula: dmacycles = (3 * spritewidth) + 1 I rolled this into the calculation above by adding 14 to this value because there's a 14 cycle delay imposed by some internal operations of the UV201. The UV201 seems to process stuff every 2 BRCLKs, until cycle 48 is reached. At this point, it starts processing stuff only every 4 BRCLKs. This causes weirdities in the cycle counts. If the UV201 must insert blank background pixels between sprites, this costs 2 cycles, which are added on AFTER calculating the delay of a multiple of 4 cycles (if totalcycles > 48). ---- I have written a QB64 (a open source Qbasic replacement that runs on modern PC's) implementation that reproduces all of the quirks seen in these tests. It performs around 1800 tests with the above data. I will make this available to those who wish to dork around with it. I have left out lots of finer details above, like the FIFO stalling the fetcher if it is full. These need to be emulated. The basics of this aren't too hard, however. The FIFO is deemed full when it has 10 entries, and the fetching engine stops until the FIFO has 8 or less entries in it, whereby it starts running again, and filling it to a maximum level of 10 entries. If rendering "spills over" from one scanline to the next, what happens is the FIFO is instantly cleared on the rising edge of HBLANK, and xdelta is reset to 0. The rendering engine continues from this point, wherever it left off. ---- Double X width: --------------- This is purely a rendering phenomena. The pixel clock to the rendering end of the FIFO is halved. The result is that the screen is now only effectively 114 pixels wide instead of 228. The number of displayable pixels is less than this, however. What this means is that every pixel will be duplicated. The background pixels are duplicated also. This means that if you position several 8 pixel wide sprites like so (X coords): 00h, 10h, 18h They will be rendered at pixels 00h, 20h, and 30h. The first sprite will occupy Xcoords 0-fh, there will be 10h blank pixels, the second sprite will occupy pixels 20-2fh, and the third will occupy 30-3fh. The data fetching end has no clue this is going on and cannot "see" this happening, other than the FIFO fills up a bit faster since it is not being empty as quickly. ---- Double Y width: --------------- This is weird. I *think* I figured out what is going on here. It appears that the UV201 uses a 1 bit register to help determine when it has to double the height of a sprite. The way this seems to work is as follows: When a sprite is marked in range, an "in range" bit is set, and the state of the scanline's lowest bit is saved into a second "height" bit. After the sprite is rendered and the last byte is fetched, the "height" bit is XOR'd with the lowest bit of the scanline counter, and it is XOR'd with the double height flag in the command register. If the result is a 1, the memory pointer will be written back to the pointer register for sprite just rendered. If the result is 0, nothing is written. So, in normal operation the following occurs: on scanline 100 (for arguement sake), sprite 0 will be shown in single height mode. * On scanline 99, we determine that sprite 0 will be shown on scanline 100. The "sprite in range" bit is set for sprite 0, for use on the NEXT scanline. The "height" bit is set to 1, which is (99 & 1). * On scanline 100, sprite 0 will be shown. It is fetched and stuffed into the FIFO. After the last byte is fetched, we XOR the height bit with the lowest bit of the scanline counter: i.e. result = (100 & 1) ^ height. This gives us a "1" bit. We XOR this with the bit in the command register: result = result ^ (command_height_bit). * If the result is "1" then we update the sprite's pointer, and reload the height bit with (scanline & 1). Repeating the above again on the NEXT scanline... * On scanline 101, we load sprite 0 data into the FIFO as before. We then do: result = (scanline & 1) ^ height. This gives us again, a result of 1. We then XOR this with the command height bit as before, which gives us a result of 1 again, so we must write the pointer back and update the height bit again. This repeats for all scanlines of a displayed object. The height bit is effectively toggled every scanline in this case. In the case that the double height bit is set, the following occurs: * On scanline 99, we determine sprite 0 will be shown on scanline 100. The "sprite in range" bit is set like before. The height bit is updated too, which will be 1. * On scanline 100, sprite 0 is shown as before. After it is fetched, we calculate this again: result = (scanline & 1) ^ height, which will be 1 now. We then XOR it with the command register's double height bit. We get a "0" now. * Since the result was 0, we DO NOT update the pointer in the sprite regs. At this point, we have shown one line of the sprite. On the next scanline... * On scanline 101, sprite 0 is shown. After it is fetched we calculate again: result = (scanline & 1) ^ height ^ command_height_bit * This time, the result will be 1, so NOW we update height with (scanline & 1) and write the pointer back to the sprite's pointer registers, thereby advancing to the next scanline. Every time the pointer is updated on the sprite registers, the sprite's Y height is decremented. After decrementing, if the Y height is 0, then the sprite's in range bit is cleared, ending rendering of the sprite. I am pretty sure the above is how it works, because when rendering spills past the end of the scanline, an excessively wide sprite will suddenly become double height, because the height bit does not get checked until the NEXT scanline, resulting in double height. 01234567890123456789012345678901234567890123456789012345678901234567890123456789