This is a disassembly of the graphic format used by the Sega Genesis game "MLBPA Baseball." To see the game I am talking about, click here:
https://www.youtube.com/watch?v=oRTTBcjfyP0. I can tell you, it's based on the LZSS format.
Code: Select all
move.w $7414(a5), -(a7)
move.w #$ffff, $7414(a5)
movem.l a6/a5/a4/a3/a2/a1/a0/d7/d6/d5/d4/d3/d2/d1/d0, -(a7)
; set VRAM write command
; input: a1.w = VRAM starting location
move.w a1, d0
andi.l #$0000ffff, d0
asl.l #2, d0
lsr.w #2, d0
ori.w #$4000, d0
swap d0
move.l d0, $00c00004
lea $00c00000, a1
lea $ffff6a9a, a4
move.w #0, d1
move.l (a3)+, d2
lea $ffff744e, a2
movea.l a2, a0
move.l #$20202020, d3
move.w #$03ff, d0
loc_0000076e:
move.l d3, (a0)+
dbf d0, loc_0000076e
move.w #$0fee, d7
moveq #0, d3
moveq #0, d6
; read a bit from the bitfield
loc_0000077c:
dbf d3, loc_00000784
move.b (a3)+, d6
moveq #7, d3
loc_00000784:
lsr.b #1, d6
bcc.w loc_000007b2
; this is what happens when a 1 is read in the bitfield
; it's an uncompressed byte
move.b (a3)+, d0
move.b d0, (a4,d1.w)
addi.w #1, d1
andi.w #3, d1
bne.w loc_0000079e
move.l (a4), (a1)
loc_0000079e:
subq.l #1, d2
beq.w loc_000007fc
move.b d0, (a2,d7.w)
addq.w #1, d7
andi.w #$0fff, d7
bra.w loc_0000077c
; this is what happens when a 0 is read in the bitfield
; it's a backreference
loc_000007b2:
moveq #0, d4
move.b (a3)+, d4
move.b (a3)+, d0
move.b d0, d5
andi.w #$00f0, d0
asl.w #4, d0
or.w d0, d4
andi.w #$000f, d5
addq.w #2, d5
loc_000007c8:
move.b (a2,d4.w), d0
addq.w #1, d4
andi.w #$0fff, d4
move.b d0, (a4,d1.w)
addi.w #1, d1
andi.w #3, d1
bne.w loc_000007e4
move.l (a4), (a1)
loc_000007e4:
subq.l #1, d2
beq.w loc_000007fc
move.b d0, (a2,d7.w)
addq.w #1, d7
andi.w #$0fff, d7
dbf d5, loc_000007c8
bra.w loc_0000077c
loc_000007fc:
tst.w d1
beq.w loc_00000804
move.l (a4), (a1)
loc_00000804:
movem.l (a7)+, d0/d1/d2/d3/d4/d5/d6/d7/a0/a1/a2/a3/a4/a5/a6
move.w (a7)+, $7414(a5)
As I study this, here are my notes:
- The routine is located at $000724 in ROM.
- The a5 register is always set to $FF0000 throughout the whole program.
- The lower half of d1 is used as input: The starting VRAM location. The a3 register is used as the source of the data.
- There's some kind of 4-byte header; probably the number of bytes in the file.
- The buffer is 4096 bytes long, but the cursor starts at the 18th-to-last byte, rather than at the beginning. Is there any reason for that?
- a2 points this buffer, which is located at $FF744E in RAM.
- a4 points a 4-byte buffer, which is located at $FF6A9A.
- d7 is used to indicate where the 4096-byte buffer cursor is.
- d1 is used to indicate where the 4-byte buffer cursor is.
- Every time a byte is written, it's written to both buffers. If the fourth byte is written to the 4-byte buffer, those bytes get written to VRAM.
- d6 is used to read the bitfields.
- Bitfields are 8 bits long, read right to left, and are interspersed with the data.
- In the bitfields, a 1 is an uncompressed byte and a 0 is a backreference.
- Backreferences are two bytes: Consider them as a single 16-bit value. Then, bits 0-3 are the length + 3. The address is comprised of bits 7-4 then bits 15-8 from most significant to least significant.
- The algorithm stops when the appropriate number of bytes is written.
Here's an example of a backreference in this format. Say that the location is $123 and the length nybble is $4 (meaning 7 bytes). Then, these are stored as the 16-bit value $2314. If you want a C++ example:
Code: Select all
// "location" ranges from 0x000 to 0xFFF
// "length" ranges from 3-18
byte[0] = location & 0xFF;
byte[1] = ((location >> 4) & 0xF0) | (length - 3);
Here's an example. The EA Sports logo that appears at the very beginning of the game has its graphics stored starting at $1069B4 in ROM, which are then decompressed to $5380 in VRAM.
The first bytes are 00 00 1A 00, which means we will be decompressing something that is 6,656 bytes long.
The first byte is $51, so that's our bitfield. In binary, that's 01010001. There will be a byte, 3 backreferences, a byte, a backreference, another byte, and another backreference.
FF | EE FF | 00 0F | 12 0F | F7 | 22 04 | 71 | 13 0D
Here's the breakdown.
- The first byte is FF.
- The next 2 bytes are a backreference. With EE FF, that means the location to copy from is $FEE and the length is $F, or 18 bytes. Since the only byte written so far is $FF, that means that byte will be copied to $FEF-$FFF, and then $000. All these 19 bytes contain $FF.
- Another backreference, this time 00 0F, so the location is $000 and the length is 18 bytes. The result is now that bytes $001-$012 are all $FF.
- Even another backreference, with bytes 12 0F. The location is $012 and the length is 18 bytes. Bytes $013-$024 are all $FF.
- Next is an uncompressed byte, $F7. $025 is now $F7.
- A backreference coming next. The bytes are 22 04, so the location is $022 and the length is $4, or 7 bytes. The bytes written since $022 are FF FF FF F7, so this sequence will be written. But when the length isn't enough to cover all the bytes backreferenced, the sequence just repeats. Therefore, $026-$02C will contain FF FF FF F7 FF FF FF.
- Another uncompressed byte, this time a $71, so $02D is now 71.
- Last bit in the bitfield. It's a backreference, with bytes 13 0D. That means the location is $013 and the length is $D, or 16 bytes. All that was written to $013-$022 were FF bytes, so that means 16 $FF bytes are written to $02E-$03D.
And here's the result for bytes $000-$03D of the buffer:
Code: Select all
FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF
FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF
FF FF FF FF FF F7 FF FF FF F7 FF FF FF 71 FF FF
FF FF FF FF FF FF FF FF FF FF FF FF FF FF
$FEE-$FFF also has all $FF bytes written. Then, the pattern continues until all 6,656 bytes have been written to VRAM, at which point decompression stops.