Questions on writing a new Mega CD emulator

Near · Post by **Near** » Sun May 05, 2019 4:25 pm

I'm trying to build a support function library to transform between F1 frames and channel frames.

So far, I have EFM working in both directions (hopefully with no table transcoding errors), scrambling working (although I don't have any test scrambled data to verify I implemented it right ...), EDC generation/verification, and RSPC P/Q generation/verification (only thanks to ecm.c)

I have absolutely no idea how to actually use P/Q to correct errors in data sectors if they're detected, and it's also unclear to me how to implement CIRC for the lower levels of the disc. It also seems really weird to me why the first 16 bytes are skipped when calculating the EDC in ecm.c, or why ecm.c is zeroing out the address bytes before computing P/Q RS codes in CD-XA mode.

Here's my code so far ... https://pastebin.com/raw/uY0gWjdV

Any pointers for the missing stuff would be appreciated ^-^

which I thought was what byuu was asking about

You were right the first time, it was.

Games being able to lie about the sector mode is a compelling reason to leave it scrambled.

So now at this point that's 2448-bytes/sector. And from what I see, the two sync control subcode bytes are 14-bit EFM codes that do not exist in the EFM table, so I see why it's generally not stored as 2450-bytes. So I'm probably going to go with 2448-bytes/sector, with lead-in + lead-out, and with scrambled data, as my preferred format, unless something new comes up.

F1ReB4LL · Post by **F1ReB4LL** » Mon May 06, 2019 7:08 am

Nemesis wrote: ↑
Sun May 05, 2019 1:57 pm
I was speaking about the reason for storing them scrambled in a cd image file, which I thought was what byuu was asking about

Ah, missed the point. Yeah, the dumps should be as unaltered as possible. And it raises another question: should the subchannels be stored in the deinterleaved or in the interleaved form? They are interleaved on the CD itself and certain drives are able to output the raw interleaved subs in certain modes.

Eke · Post by **Eke** » Mon May 06, 2019 8:14 am

And it raises another question: should the subchannels be stored in the deinterleaved or in the interleaved form? They are interleaved on the CD itself and certain drives are able to output the raw interleaved subs in certain modes.

For the record, Mega-CD gate-array stores subcode data in raw (interleaved) format (i.e each byte stored in subcode buffer area contains one bit of each subchannel) and deinterleaving is done by BIOS software, so in emulator, when using existing .sub files, you have to re-interleave subchannel data.

I guess the only purpose of storing them deinterleaved like in existing .sub files (96 bytes of P-subchannel data followed by 96-bytes of Q-subchannel data,etc) would be to reduce tracks information processing from PQ-subchannels.

Near · Post by **Near** » Mon May 06, 2019 8:46 am

A thing I've promoted in higan via icarus for a long time now is a separation of concerns:

We should have one set of archival images: things in an optimal, clean format with as consistent-as-possible SHA256 sums so that we ensure the data is not lost.

And we should also have emulator images: things in their native format for direct use in emulators with the least amount of work. Even if we have to generate some of the data in order to load lower-quality formats like ISO or BIN/CUE to create either channel sectors or F3 frames sans parity (DAO/96.)

Take the FDS: the images on the internet are all stripped down, removed of all gaps and block start markers. But you need this data to simulate the seek delays to read data. Then they try to zero out the save files on the disks. The result is an image with a known SHA256 hash, but not one usable in emulators.

Rather than have the emulator try and reproduce the missing data, icarus tries to reproduce it. The result is that, should someone perform raw rips of FDS images (and a few do exist), they will work in higan immediately. And by having a separate imported format, I can write the saves directly to the disk instead of some weird hacky method like storing a patch file of the differences for save games.

We may not have channel-sector or DAO-96 Mega CD images, but if we ever do ... I want to be ready for that. Since it's a copy for playing, it's not meant to be compressed. So I'd rather interleave the subchannel data if that's how it exists on the disc.

Right now though, the best I can do is to turn an ISO into a DAO-96 (though I don't have a CUE parser.) I don't know how to generate CIRC data to make an actual F3 frame, which is needed to produce a channel frame.

Sik · Post by **Sik** » Mon May 06, 2019 3:27 pm

Dumb question but: isn't the spiral at a physical level basically just a long string of bits? Is there any reason for a preservation format to store a disc as anything other than those raw bits? (even with uncorrectable bits since, erm, that's precisely what the error correction scheme in the standard is meant to take care of). This whole thing about what is stored separately or not seems kind of awkward in that context, especially since an emulator would likely want all the bits in the actual order the disc has them (as it already packs all data for a frame close enough). Also a disc would be streamed at a slow enough speed as to make the computations not really that expensive.

Now, that's kind of an oversimplification, as we'd need to account for multi-spiral (multisession) discs. Mega CD doesn't support that as far as I know (?) but a CD format shouldn't be limited to just one system… But the idea for a given spiral is there.

F1ReB4LL · Post by **F1ReB4LL** » Mon May 06, 2019 4:10 pm

byuu wrote: ↑
Mon May 06, 2019 8:46 am
Take the FDS: the images on the internet are all stripped down, removed of all gaps and block start markers. But you need this data to simulate the seek delays to read data. Then they try to zero out the save files on the disks. The result is an image with a known SHA256 hash, but not one usable in emulators.

A little offtopic, but what do you think about the tape preservation? Have you tried to dump (and, maybe, emulate) Family Basic and Study Box tapes (or the SC-3000 ones, since we're on the Sega resource, afterall)? Should they be preserved as .wav files or should it be a digital format convertable into the more or less "original" wave on demand?

Sik wrote: ↑
Mon May 06, 2019 3:27 pm
Dumb question but: isn't the spiral at a physical level basically just a long string of bits? Is there any reason for a preservation format to store a disc as anything other than those raw bits?

The spiral itself doesn't go straight, it "shakes" to left and right a little, its "shaking" is used to encode ATIP on recordable discs and to encode certain protection data on PSX and Saturn discs (at least). For these discs, a straight dump of raw bits isn't enough, you need to decode the wobble as well (or to measure and document the spiral somehow).

Also, there are protections, like StarForce, that perform a series of angular measurements by reading certain sectors in a specific order and measuring the time used by the drive to access those areas. When the disc is printed, the protection tool is used to measure the sectors and generate a CD-key that is included with the disc (either printed on the CD front or glued inside the box). And when you run the disc, the "user" part of the protection does the same measurements and asks you for the CD-key, comparing the results. Each CD has unique data positioning, so the key won't work with a copy or with a straight dump of the 'raw bits', you need to do the DPM measurements, so the virtual drive could emulate the original disc.

Nemesis · Post by **Nemesis** » Tue May 07, 2019 12:18 am

Sik wrote: ↑
Mon May 06, 2019 3:27 pm
Dumb question but: isn't the spiral at a physical level basically just a long string of bits? Is there any reason for a preservation format to store a disc as anything other than those raw bits? (even with uncorrectable bits since, erm, that's precisely what the error correction scheme in the standard is meant to take care of). This whole thing about what is stored separately or not seems kind of awkward in that context, especially since an emulator would likely want all the bits in the actual order the disc has them (as it already packs all data for a frame close enough). Also a disc would be streamed at a slow enough speed as to make the computations not really that expensive.

Now, that's kind of an oversimplification, as we'd need to account for multi-spiral (multisession) discs. Mega CD doesn't support that as far as I know (?) but a CD format shouldn't be limited to just one system… But the idea for a given spiral is there.

It's not a dumb question at all. What you're describing is the 7203 bytes/sector format I talked about a few pages back. For preservation, that's the gold standard, as it's a full dump of all the actual encoded data for a given spiral (effectively a session). As F1ReB4LL talked about, there's also a need for geometric information to accompany that, not only simply to position the session on the physical disk surface, which is very important for multi-session disks, but also for various copy protection schemes (the Saturn is the main one I know of in detail) that rely on advanced geometric properties in order to perform extra tests to verify a disk isn't a copy. The main reason nothing really stores CD data in this form right now is actually quite simple: Nobody until now has ever really been able to capture the data in this form, or at least, nobody has shared CD images that attempt to do so. I don't know of any CD hardware or system that exposes the data in this raw a form to software, so you need to modify hardware to capture this data stream, and you need a powerful data acquisition device in order to capture and stream it at the rate it's coming off the laser. The Domesday Duplicator hardware would be capable of doing this, with some tweaks. Software decoding a raw CD data stream in this form has been done, most recently for that very project in order to decode digital audio from LaserDisc RF images, which are encoded according to the "Red Book" (CD) standards. For preservation, that's the ideal format.

You would absolutely need a really structured community that was organised to do error correction on such raw CD rips though, as every disk is going to rip differently every time, even on the same drive. Since it requires specialist hardware, you also greatly limit the possible contributors. You also need at least three separate copies of each disk you want to rip in order to fully error correct the images. Those are the hard parts to be honest. It's also the case that these low-level images don't give you anything over a 2448 byte/sector rip except the ability to encode C1/C2 errors, as the extra data is all known, required structure, and it's impossible to decode the actual data from that structure data without performing C1/C2 error correction. All that the software-visible world will be able to get back from this are the C1/C2 error rate counts, which is valuable don't get me wrong, but it's a LOT of extra data and extra work for emulators to process (IE, doing C1/C2 error correction in software), for not much payoff. I suspect a data transformation process like byuu proposes, where if necessary you convert other formats at load time into an intermediate format which is suitable for emulation (which is likely to be 2448 byte/sector, possibly with a C1/C2 error count per sector if you wanted to go that far), is the best of both worlds.

I expect a "new wave" of CD rips will materialise in the future, kind of a redump 2.0 effort, which are based around 7203 bytes/sector rips. That's the logical route to go. Those images would be useful for virtual CD drives and the like, but for emulation 2448 byte/sector seems like the right fit. It's also easier for the here and now, where nobody has 7203 bytes/sector rips. It's a real pain to convert from 2448 byte/sector back to 7203 bytes/sector, and to do the C1/C2 error correction yourself. If it was me, I'd wait for someone else to write that code and nick it personally. If that happens in the future though, you can just take that 7203 bytes/sector rip, and extract a 2448 byte/sector rip from it, with C1/C2 error counts per sector hanging off the side, and that'd be all you'd need to fake out a MegaCD disk 100% accurately. For the Saturn, you'd need that geometric data I talked about earlier to handle the copy protection (the "wobble"), but it's more likely you'd fake that and call it a day.

Eke wrote:For the record, Mega-CD gate-array stores subcode data in raw (interleaved) format (i.e each byte stored in subcode buffer area contains one bit of each subchannel) and deinterleaving is done by BIOS software, so in emulator, when using existing .sub files, you have to re-interleave subchannel data.

Interleaved is definitely more convenient for emulation. Storing them separated into each channel per sector is more common though, and has better existing tooling support too so it's a bit more convenient from that point of view. I think either would work, just pick one. Either way it's fairly straightforward to convert between the two. I was planning to release the MegaLD rips with them separated, but that was mostly to keep the format closer to the CloneCD image format.

Near · Post by **Near** » Tue May 07, 2019 8:13 am

nobody has shared CD images that attempt to do so.

Indeed, I really need some examples of sectors in the full channel frame format and in the final decoded format, so that I can try to implement these conversion functions. I don't even have example data of what CIRC encoded bytes should even look like to know if I'm doing it right.

You would absolutely need a really structured community that was organised to do error correction on such raw CD rips though, as every disk is going to rip differently every time, even on the same drive.

Honest question ... is it a problem? As long as all rips can be downsampled and error-corrected back to 2448-bytes/sector frames, and the checksums match up across multiple disc reads (to rule out scratches and smudges), then the discs are acting like they would in the real world. And the real world isn't perfect, unfortunately. We can't expect our rips to be.

I suspect a data transformation process like byuu proposes, where if necessary you convert other formats at load time into an intermediate format which is suitable for emulation (which is likely to be 2448 byte/sector, possibly with a C1/C2 error count per sector if you wanted to go that far), is the best of both worlds.

The API will be a challenge as well, but I'm going to try and make it agnostic to the rest of nall (my template library), although that may not be optional with the CUE parser. Text formats are a nightmare with the C++ STL.

But the goal is to design a clean API that lets us store and transform the data into any formats we want. The Mega CD can descramble data selectively, so we need scrambled data. The PC Engine CD might not allow that (I have no idea), so maybe we can omit it there. There may be a system where C1/C2 error rates are used as ballparks to detect emulation. There are systems where there's wobble to encode additional data.

With the right library, we can take CDs in any format, and convert them to exactly the format we need. Obviously if you use a lower-quality rip with less information, the results may or may not work, but that just is what it is.

Right now these algorithms above F1 frames only exist in research papers. Let's fix that.

It's a real pain to convert from 2448 byte/sector back to 7203 bytes/sector, and to do the C1/C2 error correction yourself.

It really shouldn't be. Generating the 2048->2352 reed-solomon codes is trivial, as is verifying data with them. Actually correcting bad 2048-byte data using them is not. This operation is trivial because Neill Corlett figured it out and put it in code form. It's literally just a few XORs against a polynomial table. But to decipher that from the official papers, you'd need a PhD in mathematics.

I don't believe C1/C2 is going to be an epic leap in difficulty (famous last words), but the problem is finding someone that can read ECMA-130, etc.

For the Saturn, you'd need that geometric data I talked about earlier to handle the copy protection (the "wobble"), but it's more likely you'd fake that and call it a day.

If it were me, my first idea would be to encode wobble as a separate bit-stream, something like, repeat: {
<varint> bits the spiral is steady and within expected parameters for
<onebit> 0 = spiral becoming more narrow; 1 = spiral becoming more wide
}

The precision of how many physical bits on the disc are represented by the bitstream could be configurable, but in general the <varint> should make the resulting spiral log files small when they're only used to encode a single copy protection string. It'd be up to the individual system (eg the drive's firmware) how often to sample the drifting to extract data.

F1ReB4LL · Post by **F1ReB4LL** » Tue May 07, 2019 11:37 am

Btw, is anyone familiar with the LibreDrive hacked firmwares? Is it possible to use them not only for ripping the movies from DVDs/BDs, but to read the entire CD/DVD/BD contents?

King Of Chaos · Post by **King Of Chaos** » Thu May 09, 2019 1:14 pm

Some Genesis, Sega/Mega CD and Saturn tech manual scans got posted.

https://archive.org/details/SegaManuals

Just grab the 75MB zip file with all the scanned PDFs. Lots of interesting stuff in those manuals, hopefully some helpful stuff towards Sega CD emulation.

Eke · Post by **Eke** » Fri May 10, 2019 6:36 am

hopefully some helpful stuff towards Sega CD emulation.

That's a great release, with a lot of interesting manuals but there does not seem to be any unknown Genesis or Sega CD hardware documentation in there (I can not tell for Saturn emulation though).
It adds a few Genesis / Sega CD Technical Bulletins that were previously missing in current available documentation but, even if they are always interesting to have ( teamplayer protocol and 'ssf2' banking hardware are officially described in there), they don't really give any new information that could be useful for existing emulation.
Similarly, Genesis software manual and Sega CD hardware/software/BIOS/CDC/etc manuals are the same as the ones we already had, just with different watermark (it's still nice to have full Genesis Software manual, with the complement part, in a single pdf though).

The most interesting parts are probably the CTrac Sega CD devkit documentation and Super Mega Drive dev console manual.
The latter is basically a Mega Drive with extended RAM and I/O features for debugging and interfacing with a PC (it maybe also uses VDP extended VRAM mode and external CRAM feature but I am not sure, it's just that 128K VRAM mode is briefly described, with the bit in VDP registers to control it, and there is extended color palette mapped in I/O region).
CTrac system is even more interesting I think since it's basically software emulation of CD drive on a PC (using CD image file in a specific format which looks similar to cue/bin but with also support for leadin/leadout and subcode data definition) interfacing with hardware/software emulation of CDD micro-controller and Sega CD interface (that's what the 6303 microcontroller sourcecode published by Nemesis some days ago is for... nice timing). Now, the only thing we miss for complete CDD+Drive emulation is sourcecode or disassembly of SEGA.EXE / EMULATE.EXE file (seems like the latter is for an earlier version of CTrac emulation system). In particular, it apparently emulates accurate disc spin up/down + seek access time in software, which is kinda only roughly simulated in current Sega CD emulators and is required by some Sega CD games.

F1ReB4LL · Post by **F1ReB4LL** » Fri May 10, 2019 11:01 am

Mega CD specification doc is nice. No official Saturn CD format specification around, btw?

Huge · Post by **Huge** » Sat May 18, 2019 9:20 pm

F1ReB4LL wrote: ↑
Fri May 10, 2019 11:01 am
Mega CD specification doc is nice. No official Saturn CD format specification around, btw?

ST-040-R4-051795 and tech bulletin #7 details those. It's nothing we haven't known before, though. It mostly details things like minimum LBA, the ISO9660 filesystems, sector frame sizes - all standard CDROM stuff. The only really Saturn specific stuff is the code you have to put in sectors 0-15, but those are well known.

F1ReB4LL · Post by **F1ReB4LL** » Sun May 19, 2019 11:56 pm

Huge wrote: ↑
Sat May 18, 2019 9:20 pm
It mostly details things like minimum LBA, the ISO9660 filesystems, sector frame sizes - all standard CDROM stuff. The only really Saturn specific stuff is the code you have to put in sectors 0-15, but those are well known.

I've actually hoped to see something similar to the "GD-ROM Format Specification" doc with the exact address of the security ring, since it's quite hard to find out experimentally.

Near · Post by **Near** » Tue May 21, 2019 6:09 pm

After implementing more of the CDD ...

Genesis Plus GX seems to start in the "reading TOC" (0x9) mode when the system is reset and a disc is inserted.

The BIOS then sends a "stop" (0x1) command, which GX lets the BIOS read back once, and then it switches to "reading TOC" mode again.

I don't see any code in GX that transitions away from "reading TOC" into "playing" (0x1) mode after reading 150 sectors.

And indeed, within two or three 1/75th-second intervals after the stop->reading TOC mode takes effect,the BIOS starts querying TOC information, well before the 150-sector lead-in is completed to get all of the TOC information extracted. So even though it's not had a chance to read the entire TOC, I act as though it has and provide valid information.

It asks for 0x2:0x4 (start/end track#s; I return 1/1), 0x2:0x3 (disc completion time; I return 4 minutes for a test ISO), 0x2:0x5:0x1 (track start time for track #1, I return 00:02:00 here). Then it seeks via 0x4 to 00:01:70, or five sectors before the start of the first track. Skipping the seek emulation for now, I just seek immediately and then pause the drive at sector 145.

After this, the CDD requests the absolute time (0x2:0x0), which I report as 00:01:70, and then it requests to begin reading the disc (0x3). So I switch to playback status mode (0x1).

At this point, the BIOS sends a bunch of CDC commands:
0x1 (IFCTRL) = 0x38
0x1 (IFCTRL) = 0x3a
0xf (RESET)
0x1 (IFCTRL) = 0x3a
0x8,0x9 (WA) = 0x0000
0xc,0xd (PT) = 0x0000
0xa (CTRL0) = 0xa0
0xb (CTRL1) = 0xf8

And then ... nothing. It just hangs at "checking disc" on the BIOS screen. No CDC commands, only 0x0 (idle) CDD commands repeated forever. It doesn't matter whether I generate IRQs every 1/75th of a second and increment the current sector being read or not. I also tried switching the CDD to paused status once it hits sector 150 (the start of the first track), but same exact result.

I guess it's waiting for some other kind of acknowledgement that a sector from the disc has been read in, but I'm not sure what that'd be ...

Anyone have advice? Really wish there was a flow-chart for all of this ._.

SpritesMind.Net

Questions on writing a new Mega CD emulator

Re: Questions on writing a new Mega CD emulator

Re: Questions on writing a new Mega CD emulator

Re: Questions on writing a new Mega CD emulator

Re: Questions on writing a new Mega CD emulator

Re: Questions on writing a new Mega CD emulator

Re: Questions on writing a new Mega CD emulator

Re: Questions on writing a new Mega CD emulator

Re: Questions on writing a new Mega CD emulator

Re: Questions on writing a new Mega CD emulator

Re: Questions on writing a new Mega CD emulator

Re: Questions on writing a new Mega CD emulator

Re: Questions on writing a new Mega CD emulator

Re: Questions on writing a new Mega CD emulator

Re: Questions on writing a new Mega CD emulator

Re: Questions on writing a new Mega CD emulator