nobody has shared CD images that attempt to do so.
Indeed, I really need some examples of sectors in the full channel frame format and in the final decoded format, so that I can try to implement these conversion functions. I don't even have example data of what CIRC encoded bytes should even look like to know if I'm doing it right.
You would absolutely need a really structured community that was organised to do error correction on such raw CD rips though, as every disk is going to rip differently every time, even on the same drive.
Honest question ... is it a problem? As long as all rips can be downsampled and error-corrected back to 2448-bytes/sector frames, and the checksums match up across multiple disc reads (to rule out scratches and smudges), then the discs are acting like they would in the real world. And the real world isn't perfect, unfortunately. We can't expect our rips to be.
I suspect a data transformation process like byuu proposes, where if necessary you convert other formats at load time into an intermediate format which is suitable for emulation (which is likely to be 2448 byte/sector, possibly with a C1/C2 error count per sector if you wanted to go that far), is the best of both worlds.
The API will be a challenge as well, but I'm going to try and make it agnostic to the rest of nall (my template library), although that may not be optional with the CUE parser. Text formats are a nightmare with the C++ STL.
But the goal is to design a clean API that lets us store and transform the data into any formats we want. The Mega CD can descramble data selectively, so we need scrambled data. The PC Engine CD might not allow that (I have no idea), so maybe we can omit it there. There may be a system where C1/C2 error rates are used as ballparks to detect emulation. There are systems where there's wobble to encode additional data.
With the right library, we can take CDs in any format, and convert them to exactly the format we need. Obviously if you use a lower-quality rip with less information, the results may or may not work, but that just is what it is.
Right now these algorithms above F1 frames only exist in research papers. Let's fix that.
It's a real pain to convert from 2448 byte/sector back to 7203 bytes/sector, and to do the C1/C2 error correction yourself.
It really shouldn't be. Generating the 2048->2352 reed-solomon codes is trivial, as is verifying data with them. Actually correcting bad 2048-byte data using them is not. This operation is trivial because Neill Corlett figured it out and put it in code form. It's literally just a few XORs against a polynomial table. But to decipher that from the official papers, you'd need a PhD in mathematics.
I don't believe C1/C2 is going to be an epic leap in difficulty (famous last words), but the problem is finding someone that can read ECMA-130, etc.
For the Saturn, you'd need that geometric data I talked about earlier to handle the copy protection (the "wobble"), but it's more likely you'd fake that and call it a day.
If it were me, my first idea would be to encode wobble as a separate bit-stream, something like, repeat: {
<varint> bits the spiral is steady and within expected parameters for
<onebit> 0 = spiral becoming more narrow; 1 = spiral becoming more wide
}
The precision of how many physical bits on the disc are represented by the bitstream could be configurable, but in general the <varint> should make the resulting spiral log files small when they're only used to encode a single copy protection string. It'd be up to the individual system (eg the drive's firmware) how often to sample the drifting to extract data.