ECMA-130 (Compact Disc) on the fingers

Three moves equals one fire. Raking up the old box, which smelled of acetone, with multi-layered dust on the bottom (as well as my wife did not see), I came across the compact disks that I knew very well. Here is one of my favorite childhood movies ... but my once favorite arcade toy ...

A strange thing is curiosity. Here is a copy of the CD on the table - hopelessly outdated in our enlightened XXI century data storage format; but still I wonder how the data is stored there? .. What is the data storage stack itself? .. How are the errors fixed? .. What is the redundancy of the code? ..

In my childhood I had enough knowledge about the laser beam, about some head, about “this spinning thing” and about mysterious pitas .
')
No sooner said than done. Having looked through the ECMA-130 standard (there is, by the way, the domestic standard: GOST 27667-88 ) found a lot of interesting details. For example, I guessed about redundancy, but I could not even think that for recording 700 MB of data “in reality” 1943 MB is recorded (That is, 2.776 times more) ...

Schematically, the whole stack can be represented by the picture:

The stack will be viewed from the top down.
That is, from the moment of data transfer to the drive before recording the pits themselves.
It should be said that not all the disk area is used for recording / storing / reading information.
The compact disk is divided into zones ( areas ):

Center Hole (Center Hole) is the “same hole” with a diameter of 15 mm (± 0.1 mm) for which the disc itself is attached.
First transition area - “ring”, between 15 and 20 mm from the center of the disk.
Clamping area (Clamping area) - as the name implies, this area is necessary so that the disk “does not skip” when reading / writing. (26-33 mm)
The second transition area is the “second ring”, between 33 and 44 mm from the center of the disk.
The information area is the “informational useful” part of the CD. It is located at a distance of 44 mm to 118 mm from the center.
Rim area (rim) - the last area. It is a ring from 118 mm to 120 mm from the center.

It is the information zone that is interesting to us, therefore, we will speak about it in more detail. The information area is divided into the following “subzones”:

inner buffer zone
user data zone
outer buffer zone

All three translations are made by GOST ... So I will not translate from myself. What does the “program” mean? When it comes to data, I cannot attach my mind. If someone on Habré can answer me why the user data zone was transferred as a “program zone”, I would be extremely grateful!

Phew ... With the physical education program, I hope, done away with. Of course, I missed many points related to widths and lengths; with coating method; beam wavelength; and other physical properties. However, firstly, this is not the purpose of this article; secondly, I myself have not yet dismantled more than half of the information. Yes, and no desire, to be honest ... My curiosity is purely "programmer" orientation;)

Information Tracks

The first phase is the splitting into “informational tracks”. Already there are two options for recording, in digital form ( Digital Data Tracks, DDT ) and audio data ( Audio Tracks ).
In the future we will consider only digital data. All following information is correct only for DDT .

Sector

Data is divided into 8 bits (one byte each) and grouped into sectors .
It's funny that the number of sectors in the disk is not defined by the standard. It depends on how much data to “get” to write to disk ...

This seems a bit ridiculous. Nevertheless, various companies used to detect counterfeit disks used this absurdity in their time. (Since this is “going aside,” I wrote in more detail under the spoiler just below, at the end of the chapter.)

Of course, the length of the track is more or less fixed and, in general, QoS can be guaranteed.

There are three ways (mode) to write data in sectors:

Sector Mode (00) is an empty sector filled with data, consisting of 0x00 bytes.
Sector Mode (01) - use EDC , PQ and CIRC coding. (more on that below)
Sector Mode (02) - no P and Q coding, only CIRC coding

Here are the pictures:

Counterfeit Detection Method

The method, like all brilliant, is simple.

Record license key .

You must first write all the necessary information to disk.
Then we select 2 sectors. For example 103123 and 120234 sectors. Denote these sectors as A and B.
Select two bytes: one byte on each sector. For example, the 4th byte of the first sector and the 8th byte of the second
Then, calculate the angle between these bytes in the sector. How to do it? Suppose you have access to a low-level reading driver and you know the time of one turn. Then you should calculate the time spent reading between A and B. Dividing this time by the time of one revolution, it is possible to calculate the angle with a certain error.
The value of the angle, rounded to a certain sign, is fed to the input of the hash function . The hash values are taken several characters, for example, the last 3 characters.
These three characters are written to the license key.

License key verification procedure .

The user is asked to enter the license key.
Calculate the angle between the selected bytes A and B sectors
We calculate the checked list of angles. For example, we calculated an angle of 33.343 °. Suppose that rounding is accurate to a degree. Round and get 33 °. Suppose the error is ± 2 °. List of angles: [31 °, 32 °, 33 °, 34 °, 35 °].
For each corner from the list, we calculate the hash. Take a few characters from the hash. For example, last 3 characters.
If at least one hash from the list coincided with the hash from the license key, then we conclude that the disk is licensed. Otherwise the disk is counterfeit.

Error correction (Sector Mode 01 only)

Useful Information ( User Mode ) consists of 2048 bytes in 01 mode; or 2336 bytes in 02 mode.
Which mode to choose? It all depends on what reliability requirement you require.
Sector Mode 01 is more reliable as it uses additional EDC checking and PQ coding.

EDC

Error Detection Code (EDC) , as the name implies, is intended only for detection, but not for error correction.

Here is its polynomial: P (x) = (x ¹⁶ + x ¹⁵ + x ² +1) (x ¹⁶ + x ² + x + 1)

Intermediate

Eight bytes of the Intermediate field are filled with zero bytes ( 0x00 ).
I really don’t know why they were left ... Maybe “in reserve” (they like it in IT standards), ~~but maybe this is a cunning plan for steganographic data transfer.~~

P and Q coding (RSPC)

Reed-Solomon Product-like Code (RSPC) , it is also P + Q encoding used from 12 to 2075 bytes of data in mode 01 . I will omit the details, you can read them in Annex A of the ECMA-130 standard .

Bytes 12 through 2075 and test 2 076 through 2 351 make up 2340 bytes of data. This data is divided into two blocks of 1170 bytes each. Fragmentation occurs as in school physical education classes. " On perrrvyy- vtorrrroy pay! ". That is, odd and even bytes.

Next comes the encoding of external and internal codes. External is called P-coding , internal Q-coding .

Picture with P and Q encoding

For more understanding: the picture is only with Q encoding

The most difficult in understanding of the ECMA-130 stack is passed. Now it will be much easier.

Scrambling

Go to scramble. This is one scrambling sector:

Each such sector is called the Scrambled Sector .

And what is “scrambling” and why is it necessary?

On the meaning of scrambling briefly and succinctly in a comment to one of my posts wrote snapdragon

Scrambler is needed to make the spectrum of the signal uniform. Otherwise, with homogeneous data (for example, many repeating ones or zeros), the signal energy will be concentrated in a narrow range.

F ₁ , F ₂ and F ₃ frames

Each Scrambled Sector is broken into frames, 24 bytes each.
These frame'y have the name: F ₁ frame .
Each Scrambled Sector we have consists of 2352 bytes.
Accordingly, each sector is divided into 98 frames.

CIRC coding (F ₂ frame)

Cross Interleaved Reed-Solomon (CIRC) coding is performed for each F ₁ frame .
This code corrects errors with an input word length of 24 bytes and an output word length of 32 bytes.
Moreover, unlike EDC and RSPC coding, CIRC coding is applied to all Sector Mode .
The resulting sequence of 32 bytes is called the F ₂ frame .

Check byte

One check byte is added to the beginning of each F ₂ frame , and it turns out an F ₃ frame with a length of 33 bytes (32 + 1 = 33).

8-to-14 encoding

At this stage, the data each byte (8 bits) is converted to 14 bits of data. Conversion is carried out according to the table.

I will not give the whole table, you can find it in Annex D of the ECMA-130 standard .

... ... 00010000 10000000100000 00010001 10000010000000 00010010 10010010000000 00010011 00100000100000 00010100 01000010000000 00010101 00000010000000 00010110 00010010000000 ... ...

Why the need for 8-to-14 coding is not specified in the standard. (The standard and does not have to answer the questions WHY, the standard should have answers to questions HOWLY) ...

I have one hypothesis. The fact is that the real world is not as “perfect” as programmers see it. For example, a drawn point is a small “blob”, and a drawn line always has an area; otherwise our eyes would not see the point and the line ... For this reason, I would venture to make a number of assumptions. I emphasize that I have never worked professionally with the manufacture of CD discs. These are just assumptions. (Discussion in comments is absolutely welcome!).

Hypotheses.

Pete is not "perfect" burned on the surface of the disk, so you need some space next to the pit, because for this space, a burned pit can “jump in”.
Most likely, there are certain problems with the synchronization of the head itself. Too many consecutive zeros are bad.
Perhaps a large number of units is an additional "load" on the read head. Therefore, their reduction will significantly increase the life of the CD drive. On average, we have 4 units of 8 bits of data. In 8-to-14 coding in a code word , we have 1 or 2 units. That is two times less.

Counting redundancy

Let's see how redundant the CD protocol is:

Depending on Sector Mode:
- Sector Mode 01 (PQ coding) - At the input of a block of 2048 bytes, at the output of 2352 . Therefore redundancy is: ^2352/2048 = 1.148
- Sector Mode 02 (without PQ coding) - _2352/2336 = 1.007
Scrambling is a trifle, but for the order we take into account: ^{(12 + 2340)} / ₂₃₄₀ = 1.005
F ₁ -F ₂ -F ₃ frames - 33/24 = 1.375
8-to-14 encoding - 14/8 = 1.750

Multiplying everything, we get: 1.148 ⋅ 1.005 ⋅ 1.375 ⋅ 1.750 = 2.776 . Thus, the disc itself is recorded as a result 2.776 times more information than “useful information”.
For example, if the amount of “useful information” is 700 MB, 1943 MB of data is actually recorded on the disk.

For Sector Mode 02, PQ coding is not used. For this mode, the redundancy is: 1.007 ⋅ 1.005 ⋅ 1.375 1.750 = 2.435 .

Bonus: SCSI Multimedia Commands

There is a standard SCSI Multimedia Commands . It describes the commands for raw data reading. The READ CD and WRITE CD commands allow you to read 2,352 bytes of data from the entire sector. However, I did not find the commands for reading the F-fraims ... In principle, if you write redundant information for which partial losses are not terrible (for example, video, telematics)
you can do without F ₁ -F ₂ -F ₃ frames by increasing the “payload” 1.375 times.

There are also a number of unused areas in the CD (for example, the same Intermidiate ), which can also be used. For example, for the tasks of steganography.

Unfortunately I did not find the OpenSource code that implements these features ...
If there are specialists on this issue on the Habré, I will be glad to receive a reference (plus me in karma).

Source: https://habr.com/ru/post/269311/

All Articles