We weren't sure about it a few years ago, but by now it should beclear to everyone that CD-ROM's are here to stay. Most PC's are equippedwith CD-ROM readers, and most major PC software packages are beingdistributed on CD-ROM's.
Under DOS (and Windows, which uses the DOS file system) files arewritten to both hard and floppy disks with a so-called FAT (FileAllocation Table) file system.
Files on a CD-ROM, however, are written to a different standard,called ISO9660. ISO9660 is rather complex and poorly written, andobviously contains a number of diplomatic compromises among advocates ofDOS, UNIX, MVS and perhaps other operating systems.
The simplified version presented here includes only features thatwould normally be found on a CD-ROM to be used in a DOS system and whichare supported by the Microsoft MS-DOS CD-ROM Extensions (MSCDEX). It isbased on ISO9660, on certain documents regarding MSCDEX (version 2.10),and on the contents of some actual CD-ROM's.
Where a field has a specific value on a CD-ROM to be used with DOS,that value is given in this document. However, in some cases a briefdescription of values for use with other operating systems is given insquare brackets.
ISO9660 makes provisions for sets of CD-ROM's, and apparently evenpermits a file system to span more than one CD-ROM. However, thisfeature is not supported by MSCDEX.
The directory structure on a CD-ROM is almost exactly like that on aDOS floppy or hard disk. (It is presumed that the reader of thisdocument is reasonably familiar with the DOS file system.) For thisreason, DOS and Windows applications can read files from a CD-ROM justas they would from a floppy or hard disk.
There are only a few differences, which do not affect mostapplications:
Of course, neither DOS, nor UNIX, nor any other operating system canWRITE files to a CD-ROM as it would to a floppy or hard disk, because aCD-ROM is not rewritable. Files must be written to the CD-ROM by aspecial program with special hardware.
The information on a CD-ROM is divided into sectors, which arenumbered consecutively, starting with zero. There are no gaps in thenumbering.
Each sector contains 2048 8-bit bytes. (ISO9660 apparently permitsother sector sizes, but the 2048-byte size seems to be universal.)
When a number of sectors are to be read from the CD-ROM, they shouldbe read in order of increasing sector number, if possible, since that isthe order in which they pass under the read head as the CD-ROM rotates.Most implementations arrange the information so sectors will be read inthis order for typical file operations, although ISO9660 does notrequire this in all cases.
The order of bytes within a sector is considered to be the order inwhich they appear when read into memory; i.e., the "first" bytes areread into the lowest memory addresses. This is also the order used inthis document; i.e., the "first" bytes in any list appear at the top ofthe list.
Names and extensions of files and directories, the volume name, andsome other names are expressed in standard ASCII character codes(although ISO9660 does not use the name ASCII). According to ISO9660,only capital letters, digits, and underscores are permitted. However,DOS permits some other punctuation marks, which are sometimes found onCD-ROM's, in apparent defiance of ISO9660.
MSCDEX does offer support for the kanji (Japanese) character set.However, this document does not cover kanji.
Where ISO9660 requires file or directory names or extensions to besorted, the usual ASCII collating sequence is used. That is, twodifferent names or extensions are compared as follows:
A 16-bit numeric value (usually called a word) may be represented ona CD-ROM in any of three ways:
A 32-bit numeric value (usually called a double word) may berepresented on a CD-ROM in any of three ways:
The first sixteen sectors (sector numbers 0 to 15, inclusive) containnothing but zeros. ISO9660 does not define the contents of thesesectors, but for DOS they are apparently always written as zeros. Theyare apparently reserved for use by systems that can be booted from aCD-ROM.
For example PlayStation disks has the PSX logo TMD and the license
information
Sector 16 and a few of the following sectors contain a series ofvolume descriptors. There are several kinds of volume descriptor, butonly two are normally used with DOS. Each volume descriptor occupiesexactly one sector.
The last volume descriptors in the series are one or more VolumeDescriptor Set Terminators. The first seven bytes of a Volume DescriptorSet Terminator are 255, 67, 68, 48, 48, 49 and 1, respectively. Theother 2041 bytes are zeros. (The middle bytes are the ASCII codes forthe characters CD001.)
The only volume descriptor of real interest under DOS is the PrimaryVolume Descriptor. There must be at least one, and there is usually onlyone. However, some CD-ROM's have two or more identical Primary VolumeDescriptors. The contents of a Primary Volume Descriptor are as follows:
length in bytes contents -------- --------------------------------------------------------- 1 1 6 67, 68, 48, 48, 49 and 1, respectively (same as Volume Descriptor Set Terminator) 1 0 32 system identifier 32 volume identifier 8 zeros 8 total number of sectors, as a both endian double word 32 zeros 4 1, as a both endian word [volume set size] 4 1, as a both endian word [volume sequence number] 4 2048 (the sector size), as a both endian word 8 path table length in bytes, as a both endian double word 4 number of first sector in first little endian path table, as a little endian double word 4 number of first sector in second little endian path table, as a little endian double word, or zero if there is no second little endian path table 4 number of first sector in first big endian path table, as a big endian double word 4 number of first sector in second big endian path table, as a big endian double word, or zero if there is no second big endian path table 34 root directory record, as described below 128 volume set identifier 128 publisher identifier 128 data preparer identifier 128 application identifier 37 copyright file identifier 37 abstract file identifier 37 bibliographical file identifier 17 date and time of volume creation 17 date and time of most recent modification 17 date and time when volume expires 17 date and time when volume is effective 1 1 1 0 512 reserved for application use (usually zeros) 653 zeros
The first 11 characters of the volume identifier are returned as thevolume identifier by standard DOS system calls and utilities.
Other identifiers are not used by DOS, and may be filled with ASCIIblanks (32).
Each date and time field is of the following form:
length in bytes contents -------- --------------------------------------------------------- 4 year, as four ASCII digits 2 month, as two ASCII digits, where 01=January, 02=February, etc. 2 day of month, as two ASCII digits, in the range from 01 to 31 2 hour, as two ASCII digits, in the range from 00 to 23 2 minute, as two ASCII digits, in the range from 00 to 59 2 second, as two ASCII digits, in the range from 00 to 59 2 hundredths of a second, as two ASCII digits, in the range from 00 to 99 1 offset from Greenwich Mean Time, in 15-minute intervals, as a twos complement signed number, positive for time zones east of Greenwich, and negative for time zones west of Greenwich
If the date and time are not specified, the first 16 bytes are allASCII zeros (48), and the last byte is zero.
Other kinds of Volume Descriptors (which are normally ignored by DOS)have the following format:
length in bytes contents -------- --------------------------------------------------------- 1 neither 1 nor 255 6 67, 68, 48, 48, 49 and 1, respectively (same as Volume Descriptor Set Terminator) 2041 other things
The path tables normally come right after the volume descriptors.However, ISO9660 merely requires that each path table begin in thesector specified by the Primary Volume Descriptor.
The path tables are actually redundant, since all of the informationcontained in them is also stored elsewhere on the CD-ROM. However, theiruse can make directory searches much faster.
There are two kinds of path table -- a little endian path table, inwhich multiple-byte values are stored in little endian order, and a bigendian path table, in which multiple-byte values are stored in bigendian order. The two kinds of path tables are identical in every otherway.
A path table contains one record for each directory on the CD-ROM(including the root directory). The format of a record is as follows:
length in bytes contents -------- --------------------------------------------------------- 1 N, the name length (or 1 for the root directory) 1 0 [number of sectors in extended attribute record] 4 number of the first sector in the directory, as a double word 2 number of record for parent directory (or 1 for the root directory), as a word; the first record is number 1, the second record is number 2, etc. N name (or 0 for the root directory) 0 or 1 padding byte: if N is odd, this field contains a zero; if N is even, this field is omitted
According to ISO9660, a directory name consists of at least one andnot more than 31 capital letters, digits and underscores. For DOS theupper limit is eight characters.
A path table occupies as many consecutive sectors as may be requiredto hold all its records. The first record always begins in the firstbyte of the first sector. Except for the single byte described above, nopadding is used between records; hence the last record in a sector isusually continued in the next following sector. The unused part of thelast sector is filled with zeros.
The records in a path table are arranged in a precisely specifiedorder. For this purpose, each directory has an associated number calledits level. The level of the root directory is 1. The level of each otherdirectory is one greater than the level of its parent. As noted above,ISO9660 does not permit levels greater than 8.
The relative positions of any two records are determined as follows:
A directory consists of a series of directory records in one or moreconsecutive sectors. However, unlike path records, directory records maynot straddle sector boundaries. There may be unused space at the end ofeach sector, which is filled with zeros.
Each directory record represents a file or directory. Its format isas follows:
length in bytes contents -------- --------------------------------------------------------- 1 R, the number of bytes in the record (which must be even) 1 0 [number of sectors in extended attribute record] 8 number of the first sector of file data or directory (zero for an empty file), as a both endian double word 8 number of bytes of file data or length of directory, excluding the extended attribute record, as a both endian double word 1 number of years since 1900 1 month, where 1=January, 2=February, etc. 1 day of month, in the range from 1 to 31 1 hour, in the range from 0 to 23 1 minute, in the range from 0 to 59 1 second, in the range from 0 to 59 (for DOS this is always an even number) 1 offset from Greenwich Mean Time, in 15-minute intervals, as a twos complement signed number, positive for time zones east of Greenwich, and negative for time zones west of Greenwich (DOS ignores this field) 1 flags, with bits as follows: bit value ------ ------------------------------------------ 0 (LS) 0 for a norma1 file, 1 for a hidden file 1 0 for a file, 1 for a directory 2 0 [1 for an associated file] 3 0 [1 for record format specified] 4 0 [1 for permissions specified] 5 0 6 0 7 (MS) 0 [1 if not the final record for the file] 1 0 [file unit size for an interleaved file] 1 0 [interleave gap size for an interleaved file] 4 1, as a both endian word [volume sequence number] 1 N, the identifier length N identifier P padding byte: if N is even, P = 1 and this field contains a zero; if N is odd, P = 0 and this field is omitted R-33-N-P unspecified field for system use; must contain an even number of bytes
The length of a directory includes the unused space, if any, at theends of sectors. Hence it is always an exact multiple of 2048 (thesector size). Since every directory, even a nominally empty one,contains at least two records, the length of a directory is never zero.
All fields in the first record (sometimes called the "." record)refer to the directory itself, except that the identifier length is 1,and the identifier is zero. The root directory record in the PrimaryVolume Descriptor also has this format.
All fields in the second record (sometimes called the ".." record)refer to the parent directory, except that the identifier length is 1,and the identifier is 1. The second record in the root directory refersto the root directory.
The identifier for a subdirectory is its name. The identifier for afile consists of the following fields, in the order given:
Some implementations for DOS omit (4) and (5), and some usepunctuation marks other than underscores in file names and extensions.
Directory records other than the first two are sorted as follows:
[ISO9660 permits names containing more than eight characters andextensions containing more than three characters, as long as both ofthem together contain no more than 30 characters.]
It is apparently permissible under ISO9660 to use two or moreconsecutive records to represent consecutive pieces of the same file.Bit 7 of the flags byte is set in every record except the last one.However, this technique seems pointless and is apparently not used. Itis not supported by MSCDEX.
Interleaving is another technique that is apparently seldom used. Itis not supported by MSCDEX (version 2.10).
ISO9660 does not specify the order of directory or file sectors. Itmerely requires that the first sector of each directory or file be inthe location specified by its directory record, and that the sectors fordirectories and non-interleaved files be consecutive.
However, most implementations arrange the directories so eachdirectory follows its parent, and the data sectors for the files in eachdirectory lie immediately after the directory and immediately before thenext following directory. This appears to be an efficient arrangementfor most applications.
Some implementations go one step further and order the directories inthe same manner as the corresponding path table records.
Extended attribute records contain file and directory informationused by operating systems other than DOS, such as permissions andlogical record lengths.
A CD-ROM written for DOS normally does not contain any extendedattribute records.
When reading a CD-ROM containing extended attribute records, earlyversions of MSCDEX simply returned incorrect results. Later versionslearned to skip over extended attribute records.
Philip J. Erdelsky
San Diego, California USA
pje@acm.org
http://www.alumni.caltech.edu/~pje/