195 lines
5.6 KiB
Text
195 lines
5.6 KiB
Text
|
.SH
|
||
|
Block Devices
|
||
|
.PP
|
||
|
The block device I/O system is like a
|
||
|
protocol stack of filters.
|
||
|
There are a set of pseudo-devices that call
|
||
|
recursively to other pseudo-devices and real devices.
|
||
|
The protocol stack is compiled from a configuration
|
||
|
string that specifies the order of pseudo-devices and devices.
|
||
|
Each pseudo-device and device has a set of entry points
|
||
|
that corresponds to the operations that the file system
|
||
|
requires of a device.
|
||
|
The most notable operations are
|
||
|
.CW read ,
|
||
|
.CW write ,
|
||
|
and
|
||
|
.CW size .
|
||
|
.PP
|
||
|
The device stack can best be described by
|
||
|
describing the syntax of the configuration string
|
||
|
that specifies the stack.
|
||
|
Configuration strings are used
|
||
|
during the setup of the file system.
|
||
|
For a description see
|
||
|
.I fsconfig (8).
|
||
|
In the following recursive definition,
|
||
|
.I D
|
||
|
represents a
|
||
|
string that specifies a block device.
|
||
|
.IP "\fID\fP = (\fIDD\fP...)"
|
||
|
.br
|
||
|
This is a set of devices that
|
||
|
are concatenated to form a single device.
|
||
|
The size of the catenated device is the
|
||
|
sum of the sizes of each sub-device.
|
||
|
.IP "\fID\fP = [\fIDD\fP...]"
|
||
|
.br
|
||
|
This is the interleaving of the
|
||
|
individual devices.
|
||
|
If there are N devices in the list,
|
||
|
then the pseudo-device is the N-way block
|
||
|
interleaving of the sub-devices.
|
||
|
The size of the interleaved device is
|
||
|
N times the size of the smallest sub-device.
|
||
|
.IP "\fID\fP = {\fIDD\fP...}"
|
||
|
.br
|
||
|
This is a set of devices that
|
||
|
constitute a `mirror' of the first sub-device, and form a single device.
|
||
|
A write to the device is performed,
|
||
|
at the same block address,
|
||
|
on the sub-devices, in right-to-left order.
|
||
|
A read from the device is performed on each sub-device,
|
||
|
in left-to-right order, until a read succeeds without error,
|
||
|
or the set is exhausted.
|
||
|
One can think of this as a poor man's RAID 1.
|
||
|
The size of the device is the size of the smallest sub-device.
|
||
|
.IP "\fID\fP = \f(CWp\fP\fIDN1.N2\fP"
|
||
|
.br
|
||
|
This is a partition of a sub-device.
|
||
|
The sub-device is partitioned into 100 equal pieces.
|
||
|
If the size of the sub-device is not divisible by 100,
|
||
|
then there will be some slop thrown away at the top.
|
||
|
The pseudo-device starts at the N1-th piece and
|
||
|
continues for N2 pieces. Thus
|
||
|
.CW p\fID\fP67.33
|
||
|
will be the
|
||
|
last third of the device
|
||
|
.I D .
|
||
|
.IP "\fID\fP = \f(CWf\fP\fID\fP"
|
||
|
.br
|
||
|
This is a fake write-once-read-many device simulated by a
|
||
|
second read-write device.
|
||
|
This second device is partitioned
|
||
|
into a set of block flags and a set of blocks.
|
||
|
The flags are used to generate errors if a
|
||
|
block is ever written twice or read without being written first.
|
||
|
.IP "\fID\fP = \f(CWx\fP\fID\fP"
|
||
|
.br
|
||
|
This is a byte-swapped version of the file system on D.
|
||
|
Since the file server currently writes integers in metadata to disk
|
||
|
in native byte order, moving a file system to a machine of the other
|
||
|
major byte order (e.g., MIPS to Pentium)
|
||
|
requires the use of
|
||
|
.CW x .
|
||
|
It knows the sizes of the various integer fields in the file system metadata.
|
||
|
Ideally, the file server would follow the Plan 9 religion and write a consistent
|
||
|
byte order on disk, regardless of processor.
|
||
|
In the mean time, it should be possible to automatically determine the need
|
||
|
for byte-swapping by examining data in the super-block of each file system,
|
||
|
though this has not been implemented yet.
|
||
|
.IP "\fID\fP = \f(CWc\fP\fIDD\fP"
|
||
|
.br
|
||
|
This is the cache/WORM device made up of a cache (read-write)
|
||
|
device and a WORM (write-once-read-many) device.
|
||
|
More on this later.
|
||
|
.IP "\fID\fP = \f(CWo\fP"
|
||
|
.br
|
||
|
This is the dump file system that is the
|
||
|
two-level hierarchy of all dumps ever taken on a cache/WORM.
|
||
|
The read-only root of the cache/WORM file system
|
||
|
(on the dump taken Feb 18, 1995) can
|
||
|
be referenced as
|
||
|
.CW /1995/0218
|
||
|
in this pseudo device.
|
||
|
The second dump taken that day will be
|
||
|
.CW /1995/02181 .
|
||
|
.IP "\fID\fP = \f(CWw\fP\fIN1.N2.N3\fP"
|
||
|
.br
|
||
|
This is a SCSI disk on controller N1, target N2 and logical unit number N3.
|
||
|
.IP "\fID\fP = \f(CWh\fP\fIN1.N2.0\fP"
|
||
|
.br
|
||
|
This is an (E)IDE or *ATA disk on controller N1, target N2
|
||
|
(target 0 is the IDE master, 1 the slave device).
|
||
|
These disks are currently run via programmed I/O, not DMA,
|
||
|
so they tend to be slower to access than SCSI disks.
|
||
|
.IP "\fID\fP = \f(CWr\fP\fIN1\fP"
|
||
|
.br
|
||
|
This is the same as
|
||
|
.CW w ,
|
||
|
but refers to a side of a WORM disc.
|
||
|
See the
|
||
|
.I j
|
||
|
device.
|
||
|
.IP "\fID\fP = \f(CWl\fP\fIN1\fP"
|
||
|
.br
|
||
|
This is the same as
|
||
|
.CW r ,
|
||
|
but one block from the SCSI disk is removed for labeling.
|
||
|
.IP "\fID\fP = \f(CWj(\fP\fID\d\s-2\&1\s+2\u\fID\d\s-2\&2\s+2\u\f(CW*)\fID\d\s-2\&3\s+2\u\f1"
|
||
|
.br
|
||
|
.I D\d\s-2\&1\s+2\u
|
||
|
is the juke box SCSI interface.
|
||
|
The
|
||
|
.I D\d\s-2\&2\s+2\u 's
|
||
|
are the SCSI drives in the juke box
|
||
|
and the
|
||
|
.I D\d\s-2\&3\s+2\u 's
|
||
|
are the demountable platters in the juke box.
|
||
|
.I D\d\s-2\&1\s+2\u
|
||
|
and
|
||
|
.I D\d\s-2\&2\s+2\u
|
||
|
must be
|
||
|
.CW w .
|
||
|
.I D\d\s-2\&3\s+2\u
|
||
|
must be pseudo devices of
|
||
|
.CW w ,
|
||
|
.CW r ,
|
||
|
or
|
||
|
.CW l
|
||
|
devices.
|
||
|
.PP
|
||
|
For
|
||
|
.CW w ,
|
||
|
.CW h ,
|
||
|
.CW l ,
|
||
|
and
|
||
|
.CW r
|
||
|
devices any of the configuration numbers
|
||
|
can be replaced by an iterator of the form
|
||
|
.CW <\fIN1-N2\fP> .
|
||
|
N1 can be greater than N2, indicating a descending sequence.
|
||
|
Thus
|
||
|
.Ex
|
||
|
[w0.<2-6>]
|
||
|
.Ee
|
||
|
is the interleaved SCSI disks on SCSI targets
|
||
|
2 through 6 of SCSI controller 0.
|
||
|
The main file system on
|
||
|
Emelie
|
||
|
is defined by the configuration string
|
||
|
.Ex
|
||
|
c[w1.<0-5>.0]j(w6w5w4w3w2)(l<0-236>l<238-474>)
|
||
|
.Ee
|
||
|
This is a cache/WORM driver.
|
||
|
The cache is three interleaved disks on SCSI controller 1
|
||
|
targets 0, 1, 2, 3, 4, and 5.
|
||
|
The WORM half of the cache/WORM
|
||
|
is 474 jukebox disks.
|
||
|
Another file server,
|
||
|
.I choline ,
|
||
|
has a main file system defined by
|
||
|
.Ex
|
||
|
c[w<1-3>]j(w1.<6-0>.0)(l<0-124>l<128-252>)
|
||
|
.Ee
|
||
|
The order of
|
||
|
.CW w1.<6-0>.0
|
||
|
matters here, since the optical jukebox's WORM drives's
|
||
|
SCSI target ids,
|
||
|
as delivered,
|
||
|
run in descending order relative to the numbers of the drives
|
||
|
in SCSI commands
|
||
|
(e.g., the jukebox controller is SCSI target 6,
|
||
|
drive #1 is SCSI target 5,
|
||
|
and drive #6 is SCSI target 0).
|