In respect of the storage units, as we have seen, they have a series of associated devices, depending on the type of interface:
/dev/had master disk, first IDE connector;
/dev/hdb slave disk of the first connector,
/dev/hdc master second connector,
/dev/hdd slave second connector.
SCSI: /dev/sda, /dev/sdb devices... following the numbering of the peripheral devices in the SCSI Bus.
Diskettes: /dev/fdx devices, with x diskette number (starting in 0). There are different devices depending on the capacity of the diskette, for example, a 1.44 MB diskette in disk drive A would be /dev/fd0H1440.
With regard to the partitions, the number that follows the device indicates the partition index within the disk and it is treated as an independent device: /dev/hda1 first partition of the first IDE disk, or /dev/sdc2, second partition of the third SCSI device. In the case of the IDE disks, these allow four partitions, known as primary partitions, and a higher number of logical partitions. Therefore, if /dev/hdan, n is less than or equal to 4, then it will be a primary partition; if not, it will be a logical partition with n being higher than or equal to 5.
With the disks and the associated file systems, the basic processes that we can carry out are included in:
Creation of partitions or modification of partitions. Through commands such as fdisk or similar (cfdisk, sfdisk).
Formatting diskettes: different tools may be used for diskettes: fdformat (low-level formatting), superformat (formatting at different capacities in MSDOS format), mformat (specific formatting creating standard MSDOS file systems).
Creation of Linux file systems, in partitions, using the mkfs command. There are specific versions for creating diverse file systems, mkfs.ext2, mkfs.ext3 and also non-Linux file systems: mkfs.ntfs, mkfs.vfat, mkfs.msdos, mkfs.minix, or others. For CD-ROMs, commands such as mkisofs for creating the ISO9660s (with joliet or rock ridge extensions), which may be an image that might subsequently be recorded on a CD/DVD, which along with commands such as cdrecord will finally allow us to create/save the CD/DVDs. Another particular case is the mkswap order, which allows us to create swap areas in the partitions, which will subsequently be activated or deactivated with swapon and swapoff.
Setting up file systems: mount, umount. commands
Status verification: the main tool for verifying Linux file systems is the fsck command. This command checks the different areas of the file system to verify the consistency and check for possible errors and to correct these errors where possible. The actual system automatically activates the command on booting when it detects situations where the system was not switched off properly (due to a cut in the electricity supply or an accidental shutting down of the machine) or when the system has been booted a certain number of times; this check usually takes a certain amount of time, usually a few minutes (depending on the size of the data). There are also particular versions for other file systems: fsck.ext2, fsck.ext3, fsck.vfat, fsck.msdos etc. The fsck process is normally performed with the device in read only mode with the partitions mounted; it is advisable to unmount the partitions for performing the process if errors are detected and it is necessary to correct the errors. In certain cases, for example, if the system that has to be checked is the root system (/) and a critical error is detected, we will be asked to change the system's runlevel execution mode to the root execution mode and to perform the verification process there. In general, if it is necessary to verify the system; this should be performed in superuser mode (we can switch between the runlevel mode with the init or telinit commands).
Backup processes: whether in the disk, blocks of the disk, partitions, file systems, files... There are various useful tools for this: tar allows us to copy files towards file or tape units; cpio, likewise, can perform backups of files towards a file; both cpio and tar maintain information on the permissions and file owners; ddmakes it possible to make copies, whether they are files, devices, partitions or disks to files; it is slightly complex and we have to have some low-level information, on the type, size, block or sector, and it can also be sent to tapes.
Various utilities: some individual commands, some of which are used by preceding processes to carry out various treatments: badblocks for finding defective blocks in the device; dumpe2fs for obtaining information on Linux file systems; tune2fs makes it possible to carry out Linux file system tuning of the ext2 or ext3 type and to adjust different performance parameters.
We will now mention two subjects related to the concept of storage space, which are used in various environments for the basic creation of storage space. The use of RAID software and the creation of dynamic volumes.
The configuration of disks using RAID levels is currently one of the most widely-used high-availability storage schemes, when we have various disks for implementing our file systems.
The main focus on the different existing techniques is based on a fault-tolerance that is provided from the level of the device and the set of disks, to different potential errors, both physical or in the system, to avoid the loss of data or the lack of coherence in the system. As well as in some schemes that are designed to increase the performance of the disk system, increasing the bandwidth of these available for the system and applications.
Today we can find RAID in hardware mainly in corporate servers (although it is beginning to appear in desktops), where there are different hardware solutions available that fulfil these requirements. In particular, for disk-intensive applications, such as audio and/or video streaming, or in large databases.
In general, this hardware is in the form of cards (or integrated with the machine) of RAID-type disk drivers, which implement the management of one or more levels (of the RAID specification) over a set of disks administered with this driver.
In RAID a series of levels (or possible configurations) are distinguished, which can be provided (each manufacturer of specific hardware or software may support one or more of these levels). Each RAID level is applied over a set of disks, sometimes called RAID array (or RAID disk matrix), which are usually disks with equal sizes (or equal to group sizes). For example, in the case of an array, four 100 GB disks could be used or, in another case, 2 groups (at 100 GB) of 2 disks, one 30 GB disk and one 70 GB disk. In some cases of hardware drivers, the disks (or groups) cannot have different sizes; in others, they can, but the array is defined by the size of the smallest disk (or group).
We will describe some basic concepts on some levels in the following list (it should be remembered that, in some cases, the terminology has not been fully accepted, and it may depend on each manufacturer):
RAID 0: The data are distributed equally between one or more disks without information on parity or redundancy, without offering fault-tolerance. Only data are being distributed; if the disk fails physically, the information will be lost and we will have to recover it from the backup copies. What does increase is the performance, depending on the RAID 0 implementation, given that the read and write options will be divided among the different disks.
RAID 1: An exact (mirror) copy is created in a set of two or more disks (known as a RAID array). In this case, it is useful for the reading performance (which can increase lineally with the number of disks) and especially for having a tolerance to faults in one of the disks, given that (for example, with two disks) the same information is available. RAID 1 is usually adequate for high-availability, such as 24x7 environments, where we critically need the resources. This configuration also makes it possible (if the hardware supports this) to hot swap disks. If we detect a fault in any of the disks, we can replace the disk in question without switching off the system with another disk.
RAID 2: In the preceding systems, the data would be divided in blocks for subsequent distribution; here, the data are divided into bits and redundant codes are used to correct the data. It is not widely used, despite the high performance levels that it could provide, as it ideally requires a high number of disks, one per data bit, and various for calculating the redundancy (for example, in a 32 bit system, up to 39 disks would be used).
RAID 3: It uses byte divisions with a disk dedicated to the parity of blocks. This is not very widely used either, as depending on the size of the data and the positions, it does not provide simultaneous accesses. RAID 4 is similar, but it stripes the data at the block level, instead of byte level, which means that it is possible to service simultaneous requests when only a single block is requested.
RAID 5: Block-level striping is used, distributing the parity among the disks. It is widely used, due to the simple parity scheme and due to the fact that this calculation is implemented simply by the hardware, with good performance levels.
RAID 0+1 (or 01): A mirror stripe is a nested RAID level; for example, we implement two groups of RAID 0, which are used in RAID 1 to create a mirror between them. An advantage is that, in the event of an error, the RAID 0 level used may be rebuilt thanks to the other copy, but if more disks need to be added, they have to be added to all the RAID 0 groups equally.
RAID 10 (1+0): striping of mirrors, groups of RAID 1 under RAID 0. In this way, in each RAID 1 group, a disk may fail without ensuing loss of data. Of course, this means that they have to be replaced, otherwise the disk that is left in the group becomes another possible error point within the system. This configuration is usually used for high-performance databases (due to the fault tolerance and the speed, as it is not based on parity calculations).
Some points that should be taken into account with regard to RAID in general:
RAID improves the system's uptime, as some of the levels make it possible for the system to carry on working consistently when disks fail and, depending on the hardware, it is even possible to hot swap the problematic hardware without having to stop the system, which is especially important in critical systems.
RAID can improve the performance of the applications, especially in systems with mirror implementations, where data striping permits the lineal read operations to increase significantly, as the disks can provide simultaneous read capability, increasing the data transfer rate.
RAID does not protect data; evidently, it does not protect data from other possible malfunctions (virus, general errors or natural disasters). We must rely on backup copy schemes.
Data recovery is not simplified. If a disk belongs to a RAID array, its recovery should be attempted within that environment. Software that is specific to the hardware drivers is necessary to access the data.
On the other hand, it does not usually improve the performance of typical user applications, even if they are desktop applications, because these applications have components that access RAM and small sets of data, which means they will not benefit from lineal reading or sustained data transfers. In these environments, it is possible that the improvement in performance and efficiency is hardly even noticed.
Information transfer is not improved or facilitated in any way; without RAID, it is quite easy to transfer data, by simply moving the disk from one system to another. In RAID's case, it is almost impossible (unless we have the same hardware) to move one array of disks to another system.
In GNU/Linux, RAID hardware is supported through various kernel modules, associated to different sets of manufacturers or chipsets of these RAID drivers. This permits the system to abstract itself from the hardware mechanisms and to make them transparent to the system and the end user. In any case, these kernel modules allow us to access the details of these drivers and to configure their parameters at a very low level, which in some cases (especially in servers that support a high I/O load) may be beneficial for tuning the disks system that the server uses in order to maximise the system's performance.
The other option that we will analyse is that of carrying out these processes through software components, specifically GNU/Linux's RAID software component.
GNU/Linux has a kernel of the so-called Multiple Device (md) kind, which we can consider as a support through the driver of the kernel for RAID. Through this driver we can generally implement RAID levels 0,1,4,5 and nested RAID levels (such as RAID 10) on different block devices such as IDE or SCSI disks. There is also the linear level, where there is a lineal combination of the available disks (it doesn't matter if they have different sizes), which means that disks are written on consecutively.
In order to use RAID software in Linux, we must have RAID support in the kernel, and, if applicable, the md modules activated (as well as some specific drivers, depending on the case (see available drivers associated to RAID, such as in Debian with modconf). The preferred method for implementing arrays of RAID disks through the RAID software offered by Linux is either during the installation or through the mdadm utility. This utility allows us to create and manage the arrays.
Let's look at some examples (we will assume we are working with some SCSI /dev/sda, /dev/sdb disks... in which we have various partitions available for implementing RAID):
Creation of a linear array:
# mdadm –create –verbose /dev/md0 –level=linear –raid-devices=2 /dev/sda1 /dev/sdb1
where we create a linear array based on the first partitions of /dev/sda and /dev/sdb, creating the new device /dev/md0, which can already be used as a new disk (supposing that the mount point /media/diskRAID exists):
# mkfs.ext2fs /dev/md0 # mount /dev/md0 /media/diskRAID
For a RAID 0 or RAID 1, we can simply change the level (-level) to raid0 or raid1. With mdadm –detail /dev/md0, we can check the parameters of the newly created array.
We can also consult the mdstat entry in /proc to determine the active arrays and their parameters. Especially in the cases with mirrors (for example, in levels 1, 5...) we can examine the initial backup reconstruction in the created file; in /proc/mdstat we will see the reconstruction level (and the approximate completion time).
The mdadm utility provides many options that allow us to examine and manage the different RAID software arrays created (we can see a description and examples in man mdadm).
Another important consideration are the optimisations that should be made to the RAID arrays so as to improve the performance, through both the monitoring of its behaviour to optimise the file system parameters, as well as to use the RAID levels and their characteristics more effectively.
There is a need to abstract from the physical disk system and its configuration and number of devices, so that the (operating) system can take care of this work and we do not have to worry about these parameters directly. In this sense, the logical volume management system can be seen as a layer of storage virtualisation that provides a simpler view, making it simpler and smoother to use.
In the Linux kernel, there is an LVM (logical volume manager), which is based on ideas developed from the storage volume managers used in HP-UX (HP's proprietary implementation of UNIX). There are currently two versions and LVM2 is the most widely used due to a series of added features.
The architecture of an LVM typically consists of the (main) components:
Physical volumes (PV): PVs are hard disks or partitions or any other element that appears as a hard disk in the system (for example, RAID software or hardware).
Logical volumes (LV): These are equivalent to a partition on the physical disk. The LV is visible in the system as a raw block device (completely equivalent to a physical partition) and it may contain a file system (such as the users' /home). Normally, the volumes make more sense for the administrators, as names can be used to identify them (for example, we can use a logical device, named stock or marketing instead of hda6 or sdc3).
Volume groups (VG): This is the element on the upper layer. The administrative unit that includes our resources, whether they are logical volumes (LV) or physical volumes (PV). The data on the available PVs and how the LVs are formed using the PVs are saved in this unit. Evidently, in order to use a Volume Group, we have to have physical PV supports, which are organised in different logical LV units.
For example, in the following figure, we can see volume groups where we have 7 PVs (in the form of disk partitions, which are grouped to form two logical volumes (which have been completed using /usr and /home to form the file systems):
By using logical volumes, we can treat the storage space available (which may have a large number of different disks and partitions) more flexibly, according to the needs that arise, and we can manage the space by the more appropriate identifiers and by operations that permit us to adapt the space to the needs that arise at any given moment.
Logical Volume Management allows us to:
Resize logical groups and volumes, using new PVs or extracting some of those initially available.
Snapshots of the file system (reading in LVM1, and reading and/or writing in LVM2). This makes it possible to create a new device that is a snapshot of the situation of an LV. Likewise, we can create the snapshot, mount it, try various operations or configure new software or other elements and, if these do not work as we were expecting, we can return the original volume to the state it was in before performing the tests.
RAID 0 of logical volumes.
RAID levels 1 or 5 are not implemented in LVM; if they are necessary (in other words, redundancy and fault tolerance are required), then either we use RAID software or RAID hardware drivers that will implement it and we place LVM as the upper layer.
We will provide a brief, typical example (in many cases, the distributor installer carries out a similar process if we set an LVM as the initial storage system). Basically, we must: 1) create physical volumes (PV). 2) create the logical group (VG) and 3) create the logical volume and finally use the following to create and mount a file system:
1) example: we have three partitions on different disks, we have created three PVs and started-up the contents:
# dd if=/dev/zero of=/dev/hda1 bs=1k count=1 # dd if=/dev/zero of=/dev/hda2 bs=1k count=1 # dd if=/dev/zero of=/dev/hdb1 bs=1k count=1 # pvcreate /dev/hda1 Physical volume "/dev/sda1" successfully created # pvcreate /dev/hda2 Physical volume "/dev/hda2" successfully created # pvcreate /dev/hdb1 Physical volume "/dev/hdb1" successfully created
2) placement of a VG created from the different PVs:
# vgcreate group_disks /dev/hda1 /dev/hda2 /dev/hdb1 Volume group "group_disks" successfully created
3) we create the LV (in this case, with a size of 1 GB) based on the elements that we have in group VG group (-n indicates the name of the volume):
# lvcreate -L1G -n logical_volume group_disks lvcreate -- doing automatic backup of "group_disks" lvcreate -- logical volume "/dev/group_disks/ logical_volume" successfully created
And finally, we create a file system (a ReiserFS in this case):
# mkfs.reiserfs /dev/group_disks/logical_volume
Which we could, for example, place as backup space
# mkdir /mnt/backup # mount -t reiserfs /dev/group_disks/logical_volume /mnt/backup
Finally, we will have a device as a logical volume that implements a file system in our machine.