One of the main daily tasks of the (root) administrator will be to verify that the system works properly and check for any possible errors or saturation of the machine's resources (memory, disks etc.). In the following subsections, we will study the basic methods for examining the state of the system at a determined point in time and how to perform the operations required to avoid any subsequent problems.
In this module's final tutorial, we will carry out a full examination of a sample system, so that we may see some of these techniques.
When booting a GNU/Linux system, there is a large extraction of interesting information; when the system starts-up, the screen usually shows the data from the processes detecting the machine's characteristics, the devices, system services boots etc., and any problems that appear are mentioned.
In most distributions, this can be seen directly in the system's console during the booting process. However, either the speed of the messages or some of the modern distributions that hide the messages behind graphics can stop us from seeing the messages properly, which means that we need a series of tools for this process.
Basically, we can use:
dmesg command: shows the messages from the last kernel boot.
/var/log/messages file: general system log that contains the messages generated by the kernel and other daemons (there may be many different log files, normally in /var/log, and depending on the configuration of the syslog service).
uptime command: indicates how long the system has been active.
/proc system: pseudo file system (procfs) that uses the kernel to store the processes and system information.
/sys system: pseudo file system (sysfs) that appeared in the kernel 2.6.x branch to provide a more coherent method of accessing the information on the devices and their drivers.
When booting up, the kernel starts up a pseudo-file system called /proc, in which it dumps the information compiled on the machine, as well as many other internal data, during the execution. The /proc directory is implemented on memory and not saved to disk. The contained data are both static and dynamic (they vary during execution).
It should be remembered that, as /proc heavily depends on the kernel, the structure tends to depend on the system's kernel and the included structure and files can change.
One of the interesting points is that we can find the images of the processes that are being executed in the /proc directory, along with the information that the kernel handles on the processes. Each of the system's processes can be found in the /proc/<process_pid, directory, where there is a directory with files that represent its state. This information is basic for debugging programs or for the system's own commands such as ps or top, which can use it for seeing the state of the processes. In general, many of the system's utilities consult the system's dynamic information from /proc (especially some of the utilities provided in the procps package).
Example 5-3. Note
The /proc directory is an extraordinary resource for obtaining low-level information on the system's working and many system commands rely on it for their tasks.
On another note, we can find other files on the global state of the system in /proc. We will look at some of the files that we can examine to obtain important information briefly:
As of kernel version 2.6, a progressive transition of procfs (/proc) to sysfs (/sys) has begun, in order to migrate all the information that is not related to the processes, especially the devices and their drivers (modules of the kernel) to the /sys system.
The sys system is in charge of making the information on devices and drivers, which is in the kernel, available to the user space so that other APIs or applications can access the information on the devices (or their drivers) in a more flexible manner. It is usually used by layers such as HAL and the udev service for monitoring and dynamically configuring the devices.
Within the sys concept there is a tree data structure of the devices and drivers (let us say the fixed conceptual model) and how it can subsequently be accessed through the sysfs file system (the structure of which may change between different versions).
When an added object is detected or appears in the system, a directory is created in sysfs in the driver model tree (drivers, devices including their different classes). The parent/child node relationship is reflected with subdirectories under /sys/devices/ (reflecting the physical layer and its identifiers). Symbolic links are placed in the /sys/bus subdirectory reflecting the manner in which the devices belong to the different physical buses of the system. And the devices are shown in /sys/class, grouped according to their class, for example network, whereas /sys/block/ contains the block devices.
Some of the information provided by /sys can also be found in /proc, but it was decided that this method involved mixing different elements (devices, processes, data, hardware, kernel parameters) in a manner that was not very coherent and this was one of the reasons that /sys was created. It is expected that the information will migrate from /proc to /sys to centralise the device data.
The processes that are executing at a given moment will be of a different nature, generally. We may find:
System processes, whether they are processes associated to the machine's local workings, kernel, or processes (known as daemons) associated to the control of different services. On another note, they may be local or networked, depending on whether the service is being offered (we are acting as a server) or we are receiving the results of the service (we are acting as clients). Most of these processes will appear associated to the root user, even if we are not present at that moment as users. There may be some services associated to other system users (lp, bin, www, mail etc.), which are virtual non-interactive users that the system uses to execute certain processes.
Processes of the administering user: when acting as the root user, our interactive processes or the launched applications will also appear as processes associated to the root user.
System users processes: associated to the execution of their applications, whether they are interactive tasks in text mode or in graphic mode.
We can use the following as faster and more useful:
ps: the standard command, list of processes with the user data, time, process identifier and the command line used. One of the most commonly used options is ps -ef (or -ax), but there are many options available (see man).
top: one version that provides us with an updated list by intervals, dynamically monitoring the changes. And it allows us to order the list of processes sorted by different categories, such as memory usage, CPU usage, so as to obtain a ranking of the processes that are taking up all the resources. It is very useful for providing information on the possible source of the problem, in situations in which the system's resources are all being used up.
kill: this allows us to eliminate the system's processes by sending commands to the process such as kill -9 pid_of_process (9 corresponding to SIGKILL), where we set the process identifier. It is useful for processes with unstable behaviour or interactive programs that have stopped responding. We can see a list of the valid signals in the system with man 7 signal
Both the kernel and many of the service daemons, as well as the different GNU/Linux applications or subsystems, can generate messages that are sent to log files, either to obtain the trace of the system's functioning or to detect errors or fault warnings or critical situations. These types of logs are essential in many cases for administrative tasks and much of the administrator's time is spent processing and analysing their contents.
Most of the logs are created in the /var/log directory, although some applications may modify this behaviour; most of the logs of the system itself are located in this directory.
A particular daemon of the system (important) is daemon Syslogd. This daemon is in charge of receiving the messages sent by the kernel and other service daemons and sends them to a log file that is located in /var/log/messages. This is the default file, but Syslogd is also configurable (in the /etc/syslog.conf file), so as to make it possible to create other files depending on the source, according to the daemon that sends the message, thereby sending it to the log or to another location (classified by source), and/or classify the messages by importance (priority level): alarm, warning, error, critical etc.
Example 5-4. Note
The Syslogd daemon is the most important service for obtaining dynamic information on the machine. The process of analysing the logs helps us to understand how they work, the potential errors and the performance of the system.
Depending on the distribution, it can be configured in different modes by default; in /var/log in Debian it is possible to create (for example) files such as: kern.log, mail.err, mail.info... which are the logs of different services. We can examine the configuration to determine where the messages come from and in which files they are saved. An option that is usually useful is the possibility of sending the messages to a virtual text console (in /etc/syslog.conf the destination console, such as /dev/tty8 or /dev/xconsole, is specified for the type or types of message), so that we can see the messages as they appear. This is usually useful for monitoring the execution of the system without having to constantly check the log files at each time. One simple modification to this method could be to enter, from a terminal, the following instruction (for the general log):
tail -f /var/log/messages
This sentence allows us to leave the terminal or terminal window so that the changes that occur in the file will progressively appear.
Other related commands:
uptime: time that the system has been active. Useful for checking that no unexpected system reboot has occurred.
last: analyses the in/out log of the system (/var/log/wtmp) of the users, and the system reboots. Or last log control of the last time that the users were seen in the system (information in /var/log/lastlog).
Various utilities for combined processing of logs, that issue summaries (or alarms) of what has happened in the system, such as: logwatch, logcheck (Debian), log_analysis (Debian)...
Where the system's memory is concerned, we must remember that we have: a) the physical memory of the machine itself, b) virtual memory that can by addressed by the processes. Normally (unless we are dealing with corporate servers), we will not have very large amounts, so the physical memory will be less than the necessary virtual memory (4GB in 32bit systems). This will force us to use a swap zone on the disk, to implement the processes associated to the virtual memory.
This swap zone may be implemented as a file in the file system, but it is more usual to find it as a swap partition, created during the installation of the system. When partitioning the disk, it is declared as a Linux Swap type.
To examine the information on the memory, we have various useful commands and methods:
/etc/fstab file: the swap partition appears (if it exists). With an fdisk command, we can find out its size (or check /proc/swaps).
ps command: allows us to establish the processes that we have, with the options on the percentage and memory used.
top command: is a dynamic ps version that is updatable by periods of time. It can classify the processes according to the memory that they use or CPU time.
free command: reports on the global state of the memory. Also provides the size of the virtual memory.
vmstat command: reports on the state of the virtual memory and the use to which it is assigned.
Some packages, like dstat, allow us to collate data on the different parameters (memory, swap and others) by intervals of time (similar to top).
We will examine which disks are available, how they are organised and which partitions and file systems we have.
When we have a partition and we have a determined accessible file system, we will have to perform a mounting process, so as to integrate it in the system, whether explicitly or as programmed at startup/boot. During the mounting process, we connect the file system associated to the partition to a point in the directory tree.
In order to find out about the disks (or storage devices) present in the system, we can use the system boot information (dmesg), when those available are detected, such as the /dev/hdx for IDE devices or /dev/sdx for SCSI devices. Other devices, such as hard disks connected by USB, flash disks (pen drive types), removable units, external CD-ROMs etc., may be devices with some form of SCSI emulation, so they will appear as devices as this type.
Any storage device will present a series of space partitions. Typically, an IDE disk supports a maximum of four physical partitions or more if they are logical (they permit the placement of various partitions of this type on one physical partition). Each partition may contain different file system types, whether they are of one same operative or different operatives.
To examine the structure of a known device or to change its structure by partitioning the disk, we can use the fdisk command or any of its more or less interactive variants (cfdisk, sfdisk). For example, when examining a sample disk ide /dev/hda, we are given the following information:
20 GB disk with three partitions (they are identified by the number added to the device name), where we observe two NTFS and Linux-type boot partitions (Boot column with *), which indicates the existence of a Windows NT/2000/XP/Vista along with a GNU/Linux distribution and a last partition that is used as a swap for Linux. In addition, we have information on the structure of the disk and the sizes of each partition.
Some of the disks and partitions that we have, some will be mounted in our file system, or will be ready for set up upon demand, or they may be set up when the resource becomes available (in the case of removable devices).
We can obtain this information in different ways (we will see this in more detail in the final workshop):
The /etc/fstab file indicates the devices that are ready to be mounted on booting or the removable devices that may be mounted. Not all of the system devices will appear necessarily; only the ones that we want to appear when booting. We can mount the others upon demand using the mount command or remove them with umount.
mount command. This informs us of the file systems mounted at that moment (whether they are real devices or virtual file systems such as /proc). We may also obtain this information from the /etc/mtab file.
df -k command. This informs us of the storage file systems and allows us to verify the used space and available space. It's a basic command for controlling the available disk space.
With regard to this last df -k command, one of our basic tasks as an administrator of the machine is to control the machine's resources and, in this case, the space available in the file systems used. These sizes have to be monitored fairly frequently to avoid a system crash; a file system must never be left at less than 10 or 15% (especially if it is the /), as there are many process daemons that are normally writing temporary information or logs, that may generate a large amount of information; a particular case is that of the core files that we have already mentioned and which can involve very large files (depending on the process). Normally, some precautions should be taken with regard to system hygiene if any situations of file system saturation are detected:
Eliminate old temporary files. The /tmp and /var/tmp directories tend to accumulate many files created by different users or applications. Some systems or distributions take automatic hygiene measures, such as clearing /tmp every time the system boots up.
Logs: avoiding excessive growth, according to the system configuration (for example, Syslogd), as the information generated by the messages can be very large. Normally, the system will have to be cleared regularly, when certain amounts of space are taken up and, in any case, if we need the information for subsequent analyses, backups can be made in removable devices. This process can be automated using cron scripts or using specialised tools such as logrotate.
There are other parts of the system that tend to grow a lot, such as: a) user core files: we can delete these periodically or eliminate their generation; b) the email system: stores all of the emails sent and received; we can ask the users to clean them out regularly or implement a quota system; c) the caches of the browsers or other applications: other elements that usually occupy a lot of space, which require regular clearing, are: d) the accounts of the users themselves: they may have quotas so that pre-established allocated spaces are not exceeded etc.