10.2. Basic aspects

Before learning about the optimisation techniques, it is necessary to list the causes that might affect the performance of an operating system [Maj96]. Among these, we might mention:

a) Bottlenecks in the resources: the consequence is that the whole system will be slower because there are resources that cannot satisfy the demand to which they are being subjected. The first step for optimising the system is to find these bottlenecks and their causes, whilst learning about their theoretical and practical limitations.

b) Amdahl's law; according to this law, "there is a limit to how much an overall system can be improved (or speeded-up) when only one part of the system is improved"; in other words, if we have a program that uses 10% of the CPU and it is optimised to reduce the use by a factor of 2, the program will improve its performance (speedup) by 5%, which means that a tremendous amount of effort is put into something that is not compensated by the ensuing results.

c) Estimates of the speedup: it is necessary to estimate how much the system will improve so as to avoid any unnecessary efforts and costs. We can use the previously described law to evaluate whether it is necessary to invest time or money in the system.

d) Bubble effect: it is very common to have the feeling that, once we have solved a problem, another one always appears. A manifestation of this problem is that the system is constantly moving between CPU problems and in/out problems, and vice versa.

e) Response time in respect of workload: if we have twenty users, improving the productivity will mean that all will get more work done at the same time, but the individual response times will not improve; it may be that the response times for some will be better than for others. Improving the response times means optimising the system so that the individual tasks take as little time as possible.

f) User psychology: two parameters are fundamental: 1) the user will be generally unsatisfied when there are variations in the response time; and 2) the user will not notice any improvements in execution times of less than 20%.

g) Test effect: the monitoring measures affect the measures themselves. We should proceed carefully when we are performing tests because of the collateral effects of the actual testing programs.

h) Importance of the average and variation: the results should be taken into account, given that, if we obtain an average of CPU usage of 50% when only 100, 0, 0, 100 has been used, we could come to the wrong conclusions. It is important to see the variation on the average.

Example 10-1. Note

When optimising, the saturation of resources must be considered. Amdahl's law lists the knowledge of the software and hardware available, the response times and the number of jobs.

i) Basic knowledge on the hardware of the system that will be optimised: to improve something we need to "know" whether it can be improved. The person in charge of optimisation must have a lot of basic knowledge about the underlying hardware (CPU, memory, buses, cache, in/out, disks, video...) and the interconnections in order to determine where the problems lie.

j) Basic knowledge of the operating system that is to be optimised: as with the preceding point, the user must know the minimum aspects of the operating system that they intend to optimise, which would include concepts such as processes and threads (creation, execution, states, priorities, termination), system calls, cache buffers, file system, administration of memory and virtual memory (paging, swap) and tables of the kernel.

10.2.1. Monitoring on a UNIX System V

/proc will appear as a directory but in reality, it is a fictitious file system, in other words, it does not exist on the disk and the kernel creates it in the memory. This is used to provide information on the system (originally on the processes, hence the name), which will later be used by the commands that we will now examine. We will now look at some interesting files (check the relevant page on the manual for more information):

/proc/1: a directory with the information on process 1 (the number of directories is the PID of the process).

/proc/cpuinfo: information on the CPU (type, brand, model, performance...).

/proc/devices: list of devices configured in the kernel.

/proc/dma: DMA channels used at this point in time.

/proc/filesystems: file systems configured in the kernel.

/proc/interrupts: shows which interruptions are in use and how many of them have been processed.

/proc/ioports: the same applies to the ports.

/proc/kcore: image of the physical memory of the system.

/proc/kmsg: messages generated by the kernel which are then sent to syslog.

/proc/ksyms: table of kernel symbols.

/proc/loadavg: system load.

/proc/meminfo: information on memory use.

/proc/modules: modules loaded by the kernel.

/proc/net: information on the network protocols.

/proc/stat: statistics on the system.

/proc/uptime: from when the system is working.

/proc/version: version of the kernel.

It should be remembered that these files are visible (text) but sometimes the data are in a "raw" state and commands are necessary to interpret them. These commands will be the ones that we will now examine.

The compatible UNIX SV systems use the sar and sadc commands to obtain system statistics (in FC included inside the sysstat package that also includes iostat or mpstat). The equivalent in GNU/Linux Debian is atsar (and atsadc), which is the absolute equivalent to the one we have mentioned. The atsar command reads counters and statistics on the /proc file and shows them at the standard output. The first way of calling the command is:

atsar options t [n]n

where the activity is shown in n times every t seconds with a header showing the activity counters (the default value of n is 1). The second way of calling it is:

atsar -options -s time -e time -i sec -f file -n day#

The command extracts data from the file specified by -f (by default /var/log/atsar/atsarxx, with xx being the day of the month) and that were previously saved by atsadc (this is used to collect data, save them and process them and, in Debian, it is in /usr/lib/atsar). The parameter -n can be used to indicate the day of the month and -s, -e the time of final-boot, respectively. To activate atsadc, for example, we could include a line such as the following in /etc/cron.d/atsar:

@reboot root test -x /usr/lib/atsadc && /usr/lib/atsar/atsadc /var/log/atsar/atsa'date +\%d'

10,20,30,40,50 * * * * root test -x /usr/lib/atsar/atsa1 && /usr/lib/atsar/atsa1

The 1st creates the file after a reboot. The 2nd saves the data every 10 minutes with the shell script atsa1, which calls atsadc.

In atsar (or sar), the options are used to indicate which counters have to be shown; some examples include:

Example 10-2. Note

Monitoring with atsar

  • CPU: atsar -u

  • Memory: atsar -r

  • Disk: atsar -d

  • Paging: atsar -p

Between atsar and sar there are only a few differences in terms of how the data are shown and sar includes a few additional (or different) options. We will now see some examples of how to use sar (exactly the same as with atsar, the only differences are in the way in which the data are displayed) and the meaning of the information that generates:

1) CPU use

sar -u 4 5

Linux 2.6.19-prep (localhost.localdomain) 24/03/07

In this case idle=100, which means that the CPU is idle, which means that there are no processes to execute and the workload is low; if idle=10 and there are a high number of processes, the optimisation of the CPU should be considered, as it could be a bottleneck in the system.

2) Number of interruptions per second

sar -I 4 5

Linux 2.6.19-prep (localhost.localdomain) 24/03/07 08:24:01 INTR intr/s 08:24:06 4 0.00 Media: 4 0.00

Shows the information on the frequency of interruptions of the active levels located in /proc/interrupts. This is useful to see if there is any device that is constantly interrupting the CPU's work.

3) Memory and swap

sar -r 4 5

Linux 2.6.19-prep (localhost.localdomain) 24/03/07

In this case, kbmemfree is the main free memory (MP); used is the used one, buffers is the MP used in buffers; cached is the main memory used in the pages cache; swpfree/used the free/occupied swap space. It is important to remember that if there is no space in MP, the process pages will end up in the swap, where there should be space. This should be compared with CPU use. We can also check that the size of the buffers is appropriate and is so in relation to the processes that are performing I/O operations.

It is also interesting to examine the free command (fc), as it allows us to see the amount of memory in a simplified representation:

total used free shared buffers cached Mem: 1026216 729716 296500 0 24324 459980 -/+ buffers/cache: 245412 780804 Swap: 963860 0 963860

This indicates that almost of the 1 Gb memory is occupied and that almost is cache. Plus, it tells us that the swap is not being used for anything, which means that we can conclude that the system is well. If we wish to see more details, we must use the vmstat command (or sar -r) to analyse what is causing the problems or who is consuming that much memory. The following is an output from vmstat 1 10:

procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------
 r b swpd free buff cache si so bi  bo  in    cs    us  sy  id  wa    st
 0 0 0 295896 24384 459984 0 0 321  56  1249  724   11  2   81  5     0
 0 0 0 295896 24384 459984 0 0 0    28  1179  383   1   0   99  0     0
 0 0 0 295896 24384 460012 0 0 0    0   1260  498   0   0   100 0     0
 0 0 0 295896 24384 460012 0 0 0    0   1175  342   0   0   100 0
 0 0 0 295896 24384 460012 0 0 0    0   1275  526   0   0   100 0     0
 1 0 0 295896 24392 460004 0 0 0    72  1176  356   0   0   99  1     0
 0 0 0 295896 24392 460012 0 0 0    0   1218  420   0   0   100 0     0
 0 0 0 295896 24392 460012 0 0 0    0   1216  436   0   0   100 0     0
 0 0 0 295896 24392 460012 0 0 0    0   1174  361   0   0   100 0     0
 1 0 0 295896 24392 460012 0 0 0    0   1260  492   0   0   100 0     0

4) Use of the tables of the kernel

sar -v 4 5

In this case, superb-sz is the current maximum number of superblocks maintained by the kernel, for the mounted file systems; inode-sz, the current maximum number of incore-inodes in the kernel necessary, which would be one per disk, at the very least; file-sz current maximum number of open files, dquota-sz current maximum occupation of quota inputs (for the remaining options, please see sar (or atsar) man). This monitoring process can be completed with the ps -edaflm (process status) command and the top command, which will show the activity and the status of the processes in the system. The following are two examples of both commands (only some of the lines):

ps –edaflm

..

Example 10-3. Note

Check the ps command man or the top man for a description of the parameters and the characteristics

Where the parameters reflect the value indicated in the variable of the kernel for this process; the most important ones from the monitoring perspective are: F flags (in this case 1 is with super privileges, 4 created from the start daemon), S is the status (D: uninterruptible sleep in/out, R: runnable, or run queue, S: Sleeping, T: traced or stopped, Z: a defunct process ('zombie'). PRI is the priority; NI is nice; STIME, the execution start time; TTY, from where it has executed; TIME, the CPU time; CMD, the program that has run and its parameters. If we want to come out and refresh the page (configurable), we can use the top command, which shows the general statistics (processes, statuses, load etc.) and then obtain information on each point, similar to the ps, but updated every 5 seconds by default:

top - 08:26:52 up 25 min, 2 users, load average: 0.21, 0.25, 0.33 Tasks: 124 total, 1 running, 123 sleeping, 0 stopped, 0 zombie Cpu(s): 10.8%us, 2.1%sy, 0.0%ni, 82.0%id, 4.9%wa, 0.1%hi, 0.1%si, 0.0%st Mem: 1026216k total, 731056k used, 295160k free, 24464k buffers Swap: 963860k total, 0k used, 963860k free, 460208k cached

Debian Linux also includes a whole set of monitoring tools equivalent to sar, but which originated in UNIX BSD and have a similar functionality, although from different commands. vmstat (CPU statistics, memory and in/out), iostat (disks and CPU statistics), uptime (CPU load and general status).

10.2.2. Optimising the system

We will now look at some recommendations for optimising the system in accordance with the data obtained.

1) Resolving the problems with the main memory

We must ensure that the main memory can handle a high percentage of executing processes, as, otherwise, the operating system may page and go to the swap; but this means that the execution of that process will deteriorate significantly. If we add more memory, the response time will improve significantly. For this, we must take into account the size of the processes (SIZE) is the R status and add that which is used by the kernel, which can be obtained with the dmesg command, which will show us (or with free), for example:

Memory:

255048k/262080k available (1423k kernel core, 6644k reserved, 466k data, 240k init, Ok highmem

We must then compare this against the physical memory and analyse whether the system is limited by the memory (a lot of paging activity can be seen with atsar -r and -p).

The solutions for the memory problems are obvious: either we increase the capacity or reduce the demands. Given the current price of memory, it is better to increase its size than to spend lots of hours trying to free up just a few hundred bytes, by deleting, removing, organising or reducing the requirements of the executing processes. The requirements can be reduced by reducing the kernel tables, deleting modules, limiting the maximum number of users, reducing the buffers etc., all of which will downgrade the system (bubble effect) and the performance will be worse (in some cases, the system could be rendered completely inoperative).

Example 10-4. Note

Where should we look?

1st Memory

2n CPU

3rd In/Out

4th TCP/IP

5th Kernel

Another aspect that can be reduced is the amount of memory for the users, eliminating any redundant processes and changing the workload. In order to do this, we must monitor the defunct processes (zombie processes) and eliminate them, or those that do not progress in the I/O (knowing whether they are active processes, how much CPU they are using up and whether the 'users want them'). Changing the workload is using the queue planning so that the processes that need a large amount of memory can run when there is little activity (for example, at night, using the at command to launch them).

2) Too much CPU usage

Basically, we can get this from the idle time (low values). With ps or top, we must analyse which processes are the ones that 'devour CPU' and make decisions such as: postponing their execution, stopping them temporarily, changing the priority (less conflictive of all, the priority restart command PID), optimise the program (for the next time) or change the CPU (add another one). As we have mentioned, GNU/Linux uses the /proc directory to keep all the kernel configuration variables, which can be analysed and, in certain cases, 'adjusted' to achieve different or better performance levels.

To do this, we must use the systune dump > /tmp/sysfile command to obtain all the variables and their values in the /tmp/sysfile file (in other distributions, this can be done with sysctl). This file can be edited, changing the corresponding variable and then using the systune -c /tmp/sysfile command to reload them in /proc. The systune command also reads by default if we do not have the -c option in /etc/systune.conf. In this case, for example, we could modify (proceed carefully, because the kernel could be left inoperative) the variables of the category /proc/sys/vm (virtual memory) or /proc/sys/kernel (configuration of the core of the kernel).

In this same sense, it is also possible (for experts or people with nothing to lose) to change the maximum slice time, which the CPU scheduler of the operating system dedicates to each process in a circular manner (it is advisable to use renice as practice). But GNU/Linux, unlike other operating systems, is a fixed value within the code, as it is optimised for different functions (yet it is possible to modify it). We can "play" (at our own risk) with a set of variables that make it possible to touch the time slice assigned to CPU (kernel-source-2.x.x/kernel/sched.c).

3) Reducing the number of calls

Another practical way of improving the performance is reducing the number of calls to the system, which cost the most CPU time. These calls are the ones usually invoked by the shell fork() and exec(). An inadequate configuration of the PATH variable and due to the fact that the exec() call does not save anything in the cache, the current directory (indicated with a ./), could have a negative execution relationship. Consequently, we must always configure the PATH variable with the current directory as the last route. For example, in bash (or in .bashrc) we must: export PATH = $PATH. If this is not the case, the current directory is not there, or if it is, redo the PATH variable to declare it as the last route.

It should be remembered that a lot of interruption activity can affect CPU performance with regard to the processes that are being executed. By monitoring (atsar -I), we can see what the relationship of interruptions per second is and make decisions with regard to the devices that are causing them. For example, change the modem for a smarter one or change the communications structure if we detect excessive activity on the serial port to which it is connected.

4) Too much disk use

After the memory, a low response time could be due to the disks system. Firstly, we must verify that there is CPU time (for example, idle > 20%) available and that the in/out number is high (for example, > 30 in/out/s) using atsar -u and atsar -d. The solutions might be:

a) In a multi-disk system, planning where the most commonly used files are located to balance the traffic to them (for example, /home in a disk and /usr on another) and ensuring that they can use all the in/out capacities with the cache and concurrently of GNU/Linux (including, for example, planning the ide bus on which they will be). Then check that there is balance in the traffic using atsar -d (or iostat). In critical situations, we can consider purchasing a RAID disk system, which would make these adjustments automatically.

b) Bear in mind that better performance levels are achieved using two small disks instead of one large disk, equal to the combined size of the first two.

c) In systems with only one disk, for generally reducing space, four partitions are made in the following manner (from outside to inside): /, swap, /usr, /home; but this generates terrible in/out response times because if, for example, a user compiles from their directory /home/user and the compiler is in /usr/bin, the disk head will move along the whole length. In this case, it is better to join the partitions /usr and /home in one single one (larger) although this could present some inconveniences in terms of maintenance.

d) Increase the buffers of the cache of the in/out (see, for example: /proc/ide/hd...).

e) If we use an ext2fs, we can use the command: dumpe2fs -h /dev/hd... to obtain information on the disk and tune2fs /dev/hd... to change some of the configurable parameters of the disk.

f) Obviously, changing the disk for a higher-speed disk (RPM) will always have a positive effect on a system limited by the disk's in/out. [Maj96]

5) Improving TCP/IP aspects.

k) Examine the network with the atsar command (or also with netstat -i or with netstat -s | more) to analyse whether there are any fragmented packets, errors, drops, overflows etc., that may be affecting the communications and, consequently, the system (for example, in an NFS, NIS, FTP or Web server). If any problems are detected, we can analyse the network to consider any of the following actions:

a) Fragmenting the network through active elements that discard packets with problems or those that are not for machines in the segment.

b) Planning where the servers will be to reduce the traffic to them and the access times.

c) Adjust parameters of the kernel (/proc/sys/net/), for example, to obtain improvements in the throughput, type:

echo 600 > /proc/sys/net/core/netdev max backlog (300 by default).

6) Other actions on the parameters of the kernel.

There is another set of parameters on the kernel that can be tuned to obtain better performance levels, although, considering the points we have discussed, this should be performed with care, given that we could cause the opposite effect or disable the system. Consult the distribution for the source code in kernel- source-2.4.18/Documentation/sysctl the vm.txt, fs.txt, kernel.txt and sunrpc.txt files:

a) /proc/sys/vm: controls the virtual memory (MV) of the system. The virtual memory makes it possible for the processes that do not access the main memory to be accepted by the system but in the swap device, for which, the programmer has no limit with regard to the size of their program (obviously, it must be less than the swap device). The parameters that may be tuned can be changed very easily with gpowertweak.

b) /proc/sys/fs: the kernel-FS interaction parameters can be adjusted, such as file-max.

c) And also over /proc/sys/kernel, /proc/sys/sunrpc.

7) Generating the kernel appropriate to our needs.

The optimisation of the kernel means choosing the compilation parameters in accordance with our needs. It is very important to first read the readme file in /usr/src/linux. A good configuration of the kernel will make it possible for it to run faster, providing more memory for the user processes and making the overall system more stable. There are two ways of building a kernel: monolithic (better performance levels) or modular (based on modules, there will be more portability if we have a very heterogeneous system and we do not wish to compile a kernel for each one of them). To compile your own kernel and adapt it to your hardware and needs, each distribution has its own rules (although the procedure is similar).

8) The following articles are very interesting:

http://people.redhat.com/alikins/system_tuning.html for information on optimising and tuning Linux server systems.

http://www.linuxjournal.com/article.php?sid=2396 Performance Monitoring Tools for Linux; although this is an old article and some options are not available, the methodology still stands.

10.2.3. General optimisations

There is a series of general optimisations that can improve the system's performance:

1) Static or dynamic libraries: when a program is compiled, this can be done with a static library (libr.a), whose functioning code is included in the executable or with a dynamic library (libr.so.xx.x), where the library is loaded at the time of execution. Although the first guarantees a portable and secure code, it consumes more memory. The programmer must decide which option is appropriate for their program including -static in the compiler options (not adding this means dynamic) or o --disable-shared, when the configure command is used. It is advisable to use (almost all new distributions do this) the standard library libc.a and libc.so of versions 2.2.x or higher (known as Libc6) which replaces the older ones.

2) Selecting the appropriate processor: generating executable code for the architecture on which the applications will be running. Some of the most influential parameters of the compiler are: -march (for example, marchi 686 or -march k6) by simply typing gcc -marchi 686, the optimisation attributed -O1,2,3 (-O3 will generate the fastest version of the program, gcc -O3 -march = i686) and the attributes -f (consult the documentation for the different types).

3) Disk optimisation: currently, most computers include the UltraDMA (100) disk by default; however, in many cases, these are not optimised to provide the best performance levels. There is a tool (hdparm) that can be used to tune the kernel to the parameters of the IDE-type disk. We have to be careful with this tool, especially in UltraDMA disks (check the BIOS to ensure that the parameters for supporting DMA are enabled), as they can disable the disk. Check the references and the documentation ([Mou01] and man hdparm) to see which optimisations are the most important (and the risks involved), for example: -c3, -d1, -X34, -X66, -X12, -X68, -mXX, -a16, -u1, -W1, -k1, -K1. Each option means one form of optimisation and some are very high-risk, which means that we must know the disk very well. To consult the optimised parameters, we could use hdparm -vtT /dev/hdX (where X is the optimised disk) and the call to hdparm with all the parameters can be used in /etc/init.d to load it in the boot.

10.2.4. Additional configurations

There are more complementary configurations from the perspective of the security provided by optimisation, but they are mostly necessary when the system is connected to an Intranet or to the Internet. These configurations require the following actions [Mou01]:

a) Disabled the boot-up or other operating system: if someone has physical access to the machine, they could start up another preconfigured operating system and modify the existing one, which means that we should access the computer's BIOS settings to disable the boot using floppies or CD-ROMs and set up a password (remember the BIOS password, or you might have problems if you wish to change the configuration).

b) Configuration and network: it is advisable to disconnect from the network whenever we are adjusting the system. You can remove the cable or disable the device with /etc/init.d/networking stop (start to reactivate it) or with ifdown eth0 (use ifup eth0 to enable it) for any specific device.

c) Modify the /etc/security files in accordance with the system's usage and security needs. For example, in access.conf on who can log in to the system.

We should also configure the group to control what and how and also the maximum limits (limits.conf) for establishing the maximum times of usage of CPU, I/O etc. to avoid, for example, DoS attacks.

d) Maintain the security of the passwords of the root user: use at least 6 characters, with at least one character in capitals or some other symbol '.-_,'; this is not trivial; likewise, it is advisable to activate the password expiry option to force yourself to change it regularly, as well as limiting the amount of times one can enter an incorrect password. Likewise, we will have to change the parameter min x in the entry in /etc/pam.d/passwd to indicate the minimum number of characters used in the passwords (x is the number of characters).

e) Do not log in the system as the root user: create an account such as sysadm and work with it. If you access it remotely, you will always have to use shh to connect to sysadm and, if necessary, carry out a su - to work as the root.

f) Set the maximum inactivity time: startup the TMOUT variable, at 360 for example (value expressed in seconds), which will be the maximum inactivity time that the shell will let pass before blocking; it is possible to put it in the configuration files of the shell (for example, /etc/profile, /.bashrc...). If we are using graphical environments (KDE, Gnome etc.), activate the option to exit the screensaver with password.

g) Configuration of the NFS in restricted mode: in /etc/exports export only what is necessary, without using wildcards, permitting only the read access and not permitting the write access by root, for example, with /directory_exported host.domain.com (ro, root_squash).

h) Avoid boot ups from lilo (or grub) with the parameters: the system may be booted as Linux single, which will start up the operating system in single user mode. Configure the system so that the password is always required when booting up in this mode. In order to do this, in the /etc/inittab file, verify that the following line exists: S:wait:/sbin/sulogin and that /bin/sulogin is enabled. In addition, the /etc/ lilo.conf file must have all the adequate permissions so that no one can modify it except the root user (chmod 600 /etc/lilo.conf). To avoid any accidental changes, change the blocking attributed with chattr +i /etc/lilo.conf (use -i when you wish to change). This file permits a series of options that should be considered: timeout or, if the system only has one operating system for booting immediately, restricted, to prevent others from being able to insert commands when booting such as linux init = /bin/sh, and have access as an unauthorised root user; in this case, the password must be used; if we only enter the password, we will be asked for the password for loading the image of the kernel. Grub has similar options.

i) Combination control Ctrl-Alt-Delete. To prevent others from being able to turn off the machine from the keyboard, insert a comment (#) in the first column of the following line:

ca:12345:ctrlaltdel:/sbin/shutdown -t1 -a -r now Activate the changes with telinit q.

j) Avoid services that are not offered: block the /etc/services file so as not to admit non-contemplated services by blocking the file with chattr +i /etc/services.

k) Connection of the root: modify the file /etc/securetty which contains the TTY and VC (virtual console) in which the root can connect, leaving only one of each, for example, tty1 and vc/1, and if it is necessary to connect as sysadm execute a su.

l) Eliminate user accounts that are not in use: delete the users/groups that are not necessary, including those that come by default (for example, operator, shutdown, ftp, uucp, games...), and leave only the necessary ones (root, bin, daemon, sync, nobody, sysadm) and the ones that were created with the installation of packages or using commands (the same with /etc/group). If the system is critical, we might consider blocking (chattr +i file) the /etc/passwd, /etc/shadow, /etc/group amd /etc/gsahdow files to avoid their modification (be careful with this operation, because you will not subsequently be able to change the password).

m) Mount the partitions in a restrictive manner: in /etc/fstab use attributes for the partitions such as nosuid (makes it impossible to replace the user or group on the partition), nodey (does not interpret devices of characters or blocks on that partition) and noexec (does not permit the execution of files on this partition). For example:

/tmp /tmp ext2 defaults,nosuid,noexec 0 0

It is also advisable to mount the /boot on a separate partition and with read-only attributes.

n) Various protections: change the protections of the files in /etc/init.d (system services) to 700 so that only the root may modify them, start them up or stop them, and modify the /etc/issue and /etc/issue.net files so that they do not provide any information (operating system, version...) when someone connects through telnet, ssh etc.

o) SUID and SGID: a user may execute a command as an owner if they have the SUID or SGID bit activated, which would be reflected in an 's' SUID (-rwsr-xr-x) and SGID (-r-xr-sr-x). Therefore, it is necessary to delete the bit (chmod a-s file) from the commands that do not need it. These files can be searched with:

find / -type f -perm -4000 or -perm -2000 –print

We must proceed carefully with regard to the files that the SUID- GUID removes because the command could be disabled.

p) Suspicious files: you should regularly check for files with unusual names, hidden files, or files without a valid uid/gid, such as '...' (three points), '.. ' (point point space), '..^G', for this, you will have to use:

find / -name ".*" -print | cat -v

or otherwise:

find / name ".." -print

To search non-valid uid/gids, use: find / -nouser or -nogroup (careful, because some installations are made with a user who is subsequently not identified and the administrator has to change).

q) Connection without password: do not allow the .rhosts file in any user unless it is strictly necessary (we recommend using ssh with a public password instead of methods based on .rhosts).

r) X Display manager: modify the file /etc/X11/xdm/Xaccess to specify the hosts that may connect through XDM and avoid any host having a login screen.

10.2.5. Monitoring

There are two very interesting tools for monitoring the system: Munin and Monit. Munin produces graphics on different parameters of the server (load average, memory usage, CPU usage, MySQL throughput, eth0 traffic etc.) without excessive configurations, whereas monit verifies the availability of services such as Apache, MySQL, Postfix, and implements different actions such as reactivating a service that is not present. The combination provides important graphics for recognising where problems are being generated and what is generating them.

Let's say that our system is called pirulo.org and we have our page configured as www.pirulo.org with the documents in /var/www/pirulo.org/web. To install Munin on Debian Sarge, we can execute, for example, apt-get install munin munin-node.

We must then configure munin (/etc/munin/munin.conf) with:

dbdir /var/lib/munin htmldir /var/www/www.pirulo.org/web/monitoring logdir /var/log/munin rundir /var/run/munin tmpldir /etc/munin/templates [pirulo.org] address 127.0.0.1 use_node_name yes

The directory is then created, the permissions are changed and the service is restarted.

mkdir -p /var/www/pirulo.org/web/monitoring chown munin:munin /var/www/pirulo.org/web/monitoring /etc/init.d/munin-node restart

After a few minutes we will be able to see the first results in http://www.pirulo.org/monitoring/ in the browser. For example, two graphs (load and memory) are shown below.

If you wish to maintain privacy in the graphs, all you have to do is set in a password to the access the directory with apache. For example, we can save the file .htaccess with the following contents in the directory /var/www/pirulo.org/web/monitoring:

AuthType Basic AuthName "Members Only" AuthUserFile /var/www/pirulo.org/.htpasswd <limit GET PUT POST> require valid-user </limit>

We must then create the password file in /var/www/pirulo.org/.htpasswd with the command (such as root):

htpasswd -c /var/www/pirulo.org/.htpasswd admin

When we connect to www.pirulo.org/monitoring, it will not ask for the username (admin) and the password that we have entered after the preceding command.

To install monit, we execute apt-get install monit and we edit /etc/monit/monitrc. The default file includes a set of example, but we can obtain more from http://www.tildeslash.com/monit/doc/examples.php. For example, if we want to monitor proftpd, sshd, mysql, apache and postfix, by enabling the web interface of monit on port 3333, on monitrc, we can type:

set daemon 60 set logfile syslog facility log_daemon set mailserver localhost set mail-format { from: monit@pirulo.org } set alert root@localhost set httpd port 3333 and allow admin:test

check process proftpd with pidfile /var/run/proftpd.pid start program = "/etc/init.d/proftpd start" stop program = "/etc/init.d/proftpd stop" if failed port 21 protocol ftp then restart if 5 restarts within 5 cycles then timeout

check process sshd with pidfile /var/run/sshd.pid start program "/etc/init.d/ssh start" stop program "/etc/init.d/ssh stop" if failed port 22 protocol ssh then restart if 5 restarts within 5 cycles then timeout

check process mysql with pidfile /var/run/mysqld/mysqld.pid group database start program = "/etc/init.d/mysql start" stop program = "/etc/init.d/mysql stop" if failed host 127.0.0.1 port 3306 then restart if 5 restarts within 5 cycles then timeout

check process apache with pidfile /var/run/apache2.pid group www start program = "/etc/init.d/apache2 start" stop program = "/etc/init.d/apache2 stop" if failed host www.pirulo.org port 80 protocol http and request "/monit/token" then restart if cpu is greater than 60% for 2 cycles then alert if cpu > 80% for 5 cycles then restart if totalmem > 500 MB for 5 cycles then restart if children > 250 then restart if loadavg(5min) greater than 10 for 8 cycles then stop if 3 restarts within 5 cycles then timeout

check process postfix with pidfile /var/spool/postfix/pid/master.pid group mail start program = "/etc/init.d/postfix start" stop program = "/etc/init.d/postfix stop" if failed port 25 protocol smtp then restart if 5 restarts within 5 cycles then timeout

Consult the manual for more details http://www.tildeslash.com/monit/doc/manual.php. To verify that the Apache server works with Monit, we have to put the configuration that accesses to if failed host www.pirulo.org port 80 protocol http and request "/monit/token" then restart. If we cannot access this, it means that Apache does not work, which means that this file must exist (mkdir /var/www/pirulo.org/web/monit; echo "pirulo" > /var/www/pirulo.org/web/monit/token). It is also possible to configure monit so that it works on SSL (see http://www.howtoforge.com/server_monitoring_monit_munin_p2).

Finally, we must modify /etc/default/monit to enable monit and change startup=1 and CHECK_INTERVALS=60 for example (in seconds). If we start up monit (/etc/init.d/monit start) and we connect to http://www.pirulo.org:3333, we will see a screen similar to:

There are more sophisticated tools for monitoring the network and network services using simple network management protocol (SNMP) and multi-router traffic grapher (MRTG), for example. More information on this subject can be found at http://www.linuxhomenetworking.com/wiki/index.php/Quick_HOWTO_:_Ch22_:_Monitoring_Server_Performance.

The MRTG (http://oss.oetiker.ch/mrtg/) was created basically to graph network data, but other data can be used to visualise its behaviour, for example, to generate load average statistics in the server. For this, we use the mrtg and atsar packages. Once installed, we will configure the /etc/mrtg.cfg file:

WorkDir: /var/www/mrtg Target[average]: '/usr/local/bin/cpu-load/average' MaxBytes[average]: 1000 Options[average]: gauge, nopercent, growright, integer YLegend[average]: Load average kMG[average]: ,, ShortLegend[average]: Legend1[average]: Load average x 100 LegendI[average]: load: LegendO[average]: Title[average]: Load average x 100 for pirulo.org PageTop[average]: <H1>Load average x 100 for pirulo.org</H1> <TABLE> <TR><TD>System:</TD> <TD>pirulo.org</TD></TR> <TR><TD>Maintainer:</TD> <TD>webmaster@pirulo.org</TD></TR> <TR><TD>Max used:</TD> <TD>1000</TD></TR> </TABLE>

To generate the data with atsar (or sar) we create a script in /usr/local/bin/cpu-load/average (which should have execution permissions for all) that will pass the data to mrtg:

#!/bin/sh load='/usr/bin/atsar -u 1 | tail -n 1 | awk -F" " '{print $10}'' echo "$load * 100" | bc | awk -F"." '{print $1}'

We must create and change the permissions in the directory /var/www/mrtg. By default, mrtg executes in the cron, but if we want to execute it, we can run mrtg /etc/mrtg.cfg and this will generate the graphs in /var/www/mrtg/average.html that we can visualise with the browser from http://www.pirulo.org/mrtg/averange.html.

Other interesting packages that should be taken into account when monitoring a system are:

We will now describe other tools which are no less interesting (in alphabetic order) that GNU/Linux incorporates (for example Debian) for monitoring the system. This is not an exhaustive list, but simply a selection of the most commonly used (we recommend seeing the man page of each tool for more information):

The following figure shows the interfaces of ksensors, gkrellm and xosview, which present the results from the monitoring process in real time.

Below are some graphic interfaces of isag and gtop. The isag interface obtains the information generated by systat in /etc/cron.d/, sysstat through the sa1 and sa2 commands in this case, which accumulates on the day; whilst gtop shows one of the possible displays with the process location, memory and additional CPU information.