14.6. Putting It All Together

To better understand what the driver model does, let us walk through the steps of a device's lifecycle within the kernel. We describe how the PCI subsystem interacts with the driver model, the basic concepts of how a driver is added and removed, and how a device is added and removed from the system. These details, while describing the PCI kernel code specifically, apply to all other subsystems that use the driver core to manage their drivers and devices.

The interaction between the PCI core, driver core, and the individual PCI drivers is quite complex, as Figure 14-2 shows.

Figure 14-3. Device-creation process

14.6.1. Add a Device

The PCI subsystem declares a single struct bus_type called pci_bus_type, which is initialized with the following values:

struct bus_type pci_bus_type = {
    .name      = "pci",
    .match     = pci_bus_match,
    .hotplug   = pci_hotplug,
    .suspend   = pci_device_suspend,
    .resume    = pci_device_resume,
    .dev_attrs = pci_dev_attrs,
};

This pci_bus_type variable is registered with the driver core when the PCI subsystem is loaded in the kernel with a call to bus_register. When that happens, the driver core creates a sysfs directory in /sys/bus/pci that consists of two directories: devices and drivers.

All PCI drivers must define a struct pci_driver variable that defines the different functions that this PCI driver can do (for more information about the PCI subsystem and how to write a PCI driver, see Chapter 12). That structure contains a struct device_driver that is then initialized by the PCI core when the PCI driver is registered:

/* initialize common driver fields */
drv->driver.name = drv->name;
drv->driver.bus = &pci_bus_type;
drv->driver.probe = pci_device_probe;
drv->driver.remove = pci_device_remove;
drv->driver.kobj.ktype = &pci_driver_kobj_type;

This code sets up the bus for the driver to point to the pci_bus_type and points the probe and remove functions to point to functions within the PCI core. The ktype for the driver's kobject is set to the variable pci_driver_kobj_type, in order for the PCI driver's attribute files to work properly. Then the PCI core registers the PCI driver with the driver core:

/* register with core */
error = driver_register(&drv->driver);

The driver is now ready to be bound to any PCI devices it supports.

The PCI core, with help from the architecture-specific code that actually talks to the PCI bus, starts probing the PCI address space, looking for all PCI devices. When a PCI device is found, the PCI core creates a new variable in memory of type struct pci_dev. A portion of the struct pci_dev structure looks like the following:

struct pci_dev {
    /* ... */
    unsigned int   devfn;
    unsigned short vendor;
    unsigned short device;
    unsigned short subsystem_vendor;
    unsigned short subsystem_device;
    unsigned int   class;
    /* ... */
    struct pci_driver *driver;
    /* ... */
    struct device dev;
    /* ... */
};

The bus-specific fields of this PCI device are initialized by the PCI core (the devfn, vendor, device, and other fields), and the struct device variable's parent variable is set to the PCI bus device that this PCI device lives on. The bus variable is set to point at the pci_bus_type structure. Then the name and bus_id variables are set, depending on the name and ID that is read from the PCI device.

After the PCI device structure is initialized, the device is registered with the driver core with a call to:

device_register(&dev->dev);

Within the device_register function, the driver core initializes a number of the device's fields, registers the device's kobject with the kobject core (which causes a hotplug event to be generated, but we discuss that later in this chapter), and then adds the device to the list of devices that are held by the device's parent. This is done so that all devices can be walked in the proper order, always knowing where in the hierarchy of devices each one lives.

The device is then added to the bus-specific list of all devices, in this example, the pci_bus_type list. Then the list of all drivers that are registered with the bus is walked, and the match function of the bus is called for every driver, specifying this device. For the pci_bus_type bus, the match function was set to point to the pci_bus_match function by the PCI core before the device was submitted to the driver core.

The pci_bus_match function casts the struct device that was passed to it by the driver core, back into a struct pci_dev. It also casts the struct device_driver back into a struct pci_driver and then looks at the PCI device-specific information of the device and driver to see if the driver states that it can support this kind of device. If the match is not successful, the function returns 0 back to the driver core, and the driver core moves on to the next driver in its list.

If the match is successful, the function returns 1 back to the driver core. This causes the driver core to set the driver pointer in the struct device to point to this driver, and then it calls the probe function that is specified in the struct device_driver.

Earlier, before the PCI driver was registered with the driver core, the probe variable was set to point at the pci_device_probe function. This function casts (yet again) the struct device back into a struct pci_dev and the struct driver that is set in the device back into a struct pci_driver. It again verifies that this driver states that it can support this device (which seems to be a redundant extra check for some unknown reason), increments the reference count of the device, and then calls the PCI driver's probe function with a pointer to the struct pci_dev structure it should bind to.

If the PCI driver's probe function determines that it can not handle this device for some reason, it returns a negative error value, which is propagated back to the driver core and causes it to continue looking through the list of drivers to match one up with this device. If the probe function can claim the device, it does all the initialization that it needs to do to handle the device properly, and then it returns 0 back up to the driver core. This causes the driver core to add the device to the list of all devices currently bound by this specific driver and creates a symlink within the driver's directory in sysfs to the device that it is now controlling. This symlink allows users to see exactly which devices are bound to which devices. This can be seen as:

$ tree /sys/bus/pci
/sys/bus/pci/
|-- devices
|   |-- 0000:00:00.0 -> ../../../devices/pci0000:00/0000:00:00.0
|   |-- 0000:00:00.1 -> ../../../devices/pci0000:00/0000:00:00.1
|   |-- 0000:00:00.2 -> ../../../devices/pci0000:00/0000:00:00.2
|   |-- 0000:00:02.0 -> ../../../devices/pci0000:00/0000:00:02.0
|   |-- 0000:00:04.0 -> ../../../devices/pci0000:00/0000:00:04.0
|   |-- 0000:00:06.0 -> ../../../devices/pci0000:00/0000:00:06.0
|   |-- 0000:00:07.0 -> ../../../devices/pci0000:00/0000:00:07.0
|   |-- 0000:00:09.0 -> ../../../devices/pci0000:00/0000:00:09.0
|   |-- 0000:00:09.1 -> ../../../devices/pci0000:00/0000:00:09.1
|   |-- 0000:00:09.2 -> ../../../devices/pci0000:00/0000:00:09.2
|   |-- 0000:00:0c.0 -> ../../../devices/pci0000:00/0000:00:0c.0
|   |-- 0000:00:0f.0 -> ../../../devices/pci0000:00/0000:00:0f.0
|   |-- 0000:00:10.0 -> ../../../devices/pci0000:00/0000:00:10.0
|   |-- 0000:00:12.0 -> ../../../devices/pci0000:00/0000:00:12.0
|   |-- 0000:00:13.0 -> ../../../devices/pci0000:00/0000:00:13.0
|   `-- 0000:00:14.0 -> ../../../devices/pci0000:00/0000:00:14.0
`-- drivers
    |-- ALI15x3_IDE
    |   `-- 0000:00:0f.0 -> ../../../../devices/pci0000:00/0000:00:0f.0
    |-- ehci_hcd
    |   `-- 0000:00:09.2 -> ../../../../devices/pci0000:00/0000:00:09.2
    |-- ohci_hcd
    |   |-- 0000:00:02.0 -> ../../../../devices/pci0000:00/0000:00:02.0
    |   |-- 0000:00:09.0 -> ../../../../devices/pci0000:00/0000:00:09.0
    |   `-- 0000:00:09.1 -> ../../../../devices/pci0000:00/0000:00:09.1
    |-- orinoco_pci
    |   `-- 0000:00:12.0 -> ../../../../devices/pci0000:00/0000:00:12.0
    |-- radeonfb
    |   `-- 0000:00:14.0 -> ../../../../devices/pci0000:00/0000:00:14.0
    |-- serial
    `-- trident
        `-- 0000:00:04.0 -> ../../../../devices/pci0000:00/0000:00:04

.0

14.6.2. Remove a Device

A PCI device can be removed from a system in a number of different ways. All CardBus devices are really PCI devices in a different physical form factor, and the kernel PCI core does not differenciate between them. Systems that allow the removal or addition of PCI devices while the machine is still running are becoming more popular, and Linux supports them. There is also a fake PCI Hotplug driver that allows developers to test to see if their PCI driver properly handles the removal of a device while the system is running. This module is called fakephp and causes the kernel to think the PCI device is gone, but it does not allow users to physically remove a PCI device from a system that does not have the proper hardware to do so. See the documentation with this driver for more information on how to use it to test your PCI drivers.

The PCI core exerts a lot less effort to remove a device than it does to add it. When a PCI device is to be removed, the pci_remove_bus_device function is called. This function does some PCI-specific cleanups and housekeeping, and then calls the device_unregister function with a pointer to the struct pci_dev's struct device member.

In the device_unregister function, the driver core merely unlinks the sysfs files from the driver bound to the device (if there was one), removes the device from its internal list of devices, and calls kobject_del with a pointer to the struct kobject that is contained in the struct device structure. That function makes a hotplug call to user space stating that the kobject is now removed from the system, and then it deletes all sysfs files associated with the kobject and the sysfs directory itself that the kobject had originally created.

The kobject_del function also removes the kobject reference of the device itself. If that reference was the last one (meaning no user-space files were open for the sysfs entry of the device), then the release function for the PCI device itself, pci_release_dev, is called. That function merely frees up the memory that the struct pci_dev took up.

After this, all sysfs entries associated with the device are removed, and the memory associated with the device is released. The PCI device is now totally removed from the system.

14.6.3. Add a Driver

A PCI driver is added to the PCI core when it calls the pci_register_driver function. This function merely initializes the struct device_driver structure that is contained within the struct pci_driver structure, as previously mentioned in the section about adding a device. Then the PCI core calls the driver_register function in the driver core with a pointer to the structdevice_driver structure contained in the struct pci_driver structure.

The driver_register function initializes a few locks in the struct device_driver structure, and then calls the bus_add_driver function. This function does the following steps:

Looks up the bus that the driver is to be associated with. If this bus is not found, the function instantly returns.
The driver's sysfs directory is created based on the name of the driver and the bus that it is associated with.
The bus's internal lock is grabbed, and then all devices that have been registered with the bus are walked, and the match function is called for them, just like when a new device is added. If that match function succeeds, then the rest of the binding process occurs, as described in the previous section.

14.6.4. Remove a Driver

Removing a driver is a very simple action. For a PCI driver, the driver calls the pci_unregister_driver function. This function merely calls the driver core function driver_unregister, with a pointer to the struct device_driver portion of the struct pci_driver structure passed to it.

The driver_unregister function handles some basic housekeeping by cleaning up some sysfs attributes that were attached to the driver's entry in the sysfs tree. It then iterates over all devices that were attached to this driver and calls the release function for it. This happens exactly like the previously mentioned release function for when a device is removed from the system.

After all devices are unbound from the driver, the driver code does this unique bit of logic:

down(&drv->unload_sem);
up(&drv->unload_sem);

This is done right before returning to the caller of the function. This lock is grabbed because the code needs to wait for all reference counts on this driver to be dropped to 0 before it is safe to return. This is needed because the driver_unregister function is most commonly called as the exit path of a module that is being unloaded. The module needs to remain in memory for as long as the driver is being referenced by devices and by waiting for this lock to be freed, this allows the kernel to know when it is safe to remove the driver from memory.

⇦ prev

⇱ home

next ⇨