devfs
[Top] [All Lists]

[draft] Hotswap and Linux

To: devfs@xxxxxxxxxxx
Subject: [draft] Hotswap and Linux
From: Johannes Erdfelt <jerdfelt@xxxxxxxxxxx>
Date: Wed, 21 Jun 2000 10:22:04 -0700
Sender: owner-devfs@xxxxxxxxxxx
This is a little white paper that I've been working on about Linux and
Hot-Swap and all of the associated problems.

It's a first draft and I'd appreciate any feedback.

---------------------------------------------------------------------

1. Introduction

I wrote this small white paper about hot-swap and Linux because in
dealing with the problem with my work for Linux USB, I saw a significant
amount of misunderstanding about the topic within the kernel development
community.

I've been developer working on USB support for Linux for the past year
when the first driver went into the main kernel tree. I have written
a significant portion of the code for the USB support in Linux. I'm
trying to make this white paper generic enough to explain and tackle the
hot-swap problems in Linux, but I'm most familiar with USB and thusly
much of the paper will revolve around my experience with USB in
particular.

Note: I will mention permissions extensively. When I say this I mean both
permissions (read, write, etc) and ownership (uid, gid) for the device
node.

2. Hot-Swap Interfaces

A hot-swap interface is any interface to a computer which you can plug
and unplug devices while the machine is running. Insertion and removal
does not necessarily need to happen while the machine is running, it can
happen while the machine is off or in a power saving mode.

There are a couple of commonly seen hot-swap interfaces on today's PCs.
Some are generally seen on desktops (USB, IEEE 1394) or notebooks (PCMCIA,
USB, IEEE 1394) while some may be seen on servers (Hot-Swap PCI, SCSI).
However, as is often the case with the computer industry, technologies
blur and USB is being seen on servers and Hot-Swap PCI and Hot-Swap SCSI
technologies will eventually find their way to desktops. PCMCIA, or more
specifically CardBus, is essentially a form of Hot-Swap PCI. However,
they are different enough to mention seperately.

Most of the interfaces are busses, allowing multiple devices to be
connected. However, things like parport could also be considered a
hot-swap interface.

2.1 PCMCIA

This was one of the first hot-swap interfaces that came to PC's. It is
a compact form factor system bus interface. It looks like an ISA or PCI
bus (greatly simplified description) to the system.

It is a part of the system bus.

2.2 SCSI

I'm not an expert on SCSI, but I know that SCSI has supported hot-swap
devices for a while in some situations.

It is a higher level bus than the system bus.

2.3 Hot-Swap PCI

Hot-Swap PCI is relatively recent which is mainly seen in servers. It
simply adds Hot-Swap capability to PCI (correct?)

PCMCIA (CardBus) and Hot-Swap PCI are very similar technologies.

It is a part of the system bus.

2.4 IEEE 1394

This is commonly called by it's trade names Firewire (Apple) or iLink (Sony).
It is very similar to USB but is not as widely adopted.

It is a higher level bus than the system bus.

2.5 USB

Universal Serial Bus was originally introduced a couple of years ago and
has recently seen widespread adoption (iMac, etc).

It is a higher level bus than the system bus.

3. Linux and Hot-Swap problems

Unix and more specifically Linux handle hot-swap devices very poorly. Much
of the architecture of the OS assumes static or semi-static hardware
configurations.

This has not been a problem with Linux until recently as hot-swap interfaces
are a relatively recent addition to the kernel.

3.1 Device Naming

Devices are named on a first come, first served basis. For instance, when
the OS probes for SCSI devices, the first hard drive is assigned major/minor
8/0. The second harddrive is assigned major/minor 8/16. Traditional device
nodes under Linux for these devices are /dev/sda and /dev/sdb respectively.

When a device can be inserted or removed randomly, as in the case of a
hot-swap interface, the probing order becomes random. Since Linux still
uses a first come first served naming scheme, the device nodes name can
be essentially random.

The name is not guaranteed to be the same each time the device is inserted.

Most applications are configured with static device names. Using hot-swap
devices with these applications is troublesome.

3.2 Device Permissions

Linux and most every other Unix, stores permissions for a device in the
filesystem. Those are associated with the name of the device (/dev/sda).

Since we cannot guarantee the same name each time a device is inserted (see
3.1), we cannot guarantee the device has the correct or same permissions.

Tracking devices should be done by tracking characteristics of the device
not the name of the device node corresponding to the device.

3.3 Enumeration and Driver Binding

Many of today's interfaces offer complicated device structure. Some allow
multiple logical functions in one physical device, along with multiple
interfaces of varying complexity.

Selecting the correct configuration parameters can be complicated. USB
has the notion of configurations, interfaces, alternate settings and
endpoints. Choosing which configuration or alternate setting to use, or
what drivers to bind to each interface is complicated and involves
user defined policy.

3.4 Logical Function to Physical Device Association

Given a logical function of a device (say a SCSI drive, /dev/sdb) I cannot
determine what physical device it is.

For instance, I plug in a USB floppy drive. It appears as SCSI device
/dev/sdb. I cannot accurately determine that /dev/sdb is associated with
the device I just plugged in.

3.5 Userspace API

Hot-Swap interfaces that are not system busses can often export a userspace
visible API which can move complex code out of the kernel core and into
userspace for a variety of reasons.

There are many complicated issues with this that are interface specific.

4. Existing Solutions

The problem of hot-swap has been tackled in the past at varying levels
of complexity and varying levels of effectiveness.

4.1 PCMCIA/Carbus

PCMCIA (and Carbus, however I will use PCMCIA to cover both technologies)
has been supported by Linux for a couple of releases and has been the 
mostly widely used hot-swap interface. It thusly has ran into many of
these problems for a while and a series of solutions have been developed
with a varying degree of effectiveness.

The core of their solution is the Card Manager.

The PCMCIA card manager is notified by the kernel when a device insertion
or removal occurs on any PCMCIA bus. It obtains from the kernel information
about the device. It then executes an arbitrary algorithm to select the
driver, loads a kernel module if necessary for the driver, and then binds
the device to the driver.

It can also execute programs (shell scripts being the most common) to
configure the device in a flexible manner.

4.1.1 PCMCIA and Device Naming

The shell scipts used to configure devices can be setup to create symlinks
to keep one name for a device.

This has the unfortunate side-effect of hardcoding names in shell scripts.
It does not dynamically handle multiple similar devices.

4.1.2 PCMCIA and Device Permissions

The same shell scripts that configure the device, can be used to create
and apply permissions to device nodes.

This has the same problem as Device Naming in that it requires hard
coded configuration in shell scripts.

4.1.3 PCMCIA and Enumeration and Driver Binding

As explained earlier, the Card Manager works with the kernel to enumerate
and bind drivers to devices.

4.1.4 PCMCIA and Logical Function to Physical Device Association

>From David Hinds:

"Each socket has a minor device that cardmgr, etc use for the ioctl
interface and to watch for card status events.  The DS_GET_DEVICE_INFO
ioctl gives a list of device descriptors associated with that socket.
So it is domain specific in that you know in advance what physical
device you're asking about."

4.1.5 PCMCIA and Userspace API

Since PCMCIA is a system bus, it uses the same architecture defined
access to devices. This includes IRQ's, DMA channels, I/O ports, memory
ranges, etc. The existing userspace API's for this are used.

The existing API is limited to I/O port and memory access.

4.2 scsidev

scsidev scans /proc/scsi and creates device nodes in /dev dynamically
depending on the devices on the SCSI busses.

4.3 Anymore?

I don't know of any others. Any help?

5. USB Related Problems

These are problems not necessarily related to the hot-swap nature of USB,
but are important to understand for making decisions about appropriate
solutions.

5.1 Complex Structure of USB Devices

This was touched on in section 3.3 (Enumeration and Driver Binding).

Each USB device can have multiple configurations. Only one configuration
can be active at once. Each configuration has multiple interfaces. Each
interface offers a logical function of the device. They are all active
at once. Each interface has an alternate setting. Alternate settings
select how much bandwidth a device uses, programming interface, etc.
Only one alternate setting can be active at a time per interface. Each
alternate setting has 1 or more endpoints.

To make things more complicated, the entire device has one control
endpoint, shared among every interface. This control interface is
required to configure generic device parameters as well as parameters
specific to each interface.

Since each interface is a logical function of the device, separate
permissions are required. Thusly, multiple device nodes are required
for one device.

Tracking permissions for the device then entails tracking the device
and the permissions for each interface.

A quick summary, one device node is not sufficient to describe the device
and the layout of the device can wildly change from device to device and
configuration to configuration.

5.2 Userspace API

This was touched on in section 3.5 (Userspace API).

PCMCIA and Hot-Swap PCI do not have userspace API's because they are
system busses and are complicated to support correctly and guarantee
security of the system.

USB, IEEE 1394 and SCSI are higher level busses and do not suffer from
the problems that system busses have with respect to userspace access.

USB has usbdevfs. There are two portions to this.

The first is the virtual filesystem portion which dynamically creates and
deletes device nodes (actually files, since no major/minor is needed). This
is needed because of the problem described in section 5.1.

The second portion is the actual file operations. This is a series of
open(), ioctl(), read(), write() and close() functions which provide
a variety of operations to userspace programs.

The existing userspace solution overloads one device node with ioctl's
for each transfer type and feature for all endpoints (and thusly
interfaces). The existing implemented solution will not be the final
solution since it has a variety of problems.

6. My Solution for USB

I've been working on this problem with respect to USB for the past couple
of months. I've had many an email discussion with many people about this.

The solution I've worked on does not solve problem 3.1 (Device naming) yet.
However, I attempted to design a solution which does not preclude solving
this in the future.

This solution is implemented using devfs and devfsd.

6.1 Required Features for the solution

Some people have had the misconception that this locks everyone into devfs
to use USB. This is not the case. It locks us into a certain feature set
that must be supported.

In the case right now, devfs is the only solution which offers all of the
infrastructure needed.

I cannot see how hot-swap can be implemented in a clean way without these
requirements.

6.1.1 Dynamic creation of device nodes

Once configurations and alternate settings are selected, the number of
interfaces, endpoints can radically change.

6.1.2 No fixed limit on device nodes

The structure and layout of device nodes is sparse and complicated
quickly outgrowing the entire major/minor space as currently implemented.

This is required to solve the problem described in section 4.2

6.1.3 VFS intercepts for common syscalls (chmod, chown, etc)

To properly track device permissions, we need to know when permissions
are changed for device nodes so the permissions can be saved in a
database along with other identifying information about the device.

This is required to solve the problem described in section 3.2

6.1.4 VFS intercepts to load modules on demand

Part of the design will track devices and their logical functions. The
VFS can intercept open() calls for devices and load modules on demand.

6.2 Goals

There were goals I strived to meet while I create this solutions

6.2.1 Minimal code duplication

More and more hot-swap interfaces are coming up and they will all be
supported. Each solution will have common features which overlap with
all of the other solutions. For instance, notification of insertion and
removal to userspace. PCMCIA already has this and USB needs it as well.

This also minimizes the impact into unswappable kernel memory and kernel
image size.

6.2.2 Intuitive and Clean Architecture

Some other solutions had been suggested to dynamically negotiate minor
numbers between kernel and user space for device nodes. This is not clean
nor intuitive and is just a kludge. (Nor does this solution meet all of
the requirements)

The solution must be designed to stand up for the future.

6.2.3 Userfriendly Solution

To meet this goal, I used gphoto as my test application. I have written
code for USB support which has been adopted by the gphoto development
team.

A user should be able to plug in his camera, and change permissions at most
once. It should present the user with cameras on the bus (to the extent
that gphoto knows about it) and be able to use it without an overhead of
administration each time the camera is connected.

6.3 What I did NOT attempt to solve

I did not attempt to solve device naming yet. This is a problem, but
the best solution that was offered that did not require kernel
interaction was symlinks. This could become unwieldly with the
configurations of some systems. Requiring kernel side support may be
a requirement unfortunately. I welcome any thoughts on this topic.

Also, please don't confuse logical device naming (/dev/sda, /dev/sdb)
with physical device naming (/dev/usb/bus1/device1). There is no solution
to solving USB physical device naming, nor is there much desire to do so.

6.4 Description of solution

The core of the solution is devfs and devfsd. Using devfs allows us to
meet goals 6.2.1 (Code Duplication) and 6.2.2 (Intuitive and Clean
Architecture) immediately.

devfs centralizes much of the common code, such as the kernel to userspace
channel to communicate device insertions, removals, chmod calls, etc. It
also avoids creating extra VFS' for each hot-swap bus like usbdevfs is
currently implemented.

It also allows us to easily solve problems 5.1 (Complex structure of USB
devices) and 5.2 (Userspace API), which are related.

The solution I propose removes the virtual filesystem portion of usbdevfs
and keeps the file operations only.

Most of the smarts are in devfsd. I've used the devfsd MFUNCTION to
intercept REGISTER, UNREGISTER and CHANGE events. This allows us to
see the insertion and removal of the device, changes to the permissions
of the device, as well as open() calls on devices with modules that
are not loaded.

6.4.1 USB and Logical Function to Physical Device Association

Using a design from David Hinds (maintainer of the Linux PCMCIA code) I
propose we create a generic ioctl() interface which can be used to
retrieve physical device data for a given logical device. This data will
domain specific and specific to each hot-swap interface.

An alternative solution is to use devfs to create logical device nodes
as a bus specific node, and create symlinks (/dev/scsi/host0/bus0/target0
-> /dev/usb/device1/scsi/target0 or something similar).

Both solutions have strengths and weaknesses. An alternative solution may
also be better. I am confident there is a solution to this problem.

6.4.2 Processing on REGISTER events

When a REGISTER event occurs, the module obtains the descriptors for the
device and parses them, determining which configurations, interfaces and
alternate settings are available on the device. It then executes an
arbitrary algorithm (not explained here for brevity) to select a
configuration, then select alternate settings and drivers for each
interface. The software then programs the active configuration and the
active alternate setting for each interface.

If the driver must be loaded at insertion time, instead of at use time,
the driver will be loaded, binds the driver to the interfaces necessary
and activates the interface.

The module then obtains unique or semi-unique characteristics from the
device and retrieves any previously saved permissions for the device.
If no information is retrieved from the local database, default values
from the configuration can be applied to the device.

Lastly, a script is executed to perform any local device configuration
necessary. For instance, a script can be used to configure an ethernet
device.

6.4.3 Processing on UNREGISTER events

On an UNREGISTER event, the device is deleted from all internal lists.

6.4.4 Processing on CHANGE events

On a CHANGE event, unique and semi-unique characteristics are obtained
from the device and saved in a database along with the new permissions
for later retrieval.

7. Future work

The work I've done is not just applicable to USB. Many other systems can
use similar algorithms and code.

Although I specifically focused on Hot-Swap interfaces, all interfaces
can be used with the proposed solution since their configuration can be
changed when the power is off. Since this is required of Hot-Swap
interfaces, "normal" interfaces can be treated the same but with a
subset of functionality (not Hot-Swap).

7.1 PCMCIA/Cardbus

A solution similar to this could be implemented to replace the card
manager and centralize duplicated code and interfaces.

David Hinds has expressed interest in working on a common solution for
hot-swap interfaces. My solution has involved some ideas from him, but
he probably isn't comfortable with it yet since we have not talked about
it or any specific solutions.

7.2 SCSI

Many, if not all, modern SCSI devices have a unique identifiers which
could be used to devices. This could be used to track permissions on
SCSI generic devices (CD burners, scanners, etc) as well as the logical
function.

7.3 PCI

The same driver binding code could be used with a different algorithm
to automatically load and bind drivers for hot-swap PCI devices. This,
as mentioned previously, ties into PCMCIA/Cardbus.

7.4 ALSA

A similar, but different, problem exists for ALSA. Most devices supported
under ALSA export many interfaces to control the device. To appropriately
solve this problem, the ALSA team overloads /proc with the device nodes it
needs. Permissions aren't tracked for the device nodes.

8. Closing Remarks

I expect tweaks to be made when I get input from the kernel developer
community at large, including those other subsystem that may be
affected including PCMCIA, Hot-Swap PCI, IEEE1394 and SCSI.

Hot-swap support is necessary for Linux. I don't think anyone will
disagree that the current framework for hot-swap interfaces is severely
lacking and requires a significant amount of work to help solve many
of the problems discussed.

I also want to reiterate that while I am actively pushing devfs right now,
that is because it's the best solution available. If an alternative
solution is developed, and is better, then I am interested in hearing
about it.

Also, please help clear up some of the terminology and descriptions. I
admit my english and explanations can suck.

JE


<Prev in Thread] Current Thread [Next in Thread>