linux-origin
[Top] [All Lists]

Re: pci_* interface

To: linux-origin@xxxxxxxxxxx
Subject: Re: pci_* interface
From: "Leo Dagum" <dagum@xxxxxxxxxxxxxxxxxxx>
Date: Mon, 31 Jan 2000 11:20:23 -0800
In-reply-to: Ralf Baechle <ralf@oss.sgi.com> "Re: pci_* interface" (Jan 31, 8:13pm)
References: <20000131193615.I12102@uni-koblenz.de> <ralf@uni-koblenz.de> <10001311048.ZM13252@barrel.engr.sgi.com> <20000131201357.A15341@uni-koblenz.de>
Sender: owner-linux-origin@xxxxxxxxxxx
On Jan 31,  8:13pm, Ralf Baechle wrote:
> >
> > Below is a rather old but still accurate document on bus layering
> > in Irix which describes the api for pio's, dma's and interrupts.
> > Section 7.1.2 describes dma's from crosstalk, the pci interfaces
> > are identical.
>
> Your email didn't make it to the list as it exceeded the maximum size
> of 40kb.  I've now changed this limit to 2mb, same as the usual
> sendsnail limit for the smtp mailers.
>

                                I/O Infrastructure - Layering
                                (rough draft: 01/29/96)
                                Len Widra

1.0 Acknowledgements
====================
Special thanks to Brad Eacker, who contributed many valuable comments regarding
the first draft of this document.  Also thanks to Bob Alfieri who insisted on
realism as he tried to implement the Kona driver using these interfaces.

1.1 Background
==============
There is a set of interrelated changes targeted for the latest software release
(kudzu) that focus on better support for large NUMA systems.  Collectively,
these changes are referred to as the "I/O Infrastructure" changes.  Major
changes
to the I/O Infrastructure include:
        kernel and driver threads, especially for interrupt handling
        data structure encapsulation for API, ABI, and DP work
        hwgraph management
        Address/Length Lists
        NUMA-aware interfaces, especially for allocation
        I/O Bus Provider Layering

This document focuses on I/O Bus Provider Layering.


2.0 Goals
=========
The primary goals of I/O Bus Provider Layering are:

        -To better support driver code sharing across diverse platforms that
         happen to support the same bus types.  At a minimum, API's must be
         common across SGI's platforms.  ABI compatibility is desirable.

        -To eliminate some concepts in the area of I/O that are insufficient
         on modern complex systems (e.g. Lego), and to provide appropriate
         concepts and abstractions that don't currently exist.  An example of
         an inappropriate abstraction in today's Irix is the concept of a
         single I/O Page Size.  An example of a new, more appropriate concept
         is an Address/Length List.

        -To permit both performance-oriented drivers and portability-oriented
         drivers to be written with an appropriate API.  This API must be
         reasonably simple to use.  Today, some drivers avoid using the
         official APIs and they directly manage system hardware (e.g. mapping
         hardware on the device's controlling adapter).  They do this in order
         to avoid some of the overhead associated with today's interfaces,
which
         stress portability.

        -To provide a framework that will allow us to easily add new bus types
         and which will allow drivers to work with little or no change on
         future platforms.

        -To provide interfaces with names and parameters that appear to have
         been designed with some regularity rather than heavily evolved.


3.0 Philosophy
==============
In approaching an I/O API, one fundamental decision involves choosing among
three possible philosophies for device driver interfaces:

1) Use simple interfaces that work in a broad range of situations.  Sacrifice
        some performance and feature support in order to make drivers easy to
        write and maintain.  Try to make it easy for drivers to work (though
        probably not with optimal performance) on future platforms with
        little or no changes.

2) Place the burden on device driver writers.  Give driver writers facilities
        to determine what sort of a system their driver is on and give them
        the ability to manipulate all the hardware on that system.  Then it's
        the driver writer's job to figure out the right thing to do on each
        platform, and it's his/her job to figure out how to work and play well
        with other drivers that might want to share hardware resources.  When
        SGI comes out with new hardware, the driver writer may need to add or
        modify driver code.

3) Invent sufficiently complex interfaces that allow driver and system to
        coordinate in order to "do the right thing".  Performance-oriented
        drivers can get good performance, and portability-oriented drivers
        can get good portability.  The price is paid in interface and driver
        complexity, since such interfaces would likely be quite complex.

The philosophy we have chosen is to provide *portability* interfaces which
allow simple drivers to be written which will work at sub-optimal performance
on all SGI platforms AND to provide simple *performance* interfaces which
sacrifice portability, but which allow a particular driver to exploit all
of the performance available on a particular platform(s).  If a driver writer
uses the performance interfaces, he assumes a greater responsibility for
understanding the system(s) in which his driver works.  We have chosen *not*
to combine the portability and performance interfaces into a single very
complex interface.

Today, Irix really only supports portability interfaces.  Individual drivers
achieve performance in ad-hoc ways.  It would be nice if we could implement
philosophy #3, but at this time it's just too complex and too likely to
break as we evolve hardware platforms and I/O buses.  History shows that it
is difficult to guess exactly what will be significant on future platforms,
and therefore difficult to provide forward-compatible interfaces.


4.0 Layering
============
Every bus type provides a layer which is both platform-independent and bus
implementation-independent and which serves as an API for all drivers that
control devices of that type.  For example, there are platform-independent
layers for PCI, VME, and Crosstalk.   Ideally, the layers provided by different
bus types provide similar services for similar operations like DMA Management,
PIO Management, and Interrupt Management.  In all cases, the platform-
independent API invokes an implementation-dependent layer to handle the
request.

For instance, a VME driver uses interfaces provided by the generic VME API
layer.  This layer calls implementation-specific routines provided by the
Newbridge adapter layer.  The Newbridge is itself a PCI device, so it calls
routines provided by the generic PCI API layer.  This layer invokes PCIBridge-
specific routines (on Lego or SpeedRacer).  The PCIBridge is itself a
Crosstalk device, so it calls routines provided by the generic Crosstalk API
layer.  On Lego, this Crosstalk API layer invokes Hub-specific routines; while
on SpeedRacer, the Crosstalk API layer invokes Heart-specific routines.

                        **********
The important thing to recognize in all this is that the VME driver only
understands "VME things", not "Newbridge things" nor "PCI things" nor
"PCIBridge things" nor "Crosstalk things" nor "Hub things" nor "Heart things".
Similarly for *all* device drivers -- they only understand things about their
bus type.
                        **********

Note a difference from older Irix:  We no longer attempt to provide a single
common layer for PIO, and DMA that works across all bus types; rather, we leave
it to the bus-type-API to define an appropriate interface for that bus-type.
Designers of generic bus-type layers are encouraged to follow a standard model
such as the Crosstalk layer presented later in this document.

Note also that the implementation-independent generic layers are mostly stubs
that call the appropriate implementation-specific layer.  We expect compiler
Inter-Procedural Analysis and inlining capabilities to optimize out these extra
layers so that what remains is an API (and ABI) with none of the "extra" call
overhead.  [If the compiler fails to work as desired, we can always revert to
explicit inline directives, or if necessary for performance we can sacrifice
the
ABI and use macros for the generic layers.]

The call sequence between layers is illustrated here for a VME device driver:

                        device driver
                              |
                              |
                       VME generic layer
                              |
                              |
                     Newbridge Adapter layer
                              |
                              |
                       PCI generic layer
                        /            \
                       /              \
             PCIBridge layer       MooseBridge Layer
                  |
                  |
         Crosstalk generic layer
            /                 \
           /                   \
     hub layer               heart layer


So if the XXX VME driver wishes to set up a piomap operation, and if the
driver is executing on a Lego system, then the following interfaces are
used (remember, some of these are really macros or inlined):
        vme_piomap_alloc
        newbridge_piomap_alloc
        pci_piomap_alloc
        pcibr_piomap_alloc
        xtalk_piomap_alloc
        hub_piomap_alloc



5.0 Generic Types
=================
There are a couple of generic types that are useful across all I/O bus types.

vertex_hdl_t is a type that uniquely identifies a particular instance of a
device.  From a vertex_hdl_t, it's possible (through hwgraph interfaces) to
understand how the device is connected to the system and to retrieve lots
of other information about the device.

paddr_t is a type that holds a "physical address".  A physical address uniquely
identifies a particular location in memory.  Drivers use well-defined
interfaces
to translate from virtual to physical addresses.

iopaddr_t is a type that holds an "I/O physical address".  An iopaddr_t
uniquely
identifies a particular item in the address space of an I/O bus.

alenlist_t is a type that holds an "Address/Length List" which is a list of
pairs of addresses and lengths.  There are well-defined operations on
Addr/Length Lists, including operations to create and initialize a List,
extract the next pair from a List, append a pair to the List, copy a List, grow
a List, etc.  Typically, a driver wishes to do a DMA to a location specified by
        a kernel or user virtual address/length  or
        a user virtual address vector (e.g. readv) or
        a buf structure or
        a virtual Address/Length List or
        a physical Address/Length List or
        a page list (e.g. vhand)
The driver uses standard routines (TBD) to convert from any of these forms
into a single canonical form, the Address/Length List.  The List is then
passed to the device's controlling adapter so that appropriate mappings can
be established.  See sys/alenlist.h for a complete interface description.

A device_desc_t is a type that holds a "device description".  This is a
description of the needs, desires, and policy information for an instance
of a device.  Currently, the device description consists of information
about DMA, and interrupts for the device.  Typically, lboot extracts
device descriptions for various devices from configuration files and the
kernel associates this information with a device.  The information comes
from an administrator and is interpreted by kernel software.

An interrupt description in a device_desc_t contains platform-dependent and
administratively-controlled policy information including an Interrupt Target
and an Interrupt Priority.  Older versions of Irix embedded interrupt priority
information directly in the driver, and failed to provide an adequate way for
administrators to control interrupt targets.  See sys/iobus.h

A DMA description in a device_desc_t contains platform-dependent and
administratively-controlled policy information including estimated bandwidth
requirements for a device, maximum quantity of DMA resources to allow a device
to use, and DMA Bandwidth Allocation information for a device (TBD: Bandwidth
Allocation interfaces -- may need to be more dynamic).

A PIO description could also be found in a device_desc_t, but there is
currently nothing to specify.

It's worth clarifying how the device_desc_t is managed.  Administrative
information finds its way from configuration file through lboot and
master.d into device_desc_t's.  These descriptors are associated with
specific hwgraph locator strings:  When a device is added to the hwgraph,
if a descriptor has been defined for that device, that descriptor is
associated with the newly-created vertex_hdl_t.  Subsequent driver calls
for PIO, DMA, and interrupt service automatically use this default
descriptor unless the default is overridden via calling arguments.  For
calls where no overriding information is defined by the caller and no
default description exists at the vertex, reasonable defaults are chosen
by the system.

6.0 I/O Bus Services
====================
There are five basic services provided by most I/O buses (notably, not SCSI
since SCSI is generally used for disks, tapes and other end-devices and is
generally not used as an intermediate bus).
        Programmed I/O Management
        Direct Memory Access Management
        Interrupt Management
        Configuration Management
        Error Management (TBD)

PIO Management allows drivers to perform CPU loads/stores that are mapped
into references to registers in I/O space.

DMA Management allows drivers to tell I/O devices to independently send and
receive data from memory.

Interrupt Management allows drivers to tell I/O devices to generate interrupts
that are handled by registered interrupt handlers.  It also allows instances of
interrupts to be blocked and unblocked.

Configuration Management allows software to determine what devices are
available on an I/O bus and to associate device drivers for those devices.

Error Management (TBD) allows system software to isolate failing components
and switch them offline, and it may allow the system to gracefully recover from
some system failures that are triggered by or associated with devices.

Beyond these five basic services, a bus type may provide whatever services make
sense for it.  If a new interface is provided for a bus type, it must be
supported by ALL implementations of that bus type, either with a bus-generic
implementation or with a bus provider-specific implementation.  It is *not*
very acceptable, for example, for a VME driver to invoke a special feature
that's provided only on the NewBridge adapter; because that VME driver is no
longer portable to non-NewBridge VME buses.

It is perfectly acceptable and desirable for a bus generic layer to provide
service functions desired by multiple implementations of that bus type.  For
instance, the generic PCI layer may provide services that are useful to both
PCIBridge and MooseBridge.  It is also acceptable for a generic layer to
provide
to drivers some additional ease-of-use interfaces which are implemented on top
of the basic interfaces.




7.0 Crosstalk Layer
===================
To illustrate a typical basic I/O bus interface, this chapter focuses on the
crosstalk layer.  This document does not attempt to list every single detail
of the implementation; rather, it is a concise overview of the crosstalk
layer.

7.1 Crosstalk Generic Layer
===========================
sys/xtalk/xtalk.h defines a "xtalk_provider_t".  Lego hub and SpeedRacer heart,
both of which are "Crosstalk providers" each supply a xtalk_provider_t
structure
whose members are populated with implementation-specific functions.

typedef struct xtalk_provider_s {
        /* PIO MANAGEMENT */
        xtalk_piomap_alloc_f                    *piomap_alloc;
        xtalk_piomap_free_f                     *piomap_free;
        xtalk_piomap_addr_f                     *piomap_addr;
        xtalk_piomap_done_f                     *piomap_done;
        xtalk_piotrans_addr_f                   *piotrans_addr;

        /* DMA MANAGEMENT */
        xtalk_dmamap_alloc_f                    *dmamap_alloc;
        xtalk_dmamap_free_f                     *dmamap_free;
        xtalk_dmamap_addr_f                     *dmamap_addr;
        xtalk_dmamap_list_f                     *dmamap_list;
        xtalk_dmamap_done_f                     *dmamap_done;
        xtalk_dmatrans_addr_f                   *dmatrans_addr;
        xtalk_dmatrans_list_f                   *dmatrans_list;

        /* INTERRUPT MANAGEMENT */
        xtalk_intr_alloc_f                      *intr_alloc;
        xtalk_intr_free_f                       *intr_free;
        xtalk_intr_connect_f                    *intr_connect;
        xtalk_intr_disconnect_f                 *intr_disconnect;
        xtalk_intr_cpu_get_f                    *intr_cpu_get;
        xtalk_intr_block_f                      *intr_block;
        xtalk_intr_unblock_f                    *intr_unblock;

        /* CONFIGURATION MANAGEMENT */
        xtalk_provider_startup_f                *provider_startup;
        xtalk_provider_shutdown_f               *provider_shutdown;

        /* ERROR MANAGEMENT */
        /* TBD */

} xtalk_provider_t;

If there is only one Crosstalk provider on a platform, the generic crosstalk
layer directly invokes the implementation-specific interfaces.  This may allow
inlining to remove the layer completely.  If a platform supports more than one
Crosstalk provider, the generic crosstalk layer indirects through the structure
supplied by the implementation (sort of like a "Crosstalk Provider Object").


7.1.1 Generic Crosstalk PIO
===========================
A Crosstalk driver allocates PIO mapping resources using xtalk_piomap_alloc.
It then invokes xtalk_piomap_addr in order to use the allocated resources to
map to specific Crosstalk addresses.  When it's done accessing the device with
PIO's, the driver frees the mapping resources with xtalk_piomap_free.

xtalk_piomap_t
xtalk_piomap_alloc(vertex_hdl_t dev,    /* set up mapping for this device */
                device_desc_t dev_desc, /* device descriptor */
                iopaddr_t xtalk_addr,   /* map for this xtalk_addr range */
                ulong byte_count,
                ulong byte_count_max,   /* maximum size of a mapping */
                ulong flags);
The Crosstalk piomap Allocation interface allocates whatever hardware and
software resources are needed in order to be able to perform loads/stores to
the specified device/address range.  If dev_desc is non-0, it overrides the
default PIO descriptor for this device.  xtalk_piomap_alloc returns an opaque
"crosstalk piomap handle".  byte_count_max specifies the largest mapping that
will ever be requested (via xtalk_piomap_addr).  flag values include those
specified in sys/pio.h:
        PIO_FIXED       /* long-term mapping */
        PIO_UNFIXED     /* mapping needed only briefly */


void
xtalk_piomap_free(xtalk_piomap_t xtalk_piomap);
The Crosstalk piomap Free interface logically releases all software and
hardware
resources that were allocated by an earlier xtalk_piomap_alloc.  The crosstalk
implementation layer may choose to use lazy release and leave the mappings
intact until the mapping resources are needed by some other allocation.


caddr_t
xtalk_piomap_addr( xtalk_piomap_t xtalk_piomap, /* mapping resources */
                iopaddr_t xtalk_addr,           /* map for this xtalk_addr */
                ulong byte_count);              /* map this many bytes */
The Crosstalk piomap Addr interface establishes a hardware mapping to the
specified
Crosstalk address using the mapping resources specified in an earlier call
to xtalk_piomap_alloc.  xtalk_piomap_addr returns a kernel virtual address.
When software accesses this address, the corresponding mapped Crosstalk address
is accessed.  The address range specified to xtalk_piomap_addr must be
contained
within the address range specified to xtalk_piomap_alloc.  Additionally,
byte_count
must be no greater than byte_count_max.  For all offsets such that
offset < byte_count, loads and stores to caddr_t+offset access
xtalk_addr+offset.

void
xtalk_piomap_done( xtalk_piomap_t xtalk_piomap);
The Crosstalk piomap Done interface notifies the system that a driver is done
using piomap resources specified in an earlier piomap_addr call.  The piomap
resources are retained for future piomap_addr invocations.  [Note: This isn't
strictly necessary, but it provides a convenient place to add workarounds,
etc.,
so it's included as a portability interface.]

caddr_t
xtalk_piotrans_addr(vertex_hdl_t dev,           /* set up mapping for this
device */
                device_desc_t dev_desc,         /* device descriptor */
                iopaddr_t xtalk_addr,           /* Crosstalk address */
                ulong byte_count,               /* map this many bytes */
                ulong flags);
The Crosstalk PIO Translate Address interface returns a system virtual address
range that maps to a specified Crosstalk address range.  If PIO mapping
hardware would be required, xtalk_piotrans_addr returns 0.  This interface
is a performance interface rather than a portable interface.



A Crosstalk driver that wishes to be both high-performance and
highly-compatible
should try to use xtalk_piotrans_addr during set up for PIOs.  If this
interface
returns 0 (error), the driver may then use the compatible method: allocate
mapping
resources via xtalk_piomap_alloc and establish mappings with xtalk_piomap_addr.



7.1.2 Generic Crosstalk DMA
===========================
A Crosstalk driver allocates DMA mapping resources using xtalk_dmamap_alloc.
It then invokes xtalk_dmamap_addr or xtalk_dmamap_list in order to use the
allocated resources to map to specific memory addresses.  After a DMA completes
but before making data available to the user that requested this DMA, the
Crosstalk driver calls xtalk_dmamap_done.  When the driver has entirely
finished
accessing memory with DMA's, it frees the mapping resources with
xtalk_dmamap_free.
Usually, though, the driver saves the dma mapping resources for later use with
a different mapping.  Typically, a driver allocates mapping resources during
initialization, and it re-uses these resources for many DMA's to various
locations.

Drivers which are more performance oriented and which are less concerned with
portability to future platforms may use the "translate" operations rather than
the dmamap operations.  The translate operations work only for devices which
*know* that they won't need any mapping resources on the platform they're
connected to.  Devices and drivers eligible to use translate operations
typically:
        support scatter/gather AND
        support 64-bit addressing AND
        are interrupt driven AND
        understand NUMA issues (if on Lego) AND
        understand endianness AND
        understand Guaranteed Bandwidth AND
        understand caches and flushing requirements AND
        understand prefetching issues AND
        are aware of system-level bug workarounds AND
        are performance-oriented AND
        are SGI-internal drivers
New criteria may be added to this list at any time!  That's why the performance
interface is not very portable.  [TBD: We could add an interface that allows
a driver to declare its "level of sophistication" for a platform.  The
translate
operations could fail if this level was too low.]


xtalk_dmamap_t
xtalk_dmamap_alloc(vertex_hdl_t dev,    /* set up mappings for this device */
                device_desc_t dev_desc, /* device descriptor */
                ulong byte_count_max,   /* max size of a mapping */
                ulong flags);           /* defined in dma.h */
The Crosstalk dmamap Allocation interface allocates whatever hardware and
software resources are needed in order to be able to perform DMA's of the
desired size to/from the specified device from/to the specified memory
range.  [TBD: exact semantics of desired size, considering misalignment and
list-oriented DMA operations].  If dev_desc is non-0, it overrides the default
DMA descriptor for this device.  byte_count_max specifies the size of the
largest mapping that will ever be requested (via xtalk_dmamap_addr).  flags are
defined in dma.h, and include information such as
        DMA_DATA                /* for data, not device control blocks */
        DMA_DESC                /* for device control descriptors */
        DMA_ADDR16              /* device handles 16-bit addresses */
        DMA_ADDR32              /* device handles 32-bit addresses */
        DMA_ADDR64              /* device handles 64-bit addresses */
        DMA_BIG_ENDIAN          /* device is big-endian */
        DMA_LITTLE_ENDIAN       /* device is little-endian */
xtalk_dmamap_alloc returns an opaque "crosstalk dmamap handle".


void
xtalk_dmamap_free(xtalk_dmamap_t dmamap);
The Crosstalk dmamap Free interface logically releases all software and
hardware
resources that were allocated by an earlier xtalk_dmamap_alloc.  The crosstalk
implementation layer may choose to use lazy release and leave the mappings
intact until the mapping resources are needed by some other allocation.


iopaddr_t
xtalk_dmamap_addr(xtalk_dmamap_t dmamap,   /* use these mapping resources */
                paddr_t paddr,             /* map for this address */
                ulong byte_count);         /* map this many bytes */
The Crosstalk dmamap Addr interface uses the resources allocated in an earlier
xtalk_dmamap_alloc call in order to establish a DMA mapping to the specified
physical address range.  It returns a Crosstalk address which represents the
start of the Crosstalk address range that maps to the specified physical
address range.  Typically, paddr/byte_count describes a single memory page.
The address range specified to xtalk_dmamap_addr must be contained within the
address range specified to xtalk_dmamap_alloc, and byte_count must be less
than or equal to byte_count_max specified to xtalk_dmamap_alloc.  This
interface
is a portability interface rather than a performance interface.


alenlist_t
xtalk_dmamap_list(xtalk_dmamap_t dmamap,    /* use these mapping resources */
                alenlist_t alenlist)        /* map this address/length list */
The Crosstalk dmamap List interface uses the resources allocated in an earlier
xtalk_dmamap_alloc call in order to establish a DMA mapping to the physical
addresses listed in the specified Address/Length List.  It returns an
Address/Length List where the addresses are in the Crosstalk address space
rather than in system physical address space.  When possible, the mappings
established are sufficient to map the incoming list with a single
Address/Length Pair.  Upon return, the original List has been free'd.  The
driver must free the new (returned) list.  This interface is a portability
interface rather than a performance interface.

void
xtalk_dmamap_done(xtalk_dmamap_t dmamap)
The Crosstalk dmamap Done interface notifies the system that whatever DMA
may have been in progress after an earlier xtalk_dmamap_addr or
xtalk_dmamap_list call has now been completed.  This interface is used at
the completion of each DMA before the buffer is made available to other
consumers.  [Note: This isn't strictly necessary, but it provides a really
convenient place to add workarounds, etc., so it's included as a portability
interface.]


iopaddr_t
xtalk_dmatrans_addr(vertex_hdl_t dev,           /* translate for this device */
                device_desc_t dev_desc,         /* device descriptor */
                paddr_t paddr,                  /* system physical address */
                ulong byte_count,               /* length */
                ulong flags);                   /* defined in dma.h */
The Crosstalk DMA Translate Address interface translates from a system physical
address range into a Crosstalk address range.  If mapping resources would
be required for this operation, xtalk_dmatrans_addr returns 0.  This interface
is a performance interface rather than a portability interface.


alenlist_t
xtalk_dmatrans_list(vertex_hdl_t dev,   /* translate for this device */
                device_desc_t dev_desc, /* device descriptor */
                alenlist_t palenlist,   /* system address/length list */
                ulong flags);           /* defined in dma.h */
The Crosstalk DMA Translate interface translates from a list of system physical
Address/Length Pairs into a list of Crosstalk Address/Length pairs.  If mapping
resources would be required in order to map the entire list, this interface
returns 0.  On return, the original Address/Length List has been freed.  The
driver must free the new (returned) list.  This interface is a performance
interface rather than a portability interface.


Observe that the DMA interface is very similar to the PIO interface.  The DMA
interface provides a few extra list-oriented operations that *could* be
provided
for PIO as well; however, these operations (map_list, trans_list) would only
be useful with drivers that want to efficiently support PIO-intensive devices;
so for now, we have omitted them.



7.1.3 Generic Crosstalk Interrupts
==================================
A Crosstalk driver allocates interrupt resources using xtalk_intr_alloc.
It then invokes xtalk_intr_connect in order to associate the allocated
resources with a software interrupt handler.  When a driver no longer
wishes to handle interrupts, it can disconnect the handler with
xtalk_intr_disconnect and/or it can free the allocated interrupt resources
with xtalk_intr_free.  If the Crosstalk driver uses kernel threads as a
programming model, it can use xtalk_intr_block and xtalk_intr_unblock to
block/unblock specific interrupts.  If the Crosstalk driver uses the
traditional "spl" model, it blocks interrupts with standard operations
provided by platform-dependent, bus-independent code (not shown in this
document).

xtalk_intr_t
xtalk_intr_alloc(vertex_hdl_t dev,              /* which crosstalk device */
                device_desc_t dev_desc,         /* device descriptor */
                vertex_hdl_t owner_dev);        /* device which owns this intr
*/
The Crosstalk Interrupt Allocation interface allocates whatever hardware and
software resources are needed in order for the specified device to generate
interrupts.  If dev_desc is non-0, it overrides the default interrupt
descriptor for this device.  owner_dev is recorded along with the interrupt
handle in order to assist debug, etc.  Returns opaque "crosstalk interrupt
handle".


void
xtalk_intr_free(xtalk_intr_t intr_hdl);
The Crosstalk Interrupt Free interface logically releases all software and
hardware resources that were allocated by an earlier xtalk_intr_alloc.  The
crosstalk implementation layer may choose to use lazy release and leave the
mappings intact until the mapping resources are needed by some other
allocation.


int
xtalk_intr_connect(xtalk_intr_t intr_hdl,       /* xtalk intr resource handle
*/
                intr_func_t *intr_func,         /* xtalk intr handler */
                void *intr_arg,                 /* arg to intr handler */
                xtalk_intr_setfunc_f setfunc,   /* func to set intr hw */
                void *setfunc_arg);             /* arg to setfunc */
The Crosstalk Interrupt Connect interface associates a software interrupt
handler
with hardware interrupt resources.  intr_hdl is a crosstalk interrupt handle,
returned from an earlier xtalk_intr_alloc, and representing hardware resources.
intr_func is a function to call when the interrupt is triggered, and intr_arg
is an argument to pass to intr_func.  setfunc is a function that can be called
at any time in order to retarget an interrupt.  setfunc_arg is an opaque
pointer-sized value to be interpreted by setfunc.  It must be sufficient to
determine which registers on which Crosstalk device need to be reprogrammed in
order to redirect the interrupt.  setfunc and setfunc_arg are used to support
interrupt migration in a way that's fairly transparent to the device driver.
If setfunc is NULL, then the driver takes responsibility for programming its
own hardware to generate interrupts, and it does not allow system software to
transparently migrate these interrupts.  [TBD: Details of transparent interrupt
migration.  This interface may be simplified (get rid of setfunc*) as a
result.]
For unloadable drivers, it is the driver's responsibility to disconnect
interrupts before allowing the unload to succeed.


void
xtalk_intr_disconnect(xtalk_intr_t intr_hdl);
The Crosstalk Interrupt Disconnect interface disconnects a software interrupt
handler from hardware interrupt resources.  The interrupt resources can be
re-connected to a different handler, or then can be left unconnected until
later.  Loadable drivers should disconnect as part of the unload operation,
and they should free as part of the unregister operation.

vertex_hdl_t
xtalk_intr_cpu_get(xtalk_intr_t intr_hdl);
The Crosstalk Interrupt CPU Get interface identifies which CPU is currently
targeted by the specified interrupt.


void
xtalk_intr_block(xtalk_intr_t intr_hdl);
The Crosstalk Interrupt Block routine prevents the specified interrupt from
reaching the CURRENT CPU.  This interface is intended for use with kthreads.
The calling thread must ensure that it cannot be migrated, since this interface
deals with only the CURRENT CPU.  The old-fashioned way to block interrupts
is with an interface provided by every *platform* (not every *bus*) which
blocks
an entire "software level", or "spl" at the CPU side.  The intent of intr_block
is to block only the one specified interrupt, probably at the I/O side.  It is
permissible for the crosstalk implementation to employ *lazy* interrupt
blocking,
but only on non-RealTime CPUs.  Blocking an interrupt from reaching a CPU when
the interrupt is already blocked from reaching that CPU has no effect.


void
xtalk_intr_unblock(xtalk_intr_t intr_hdl);
The Crosstalk Interrupt Unblock routine unblocks an interrupt that was
previously blocked via xtalk_intr_block.


7.1.4 Generic Crosstalk Configuration
=====================================
void
xtalk_provider_startup(vertex_hdl_t xtalk_provider);
The Crosstalk Provider Startup interface is called once for every crosstalk
provider (e.g. hub, heart) found.  It performs whatever initialization is
needed for that provider.  Typically, this interface calls initialization
routines for pio, dma, and interrupts.

void
xtalk_provider_shutdown(vertex_hdl_t xtalk_provider);
The Crosstalk Provider Shutdown interface is called in order to turn off
an entire Crosstalk Provider.  Devices owned by that provider will no
longer be accessible through that provider.


7.1.5 Generic Crosstalk Support Interface
=========================================
This section describes some Crosstalk-specific auxiliary interfaces that have
a single implementation, independent of the particular implementation of the
Crosstalk provider.

int
xwidget_driver_register(xwidget_partnum_t part_num, xwidget_mfg_num_t mfg_num,
                char *driver_prefix, unsigned flags);
The Crosstalk Widget Initialization Function Add interface allows a crosstalk
widget's driver to advertise that it is available to handle a particular
crosstalk
part.  Typically, xwidget_driver_register is called from a crosstalk driver's
*_init
entry point.  It is the driver's responsibility to manage rev numbers for
Crosstalk
widgets appropriately.  The infrastructure does not provide any way to register
different drivers for different revisions of a part.

void
xwidget_unregister(char *driver_prefix)
The Crosstalk Widget Initialization Function Remove interface allows a
crosstalk
driver to tell the system that the specified Crosstalk Widget driver should no
longer
be used.  This is useful for a loadable Crosstalk driver that wishes to unload.

int
xwidget_init(   struct xwidget_hwid_s *hwid,    /* widget's hardware ID */
                vertex_hdl_t dev,               /* widget to initialize */
                xwidgetnum_t id,                /* widget's target id (0..f) */
                vertex_hdl_t master,            /* widget's master vertex */
                xwidgetnum_t targetid);         /* master's target id (0..f) */
The Crosstalk Widget Initialization interface is called from crosstalk
*provider*
code.  It initializes a specified widget using the pre-registered widget
driver.
xwidget_init also allocates and initializes standard widget information that
will
be needed later.

void
xwidget_reset(vertex_hdl_t xwidget);
The Crosstalk Widget Reset interface performs a hardware reset on the specified
xwidget.

xwidget_info_t
xwidget_info_get(vertex_hdl_t widget);
The Crosstalk Widget Information Get interface obtains a handle to standard
widget information for a specified widget.  It is called by drivers and
providers that need to access standard widget information such as the
"crosstalk ID", "crosstalk device type", "state", "master device", and
"crosstalk ID of the master device".

vertex_hdl_t
xwidget_info_dev_get(xwidget_info_t xwidget_info);
The Crosstalk Widget Information Device Get interface determines which
Crosstalk device is associated with a Crosstalk Widget information handle.
(This is just the reverse operation from xwidget_info_get.)

xwidgetnum_t
xwidget_info_id_get(xwidget_info_t xwidget_info);
The Crosstalk Widget Information ID Get interface returns the Crosstalk
widget number (a.k.a "target ID") of a specified Crosstalk Widget.

int
xwidget_info_type_get(xwidget_info_t xwidget_info);
The Crosstalk Widget Information Type Get interface returns the Crosstalk
Type of a specified Crosstalk Widget.

int
xwidget_info_state_get(xwidget_info_t xwidget_info);
The Crosstalk Widget Information State Get interface indicates what
"state" a specified Crosstalk Widget is in.  Possible states are specified
in sys/iobus.h, and include "INITIALIZING", "ATTACHING", "ERROR", "INACTIVE",
etc.  [TBD: The states need some work.]

vertex_hdl_t
xwidget_info_master_get(xwidget_info_t xwidget_info);
The Crosstalk Widget Information Master Get interface indicates the "master
device" for a specified Crosstalk Widget.  Every Crosstalk Widget is assigned
a master, which is the Crosstalk provider that handles DMA, PIO, and interrupts
for the widget.  [For example, in a Lego system with a Crossbow, one of the two
hubs connected to the crossbow is selected as a master for each of the widgets
hanging off that crossbow.]

xwidgetnum_t
xwidget_info_masterid_get(xwidget_info_t xwidget_info);
The Crosstalk Widget Information MasterID Get interface returns the Crosstalk
widget number (a.k.a. "target ID") of the specified widget's master.  This
information could also be obtained via this code
  xwidget_info_id_get(xwidget_info_get(xwidget_info_master_get(xwidget_info)))
but it is more efficient to use the xwidget_info_masterid_get interface.

xwidgetnum_t
xtalk_intr_target_get(xtalk_intr_t xtalk_intr);
The Crosstalk Interrupt Target Get interface returns the widget number (0..f)
associated with a specified interrupt.

xtalk_intr_vector_t
xtalk_intr_vector_get(xtalk_intr_t xtalk_intr);
The Crosstalk Interrupt Vector Get interface returns the Crosstalk interrupt
"vector" (0..255) associated with a specified interrupt.

iopaddr_t
xtalk_intr_addr_get(xtalk_intr_t xtalk_intr);
The Crosstalk Interrupt Address Get interface returns the crosstalk address
which, when written, generates the specified interrupt.

vertex_hdl_t
xtalk_intr_cpu_get(xtalk_intr_t xtalk_intr);
The Crosstalk Interrupt CPU Get interface returns the handle of the CPU
which eventually receives the specified interrupt.  (Should this interface
exist?)

void *
xtalk_intr_sfarg_get(xtalk_intr_t xtalk_intr);
The Crosstalk Interrupt SetFunc Argument Get interface returns the "setfunc
argument" associated with a specified interrupt.  The "setfunc arg" is an
arbitrary argument that the driver specified in an earlier invocation of
xtalk_intr_connect.

vertex_hdl_t
xtalk_pio_dev_get(xtalk_piomap_t xtalk_piomap);
The Crosstalk PIO Device Get interface returns the crosstalk device
associated with a given piomap.

xwidgetnum_t
xtalk_pio_target_get(xtalk_piomap_t xtalk_piomap);
The Crosstalk PIO Target Get interface returns the Crosstalk widget number
that is used for the specified piomap.  This is a widget number associated
with the device's Crosstalk provider (master).

iopaddr_t
xtalk_pio_xtalk_addr_get(xtalk_piomap_t xtalk_piomap);
The Crosstalk PIO Xtalk Address Get interface returns the starting Crosstalk
address mapped by the specified piomap.

ulong
xtalk_pio_mapsz_get(xtalk_piomap_t xtalk_piomap);
The Crosstalk PIO Map Size Get interface returns the size of the Crosstalk
address range mapped by the specified piomap.

caddr_t
xtalk_pio_kvaddr_get(xtalk_piomap_t xtalk_piomap);
The Crosstalk PIO Kernel Virtual Address Get interface returns the starting
kernel virtual address used to map to the Crosstalk address range associated
with the specified piomap.

vertex_hdl_t
xtalk_dma_dev_get(xtalk_dmamap_t xtalk_dmamap);
The Crosstalk DMA Device Get interface returns the device which requested
the specified DMA mapping.

xwidgetnum_t
xtalk_dma_target_get(xtalk_dmamap_t xtalk_dmamap);
The Crosstalk DMA Target Get interface returns the Crosstalk widget number
that is used for DMA's that use the specified dmamap.  This is a widget
number associated with the device's Crosstalk provider (master).

void
xtalk_init(void);
The Crosstalk Init interface is invoked once during startup to initialize
software needed to deal with Crosstalk providers and devices.



7.2 Hub as Crosstalk Provider
=============================
Lego's Crosstalk provider is a hub.  The code which implements
Hub-as-CrosstalkProvider is in ml/KLEGO/hubio.c.  The interfaces look
very much like the generic crosstalk layer, except for a bunch of casts.


7.3 Heart as Crosstalk Provider
===============================
SpeedRacer's Crosstalk provider is a heart.  The code which implements
Heart-as-CrosstalkProvider is TBD.  The interfaces will look very much
like the generic crosstalk layer, except for a bunch of casts.


7.4 xbow as Crosstalk Provider
==============================
Our hope for xbow is to keep it largely invisible as far as bus operations
are concerned.  We'll simply treat it as an extension of hub/heart -- it's
not really a Crosstalk Provider, but just the ASIC that implements a
Crosstalk switch.  We could have tried to treat xbow as a Crosstalk *device*
that happens to also be a Crosstalk *provider*.  Thus, the xbow implementation
layer would provide all the interfaces (above) expected from a xtalk provider,
and it would also make generic xtalk calls which would be handed off to hub
or heart.



8.0 Bus-Independent Interfaces
==============================
In addition to the many existing bus-independent interfaces for use by
drivers, these new interfaces are also available.

Additions to manage device descriptors:
device_desc_t   device_desc_dup(vertex_hdl_t dev);
void            device_desc_free(device_desc_t device_desc);
device_desc_t   device_desc_default_get(vertex_hdl_t dev);
void            device_desc_default_set(vertex_hdl_t dev, device_desc_t
device_desc);

Accessor interfaces for device descriptors:
vertex_hdl_t    device_desc_intr_target_get(device_desc_t device_desc);
int             device_desc_intr_policy_get(device_desc_t device_desc);
ilvl_t          device_desc_intr_swlevel_get(device_desc_t device_desc);
char *          device_desc_intr_name_get(device_desc_t device_desc);
int             device_desc_flags_get(device_desc_t device_desc);

void            device_desc_intr_target_set(device_desc_t device_desc,
vertex_hdl_t target);
void            device_desc_intr_policy_set(device_desc_t device_desc, int
policy);
void            device_desc_intr_swlevel_set(device_desc_t device_desc, ilvl_t
swlevel);
void            device_desc_intr_name_set(device_desc_t device_desc, char
*name);
void            device_desc_flag_set(device_desc_t device_desc, int flag);
void            device_desc_flag_clear(device_desc_t device_desc, int flag);

Additions to access edt fields (for I/O buses interfaces that require edt):
void *          edt_bus_info_get(edt_t *edt);
vertex_hdl_t    edt_connectpt_get(edt_t *edt);
vertex_hdl_t    edt_master_get(edt_t *edt);
device_desc_t   edt_device_desc_get(edt_t *edt);
[Note that the edt* interfaces are really only useful on buses like VME where
the
*drivers* as opposed to the bus-dependent code must probe for devices.  On
IObuses
like Crosstalk and PCI, the bus code manages the hardware graph and device
descriptors and it calls the driver's initialization function.]


Interfaces to help manage device topology:
vertex_hdl_t    device_master_get(vertex_hdl_t vhdl);
void            device_master_set(vertex_hdl_t vhdl, vertex_hdl_t master);
These interfaces get and set the "master" for a specified device.  (The master
for crosstalk widgets is a crosstalk provider.)

cnodeid_t       master_node_get(vertex_hdl_t vhdl);
This interface returns the compact node ID of the node which "owns" the
specified
vertex.  It determines the owner by following "master" edges in the hwgraph
until
it reaches a node controller.  If it cannot determine a "master node", this
interface returns CNODEID_NONE.


Generic operations are provided (TBD) to convert from any of these
        a kernel or user virtual address/length  or
        a buf structure or
        a virtual Address/Length List or
        a physical Address/Length List or
        a page list
into an alenlist_t.

Versions of userdma and useracc are provided (TBD) which prepare a specified
buffer for DMA and return a alenlist_t that describes the prepared memory.

Error handling interfaces need to be defined.


9.0 Comparison with Old Irix
============================
This new interface expects drivers to use interfaces defined by the bus type
of the devices they control.  Old Irix attempted to squish all bus types
into a single set of interfaces.

This new interface requires a driver allocating resources to specify which
device it controls rather than which adapter that device is connected to.
The decision of which adapter to use is left to the system.  The hwgraph
allows the system to efficiently determine which adapter(s) connect to a
device based on the vertex_hdl_t.

This new interface manages policy information from administrative files in a
fairly transparent manner.  Policy information is extracted from files and
associated with a vertex_hdl_t.  Old Irix used to embed policy information
in the driver itself.

This new interface achieves all of the goals outlined earlier in chapter 2.0.

It is our intention to provide a compatibility layer for older cruftier
drivers that use on the Old Irix pio* and dma* interfaces.  This layer will
work for existing bus types, but it will *not* be extended to work with PCI,
Crosstalk, or other new bus types.  Even in Old Irix, this layer provided
no notable value.  Authors of new drivers will be encouraged to use the
new interfaces like the one described here for Crosstalk.


10.0 Future Directions
======================
[Note: None of the things in this chapter are ready for implementation.
It is presented merely to show a long-term direction.]

The fact that PIO and DMA interfaces look so remarkably similar begs some
obvious questions: Why have separate interfaces for PIO and DMA?  What's
the essential difference between PIO and DMA?   PIO's typically use partial
reads/writes whereas DMA typically uses full cache line reads/writes.  PIO's
typically are initiated by a CPU and directed at a device whereas DMA typically
is initiated by a device and directed at a memory.

In an environment where crosstalk devices send an interrupt by initiating a
partial write to a hub/heart; and in an environment where *CPU*'s initiate DMA
through the use of write gatherers and Block Transfer Engines; and in an
environment with Peer-to-Peer support in which one device DMA's directly to
another device without going through memory, the distinction between PIO and
DMA and even the distinction between kinds of hardware components is blurred.

It makes sense to consider a single unifying interface that handles a more
generalized "mapping" from one "Address Space" to another.  Examples of
things that use and/or provide one or more Address Spaces include:
        processes
        threads
        kernels
        dpnodes
        cpus
        nodes
        physical memories
        I/O devices (VME, PCI, crosstalk, etc.), especially block devs
        Block Transfer Engines
        files
In general, anything capable of performing a read/write/cache operation
performs these operations into some address space.  Anything capable of being
read/written at an offset provides an address space.

Address space mappings are accomplished with various Mapping Resources,
which include:
        direct translations
        mapping RAMs (e.g. for DMA)
        mapping tables (e.g. for PIO)
        TLBs
        pure software data structures

Let's define an addr_t to be "an address in some address space".  We'll say
that
by definition, an addr_t type is large enough to hold any address in any
Address Space.

Let's also expand an alenlist_t so that it now designates *which* address
space is mapped.  (Somewhat arbitrarily, we've restricted an alenlist_t
to specify a list of address/length pairs all from ONE address space.  We
could have turned this into an address/length/space triplet.]

Finally, let's use an aspc_t to represent an Address Space defined by anything
that provides an address space (see list above).  Software that manages the
providers of Address Spaces must then provide the following operations on an
aspc_t *rather than* the DMA and PIO interfaces described above.  (TBD: This
interface is only a rough approximation.)


aspcmap_t
aspcmap_alloc(  aspc_t aspc_src,                /* source Address Space */
                aspc_t aspc_targ,               /* target Address Space */
                addr_t address,                 /* target address range */
                ulong byte_count,
                ulong byte_count_max,           /* maximum #bytes in a map */
                asmap_desc_t asmap_desc,        /* details about mapping */
                ulong flags)
Allocate whatever pre-allocatable mapping resources are required in order to
map up to byte_count_max bytes from the source Address Space to the target
Address Space.  asmap_desc specifies details about the mapping, such as:
        cached or non-cached
        partial lines or full line transfers
        bandwidth allocation information
        endianness
        usage: data or descriptor
        width of transfers (8, 16, 32, 64-bit)
aspcmap_alloc return an opaque handle that describes a set of Mapping
Resources.
It is left to the implementation (TBD: much work) to determine an appropriate
"path" from source to target through intermediate Address Spaces.  This path
is associated with the aspcmap_t that is returned.  If a mapping from src to
targ is not supported return 0.


void
aspcmap_free(aspcmap_t aspcmap)                 /* map resources to free */
Frees the specified Mapping Resources.


addr_t
aspcmap_addr(aspcmap_t aspcmap_hdl,             /* map resources to use */
                addr_t addr_targ,               /* target address */
                ulong byte_count,               /* byte count to map */
                ulong flags)
Use the specified Mapping Resources to map to the specified target address.
Return an address in the source Address Space, which accesses the specified
target.  Note that the target Address Space was specified earlier when the
Mapping Resources were allocated; so there's no need to re-specify it.



alenlist_t
aspcmap_list(aspcmap_t aspcmap_hdl,             /* map resources to use */
                alenlist_t alenlist_targ)       /* List to map */
Same as aspcmap_addr, but for use with a *List* of Address/Length pairs.


addr_t
aspctrans_addr( aspc_t aspc_src,                /* source Address Space */
                aspc_t aspc_targ,               /* target Address Space */
                addr_t addr_targ,               /* target address */
                ulong byte_count,               /* #bytes in address range */
                ulong flags)
Given a target address range in some address space, provide an address in the
specified source address range which maps to the target without the need for
any pre-allocated mapping resources.  If pre-allocated mapping resources would
be required, return 0.


alenlist_t
aspctrans_list( aspc_t aspc_src,                /* source Address Space */
                aspc_t aspc_targ,               /* target Address Space */
                alenlist_t alenlist_targ,       /* List to map */
                ulong flags)
Similar to aspctrans_addr, but for use with a *List* or Address/Length pairs.


By using the aspc* interface, software can establish a mapping from any Address
Space to any other Address Space through whatever intermediate hardware is
needed.  This allows, for instance, a mapping to be set up between two devices
for peer-to-peer transfers.

These aspc* interfaces can establish a mapping from a "Kernel Virtual Address
Space" to a particular device's Address Space -- this is "PIO".  They can also
establish a mapping from a device's Address Space to the "Physical Memory
Address Space" in order to handle "DMA".  In fact, it should be easily possible
to layer the pio* and dma* interfaces described above on top of the aspc*
interface.  Note that on NUMA systems, every memory provides its own Physical
Address Space; but, there is also a collective Physical Memory Address Space
for the entire system which comprises the individual memory spaces.  The same
approach applies to CPU Address Spaces -- we may very well permit some amount
of per-CPU Kernel Virtual Address Space as well as some amount of Global
Kernel Virtual Address Space.

Product release timing is makes this large of a change impractical at this
time; so we do not plan to implement any of the aspc* interfaces.  We should
consider changes like these at the next opportunity; especially in conjunction
with peer-to-peer support.




-- 
Leo Dagum    SGI  Mountain View, CA 94043 (650-933-2179)

<Prev in Thread] Current Thread [Next in Thread>