From owner-linux-origin@oss.sgi.com Fri Jan 28 19:27:01 2000 Received: by oss.sgi.com id ; Fri, 28 Jan 2000 19:26:51 -0800 Received: from mailhost.uni-koblenz.de ([141.26.64.1]:32720 "EHLO mailhost.uni-koblenz.de") by oss.sgi.com with ESMTP id ; Fri, 28 Jan 2000 19:26:25 -0800 Received: from cacc-27.uni-koblenz.de (cacc-27.uni-koblenz.de [141.26.131.27]) by mailhost.uni-koblenz.de (8.9.3/8.9.3) with ESMTP id EAA06681 for ; Sat, 29 Jan 2000 04:29:04 +0100 (MET) Received: by lappi.waldorf-gmbh.de id ; Sat, 29 Jan 2000 04:28:21 +0100 Date: Sat, 29 Jan 2000 04:20:37 +0100 From: Ralf Baechle To: linux-origin@oss.sgi.com Subject: linux-origin list Message-ID: <20000129042037.A4144@uni-koblenz.de> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 1.0pre3us X-Accept-Language: de,en,fr Sender: owner-linux-origin@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;linux-origin-outgoing I've setup a new list linux-origin@oss.sgi.com which can be subscribed via majordomo. The list is closed, that is I as the moderator have to aprove each subscription. That means we can discuss things in private, unlike linux@engr.sgi.com. Right now only three people are on the list, me, Kanoj and Leo. Tell me who else should be subscribed or just pass on the word. Ralf From owner-linux-origin@oss.sgi.com Fri Jan 28 19:43:01 2000 Received: by oss.sgi.com id ; Fri, 28 Jan 2000 19:42:52 -0800 Received: from mailhost.uni-koblenz.de ([141.26.64.1]:59605 "EHLO mailhost.uni-koblenz.de") by oss.sgi.com with ESMTP id ; Fri, 28 Jan 2000 19:42:48 -0800 Received: from cacc-27.uni-koblenz.de (cacc-27.uni-koblenz.de [141.26.131.27]) by mailhost.uni-koblenz.de (8.9.3/8.9.3) with ESMTP id EAA07987 for ; Sat, 29 Jan 2000 04:45:27 +0100 (MET) Received: by lappi.waldorf-gmbh.de id ; Sat, 29 Jan 2000 04:45:03 +0100 Date: Sat, 29 Jan 2000 04:45:03 +0100 From: Ralf Baechle To: linux-origin@oss.sgi.com Subject: 2.3.41 Message-ID: <20000129044503.D4672@uni-koblenz.de> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 1.0pre3us Sender: owner-linux-origin@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;linux-origin-outgoing Take a look at linux/Documentation/DMA-mapping.txt: > Dynamic DMA mapping > =================== > > David S. Miller > Richard Henderson > Jakub Jelinek This text documents an API which we now need to examine if suitable for SNx. Kanoj, re our phone discussion - it says: > Drivers converted fully to this interface should not use virt_to_bus any > longer, nor should they use bus_to_virt. Ralf From owner-linux-origin@oss.sgi.com Sat Jan 29 14:54:29 2000 Received: by oss.sgi.com id ; Sat, 29 Jan 2000 14:54:09 -0800 Received: from deliverator.sgi.com ([204.94.214.10]:13140 "EHLO deliverator.sgi.com") by oss.sgi.com with ESMTP id ; Sat, 29 Jan 2000 14:53:50 -0800 Received: from google.engr.sgi.com (google.engr.sgi.com [163.154.10.145]) by deliverator.sgi.com (980309.SGI.8.8.8-aspam-6.2/980310.SGI-aspam) via ESMTP id OAA01871 for ; Sat, 29 Jan 2000 14:52:09 -0800 (PST) mail_from (kanoj@google.engr.sgi.com) Received: (from kanoj@localhost) by google.engr.sgi.com (980427.SGI.8.8.8/970903.SGI.AUTOCF) id OAA12434; Sat, 29 Jan 2000 14:55:18 -0800 (PST) From: kanoj@google.engr.sgi.com (Kanoj Sarcar) Message-Id: <200001292255.OAA12434@google.engr.sgi.com> Subject: Re: 2.3.41 To: ralf@uni-koblenz.de (Ralf Baechle) Date: Sat, 29 Jan 2000 14:55:17 -0800 (PST) Cc: linux-origin@oss.sgi.com In-Reply-To: <20000129044503.D4672@uni-koblenz.de> from "Ralf Baechle" at Jan 29, 2000 04:45:03 AM X-Mailer: ELM [version 2.5 PL2] MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-linux-origin@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;linux-origin-outgoing > > Take a look at linux/Documentation/DMA-mapping.txt: > > > Dynamic DMA mapping > > =================== > > > > David S. Miller > > Richard Henderson > > Jakub Jelinek > > This text documents an API which we now need to examine if suitable for > SNx. > > Kanoj, re our phone discussion - it says: > > > Drivers converted fully to this interface should not use virt_to_bus any > > longer, nor should they use bus_to_virt. > > Ralf > Leo, looks like other people are starting to get worried about the dma and pio issues too. After talking to Ralf, it is clear that some of us have to understand how irix handled this, issues involved in the sn0/sn1 platforms, and then get in touch with David and Jakub ... fast. We probably want to reach consensus internally about the general structure of the apis first though. Its time to start looking at the old papers people put up on bablyon regarding io apis in irix .... The sad truth is, we have to keep supporting older drivers which do not follow the apis ... Kanoj From owner-linux-origin@oss.sgi.com Sat Jan 29 17:45:48 2000 Received: by oss.sgi.com id ; Sat, 29 Jan 2000 17:45:39 -0800 Received: from deliverator.sgi.com ([204.94.214.10]:18017 "EHLO deliverator.sgi.com") by oss.sgi.com with ESMTP id ; Sat, 29 Jan 2000 17:45:25 -0800 Received: from cthulhu.engr.sgi.com (cthulhu.engr.sgi.com [192.26.80.2]) by deliverator.sgi.com (980309.SGI.8.8.8-aspam-6.2/980310.SGI-aspam) via ESMTP id RAA06888 for ; Sat, 29 Jan 2000 17:43:45 -0800 (PST) mail_from (dagum@barrel.engr.sgi.com) Received: from barrel.engr.sgi.com (barrel.engr.sgi.com [163.154.5.63]) by cthulhu.engr.sgi.com (980427.SGI.8.8.8/970903.SGI.AUTOCF) via SMTP id RAA57123; Sat, 29 Jan 2000 17:47:55 -0800 (PST) mail_from (dagum@barrel.engr.sgi.com) Received: (from dagum@localhost) by barrel.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id RAA10477; Sat, 29 Jan 2000 17:47:14 -0800 Date: Sat, 29 Jan 2000 17:47:14 -0800 From: dagum@barrel.engr.sgi.com (Leo Dagum) Message-Id: <200001300147.RAA10477@barrel.engr.sgi.com> To: ralf@uni-koblenz.de (Ralf Baechle), kanoj@google.engr.sgi.com (Kanoj Sarcar) Subject: Re: 2.3.41 Cc: linux-origin@oss.sgi.com Sender: owner-linux-origin@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;linux-origin-outgoing > > > > Take a look at linux/Documentation/DMA-mapping.txt: > > > > > Dynamic DMA mapping > > > =================== > > > > > > David S. Miller > > > Richard Henderson > > > Jakub Jelinek > > > > This text documents an API which we now need to examine if suitable for > > SNx. > > > > Kanoj, re our phone discussion - it says: > > > > > Drivers converted fully to this interface should not use virt_to_bus any > > > longer, nor should they use bus_to_virt. > > > > Ralf > > > > Leo, looks like other people are starting to get worried about the dma > and pio issues too. After talking to Ralf, it is clear that some of us > have to understand how irix handled this, issues involved in the sn0/sn1 > platforms, and then get in touch with David and Jakub ... fast. > We probably want to reach consensus internally about the general > structure of the apis first though. Its time to start looking at the > old papers people put up on bablyon regarding io apis in irix .... This is a good thing really, even if it doesn't go fully our way I'm guessing (w/o having seen the document yet) that it will be a lot easier as a starting point than virt_to_bus/bus_to_virt and should give us much better driver code capture than we could otherwise hope for. > > The sad truth is, we have to keep supporting older drivers which do > not follow the apis ... > Sure, but we were expecting that anyway. - leo > Kanoj > Leo Dagum SGI Mountain View, CA 94043 (650-933-2179) From owner-linux-origin@oss.sgi.com Sat Jan 29 18:01:48 2000 Received: by oss.sgi.com id ; Sat, 29 Jan 2000 18:01:39 -0800 Received: from pneumatic-tube.sgi.com ([204.94.214.22]:23839 "EHLO pneumatic-tube.sgi.com") by oss.sgi.com with ESMTP id ; Sat, 29 Jan 2000 18:01:17 -0800 Received: from cthulhu.engr.sgi.com (cthulhu.engr.sgi.com [192.26.80.2]) by pneumatic-tube.sgi.com (980327.SGI.8.8.8-aspam/980310.SGI-aspam) via ESMTP id SAA03828 for ; Sat, 29 Jan 2000 18:06:38 -0800 (PST) mail_from (dagum@barrel.engr.sgi.com) Received: from barrel.engr.sgi.com (barrel.engr.sgi.com [163.154.5.63]) by cthulhu.engr.sgi.com (980427.SGI.8.8.8/970903.SGI.AUTOCF) via SMTP id SAA81972 for <@cthulhu.engr.sgi.com:linux-origin@oss.sgi.com>; Sat, 29 Jan 2000 18:03:48 -0800 (PST) mail_from (dagum@barrel.engr.sgi.com) Received: (from dagum@localhost) by barrel.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id SAA10538 for linux-origin@oss.sgi.com; Sat, 29 Jan 2000 18:03:38 -0800 Date: Sat, 29 Jan 2000 18:03:38 -0800 From: dagum@barrel.engr.sgi.com (Leo Dagum) Message-Id: <200001300203.SAA10538@barrel.engr.sgi.com> To: linux-origin@oss.sgi.com Subject: Dynamic DMA Mapping document Sender: owner-linux-origin@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;linux-origin-outgoing This is the full document... - leo patchex: extracting file DMA-mapping.txt from patch-2.3.41 to stdout diff -u --recursive --new-file v2.3.40/linux/Documentation/DMA-mapping.txt linux/Documentation/DMA-mapping.txt --- v2.3.40/linux/Documentation/DMA-mapping.txt Wed Dec 31 16:00:00 1969 --- v2.3.40/linux/Documentation/DMA-mapping.txt Wed Dec 31 16:00:00 1969 ++ linux/Documentation/DMA-mapping.txt Thu Jan 27 08:58:15 2000 @@ -0,0 1,143 @@ Dynamic DMA mapping =================== David S. Miller Richard Henderson Jakub Jelinek Most of the 64bit platforms have special hardware that translates bus addresses (DMA addresses) to physical addresses similarly to how page tables and/or TLB translate virtual addresses to physical addresses. This is needed so that e.g. PCI devices can access with a Single Address Cycle (32bit DMA address) any page in the 64bit physical address space. Previously in Linux those 64bit platforms had to set artificial limits on the maximum RAM size in the system, so that the virt_to_bus() static scheme works (the DMA address translation tables were simply filled on bootup to map each bus address to the physical page __pa(bus_to_virt())). So that Linux can use the dynamic DMA mapping, it needs some help from the drivers, namely it has to take into account that DMA addresses should be mapped only for the time they are actually used and unmapped after the DMA transfer. The following API will work of course even on platforms where no such hardware exists, see e.g. include/asm-i386/pci.h for how it is implemented on top of the virt_to_bus interface. First of all, you should make sure #include is in your driver. This file defines a dma_addr_t type which should be used everywhere you hold a DMA (bus) address returned from the DMA mapping functions. There are two types of DMA mappings: - static DMA mappings which are usually mapped at driver initialization, unmapped at the end and for which the hardware should not assume sequential accesses (from both the DMA engine in the card and CPU). - streaming DMA mappings which are usually mapped for one DMA transfer, unmapped right after it (unless you use pci_dma_sync below) and for which hardware can optimize for sequential accesses. To allocate and map a static DMA region, you should do: dma_addr_t dma_handle; cpu_addr = pci_alloc_consistent(dev, size, &dma_handle); where dev is a struct pci_dev *. You should pass NULL for PCI like buses where devices don't have struct pci_dev (like ISA, EISA). This argument is needed because the DMA translations may be bus specific (and often is private to the bus which the device is attached to). Size is the length of the region you want to allocate. This routine will allocate RAM for that region, so it acts similarly to __get_free_pages (but takes size instead of page order). It returns two values: the virtual address which you can use to access it from the CPU and dma_handle which you pass to the card. The return address is guaranteed to be page aligned. To unmap and free such DMA region, you call: pci_free_consistent(dev, size, cpu_addr, dma_handle); where dev, size are the same as in the above call and cpu_addr and dma_handle are the values pci_alloc_consistent returned. The streaming DMA mapping routines can be called from interrupt context. There are two versions of each map/unmap, one which map/unmap a single memory region, one which map/unmap a scatterlist. To map a single region, you do: dma_addr_t dma_handle; dma_handle = pci_map_single(dev, addr, size); and to unmap it: pci_unmap_single(dev, dma_handle, size); You should call pci_unmap_single when the DMA activity is finished, e.g. from interrupt which told you the DMA transfer is done. Similarly with scatterlists, you map a region gathered from several regions by: int i, count = pci_map_sg(dev, sglist, nents); struct scatterlist *sg; for (i = 0, sg = sglist; i < count; i++, sg++) { hw_address[i] = sg_dma_address(sg); hw_len[i] = sg_dma_len(sg); } where nents is the number of entries in the sglist. The implementation is free to merge several consecutive sglist entries into one (e.g. if DMA mapping is done with PAGE_SIZE granularity, any consecutive sglist entries can be merged into one provided the first one ends and the second one starts on a page boundary - in fact this is a huge advantage for cards which either cannot do scatter-gather or have very limited number of scatter-gather entries) and returns the actual number of sg entries it mapped them too. Then you should loop count times (note: this can be less than nents times) and use sg_dma_address() and sg_dma_length() macros where you previously accessed sg->address and sg->length as shown above. To unmap a scatterlist, just call: pci_unmap_sg(dev, sglist, nents); Again, make sure DMA activity finished. Every pci_map_{single,sg} call should have its pci_unmap_{single,sg} counterpart, because the bus address space is a shared resource (although in some ports the mapping is per each BUS so less devices contend for the same bus address space) and you could render the machine unusable by eating all bus addresses. If you need to use the same streaming DMA region multiple times and touch the data in between the DMA transfers, just map it with pci_map_{single,sg}, after each DMA transfer call either: pci_dma_sync_single(dev, dma_handle, size); or: pci_dma_sync_sg(dev, sglist, nents); and after the last DMA transfer call one of the DMA unmap routines pci_unmap_{single,sg}. If you don't touch the data from the first pci_map_* call till pci_unmap_*, then you don't have to call pci_sync_* routines. Drivers converted fully to this interface should not use virt_to_bus any longer, nor should they use bus_to_virt. Some drivers have to be changed a little bit, because there is no longer an equivalent to bus_to_virt in the dynamic DMA mapping scheme - you have to always store the DMA addresses returned by the pci_alloc_consistent and pci_map_single calls (pci_map_sg stores them in the scatterlist itself if the platform supports dynamic DMA mapping in hardware) in your driver structures and/or in the card registers. For PCI cards which recognize fewer address lines than 32 in Single Address Cycle, you should set corresponding pci_dev's dma_mask field to a different mask. The dma mapping routines then should either honour your request and allocate the DMA only with the bus address with bits set in your dma_mask or should complain that the device is not supported on that platform. Leo Dagum SGI Mountain View, CA 94043 (650-933-2179) From owner-linux-origin@oss.sgi.com Sat Jan 29 18:13:39 2000 Received: by oss.sgi.com id ; Sat, 29 Jan 2000 18:13:29 -0800 Received: from sgi.SGI.COM ([192.48.153.1]:40558 "EHLO sgi.com") by oss.sgi.com with ESMTP id ; Sat, 29 Jan 2000 18:13:05 -0800 Received: from google.engr.sgi.com ([163.154.10.145]) by sgi.com (980327.SGI.8.8.8-aspam/980304.SGI-aspam: SGI does not authorize the use of its proprietary systems or networks for unsolicited or bulk email from the Internet.) via ESMTP id SAA03144 for ; Sat, 29 Jan 2000 18:15:51 -0800 (PST) mail_from (kanoj@google.engr.sgi.com) Received: (from kanoj@localhost) by google.engr.sgi.com (980427.SGI.8.8.8/970903.SGI.AUTOCF) id SAA30330; Sat, 29 Jan 2000 18:14:26 -0800 (PST) From: kanoj@google.engr.sgi.com (Kanoj Sarcar) Message-Id: <200001300214.SAA30330@google.engr.sgi.com> Subject: Re: Dynamic DMA Mapping document To: dagum@barrel.engr.sgi.com (Leo Dagum) Date: Sat, 29 Jan 2000 18:14:25 -0800 (PST) Cc: linux-origin@oss.sgi.com In-Reply-To: <200001300203.SAA10538@barrel.engr.sgi.com> from "Leo Dagum" at Jan 29, 2000 06:03:38 PM X-Mailer: ELM [version 2.5 PL2] MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-linux-origin@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;linux-origin-outgoing Right away you can see that we probably need to pass in another arg to the dma api's: an indicator whether we are dma'ing commands or data to/from the controller, so as to be able to set address attr bits like prefetch, virtual channel numbers etc in the returned dma_handle. Probably some more stuff, similar to all the info that pciio_dmatrans_addr() gets. Kanoj From owner-linux-origin@oss.sgi.com Sun Jan 30 17:22:29 2000 Received: by oss.sgi.com id ; Sun, 30 Jan 2000 17:22:20 -0800 Received: from pneumatic-tube.sgi.com ([204.94.214.22]:39505 "EHLO pneumatic-tube.sgi.com") by oss.sgi.com with ESMTP id ; Sun, 30 Jan 2000 17:21:59 -0800 Received: from google.engr.sgi.com (google.engr.sgi.com [163.154.10.145]) by pneumatic-tube.sgi.com (980327.SGI.8.8.8-aspam/980310.SGI-aspam) via ESMTP id RAA08201 for ; Sun, 30 Jan 2000 17:27:22 -0800 (PST) mail_from (kanoj@google.engr.sgi.com) Received: (from kanoj@localhost) by google.engr.sgi.com (980427.SGI.8.8.8/970903.SGI.AUTOCF) id RAA13165 for linux-origin@oss.sgi.com; Sun, 30 Jan 2000 17:23:30 -0800 (PST) From: kanoj@google.engr.sgi.com (Kanoj Sarcar) Message-Id: <200001310123.RAA13165@google.engr.sgi.com> Subject: CVS Update@oss.sgi.com: linux (fwd) To: linux-origin@oss.sgi.com Date: Sun, 30 Jan 2000 17:23:30 -0800 (PST) X-Mailer: ELM [version 2.5 PL2] MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-linux-origin@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;linux-origin-outgoing Please put all the hacks that you are putting in, and intend to remove later on, in this file, so that we can easily track these issues ... Kanoj Forwarded message: > From owner-linux-cvs@oss.sgi.com Sun Jan 30 17:22:24 2000 > From: Kanoj Sarcar > To: linux-cvs@oss.sgi.com > Subject: CVS Update@oss.sgi.com: linux > Message-Id: <20000131011913Z305160-9818+87@oss.sgi.com> > Date: Sun, 30 Jan 2000 17:19:12 -0800 > X-Orcpt: rfc822;linux-cvs > Sender: owner-linux-cvs@oss.sgi.com > Precedence: bulk > > CVSROOT: /home/pub/cvs > Module name: linux > Changes by: kanoj@oss.sgi.com 00/01/30 17:19:12 > > Added files: > arch/mips64/sgi-ip27: TODO > > Log message: > New file to track all issues that need investigation. > From owner-linux-origin@oss.sgi.com Mon Jan 31 10:42:26 2000 Received: by oss.sgi.com id ; Mon, 31 Jan 2000 10:42:18 -0800 Received: from mailhost.uni-koblenz.de ([141.26.64.1]:44937 "EHLO mailhost.uni-koblenz.de") by oss.sgi.com with ESMTP id ; Mon, 31 Jan 2000 10:42:02 -0800 Received: from cacc-17.uni-koblenz.de (cacc-17.uni-koblenz.de [141.26.131.17]) by mailhost.uni-koblenz.de (8.9.3/8.9.3) with ESMTP id TAA24261 for ; Mon, 31 Jan 2000 19:44:40 +0100 (MET) Received: by lappi.waldorf-gmbh.de id ; Mon, 31 Jan 2000 19:36:15 +0100 Date: Mon, 31 Jan 2000 19:36:15 +0100 From: Ralf Baechle To: linux-origin@oss.sgi.com Subject: pci_* interface Message-ID: <20000131193615.I12102@uni-koblenz.de> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 1.0pre3us X-Accept-Language: de,en,fr Sender: owner-linux-origin@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;linux-origin-outgoing Seems others also aren't really hapy with the interface. On Friday Kanoj and me agreed that we first want to discuss on this list what kind of interfaces we actually want for the Origin but I guess we're somewhat under pressure now to try to fix the interfaces. Can anybody point me to documentation of the interfaces which IRIX uses for this purpose? Ralf From owner-linux-origin@oss.sgi.com Mon Jan 31 10:58:07 2000 Received: by oss.sgi.com id ; Mon, 31 Jan 2000 10:57:57 -0800 Received: from deliverator.sgi.com ([204.94.214.10]:54884 "EHLO deliverator.sgi.com") by oss.sgi.com with ESMTP id ; Mon, 31 Jan 2000 10:57:47 -0800 Received: from rock.csd.sgi.com (fddi-rock.csd.sgi.com [150.166.9.10]) by deliverator.sgi.com (980309.SGI.8.8.8-aspam-6.2/980310.SGI-aspam) via ESMTP id KAA18306 for ; Mon, 31 Jan 2000 10:56:12 -0800 (PST) mail_from (len@sgi.com) Received: from sgi.com (hives.engr.sgi.com [163.154.15.44]) by rock.csd.sgi.com (980427.SGI.8.8.8/970903.SGI.AUTOCF) via ESMTP id LAA78954; Mon, 31 Jan 2000 11:00:24 -0800 (PST) mail_from (len@sgi.com) Message-ID: <3895DBC2.89FDF88E@sgi.com> Date: Mon, 31 Jan 2000 11:00:19 -0800 From: Len Widra Organization: SGI X-Mailer: Mozilla 4.7C-SGI [en] (X11; I; IRIX 6.5 IP22) X-Accept-Language: en MIME-Version: 1.0 To: Ralf Baechle CC: linux-origin@oss.sgi.com Subject: Re: pci_* interface References: <20000131193615.I12102@uni-koblenz.de> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-linux-origin@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;linux-origin-outgoing > Seems others also aren't really hapy with the interface. On Friday I just joined this mail list. Who's not happy with what? > Can anybody point me to documentation of the interfaces which IRIX uses > for this purpose? You can start with the technical pubs library book on PCI Device Drivers: http://techpubs.engr.sgi.com/library/dynaweb_bin/ebt-bin/0650/nph-infosrch.cgi/infosrchtpl/SGI_Developer/DevDriver_PG/@InfoSearch__BookTextView/42090?DwebQuery=device%2Band%2Bdriver Thanks, --Len -- =============================================================================== Len Widra 650-933-1189 len@sgi.com Principal Engineer, SGI From owner-linux-origin@oss.sgi.com Mon Jan 31 11:05:37 2000 Received: by oss.sgi.com id ; Mon, 31 Jan 2000 11:05:27 -0800 Received: from deliverator.sgi.com ([204.94.214.10]:58983 "EHLO deliverator.sgi.com") by oss.sgi.com with ESMTP id ; Mon, 31 Jan 2000 11:05:20 -0800 Received: from google.engr.sgi.com (google.engr.sgi.com [163.154.10.145]) by deliverator.sgi.com (980309.SGI.8.8.8-aspam-6.2/980310.SGI-aspam) via ESMTP id LAA19536 for ; Mon, 31 Jan 2000 11:03:47 -0800 (PST) mail_from (kanoj@google.engr.sgi.com) Received: (from kanoj@localhost) by google.engr.sgi.com (980427.SGI.8.8.8/970903.SGI.AUTOCF) id LAA49395; Mon, 31 Jan 2000 11:06:52 -0800 (PST) From: kanoj@google.engr.sgi.com (Kanoj Sarcar) Message-Id: <200001311906.LAA49395@google.engr.sgi.com> Subject: Re: pci_* interface To: len@sgi.com (Len Widra) Date: Mon, 31 Jan 2000 11:06:52 -0800 (PST) Cc: ralf@uni-koblenz.de (Ralf Baechle), linux-origin@oss.sgi.com In-Reply-To: <3895DBC2.89FDF88E@sgi.com> from "Len Widra" at Jan 31, 2000 11:00:19 AM X-Mailer: ELM [version 2.5 PL2] MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-linux-origin@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;linux-origin-outgoing > > > Seems others also aren't really hapy with the interface. On Friday > > I just joined this mail list. Who's not happy with what? > > > Can anybody point me to documentation of the interfaces which IRIX uses > > for this purpose? > > You can start with the technical pubs library book on PCI Device Drivers: > http://techpubs.engr.sgi.com/library/dynaweb_bin/ebt-bin/0650/nph-infosrch.cgi/infosrchtpl/SGI_Developer/DevDriver_PG/@InfoSearch__BookTextView/42090?DwebQuery=device%2Band%2Bdriver > > Thanks, > --Len > > -- > =============================================================================== > Len Widra 650-933-1189 len@sgi.com > Principal Engineer, SGI > > > And I was foolishly trying to get thru http://babylon/sn0/io.html. Who wants to fix the links on babylon for the software docs and hardware specs? Len, we are talking about getting irix style pci/dma io-layering calls into Linux, couple of the RedHat guys are also interested with this stuff for sparc64 ... Kanoj From owner-linux-origin@oss.sgi.com Mon Jan 31 11:06:07 2000 Received: by oss.sgi.com id ; Mon, 31 Jan 2000 11:05:57 -0800 Received: from deliverator.sgi.com ([204.94.214.10]:5736 "EHLO deliverator.sgi.com") by oss.sgi.com with ESMTP id ; Mon, 31 Jan 2000 11:05:55 -0800 Received: from cthulhu.engr.sgi.com (cthulhu.engr.sgi.com [192.26.80.2]) by deliverator.sgi.com (980309.SGI.8.8.8-aspam-6.2/980310.SGI-aspam) via ESMTP id LAA19613 for ; Mon, 31 Jan 2000 11:04:17 -0800 (PST) mail_from (dagum@barrel.engr.sgi.com) Received: from barrel.engr.sgi.com (barrel.engr.sgi.com [163.154.5.63]) by cthulhu.engr.sgi.com (980427.SGI.8.8.8/970903.SGI.AUTOCF) via SMTP id LAA95806 for <@cthulhu.engr.sgi.com:linux-origin@oss.sgi.com>; Mon, 31 Jan 2000 11:08:28 -0800 (PST) mail_from (dagum@barrel.engr.sgi.com) Received: (from dagum@localhost) by barrel.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id LAA13308 for linux-origin@oss.sgi.com; Mon, 31 Jan 2000 11:08:18 -0800 From: "Leo Dagum" Message-Id: <10001311108.ZM13306@barrel.engr.sgi.com> Date: Mon, 31 Jan 2000 11:08:17 -0800 In-Reply-To: Len Widra "Re: pci_* interface" (Jan 31, 11:00am) References: <20000131193615.I12102@uni-koblenz.de> <3895DBC2.89FDF88E@sgi.com> X-Mailer: Z-Mail (3.2.3 08feb96 MediaMail) To: linux-origin@oss.sgi.com Subject: Re: pci_* interface Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Sender: owner-linux-origin@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;linux-origin-outgoing On Jan 31, 11:00am, Len Widra wrote: > Subject: Re: pci_* interface > > Seems others also aren't really hapy with the interface. On Friday > > I just joined this mail list. Who's not happy with what? > People are unhappy with the dynamic DMA mapping interface proposed and included in release 2.3.41. Below is a document outlining the interface. (Apologies to the people who've gotten this 3 times already from me..) - leo patchex: extracting file DMA-mapping.txt from patch-2.3.41 to stdout diff -u --recursive --new-file v2.3.40/linux/Documentation/DMA-mapping.txt linux/Documentation/DMA-mapping.txt --- v2.3.40/linux/Documentation/DMA-mapping.txt Wed Dec 31 16:00:00 1969 --- v2.3.40/linux/Documentation/DMA-mapping.txt Wed Dec 31 16:00:00 1969 ++ linux/Documentation/DMA-mapping.txt Thu Jan 27 08:58:15 2000 @@ -0,0 1,143 @@ Dynamic DMA mapping =================== David S. Miller Richard Henderson Jakub Jelinek Most of the 64bit platforms have special hardware that translates bus addresses (DMA addresses) to physical addresses similarly to how page tables and/or TLB translate virtual addresses to physical addresses. This is needed so that e.g. PCI devices can access with a Single Address Cycle (32bit DMA address) any page in the 64bit physical address space. Previously in Linux those 64bit platforms had to set artificial limits on the maximum RAM size in the system, so that the virt_to_bus() static scheme works (the DMA address translation tables were simply filled on bootup to map each bus address to the physical page __pa(bus_to_virt())). So that Linux can use the dynamic DMA mapping, it needs some help from the drivers, namely it has to take into account that DMA addresses should be mapped only for the time they are actually used and unmapped after the DMA transfer. The following API will work of course even on platforms where no such hardware exists, see e.g. include/asm-i386/pci.h for how it is implemented on top of the virt_to_bus interface. First of all, you should make sure #include is in your driver. This file defines a dma_addr_t type which should be used everywhere you hold a DMA (bus) address returned from the DMA mapping functions. There are two types of DMA mappings: - static DMA mappings which are usually mapped at driver initialization, unmapped at the end and for which the hardware should not assume sequential accesses (from both the DMA engine in the card and CPU). - streaming DMA mappings which are usually mapped for one DMA transfer, unmapped right after it (unless you use pci_dma_sync below) and for which hardware can optimize for sequential accesses. To allocate and map a static DMA region, you should do: dma_addr_t dma_handle; cpu_addr = pci_alloc_consistent(dev, size, &dma_handle); where dev is a struct pci_dev *. You should pass NULL for PCI like buses where devices don't have struct pci_dev (like ISA, EISA). This argument is needed because the DMA translations may be bus specific (and often is private to the bus which the device is attached to). Size is the length of the region you want to allocate. This routine will allocate RAM for that region, so it acts similarly to __get_free_pages (but takes size instead of page order). It returns two values: the virtual address which you can use to access it from the CPU and dma_handle which you pass to the card. The return address is guaranteed to be page aligned. To unmap and free such DMA region, you call: pci_free_consistent(dev, size, cpu_addr, dma_handle); where dev, size are the same as in the above call and cpu_addr and dma_handle are the values pci_alloc_consistent returned. The streaming DMA mapping routines can be called from interrupt context. There are two versions of each map/unmap, one which map/unmap a single memory region, one which map/unmap a scatterlist. To map a single region, you do: dma_addr_t dma_handle; dma_handle = pci_map_single(dev, addr, size); and to unmap it: pci_unmap_single(dev, dma_handle, size); You should call pci_unmap_single when the DMA activity is finished, e.g. from interrupt which told you the DMA transfer is done. Similarly with scatterlists, you map a region gathered from several regions by: int i, count = pci_map_sg(dev, sglist, nents); struct scatterlist *sg; for (i = 0, sg = sglist; i < count; i++, sg++) { hw_address[i] = sg_dma_address(sg); hw_len[i] = sg_dma_len(sg); } where nents is the number of entries in the sglist. The implementation is free to merge several consecutive sglist entries into one (e.g. if DMA mapping is done with PAGE_SIZE granularity, any consecutive sglist entries can be merged into one provided the first one ends and the second one starts on a page boundary - in fact this is a huge advantage for cards which either cannot do scatter-gather or have very limited number of scatter-gather entries) and returns the actual number of sg entries it mapped them too. Then you should loop count times (note: this can be less than nents times) and use sg_dma_address() and sg_dma_length() macros where you previously accessed sg->address and sg->length as shown above. To unmap a scatterlist, just call: pci_unmap_sg(dev, sglist, nents); Again, make sure DMA activity finished. Every pci_map_{single,sg} call should have its pci_unmap_{single,sg} counterpart, because the bus address space is a shared resource (although in some ports the mapping is per each BUS so less devices contend for the same bus address space) and you could render the machine unusable by eating all bus addresses. If you need to use the same streaming DMA region multiple times and touch the data in between the DMA transfers, just map it with pci_map_{single,sg}, after each DMA transfer call either: pci_dma_sync_single(dev, dma_handle, size); or: pci_dma_sync_sg(dev, sglist, nents); and after the last DMA transfer call one of the DMA unmap routines pci_unmap_{single,sg}. If you don't touch the data from the first pci_map_* call till pci_unmap_*, then you don't have to call pci_sync_* routines. Drivers converted fully to this interface should not use virt_to_bus any longer, nor should they use bus_to_virt. Some drivers have to be changed a little bit, because there is no longer an equivalent to bus_to_virt in the dynamic DMA mapping scheme - you have to always store the DMA addresses returned by the pci_alloc_consistent and pci_map_single calls (pci_map_sg stores them in the scatterlist itself if the platform supports dynamic DMA mapping in hardware) in your driver structures and/or in the card registers. For PCI cards which recognize fewer address lines than 32 in Single Address Cycle, you should set corresponding pci_dev's dma_mask field to a different mask. The dma mapping routines then should either honour your request and allocate the DMA only with the bus address with bits set in your dma_mask or should complain that the device is not supported on that platform. -- Leo Dagum SGI Mountain View, CA 94043 (650-933-2179) From owner-linux-origin@oss.sgi.com Mon Jan 31 11:12:27 2000 Received: by oss.sgi.com id ; Mon, 31 Jan 2000 11:12:17 -0800 Received: from mailhost.uni-koblenz.de ([141.26.64.1]:9881 "EHLO mailhost.uni-koblenz.de") by oss.sgi.com with ESMTP id ; Mon, 31 Jan 2000 11:12:06 -0800 Received: from cacc-17.uni-koblenz.de (cacc-17.uni-koblenz.de [141.26.131.17]) by mailhost.uni-koblenz.de (8.9.3/8.9.3) with ESMTP id UAA28100; Mon, 31 Jan 2000 20:14:53 +0100 (MET) Received: by lappi.waldorf-gmbh.de id ; Mon, 31 Jan 2000 20:13:57 +0100 Date: Mon, 31 Jan 2000 20:13:57 +0100 From: Ralf Baechle To: Leo Dagum Cc: linux-origin@oss.sgi.com Subject: Re: pci_* interface Message-ID: <20000131201357.A15341@uni-koblenz.de> References: <20000131193615.I12102@uni-koblenz.de> <10001311048.ZM13252@barrel.engr.sgi.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 1.0pre3us In-Reply-To: <10001311048.ZM13252@barrel.engr.sgi.com> X-Accept-Language: de,en,fr Sender: owner-linux-origin@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;linux-origin-outgoing On Mon, Jan 31, 2000 at 10:48:34AM -0800, Leo Dagum wrote: > On Jan 31, 7:36pm, Ralf Baechle wrote: > > Subject: pci_* interface > > Seems others also aren't really hapy with the interface. On Friday > > Kanoj and me agreed that we first want to discuss on this list what > > kind of interfaces we actually want for the Origin but I guess we're > > somewhat under pressure now to try to fix the interfaces. > > > > Can anybody point me to documentation of the interfaces which IRIX uses > > for this purpose? > > > > Ralf > >-- End of excerpt from Ralf Baechle > > Below is a rather old but still accurate document on bus layering > in Irix which describes the api for pio's, dma's and interrupts. > Section 7.1.2 describes dma's from crosstalk, the pci interfaces > are identical. Your email didn't make it to the list as it exceeded the maximum size of 40kb. I've now changed this limit to 2mb, same as the usual sendsnail limit for the smtp mailers. Ralf From owner-linux-origin@oss.sgi.com Mon Jan 31 11:16:36 2000 Received: by oss.sgi.com id ; Mon, 31 Jan 2000 11:16:28 -0800 Received: from sgi.SGI.COM ([192.48.153.1]:3120 "EHLO sgi.com") by oss.sgi.com with ESMTP id ; Mon, 31 Jan 2000 11:16:24 -0800 Received: from google.engr.sgi.com ([163.154.10.145]) by sgi.com (980327.SGI.8.8.8-aspam/980304.SGI-aspam: SGI does not authorize the use of its proprietary systems or networks for unsolicited or bulk email from the Internet.) via ESMTP id LAA04836; Mon, 31 Jan 2000 11:18:59 -0800 (PST) mail_from (kanoj@google.engr.sgi.com) Received: (from kanoj@localhost) by google.engr.sgi.com (980427.SGI.8.8.8/970903.SGI.AUTOCF) id LAA50994; Mon, 31 Jan 2000 11:17:41 -0800 (PST) From: kanoj@google.engr.sgi.com (Kanoj Sarcar) Message-Id: <200001311917.LAA50994@google.engr.sgi.com> Subject: Re: pci_* interface To: ralf@oss.sgi.com (Ralf Baechle) Date: Mon, 31 Jan 2000 11:17:40 -0800 (PST) Cc: dagum@barrel.engr.sgi.com (Leo Dagum), linux-origin@oss.sgi.com In-Reply-To: <20000131201357.A15341@uni-koblenz.de> from "Ralf Baechle" at Jan 31, 2000 08:13:57 PM X-Mailer: ELM [version 2.5 PL2] MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-linux-origin@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;linux-origin-outgoing > > On Mon, Jan 31, 2000 at 10:48:34AM -0800, Leo Dagum wrote: > > > On Jan 31, 7:36pm, Ralf Baechle wrote: > > > Subject: pci_* interface > > > Seems others also aren't really hapy with the interface. On Friday > > > Kanoj and me agreed that we first want to discuss on this list what > > > kind of interfaces we actually want for the Origin but I guess we're > > > somewhat under pressure now to try to fix the interfaces. > > > > > > Can anybody point me to documentation of the interfaces which IRIX uses > > > for this purpose? > > > > > > Ralf > > >-- End of excerpt from Ralf Baechle > > > > Below is a rather old but still accurate document on bus layering > > in Irix which describes the api for pio's, dma's and interrupts. > > Section 7.1.2 describes dma's from crosstalk, the pci interfaces > > are identical. > > Your email didn't make it to the list as it exceeded the maximum size > of 40kb. I've now changed this limit to 2mb, same as the usual > sendsnail limit for the smtp mailers. > > Ralf > Could someone pls resend the doc again? Kanoj From owner-linux-origin@oss.sgi.com Mon Jan 31 11:18:36 2000 Received: by oss.sgi.com id ; Mon, 31 Jan 2000 11:18:27 -0800 Received: from sgi.SGI.COM ([192.48.153.1]:48688 "EHLO sgi.com") by oss.sgi.com with ESMTP id ; Mon, 31 Jan 2000 11:18:21 -0800 Received: from cthulhu.engr.sgi.com (cthulhu.engr.sgi.com [192.26.80.2]) by sgi.com (980327.SGI.8.8.8-aspam/980304.SGI-aspam: SGI does not authorize the use of its proprietary systems or networks for unsolicited or bulk email from the Internet.) via ESMTP id LAA03848 for ; Mon, 31 Jan 2000 11:20:49 -0800 (PST) mail_from (dagum@barrel.engr.sgi.com) Received: from barrel.engr.sgi.com (barrel.engr.sgi.com [163.154.5.63]) by cthulhu.engr.sgi.com (980427.SGI.8.8.8/970903.SGI.AUTOCF) via SMTP id LAA32339 for <@cthulhu.engr.sgi.com:linux-origin@oss.sgi.com>; Mon, 31 Jan 2000 11:20:33 -0800 (PST) mail_from (dagum@barrel.engr.sgi.com) Received: (from dagum@localhost) by barrel.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id LAA13368 for linux-origin@oss.sgi.com; Mon, 31 Jan 2000 11:20:23 -0800 From: "Leo Dagum" Message-Id: <10001311120.ZM13366@barrel.engr.sgi.com> Date: Mon, 31 Jan 2000 11:20:23 -0800 In-Reply-To: Ralf Baechle "Re: pci_* interface" (Jan 31, 8:13pm) References: <20000131193615.I12102@uni-koblenz.de> <10001311048.ZM13252@barrel.engr.sgi.com> <20000131201357.A15341@uni-koblenz.de> X-Mailer: Z-Mail (3.2.3 08feb96 MediaMail) To: linux-origin@oss.sgi.com Subject: Re: pci_* interface Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Sender: owner-linux-origin@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;linux-origin-outgoing On Jan 31, 8:13pm, Ralf Baechle wrote: > > > > Below is a rather old but still accurate document on bus layering > > in Irix which describes the api for pio's, dma's and interrupts. > > Section 7.1.2 describes dma's from crosstalk, the pci interfaces > > are identical. > > Your email didn't make it to the list as it exceeded the maximum size > of 40kb. I've now changed this limit to 2mb, same as the usual > sendsnail limit for the smtp mailers. > I/O Infrastructure - Layering (rough draft: 01/29/96) Len Widra 1.0 Acknowledgements ==================== Special thanks to Brad Eacker, who contributed many valuable comments regarding the first draft of this document. Also thanks to Bob Alfieri who insisted on realism as he tried to implement the Kona driver using these interfaces. 1.1 Background ============== There is a set of interrelated changes targeted for the latest software release (kudzu) that focus on better support for large NUMA systems. Collectively, these changes are referred to as the "I/O Infrastructure" changes. Major changes to the I/O Infrastructure include: kernel and driver threads, especially for interrupt handling data structure encapsulation for API, ABI, and DP work hwgraph management Address/Length Lists NUMA-aware interfaces, especially for allocation I/O Bus Provider Layering This document focuses on I/O Bus Provider Layering. 2.0 Goals ========= The primary goals of I/O Bus Provider Layering are: -To better support driver code sharing across diverse platforms that happen to support the same bus types. At a minimum, API's must be common across SGI's platforms. ABI compatibility is desirable. -To eliminate some concepts in the area of I/O that are insufficient on modern complex systems (e.g. Lego), and to provide appropriate concepts and abstractions that don't currently exist. An example of an inappropriate abstraction in today's Irix is the concept of a single I/O Page Size. An example of a new, more appropriate concept is an Address/Length List. -To permit both performance-oriented drivers and portability-oriented drivers to be written with an appropriate API. This API must be reasonably simple to use. Today, some drivers avoid using the official APIs and they directly manage system hardware (e.g. mapping hardware on the device's controlling adapter). They do this in order to avoid some of the overhead associated with today's interfaces, which stress portability. -To provide a framework that will allow us to easily add new bus types and which will allow drivers to work with little or no change on future platforms. -To provide interfaces with names and parameters that appear to have been designed with some regularity rather than heavily evolved. 3.0 Philosophy ============== In approaching an I/O API, one fundamental decision involves choosing among three possible philosophies for device driver interfaces: 1) Use simple interfaces that work in a broad range of situations. Sacrifice some performance and feature support in order to make drivers easy to write and maintain. Try to make it easy for drivers to work (though probably not with optimal performance) on future platforms with little or no changes. 2) Place the burden on device driver writers. Give driver writers facilities to determine what sort of a system their driver is on and give them the ability to manipulate all the hardware on that system. Then it's the driver writer's job to figure out the right thing to do on each platform, and it's his/her job to figure out how to work and play well with other drivers that might want to share hardware resources. When SGI comes out with new hardware, the driver writer may need to add or modify driver code. 3) Invent sufficiently complex interfaces that allow driver and system to coordinate in order to "do the right thing". Performance-oriented drivers can get good performance, and portability-oriented drivers can get good portability. The price is paid in interface and driver complexity, since such interfaces would likely be quite complex. The philosophy we have chosen is to provide *portability* interfaces which allow simple drivers to be written which will work at sub-optimal performance on all SGI platforms AND to provide simple *performance* interfaces which sacrifice portability, but which allow a particular driver to exploit all of the performance available on a particular platform(s). If a driver writer uses the performance interfaces, he assumes a greater responsibility for understanding the system(s) in which his driver works. We have chosen *not* to combine the portability and performance interfaces into a single very complex interface. Today, Irix really only supports portability interfaces. Individual drivers achieve performance in ad-hoc ways. It would be nice if we could implement philosophy #3, but at this time it's just too complex and too likely to break as we evolve hardware platforms and I/O buses. History shows that it is difficult to guess exactly what will be significant on future platforms, and therefore difficult to provide forward-compatible interfaces. 4.0 Layering ============ Every bus type provides a layer which is both platform-independent and bus implementation-independent and which serves as an API for all drivers that control devices of that type. For example, there are platform-independent layers for PCI, VME, and Crosstalk. Ideally, the layers provided by different bus types provide similar services for similar operations like DMA Management, PIO Management, and Interrupt Management. In all cases, the platform- independent API invokes an implementation-dependent layer to handle the request. For instance, a VME driver uses interfaces provided by the generic VME API layer. This layer calls implementation-specific routines provided by the Newbridge adapter layer. The Newbridge is itself a PCI device, so it calls routines provided by the generic PCI API layer. This layer invokes PCIBridge- specific routines (on Lego or SpeedRacer). The PCIBridge is itself a Crosstalk device, so it calls routines provided by the generic Crosstalk API layer. On Lego, this Crosstalk API layer invokes Hub-specific routines; while on SpeedRacer, the Crosstalk API layer invokes Heart-specific routines. ********** The important thing to recognize in all this is that the VME driver only understands "VME things", not "Newbridge things" nor "PCI things" nor "PCIBridge things" nor "Crosstalk things" nor "Hub things" nor "Heart things". Similarly for *all* device drivers -- they only understand things about their bus type. ********** Note a difference from older Irix: We no longer attempt to provide a single common layer for PIO, and DMA that works across all bus types; rather, we leave it to the bus-type-API to define an appropriate interface for that bus-type. Designers of generic bus-type layers are encouraged to follow a standard model such as the Crosstalk layer presented later in this document. Note also that the implementation-independent generic layers are mostly stubs that call the appropriate implementation-specific layer. We expect compiler Inter-Procedural Analysis and inlining capabilities to optimize out these extra layers so that what remains is an API (and ABI) with none of the "extra" call overhead. [If the compiler fails to work as desired, we can always revert to explicit inline directives, or if necessary for performance we can sacrifice the ABI and use macros for the generic layers.] The call sequence between layers is illustrated here for a VME device driver: device driver | | VME generic layer | | Newbridge Adapter layer | | PCI generic layer / \ / \ PCIBridge layer MooseBridge Layer | | Crosstalk generic layer / \ / \ hub layer heart layer So if the XXX VME driver wishes to set up a piomap operation, and if the driver is executing on a Lego system, then the following interfaces are used (remember, some of these are really macros or inlined): vme_piomap_alloc newbridge_piomap_alloc pci_piomap_alloc pcibr_piomap_alloc xtalk_piomap_alloc hub_piomap_alloc 5.0 Generic Types ================= There are a couple of generic types that are useful across all I/O bus types. vertex_hdl_t is a type that uniquely identifies a particular instance of a device. From a vertex_hdl_t, it's possible (through hwgraph interfaces) to understand how the device is connected to the system and to retrieve lots of other information about the device. paddr_t is a type that holds a "physical address". A physical address uniquely identifies a particular location in memory. Drivers use well-defined interfaces to translate from virtual to physical addresses. iopaddr_t is a type that holds an "I/O physical address". An iopaddr_t uniquely identifies a particular item in the address space of an I/O bus. alenlist_t is a type that holds an "Address/Length List" which is a list of pairs of addresses and lengths. There are well-defined operations on Addr/Length Lists, including operations to create and initialize a List, extract the next pair from a List, append a pair to the List, copy a List, grow a List, etc. Typically, a driver wishes to do a DMA to a location specified by a kernel or user virtual address/length or a user virtual address vector (e.g. readv) or a buf structure or a virtual Address/Length List or a physical Address/Length List or a page list (e.g. vhand) The driver uses standard routines (TBD) to convert from any of these forms into a single canonical form, the Address/Length List. The List is then passed to the device's controlling adapter so that appropriate mappings can be established. See sys/alenlist.h for a complete interface description. A device_desc_t is a type that holds a "device description". This is a description of the needs, desires, and policy information for an instance of a device. Currently, the device description consists of information about DMA, and interrupts for the device. Typically, lboot extracts device descriptions for various devices from configuration files and the kernel associates this information with a device. The information comes from an administrator and is interpreted by kernel software. An interrupt description in a device_desc_t contains platform-dependent and administratively-controlled policy information including an Interrupt Target and an Interrupt Priority. Older versions of Irix embedded interrupt priority information directly in the driver, and failed to provide an adequate way for administrators to control interrupt targets. See sys/iobus.h A DMA description in a device_desc_t contains platform-dependent and administratively-controlled policy information including estimated bandwidth requirements for a device, maximum quantity of DMA resources to allow a device to use, and DMA Bandwidth Allocation information for a device (TBD: Bandwidth Allocation interfaces -- may need to be more dynamic). A PIO description could also be found in a device_desc_t, but there is currently nothing to specify. It's worth clarifying how the device_desc_t is managed. Administrative information finds its way from configuration file through lboot and master.d into device_desc_t's. These descriptors are associated with specific hwgraph locator strings: When a device is added to the hwgraph, if a descriptor has been defined for that device, that descriptor is associated with the newly-created vertex_hdl_t. Subsequent driver calls for PIO, DMA, and interrupt service automatically use this default descriptor unless the default is overridden via calling arguments. For calls where no overriding information is defined by the caller and no default description exists at the vertex, reasonable defaults are chosen by the system. 6.0 I/O Bus Services ==================== There are five basic services provided by most I/O buses (notably, not SCSI since SCSI is generally used for disks, tapes and other end-devices and is generally not used as an intermediate bus). Programmed I/O Management Direct Memory Access Management Interrupt Management Configuration Management Error Management (TBD) PIO Management allows drivers to perform CPU loads/stores that are mapped into references to registers in I/O space. DMA Management allows drivers to tell I/O devices to independently send and receive data from memory. Interrupt Management allows drivers to tell I/O devices to generate interrupts that are handled by registered interrupt handlers. It also allows instances of interrupts to be blocked and unblocked. Configuration Management allows software to determine what devices are available on an I/O bus and to associate device drivers for those devices. Error Management (TBD) allows system software to isolate failing components and switch them offline, and it may allow the system to gracefully recover from some system failures that are triggered by or associated with devices. Beyond these five basic services, a bus type may provide whatever services make sense for it. If a new interface is provided for a bus type, it must be supported by ALL implementations of that bus type, either with a bus-generic implementation or with a bus provider-specific implementation. It is *not* very acceptable, for example, for a VME driver to invoke a special feature that's provided only on the NewBridge adapter; because that VME driver is no longer portable to non-NewBridge VME buses. It is perfectly acceptable and desirable for a bus generic layer to provide service functions desired by multiple implementations of that bus type. For instance, the generic PCI layer may provide services that are useful to both PCIBridge and MooseBridge. It is also acceptable for a generic layer to provide to drivers some additional ease-of-use interfaces which are implemented on top of the basic interfaces. 7.0 Crosstalk Layer =================== To illustrate a typical basic I/O bus interface, this chapter focuses on the crosstalk layer. This document does not attempt to list every single detail of the implementation; rather, it is a concise overview of the crosstalk layer. 7.1 Crosstalk Generic Layer =========================== sys/xtalk/xtalk.h defines a "xtalk_provider_t". Lego hub and SpeedRacer heart, both of which are "Crosstalk providers" each supply a xtalk_provider_t structure whose members are populated with implementation-specific functions. typedef struct xtalk_provider_s { /* PIO MANAGEMENT */ xtalk_piomap_alloc_f *piomap_alloc; xtalk_piomap_free_f *piomap_free; xtalk_piomap_addr_f *piomap_addr; xtalk_piomap_done_f *piomap_done; xtalk_piotrans_addr_f *piotrans_addr; /* DMA MANAGEMENT */ xtalk_dmamap_alloc_f *dmamap_alloc; xtalk_dmamap_free_f *dmamap_free; xtalk_dmamap_addr_f *dmamap_addr; xtalk_dmamap_list_f *dmamap_list; xtalk_dmamap_done_f *dmamap_done; xtalk_dmatrans_addr_f *dmatrans_addr; xtalk_dmatrans_list_f *dmatrans_list; /* INTERRUPT MANAGEMENT */ xtalk_intr_alloc_f *intr_alloc; xtalk_intr_free_f *intr_free; xtalk_intr_connect_f *intr_connect; xtalk_intr_disconnect_f *intr_disconnect; xtalk_intr_cpu_get_f *intr_cpu_get; xtalk_intr_block_f *intr_block; xtalk_intr_unblock_f *intr_unblock; /* CONFIGURATION MANAGEMENT */ xtalk_provider_startup_f *provider_startup; xtalk_provider_shutdown_f *provider_shutdown; /* ERROR MANAGEMENT */ /* TBD */ } xtalk_provider_t; If there is only one Crosstalk provider on a platform, the generic crosstalk layer directly invokes the implementation-specific interfaces. This may allow inlining to remove the layer completely. If a platform supports more than one Crosstalk provider, the generic crosstalk layer indirects through the structure supplied by the implementation (sort of like a "Crosstalk Provider Object"). 7.1.1 Generic Crosstalk PIO =========================== A Crosstalk driver allocates PIO mapping resources using xtalk_piomap_alloc. It then invokes xtalk_piomap_addr in order to use the allocated resources to map to specific Crosstalk addresses. When it's done accessing the device with PIO's, the driver frees the mapping resources with xtalk_piomap_free. xtalk_piomap_t xtalk_piomap_alloc(vertex_hdl_t dev, /* set up mapping for this device */ device_desc_t dev_desc, /* device descriptor */ iopaddr_t xtalk_addr, /* map for this xtalk_addr range */ ulong byte_count, ulong byte_count_max, /* maximum size of a mapping */ ulong flags); The Crosstalk piomap Allocation interface allocates whatever hardware and software resources are needed in order to be able to perform loads/stores to the specified device/address range. If dev_desc is non-0, it overrides the default PIO descriptor for this device. xtalk_piomap_alloc returns an opaque "crosstalk piomap handle". byte_count_max specifies the largest mapping that will ever be requested (via xtalk_piomap_addr). flag values include those specified in sys/pio.h: PIO_FIXED /* long-term mapping */ PIO_UNFIXED /* mapping needed only briefly */ void xtalk_piomap_free(xtalk_piomap_t xtalk_piomap); The Crosstalk piomap Free interface logically releases all software and hardware resources that were allocated by an earlier xtalk_piomap_alloc. The crosstalk implementation layer may choose to use lazy release and leave the mappings intact until the mapping resources are needed by some other allocation. caddr_t xtalk_piomap_addr( xtalk_piomap_t xtalk_piomap, /* mapping resources */ iopaddr_t xtalk_addr, /* map for this xtalk_addr */ ulong byte_count); /* map this many bytes */ The Crosstalk piomap Addr interface establishes a hardware mapping to the specified Crosstalk address using the mapping resources specified in an earlier call to xtalk_piomap_alloc. xtalk_piomap_addr returns a kernel virtual address. When software accesses this address, the corresponding mapped Crosstalk address is accessed. The address range specified to xtalk_piomap_addr must be contained within the address range specified to xtalk_piomap_alloc. Additionally, byte_count must be no greater than byte_count_max. For all offsets such that offset < byte_count, loads and stores to caddr_t+offset access xtalk_addr+offset. void xtalk_piomap_done( xtalk_piomap_t xtalk_piomap); The Crosstalk piomap Done interface notifies the system that a driver is done using piomap resources specified in an earlier piomap_addr call. The piomap resources are retained for future piomap_addr invocations. [Note: This isn't strictly necessary, but it provides a convenient place to add workarounds, etc., so it's included as a portability interface.] caddr_t xtalk_piotrans_addr(vertex_hdl_t dev, /* set up mapping for this device */ device_desc_t dev_desc, /* device descriptor */ iopaddr_t xtalk_addr, /* Crosstalk address */ ulong byte_count, /* map this many bytes */ ulong flags); The Crosstalk PIO Translate Address interface returns a system virtual address range that maps to a specified Crosstalk address range. If PIO mapping hardware would be required, xtalk_piotrans_addr returns 0. This interface is a performance interface rather than a portable interface. A Crosstalk driver that wishes to be both high-performance and highly-compatible should try to use xtalk_piotrans_addr during set up for PIOs. If this interface returns 0 (error), the driver may then use the compatible method: allocate mapping resources via xtalk_piomap_alloc and establish mappings with xtalk_piomap_addr. 7.1.2 Generic Crosstalk DMA =========================== A Crosstalk driver allocates DMA mapping resources using xtalk_dmamap_alloc. It then invokes xtalk_dmamap_addr or xtalk_dmamap_list in order to use the allocated resources to map to specific memory addresses. After a DMA completes but before making data available to the user that requested this DMA, the Crosstalk driver calls xtalk_dmamap_done. When the driver has entirely finished accessing memory with DMA's, it frees the mapping resources with xtalk_dmamap_free. Usually, though, the driver saves the dma mapping resources for later use with a different mapping. Typically, a driver allocates mapping resources during initialization, and it re-uses these resources for many DMA's to various locations. Drivers which are more performance oriented and which are less concerned with portability to future platforms may use the "translate" operations rather than the dmamap operations. The translate operations work only for devices which *know* that they won't need any mapping resources on the platform they're connected to. Devices and drivers eligible to use translate operations typically: support scatter/gather AND support 64-bit addressing AND are interrupt driven AND understand NUMA issues (if on Lego) AND understand endianness AND understand Guaranteed Bandwidth AND understand caches and flushing requirements AND understand prefetching issues AND are aware of system-level bug workarounds AND are performance-oriented AND are SGI-internal drivers New criteria may be added to this list at any time! That's why the performance interface is not very portable. [TBD: We could add an interface that allows a driver to declare its "level of sophistication" for a platform. The translate operations could fail if this level was too low.] xtalk_dmamap_t xtalk_dmamap_alloc(vertex_hdl_t dev, /* set up mappings for this device */ device_desc_t dev_desc, /* device descriptor */ ulong byte_count_max, /* max size of a mapping */ ulong flags); /* defined in dma.h */ The Crosstalk dmamap Allocation interface allocates whatever hardware and software resources are needed in order to be able to perform DMA's of the desired size to/from the specified device from/to the specified memory range. [TBD: exact semantics of desired size, considering misalignment and list-oriented DMA operations]. If dev_desc is non-0, it overrides the default DMA descriptor for this device. byte_count_max specifies the size of the largest mapping that will ever be requested (via xtalk_dmamap_addr). flags are defined in dma.h, and include information such as DMA_DATA /* for data, not device control blocks */ DMA_DESC /* for device control descriptors */ DMA_ADDR16 /* device handles 16-bit addresses */ DMA_ADDR32 /* device handles 32-bit addresses */ DMA_ADDR64 /* device handles 64-bit addresses */ DMA_BIG_ENDIAN /* device is big-endian */ DMA_LITTLE_ENDIAN /* device is little-endian */ xtalk_dmamap_alloc returns an opaque "crosstalk dmamap handle". void xtalk_dmamap_free(xtalk_dmamap_t dmamap); The Crosstalk dmamap Free interface logically releases all software and hardware resources that were allocated by an earlier xtalk_dmamap_alloc. The crosstalk implementation layer may choose to use lazy release and leave the mappings intact until the mapping resources are needed by some other allocation. iopaddr_t xtalk_dmamap_addr(xtalk_dmamap_t dmamap, /* use these mapping resources */ paddr_t paddr, /* map for this address */ ulong byte_count); /* map this many bytes */ The Crosstalk dmamap Addr interface uses the resources allocated in an earlier xtalk_dmamap_alloc call in order to establish a DMA mapping to the specified physical address range. It returns a Crosstalk address which represents the start of the Crosstalk address range that maps to the specified physical address range. Typically, paddr/byte_count describes a single memory page. The address range specified to xtalk_dmamap_addr must be contained within the address range specified to xtalk_dmamap_alloc, and byte_count must be less than or equal to byte_count_max specified to xtalk_dmamap_alloc. This interface is a portability interface rather than a performance interface. alenlist_t xtalk_dmamap_list(xtalk_dmamap_t dmamap, /* use these mapping resources */ alenlist_t alenlist) /* map this address/length list */ The Crosstalk dmamap List interface uses the resources allocated in an earlier xtalk_dmamap_alloc call in order to establish a DMA mapping to the physical addresses listed in the specified Address/Length List. It returns an Address/Length List where the addresses are in the Crosstalk address space rather than in system physical address space. When possible, the mappings established are sufficient to map the incoming list with a single Address/Length Pair. Upon return, the original List has been free'd. The driver must free the new (returned) list. This interface is a portability interface rather than a performance interface. void xtalk_dmamap_done(xtalk_dmamap_t dmamap) The Crosstalk dmamap Done interface notifies the system that whatever DMA may have been in progress after an earlier xtalk_dmamap_addr or xtalk_dmamap_list call has now been completed. This interface is used at the completion of each DMA before the buffer is made available to other consumers. [Note: This isn't strictly necessary, but it provides a really convenient place to add workarounds, etc., so it's included as a portability interface.] iopaddr_t xtalk_dmatrans_addr(vertex_hdl_t dev, /* translate for this device */ device_desc_t dev_desc, /* device descriptor */ paddr_t paddr, /* system physical address */ ulong byte_count, /* length */ ulong flags); /* defined in dma.h */ The Crosstalk DMA Translate Address interface translates from a system physical address range into a Crosstalk address range. If mapping resources would be required for this operation, xtalk_dmatrans_addr returns 0. This interface is a performance interface rather than a portability interface. alenlist_t xtalk_dmatrans_list(vertex_hdl_t dev, /* translate for this device */ device_desc_t dev_desc, /* device descriptor */ alenlist_t palenlist, /* system address/length list */ ulong flags); /* defined in dma.h */ The Crosstalk DMA Translate interface translates from a list of system physical Address/Length Pairs into a list of Crosstalk Address/Length pairs. If mapping resources would be required in order to map the entire list, this interface returns 0. On return, the original Address/Length List has been freed. The driver must free the new (returned) list. This interface is a performance interface rather than a portability interface. Observe that the DMA interface is very similar to the PIO interface. The DMA interface provides a few extra list-oriented operations that *could* be provided for PIO as well; however, these operations (map_list, trans_list) would only be useful with drivers that want to efficiently support PIO-intensive devices; so for now, we have omitted them. 7.1.3 Generic Crosstalk Interrupts ================================== A Crosstalk driver allocates interrupt resources using xtalk_intr_alloc. It then invokes xtalk_intr_connect in order to associate the allocated resources with a software interrupt handler. When a driver no longer wishes to handle interrupts, it can disconnect the handler with xtalk_intr_disconnect and/or it can free the allocated interrupt resources with xtalk_intr_free. If the Crosstalk driver uses kernel threads as a programming model, it can use xtalk_intr_block and xtalk_intr_unblock to block/unblock specific interrupts. If the Crosstalk driver uses the traditional "spl" model, it blocks interrupts with standard operations provided by platform-dependent, bus-independent code (not shown in this document). xtalk_intr_t xtalk_intr_alloc(vertex_hdl_t dev, /* which crosstalk device */ device_desc_t dev_desc, /* device descriptor */ vertex_hdl_t owner_dev); /* device which owns this intr */ The Crosstalk Interrupt Allocation interface allocates whatever hardware and software resources are needed in order for the specified device to generate interrupts. If dev_desc is non-0, it overrides the default interrupt descriptor for this device. owner_dev is recorded along with the interrupt handle in order to assist debug, etc. Returns opaque "crosstalk interrupt handle". void xtalk_intr_free(xtalk_intr_t intr_hdl); The Crosstalk Interrupt Free interface logically releases all software and hardware resources that were allocated by an earlier xtalk_intr_alloc. The crosstalk implementation layer may choose to use lazy release and leave the mappings intact until the mapping resources are needed by some other allocation. int xtalk_intr_connect(xtalk_intr_t intr_hdl, /* xtalk intr resource handle */ intr_func_t *intr_func, /* xtalk intr handler */ void *intr_arg, /* arg to intr handler */ xtalk_intr_setfunc_f setfunc, /* func to set intr hw */ void *setfunc_arg); /* arg to setfunc */ The Crosstalk Interrupt Connect interface associates a software interrupt handler with hardware interrupt resources. intr_hdl is a crosstalk interrupt handle, returned from an earlier xtalk_intr_alloc, and representing hardware resources. intr_func is a function to call when the interrupt is triggered, and intr_arg is an argument to pass to intr_func. setfunc is a function that can be called at any time in order to retarget an interrupt. setfunc_arg is an opaque pointer-sized value to be interpreted by setfunc. It must be sufficient to determine which registers on which Crosstalk device need to be reprogrammed in order to redirect the interrupt. setfunc and setfunc_arg are used to support interrupt migration in a way that's fairly transparent to the device driver. If setfunc is NULL, then the driver takes responsibility for programming its own hardware to generate interrupts, and it does not allow system software to transparently migrate these interrupts. [TBD: Details of transparent interrupt migration. This interface may be simplified (get rid of setfunc*) as a result.] For unloadable drivers, it is the driver's responsibility to disconnect interrupts before allowing the unload to succeed. void xtalk_intr_disconnect(xtalk_intr_t intr_hdl); The Crosstalk Interrupt Disconnect interface disconnects a software interrupt handler from hardware interrupt resources. The interrupt resources can be re-connected to a different handler, or then can be left unconnected until later. Loadable drivers should disconnect as part of the unload operation, and they should free as part of the unregister operation. vertex_hdl_t xtalk_intr_cpu_get(xtalk_intr_t intr_hdl); The Crosstalk Interrupt CPU Get interface identifies which CPU is currently targeted by the specified interrupt. void xtalk_intr_block(xtalk_intr_t intr_hdl); The Crosstalk Interrupt Block routine prevents the specified interrupt from reaching the CURRENT CPU. This interface is intended for use with kthreads. The calling thread must ensure that it cannot be migrated, since this interface deals with only the CURRENT CPU. The old-fashioned way to block interrupts is with an interface provided by every *platform* (not every *bus*) which blocks an entire "software level", or "spl" at the CPU side. The intent of intr_block is to block only the one specified interrupt, probably at the I/O side. It is permissible for the crosstalk implementation to employ *lazy* interrupt blocking, but only on non-RealTime CPUs. Blocking an interrupt from reaching a CPU when the interrupt is already blocked from reaching that CPU has no effect. void xtalk_intr_unblock(xtalk_intr_t intr_hdl); The Crosstalk Interrupt Unblock routine unblocks an interrupt that was previously blocked via xtalk_intr_block. 7.1.4 Generic Crosstalk Configuration ===================================== void xtalk_provider_startup(vertex_hdl_t xtalk_provider); The Crosstalk Provider Startup interface is called once for every crosstalk provider (e.g. hub, heart) found. It performs whatever initialization is needed for that provider. Typically, this interface calls initialization routines for pio, dma, and interrupts. void xtalk_provider_shutdown(vertex_hdl_t xtalk_provider); The Crosstalk Provider Shutdown interface is called in order to turn off an entire Crosstalk Provider. Devices owned by that provider will no longer be accessible through that provider. 7.1.5 Generic Crosstalk Support Interface ========================================= This section describes some Crosstalk-specific auxiliary interfaces that have a single implementation, independent of the particular implementation of the Crosstalk provider. int xwidget_driver_register(xwidget_partnum_t part_num, xwidget_mfg_num_t mfg_num, char *driver_prefix, unsigned flags); The Crosstalk Widget Initialization Function Add interface allows a crosstalk widget's driver to advertise that it is available to handle a particular crosstalk part. Typically, xwidget_driver_register is called from a crosstalk driver's *_init entry point. It is the driver's responsibility to manage rev numbers for Crosstalk widgets appropriately. The infrastructure does not provide any way to register different drivers for different revisions of a part. void xwidget_unregister(char *driver_prefix) The Crosstalk Widget Initialization Function Remove interface allows a crosstalk driver to tell the system that the specified Crosstalk Widget driver should no longer be used. This is useful for a loadable Crosstalk driver that wishes to unload. int xwidget_init( struct xwidget_hwid_s *hwid, /* widget's hardware ID */ vertex_hdl_t dev, /* widget to initialize */ xwidgetnum_t id, /* widget's target id (0..f) */ vertex_hdl_t master, /* widget's master vertex */ xwidgetnum_t targetid); /* master's target id (0..f) */ The Crosstalk Widget Initialization interface is called from crosstalk *provider* code. It initializes a specified widget using the pre-registered widget driver. xwidget_init also allocates and initializes standard widget information that will be needed later. void xwidget_reset(vertex_hdl_t xwidget); The Crosstalk Widget Reset interface performs a hardware reset on the specified xwidget. xwidget_info_t xwidget_info_get(vertex_hdl_t widget); The Crosstalk Widget Information Get interface obtains a handle to standard widget information for a specified widget. It is called by drivers and providers that need to access standard widget information such as the "crosstalk ID", "crosstalk device type", "state", "master device", and "crosstalk ID of the master device". vertex_hdl_t xwidget_info_dev_get(xwidget_info_t xwidget_info); The Crosstalk Widget Information Device Get interface determines which Crosstalk device is associated with a Crosstalk Widget information handle. (This is just the reverse operation from xwidget_info_get.) xwidgetnum_t xwidget_info_id_get(xwidget_info_t xwidget_info); The Crosstalk Widget Information ID Get interface returns the Crosstalk widget number (a.k.a "target ID") of a specified Crosstalk Widget. int xwidget_info_type_get(xwidget_info_t xwidget_info); The Crosstalk Widget Information Type Get interface returns the Crosstalk Type of a specified Crosstalk Widget. int xwidget_info_state_get(xwidget_info_t xwidget_info); The Crosstalk Widget Information State Get interface indicates what "state" a specified Crosstalk Widget is in. Possible states are specified in sys/iobus.h, and include "INITIALIZING", "ATTACHING", "ERROR", "INACTIVE", etc. [TBD: The states need some work.] vertex_hdl_t xwidget_info_master_get(xwidget_info_t xwidget_info); The Crosstalk Widget Information Master Get interface indicates the "master device" for a specified Crosstalk Widget. Every Crosstalk Widget is assigned a master, which is the Crosstalk provider that handles DMA, PIO, and interrupts for the widget. [For example, in a Lego system with a Crossbow, one of the two hubs connected to the crossbow is selected as a master for each of the widgets hanging off that crossbow.] xwidgetnum_t xwidget_info_masterid_get(xwidget_info_t xwidget_info); The Crosstalk Widget Information MasterID Get interface returns the Crosstalk widget number (a.k.a. "target ID") of the specified widget's master. This information could also be obtained via this code xwidget_info_id_get(xwidget_info_get(xwidget_info_master_get(xwidget_info))) but it is more efficient to use the xwidget_info_masterid_get interface. xwidgetnum_t xtalk_intr_target_get(xtalk_intr_t xtalk_intr); The Crosstalk Interrupt Target Get interface returns the widget number (0..f) associated with a specified interrupt. xtalk_intr_vector_t xtalk_intr_vector_get(xtalk_intr_t xtalk_intr); The Crosstalk Interrupt Vector Get interface returns the Crosstalk interrupt "vector" (0..255) associated with a specified interrupt. iopaddr_t xtalk_intr_addr_get(xtalk_intr_t xtalk_intr); The Crosstalk Interrupt Address Get interface returns the crosstalk address which, when written, generates the specified interrupt. vertex_hdl_t xtalk_intr_cpu_get(xtalk_intr_t xtalk_intr); The Crosstalk Interrupt CPU Get interface returns the handle of the CPU which eventually receives the specified interrupt. (Should this interface exist?) void * xtalk_intr_sfarg_get(xtalk_intr_t xtalk_intr); The Crosstalk Interrupt SetFunc Argument Get interface returns the "setfunc argument" associated with a specified interrupt. The "setfunc arg" is an arbitrary argument that the driver specified in an earlier invocation of xtalk_intr_connect. vertex_hdl_t xtalk_pio_dev_get(xtalk_piomap_t xtalk_piomap); The Crosstalk PIO Device Get interface returns the crosstalk device associated with a given piomap. xwidgetnum_t xtalk_pio_target_get(xtalk_piomap_t xtalk_piomap); The Crosstalk PIO Target Get interface returns the Crosstalk widget number that is used for the specified piomap. This is a widget number associated with the device's Crosstalk provider (master). iopaddr_t xtalk_pio_xtalk_addr_get(xtalk_piomap_t xtalk_piomap); The Crosstalk PIO Xtalk Address Get interface returns the starting Crosstalk address mapped by the specified piomap. ulong xtalk_pio_mapsz_get(xtalk_piomap_t xtalk_piomap); The Crosstalk PIO Map Size Get interface returns the size of the Crosstalk address range mapped by the specified piomap. caddr_t xtalk_pio_kvaddr_get(xtalk_piomap_t xtalk_piomap); The Crosstalk PIO Kernel Virtual Address Get interface returns the starting kernel virtual address used to map to the Crosstalk address range associated with the specified piomap. vertex_hdl_t xtalk_dma_dev_get(xtalk_dmamap_t xtalk_dmamap); The Crosstalk DMA Device Get interface returns the device which requested the specified DMA mapping. xwidgetnum_t xtalk_dma_target_get(xtalk_dmamap_t xtalk_dmamap); The Crosstalk DMA Target Get interface returns the Crosstalk widget number that is used for DMA's that use the specified dmamap. This is a widget number associated with the device's Crosstalk provider (master). void xtalk_init(void); The Crosstalk Init interface is invoked once during startup to initialize software needed to deal with Crosstalk providers and devices. 7.2 Hub as Crosstalk Provider ============================= Lego's Crosstalk provider is a hub. The code which implements Hub-as-CrosstalkProvider is in ml/KLEGO/hubio.c. The interfaces look very much like the generic crosstalk layer, except for a bunch of casts. 7.3 Heart as Crosstalk Provider =============================== SpeedRacer's Crosstalk provider is a heart. The code which implements Heart-as-CrosstalkProvider is TBD. The interfaces will look very much like the generic crosstalk layer, except for a bunch of casts. 7.4 xbow as Crosstalk Provider ============================== Our hope for xbow is to keep it largely invisible as far as bus operations are concerned. We'll simply treat it as an extension of hub/heart -- it's not really a Crosstalk Provider, but just the ASIC that implements a Crosstalk switch. We could have tried to treat xbow as a Crosstalk *device* that happens to also be a Crosstalk *provider*. Thus, the xbow implementation layer would provide all the interfaces (above) expected from a xtalk provider, and it would also make generic xtalk calls which would be handed off to hub or heart. 8.0 Bus-Independent Interfaces ============================== In addition to the many existing bus-independent interfaces for use by drivers, these new interfaces are also available. Additions to manage device descriptors: device_desc_t device_desc_dup(vertex_hdl_t dev); void device_desc_free(device_desc_t device_desc); device_desc_t device_desc_default_get(vertex_hdl_t dev); void device_desc_default_set(vertex_hdl_t dev, device_desc_t device_desc); Accessor interfaces for device descriptors: vertex_hdl_t device_desc_intr_target_get(device_desc_t device_desc); int device_desc_intr_policy_get(device_desc_t device_desc); ilvl_t device_desc_intr_swlevel_get(device_desc_t device_desc); char * device_desc_intr_name_get(device_desc_t device_desc); int device_desc_flags_get(device_desc_t device_desc); void device_desc_intr_target_set(device_desc_t device_desc, vertex_hdl_t target); void device_desc_intr_policy_set(device_desc_t device_desc, int policy); void device_desc_intr_swlevel_set(device_desc_t device_desc, ilvl_t swlevel); void device_desc_intr_name_set(device_desc_t device_desc, char *name); void device_desc_flag_set(device_desc_t device_desc, int flag); void device_desc_flag_clear(device_desc_t device_desc, int flag); Additions to access edt fields (for I/O buses interfaces that require edt): void * edt_bus_info_get(edt_t *edt); vertex_hdl_t edt_connectpt_get(edt_t *edt); vertex_hdl_t edt_master_get(edt_t *edt); device_desc_t edt_device_desc_get(edt_t *edt); [Note that the edt* interfaces are really only useful on buses like VME where the *drivers* as opposed to the bus-dependent code must probe for devices. On IObuses like Crosstalk and PCI, the bus code manages the hardware graph and device descriptors and it calls the driver's initialization function.] Interfaces to help manage device topology: vertex_hdl_t device_master_get(vertex_hdl_t vhdl); void device_master_set(vertex_hdl_t vhdl, vertex_hdl_t master); These interfaces get and set the "master" for a specified device. (The master for crosstalk widgets is a crosstalk provider.) cnodeid_t master_node_get(vertex_hdl_t vhdl); This interface returns the compact node ID of the node which "owns" the specified vertex. It determines the owner by following "master" edges in the hwgraph until it reaches a node controller. If it cannot determine a "master node", this interface returns CNODEID_NONE. Generic operations are provided (TBD) to convert from any of these a kernel or user virtual address/length or a buf structure or a virtual Address/Length List or a physical Address/Length List or a page list into an alenlist_t. Versions of userdma and useracc are provided (TBD) which prepare a specified buffer for DMA and return a alenlist_t that describes the prepared memory. Error handling interfaces need to be defined. 9.0 Comparison with Old Irix ============================ This new interface expects drivers to use interfaces defined by the bus type of the devices they control. Old Irix attempted to squish all bus types into a single set of interfaces. This new interface requires a driver allocating resources to specify which device it controls rather than which adapter that device is connected to. The decision of which adapter to use is left to the system. The hwgraph allows the system to efficiently determine which adapter(s) connect to a device based on the vertex_hdl_t. This new interface manages policy information from administrative files in a fairly transparent manner. Policy information is extracted from files and associated with a vertex_hdl_t. Old Irix used to embed policy information in the driver itself. This new interface achieves all of the goals outlined earlier in chapter 2.0. It is our intention to provide a compatibility layer for older cruftier drivers that use on the Old Irix pio* and dma* interfaces. This layer will work for existing bus types, but it will *not* be extended to work with PCI, Crosstalk, or other new bus types. Even in Old Irix, this layer provided no notable value. Authors of new drivers will be encouraged to use the new interfaces like the one described here for Crosstalk. 10.0 Future Directions ====================== [Note: None of the things in this chapter are ready for implementation. It is presented merely to show a long-term direction.] The fact that PIO and DMA interfaces look so remarkably similar begs some obvious questions: Why have separate interfaces for PIO and DMA? What's the essential difference between PIO and DMA? PIO's typically use partial reads/writes whereas DMA typically uses full cache line reads/writes. PIO's typically are initiated by a CPU and directed at a device whereas DMA typically is initiated by a device and directed at a memory. In an environment where crosstalk devices send an interrupt by initiating a partial write to a hub/heart; and in an environment where *CPU*'s initiate DMA through the use of write gatherers and Block Transfer Engines; and in an environment with Peer-to-Peer support in which one device DMA's directly to another device without going through memory, the distinction between PIO and DMA and even the distinction between kinds of hardware components is blurred. It makes sense to consider a single unifying interface that handles a more generalized "mapping" from one "Address Space" to another. Examples of things that use and/or provide one or more Address Spaces include: processes threads kernels dpnodes cpus nodes physical memories I/O devices (VME, PCI, crosstalk, etc.), especially block devs Block Transfer Engines files In general, anything capable of performing a read/write/cache operation performs these operations into some address space. Anything capable of being read/written at an offset provides an address space. Address space mappings are accomplished with various Mapping Resources, which include: direct translations mapping RAMs (e.g. for DMA) mapping tables (e.g. for PIO) TLBs pure software data structures Let's define an addr_t to be "an address in some address space". We'll say that by definition, an addr_t type is large enough to hold any address in any Address Space. Let's also expand an alenlist_t so that it now designates *which* address space is mapped. (Somewhat arbitrarily, we've restricted an alenlist_t to specify a list of address/length pairs all from ONE address space. We could have turned this into an address/length/space triplet.] Finally, let's use an aspc_t to represent an Address Space defined by anything that provides an address space (see list above). Software that manages the providers of Address Spaces must then provide the following operations on an aspc_t *rather than* the DMA and PIO interfaces described above. (TBD: This interface is only a rough approximation.) aspcmap_t aspcmap_alloc( aspc_t aspc_src, /* source Address Space */ aspc_t aspc_targ, /* target Address Space */ addr_t address, /* target address range */ ulong byte_count, ulong byte_count_max, /* maximum #bytes in a map */ asmap_desc_t asmap_desc, /* details about mapping */ ulong flags) Allocate whatever pre-allocatable mapping resources are required in order to map up to byte_count_max bytes from the source Address Space to the target Address Space. asmap_desc specifies details about the mapping, such as: cached or non-cached partial lines or full line transfers bandwidth allocation information endianness usage: data or descriptor width of transfers (8, 16, 32, 64-bit) aspcmap_alloc return an opaque handle that describes a set of Mapping Resources. It is left to the implementation (TBD: much work) to determine an appropriate "path" from source to target through intermediate Address Spaces. This path is associated with the aspcmap_t that is returned. If a mapping from src to targ is not supported return 0. void aspcmap_free(aspcmap_t aspcmap) /* map resources to free */ Frees the specified Mapping Resources. addr_t aspcmap_addr(aspcmap_t aspcmap_hdl, /* map resources to use */ addr_t addr_targ, /* target address */ ulong byte_count, /* byte count to map */ ulong flags) Use the specified Mapping Resources to map to the specified target address. Return an address in the source Address Space, which accesses the specified target. Note that the target Address Space was specified earlier when the Mapping Resources were allocated; so there's no need to re-specify it. alenlist_t aspcmap_list(aspcmap_t aspcmap_hdl, /* map resources to use */ alenlist_t alenlist_targ) /* List to map */ Same as aspcmap_addr, but for use with a *List* of Address/Length pairs. addr_t aspctrans_addr( aspc_t aspc_src, /* source Address Space */ aspc_t aspc_targ, /* target Address Space */ addr_t addr_targ, /* target address */ ulong byte_count, /* #bytes in address range */ ulong flags) Given a target address range in some address space, provide an address in the specified source address range which maps to the target without the need for any pre-allocated mapping resources. If pre-allocated mapping resources would be required, return 0. alenlist_t aspctrans_list( aspc_t aspc_src, /* source Address Space */ aspc_t aspc_targ, /* target Address Space */ alenlist_t alenlist_targ, /* List to map */ ulong flags) Similar to aspctrans_addr, but for use with a *List* or Address/Length pairs. By using the aspc* interface, software can establish a mapping from any Address Space to any other Address Space through whatever intermediate hardware is needed. This allows, for instance, a mapping to be set up between two devices for peer-to-peer transfers. These aspc* interfaces can establish a mapping from a "Kernel Virtual Address Space" to a particular device's Address Space -- this is "PIO". They can also establish a mapping from a device's Address Space to the "Physical Memory Address Space" in order to handle "DMA". In fact, it should be easily possible to layer the pio* and dma* interfaces described above on top of the aspc* interface. Note that on NUMA systems, every memory provides its own Physical Address Space; but, there is also a collective Physical Memory Address Space for the entire system which comprises the individual memory spaces. The same approach applies to CPU Address Spaces -- we may very well permit some amount of per-CPU Kernel Virtual Address Space as well as some amount of Global Kernel Virtual Address Space. Product release timing is makes this large of a change impractical at this time; so we do not plan to implement any of the aspc* interfaces. We should consider changes like these at the next opportunity; especially in conjunction with peer-to-peer support. -- Leo Dagum SGI Mountain View, CA 94043 (650-933-2179) From owner-linux-origin@oss.sgi.com Mon Jan 31 11:56:17 2000 Received: by oss.sgi.com id ; Mon, 31 Jan 2000 11:56:07 -0800 Received: from deliverator.sgi.com ([204.94.214.10]:39801 "EHLO deliverator.sgi.com") by oss.sgi.com with ESMTP id ; Mon, 31 Jan 2000 11:56:00 -0800 Received: from google.engr.sgi.com (google.engr.sgi.com [163.154.10.145]) by deliverator.sgi.com (980309.SGI.8.8.8-aspam-6.2/980310.SGI-aspam) via ESMTP id LAA26392 for ; Mon, 31 Jan 2000 11:54:28 -0800 (PST) mail_from (kanoj@google.engr.sgi.com) Received: (from kanoj@localhost) by google.engr.sgi.com (980427.SGI.8.8.8/970903.SGI.AUTOCF) id LAA56743 for linux-origin@oss.sgi.com; Mon, 31 Jan 2000 11:57:39 -0800 (PST) From: kanoj@google.engr.sgi.com (Kanoj Sarcar) Message-Id: <200001311957.LAA56743@google.engr.sgi.com> Subject: CVS Update@oss.sgi.com: linux (fwd) To: linux-origin@oss.sgi.com Date: Mon, 31 Jan 2000 11:57:39 -0800 (PST) X-Mailer: ELM [version 2.5 PL2] MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-linux-origin@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;linux-origin-outgoing Forwarded message: > From owner-linux-cvs@oss.sgi.com Mon Jan 31 11:51:38 2000 > From: Kanoj Sarcar > To: linux-cvs@oss.sgi.com > Subject: CVS Update@oss.sgi.com: linux > Message-Id: <20000131194812Z305160-9822+102@oss.sgi.com> > Date: Mon, 31 Jan 2000 11:48:12 -0800 > X-Orcpt: rfc822;linux-cvs > Sender: owner-linux-cvs@oss.sgi.com > Precedence: bulk > > CVSROOT: /home/pub/cvs > Module name: linux > Changes by: kanoj@oss.sgi.com 00/01/31 11:48:11 > > Modified files: > arch/mips64/sgi-ip27: ip27-irq.c > > Log message: > Add in bridge byte swapping for the Qlogic scsi controller. Also need > to do something about RRB and WBs. > Ralf, Please verify that with the above change, your nfsroot setup still works. Just fyi: with CONFIG_IP_PNP_BOOTP, my disk boot shows much lesser syscalls, and panics much earlier compared to a disk boot with a kernel that does not have CONFIG_IP_PNP_BOOTP. Kanoj From owner-linux-origin@oss.sgi.com Mon Jan 31 15:27:49 2000 Received: by oss.sgi.com id ; Mon, 31 Jan 2000 15:27:39 -0800 Received: from mailhost.uni-koblenz.de ([141.26.64.1]:30356 "EHLO mailhost.uni-koblenz.de") by oss.sgi.com with ESMTP id ; Mon, 31 Jan 2000 15:27:35 -0800 Received: from cacc-30.uni-koblenz.de (cacc-30.uni-koblenz.de [141.26.131.30]) by mailhost.uni-koblenz.de (8.9.3/8.9.3) with ESMTP id AAA27622 for ; Tue, 1 Feb 2000 00:30:27 +0100 (MET) Received: by lappi.waldorf-gmbh.de id ; Tue, 1 Feb 2000 00:28:57 +0100 Date: Tue, 1 Feb 2000 00:28:57 +0100 From: Ralf Baechle To: linux-origin@oss.sgi.com Subject: Re: CVS Update@oss.sgi.com: linux Message-ID: <20000201002857.C15466@uni-koblenz.de> References: <20000131213410Z305160-9824+201@oss.sgi.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 1.0pre3us In-Reply-To: <20000131213410Z305160-9824+201@oss.sgi.com> X-Accept-Language: de,en,fr Sender: owner-linux-origin@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;linux-origin-outgoing On Mon, Jan 31, 2000 at 01:34:07PM -0800, Kanoj Sarcar wrote: > Log message: > Check in the bus_to_virt/virt_to_bus changes to get the Qlogic driver > working. Ralf to let me know if this will work for now, or he wants > me to implement one of the other two alternatives we discussed. The solution which you checked into CVS is probably the least evil solution for now, so let's stick with that. Ralf From owner-linux-origin@oss.sgi.com Mon Jan 31 16:56:01 2000 Received: by oss.sgi.com id ; Mon, 31 Jan 2000 16:55:51 -0800 Received: from deliverator.sgi.com ([204.94.214.10]:22088 "EHLO deliverator.sgi.com") by oss.sgi.com with ESMTP id ; Mon, 31 Jan 2000 16:55:32 -0800 Received: from google.engr.sgi.com (google.engr.sgi.com [163.154.10.145]) by deliverator.sgi.com (980309.SGI.8.8.8-aspam-6.2/980310.SGI-aspam) via ESMTP id QAA00651 for ; Mon, 31 Jan 2000 16:54:01 -0800 (PST) mail_from (kanoj@google.engr.sgi.com) Received: (from kanoj@localhost) by google.engr.sgi.com (980427.SGI.8.8.8/970903.SGI.AUTOCF) id QAA88730 for linux-origin@oss.sgi.com; Mon, 31 Jan 2000 16:57:11 -0800 (PST) From: kanoj@google.engr.sgi.com (Kanoj Sarcar) Message-Id: <200002010057.QAA88730@google.engr.sgi.com> Subject: state of disk boots To: linux-origin@oss.sgi.com Date: Mon, 31 Jan 2000 16:57:11 -0800 (PST) X-Mailer: ELM [version 2.5 PL2] MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-linux-origin@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;linux-origin-outgoing It should now be possible to use the default config file, and come up on a root disk (specifying root=/dev/sdXX at the prom, letting the bootp stuff time out) to see some system calls being generated by init. Kanoj