DIRECT MEMORY ACCESS
'Direct memory access' ('DMA') is a feature of modern computers that allows certain hardware subsystems within the computer to access system memory for reading and/or writing independently of the central processing unit. Many hardware systems use DMA including disk drive controllers, graphics cards, network cards, and sound cards. Computers that have DMA channels can transfer data to and from devices with much less CPU overhead than computers without a DMA channel.
Without DMA, using programmed input/output (PIO) mode, the CPU typically has to be occupied for the entire time it's performing a transfer. With DMA, the CPU would initiate the transfer, do other operations while the transfer is in progress, and receive an interrupt from the DMA controller once the operation has been done. This is especially useful in real-time computing applications where not stalling behind concurrent operations is critical.
DMA is an essential feature of all modern computers, as it allows devices to transfer data without subjecting the CPU to a heavy overhead. Otherwise, the CPU would have to copy each piece of data from the source to the destination. This is typically slower than copying normal blocks of memory since access to I/O devices over a peripheral bus is generally slower than normal system RAM. During this time the CPU would be unavailable for any other tasks involving CPU bus access, although it could continue doing any work which did not require bus access.
A DMA transfer essentially copies a block of memory from one device to another. While the CPU initiates the transfer, it does not execute it. For so-called "third party" DMA, as is normally used with the ISA bus, the transfer is performed by a DMA controller which is typically part of the motherboard chipset. More advanced bus designs such as PCI typically use bus mastering DMA, where the device takes control of the bus and performs the transfer itself.
A typical usage of DMA is copying a block of memory from system RAM to or from a buffer on the device. Such an operation does not stall the processor, which as a result can be scheduled to perform other tasks. DMA transfers are essential to high performance embedded systems. It is also essential in providing so-called zero-copy implementations of peripheral device drivers as well as functionalities such as network packet routing, audio playback and streaming video.
DMA can lead to cache coherency problems. Imagine a CPU equipped with a cache and an external memory, which can be accessed directly by devices using DMA. When the CPU accesses location X in the memory, the current value will be stored in the cache. Subsequent operations on X will update its cached copy. If the cache is not flushed to the memory before the next time a device tries to access X, the device will receive a stale value of X.
Similarly, if the cached copy of X is not invalidated when a device writes a new value to the memory, then the CPU will operate on a stale value of X.
In addition to hardware interaction, DMA can also be used to offload expensive memory operations, such as large copies or scatter-gather operations, from the CPU to a dedicated DMA engine. While normal memory copies are typically too small to be worthwhile to offload on today's desktop computers, they are frequently offloaded on embedded devices due to more limited resources.[1]
Newer Intel Xeon processors also include a DMA engine technology called I/OAT, meant to improve network performance on high-throughput network interfaces, in particular gigabit Ethernet and faster.[2] However, various benchmarks with this approach by Intel's Linux kernel developer Andrew Grover indicate no more than 10% improvement in CPU utilization with receiving workloads, and no improvement when transmitting data.[3]
Reconfigurable DMA circuits, for instance, based on GAG Generic Address Generators, provide the enabling technology of Auto-sequencing memory, programmable by Flowware to generate the data streams for running system architectures based on the Anti machine paradigm, which could be called a DMA engine.
For example, a PC's ISA DMA controller has 16 DMA channels of which 7 are available for use by the PC's CPU. Each DMA channel has associated with it a 16-bit address register and a 16-bit count register. To initiate a data transfer the device driver sets up the DMA channel's address and count registers together with the direction of the data transfer, read or write. It then instructs the DMA hardware to begin the transfer. When the transfer is complete, the device interrupts the CPU.
"Scatter-gather" DMA allows the transfer of data to and from multiple memory areas in a single DMA transaction. It is equivalent to the chaining together of multiple simple DMA requests. Again, the motivation is to off-load multiple input/output interrupt and data copy tasks from the CPU.
DRQ stands for DMA request; DACK for DMA acknowledge. These symbols are generally seen on hardware schematics of computer systems with DMA functionality. They represent electronic signaling lines between the CPU and DMA controller.
As mentioned above, a PCI architecture has no central DMA controller, unlike ISA. Instead, any PCI component can request control of the bus ("become the bus master") and request to read and write from the system memory. More precisely, a PCI component requests bus ownership from the PCI bus controller (usually the southbridge in a modern PC design), which will arbitrate if several devices request bus ownership simultaneously, since there can only be one bus master at one time. When the component is granted ownership, it will issue normal read and write commands on the PCI bus, which will be claimed by the bus controller and forwarded to the memory controller using a scheme which is specific to every chipset.
As an example, on a modern AMD Socket AM2-based PC, the southbridge will forward the transactions to the northbridge (which is integrated on the CPU die) using HyperTransport, which will in turn convert them to DDR2 operations and send them out on the DDR2 memory bus. As can be seen, there are quite a number of steps involved in a PCI DMA transfer; however, since the components outside the PCI bus are faster than the PCI bus itself by almost an order of magnitude or more (see List of device bandwidths), that poses little problem.
In System-on-Chips and Embedded Systems, typical system bus infrastructure is on-chip bus such as AMBA AHB. High performance device/hardware sitting on this bus will need to access huge amount of data block to/from shared system memory on the same bus. AMBA defines two kinds of AHB component: master and slave. Slave interface is similar to Programmed I/O through which the software (running on embedded CPU, e.g. ARM) can write/read I/O registers or (less commonly) local memory block inside the device. Master interface can be used by the device to perform DMA to/from system memory without software intervention. Therefore high bandwidth device such as network controller will have two interface adapters to the AHB bus: master and slave interface. An DMA engine with configurable (during hardware compile time) number of channels is usually present in the device to perform scatter/gather operation.
★ Remote Direct Memory Access
★ Blitter
★ AT Attachment
★ mmap() and DMA, from ''Linux Device Drivers, 2nd Edition'', Alessandro Rubini & Jonathan Corbet
★ Memory Mapping and DMA, from ''Linux Device Drivers, 3rd Edition'', Jonathan Corbet, Alessandro Rubini, Greg Kroah-Hartman
★ DMA and Interrupt Handling
★ DMA Modes & Bus Mastering
1. Memory copies in hardware, , Jack, Ganssle, Embedded Systems Programming,
2. Memory copies in hardware, , Jonathan, Corbet, LWN.net,
3. I/OAT on LinuxNet wiki
Without DMA, using programmed input/output (PIO) mode, the CPU typically has to be occupied for the entire time it's performing a transfer. With DMA, the CPU would initiate the transfer, do other operations while the transfer is in progress, and receive an interrupt from the DMA controller once the operation has been done. This is especially useful in real-time computing applications where not stalling behind concurrent operations is critical.
| Contents |
| Principle |
| Cache coherency problem |
| DMA engines |
| Examples |
| ISA |
| PCI |
| AHB |
| See also |
| References |
Principle
DMA is an essential feature of all modern computers, as it allows devices to transfer data without subjecting the CPU to a heavy overhead. Otherwise, the CPU would have to copy each piece of data from the source to the destination. This is typically slower than copying normal blocks of memory since access to I/O devices over a peripheral bus is generally slower than normal system RAM. During this time the CPU would be unavailable for any other tasks involving CPU bus access, although it could continue doing any work which did not require bus access.
A DMA transfer essentially copies a block of memory from one device to another. While the CPU initiates the transfer, it does not execute it. For so-called "third party" DMA, as is normally used with the ISA bus, the transfer is performed by a DMA controller which is typically part of the motherboard chipset. More advanced bus designs such as PCI typically use bus mastering DMA, where the device takes control of the bus and performs the transfer itself.
A typical usage of DMA is copying a block of memory from system RAM to or from a buffer on the device. Such an operation does not stall the processor, which as a result can be scheduled to perform other tasks. DMA transfers are essential to high performance embedded systems. It is also essential in providing so-called zero-copy implementations of peripheral device drivers as well as functionalities such as network packet routing, audio playback and streaming video.
Cache coherency problem
DMA can lead to cache coherency problems. Imagine a CPU equipped with a cache and an external memory, which can be accessed directly by devices using DMA. When the CPU accesses location X in the memory, the current value will be stored in the cache. Subsequent operations on X will update its cached copy. If the cache is not flushed to the memory before the next time a device tries to access X, the device will receive a stale value of X.
Similarly, if the cached copy of X is not invalidated when a device writes a new value to the memory, then the CPU will operate on a stale value of X.
DMA engines
In addition to hardware interaction, DMA can also be used to offload expensive memory operations, such as large copies or scatter-gather operations, from the CPU to a dedicated DMA engine. While normal memory copies are typically too small to be worthwhile to offload on today's desktop computers, they are frequently offloaded on embedded devices due to more limited resources.[1]
Newer Intel Xeon processors also include a DMA engine technology called I/OAT, meant to improve network performance on high-throughput network interfaces, in particular gigabit Ethernet and faster.[2] However, various benchmarks with this approach by Intel's Linux kernel developer Andrew Grover indicate no more than 10% improvement in CPU utilization with receiving workloads, and no improvement when transmitting data.[3]
Reconfigurable DMA circuits, for instance, based on GAG Generic Address Generators, provide the enabling technology of Auto-sequencing memory, programmable by Flowware to generate the data streams for running system architectures based on the Anti machine paradigm, which could be called a DMA engine.
Examples
ISA
For example, a PC's ISA DMA controller has 16 DMA channels of which 7 are available for use by the PC's CPU. Each DMA channel has associated with it a 16-bit address register and a 16-bit count register. To initiate a data transfer the device driver sets up the DMA channel's address and count registers together with the direction of the data transfer, read or write. It then instructs the DMA hardware to begin the transfer. When the transfer is complete, the device interrupts the CPU.
"Scatter-gather" DMA allows the transfer of data to and from multiple memory areas in a single DMA transaction. It is equivalent to the chaining together of multiple simple DMA requests. Again, the motivation is to off-load multiple input/output interrupt and data copy tasks from the CPU.
DRQ stands for DMA request; DACK for DMA acknowledge. These symbols are generally seen on hardware schematics of computer systems with DMA functionality. They represent electronic signaling lines between the CPU and DMA controller.
PCI
As mentioned above, a PCI architecture has no central DMA controller, unlike ISA. Instead, any PCI component can request control of the bus ("become the bus master") and request to read and write from the system memory. More precisely, a PCI component requests bus ownership from the PCI bus controller (usually the southbridge in a modern PC design), which will arbitrate if several devices request bus ownership simultaneously, since there can only be one bus master at one time. When the component is granted ownership, it will issue normal read and write commands on the PCI bus, which will be claimed by the bus controller and forwarded to the memory controller using a scheme which is specific to every chipset.
As an example, on a modern AMD Socket AM2-based PC, the southbridge will forward the transactions to the northbridge (which is integrated on the CPU die) using HyperTransport, which will in turn convert them to DDR2 operations and send them out on the DDR2 memory bus. As can be seen, there are quite a number of steps involved in a PCI DMA transfer; however, since the components outside the PCI bus are faster than the PCI bus itself by almost an order of magnitude or more (see List of device bandwidths), that poses little problem.
AHB
In System-on-Chips and Embedded Systems, typical system bus infrastructure is on-chip bus such as AMBA AHB. High performance device/hardware sitting on this bus will need to access huge amount of data block to/from shared system memory on the same bus. AMBA defines two kinds of AHB component: master and slave. Slave interface is similar to Programmed I/O through which the software (running on embedded CPU, e.g. ARM) can write/read I/O registers or (less commonly) local memory block inside the device. Master interface can be used by the device to perform DMA to/from system memory without software intervention. Therefore high bandwidth device such as network controller will have two interface adapters to the AHB bus: master and slave interface. An DMA engine with configurable (during hardware compile time) number of channels is usually present in the device to perform scatter/gather operation.
See also
★ Remote Direct Memory Access
★ Blitter
★ AT Attachment
References
★ mmap() and DMA, from ''Linux Device Drivers, 2nd Edition'', Alessandro Rubini & Jonathan Corbet
★ Memory Mapping and DMA, from ''Linux Device Drivers, 3rd Edition'', Jonathan Corbet, Alessandro Rubini, Greg Kroah-Hartman
★ DMA and Interrupt Handling
★ DMA Modes & Bus Mastering
1. Memory copies in hardware, , Jack, Ganssle, Embedded Systems Programming,
2. Memory copies in hardware, , Jonathan, Corbet, LWN.net,
3. I/OAT on LinuxNet wiki
This article provided by Wikipedia. To edit the contents of this article, click here for original source.
psst.. try this: add to faves

العربية
中国
Français
Deutsch
Ελληνική
हिन्दी
Italiano
日本語
Português
Русский
Español