Commit 106ffba7 authored by Gilles Chanteperdrix's avatar Gilles Chanteperdrix Committed by Philippe Gerum
Browse files

rtnet: import

parent a9db34d4
This list was created when porting the pcnet32 driver to RTnet and was
extended and revised afterwards. It is absolutely unsorted. Some points may
not apply to every driver, some may have to be added for others. It is
recommended to take a look at pcnet32-rt.c or other existing drivers if some
steps remain unclear.
IMPORTANT: Check if the critical paths of the driver (xmit function, interrupt
handler) are free of any unbounded or unacceptable long delays, e.g. caused by
waiting on hardware events.
1. Add to beginning of file (also add a #define for MAX_UNITS if it is missing
so far):
#include <rtnet_port.h>
static int cards[MAX_UNITS] = { [0 ... (MAX_UNITS-1)] = 1 };
compat_module_int_param_array(cards, MAX_UNITS);
MODULE_PARM_DESC(cards, "array of cards to be supported (e.g. 1,0,1)");
2. disable any copybreak mechanism (rtskbs are all equally sized)
3. add the following fields to private data:
struct rtskb_queue skb_pool;
rtdm_irq_t irq_handle;
4. initialize skb pool in probe or init function:
if (rtskb_pool_init(&<priv>->skb_pool, RX_RING_SIZE*2) < RX_RING_SIZE*2) {
return -ENOMEM;
5. free skb pool in cleanup function
6. replace unregister_netdev with rt_unregister_rtnetdev
7. call rt_rtdev_disconnect in cleanup function (and on error cleanups!)
8. cleanup device structure with rtdev_free
9. replace netif_stop_queue with rtnetif_stop_queue
10. add to the close function replacing the free_irq call:
if ( (i=rtdm_irq_free(&<priv>->irq_handle))<0 )
return i;
11. replace struct sk_buff with struct rtskb
12. replace skb_XXX calls with rtskb_XXX
13. replace eth_type_trans with rt_eth_type_trans
14. replace netif_rx with rtnetif_rx
15. replace struct net_device with struct rtnet_device
16. replace netif_start_queue with rtnetif_start_queue
17. revise the xmit routine
17.1. add new locking scheme replacing any standard spin lock calls:
rtdm_lockctx_t context;
rtdm_lock_get_irqsave(&<priv>->lock, context);
rtdm_lock_put_irqrestore(&<priv>->lock, context);
/* ONLY IN EXCEPTIONAL CASES, e.g. if the operation can take more than a
* few ten microseconds: */
/* Note that the latter scheme does not work if the IRQ line is shared
* with other devices. Also, rtdm_irq_disable/enable can be costly
* themselves on certain architectures. */
17.2. add the following code right before the code which triggers the physical
transmission (take care if data has to be transfered manually, i.e.
without DMA):
/* get and patch time stamp just before the transmission */
if (skb->xmit_stamp)
*skb->xmit_stamp = cpu_to_be64(rtdm_clock_read() + *skb->xmit_stamp);
17.3. make the code above and the transmission triggering atomical by switching
off all interrupts:
rtdm_lockctx_t context;
<patch time stamp>
<trigger transmission>
/* or combined with the spinlock: */
rtdm_lock_irqsave(&<priv>->lock, context);
<prepare transmission>
<patch time stamp>
<trigger transmission>
rtdm_lock_irqrestore(&<priv>->lock, context);
NOTE: Some hardware may require the driver to calculate the frame
checksum, thus making a patching of the frame effectively impossible. In
this case use the following strategy: switch off the interrupts only if
there is actually a time stamp to patch. Normally, frames using this
feature are rather short and will not cause long irq locks. Take a look
at 8139too-rt or via-rhine-rt to find some examples.
18. modify interrupt handler:
static int XXX_interrupt(rtdm_irq_t *irq_handle)
struct rtnet_device *dev = rtdm_irq_get_arg(irq_handle, struct rtnet_device);
Also adapt the prototype of the interrupt handler accordingly if provided.
19. replace spin_lock/spin_unlock with rtdm_lock_get/rtdm_lock_put within the
interrupt handler
20. replace printk in xmit function, interrupt handler, and any function called
within this context with rtdm_printk. Where avoidable, disable output in
critical functions (i.e. when interrupts are off) completely.
21. replace dev_kfree_skb[_XXX] with dev_kfree_rtskb
22. replace alloc_etherdev with the following lines:
dev = rt_alloc_etherdev(sizeof(struct XXX_private) /* or 0 */);
if (dev == NULL)
return -ENOMEM;
rtdev_alloc_name(dev, "rteth%d");
rt_rtdev_connect(dev, &RTDEV_manager);
dev->vers = RTDEV_VERS_2_0;
23. replace request_irq in open function with the following lines:
rt_stack_connect(dev, &STACK_manager);
retval = rtdm_irq_request(&<priv>->irq_handle, dev->irq, XXX_interrupt,
RTDM_IRQTYPE_SHARED, NULL /* or driver name */, dev);
if (retval)
return retval;
24. replace netif_queue_stopped with rtnetif_queue_stopped
25. replace netif_wake_queue with rtnetif_wake_queue
26. add to the beginning of the probe or card-init function:
static int cards_found = -1;
if (cards[cards_found] == 0)
return -ENODEV;
27. call rtdm_clock_read within receive interrupt and set time_stamp field of skb accordingly
28. initialize new unsigned int old_packet_cnt with <priv>->stats.rx_packets at
the beginning of the interrupt handler
29. add to the end of the interrupt handler:
rtdm_lock_put(&<priv>->lock); /* if locking is not done in interrupt main function */
if (old_packet_cnt != <priv>->stats.rx_packets)
30. disable any timer setup and delete calls
31. uncomment not required(!) MII related assignments and functions
32. uncomment any other unused functions
33. replace register_netdev with rt_register_rtnetdev
34. replace netif_carrier_{on|off} with rtnetif_carrier_{on|off}
35. replace dev_alloc_skb(size) with dev_alloc_rtskb(size, &<priv>->skb_pool)
36. reduce RX_RING_SIZE to 8
and check if they are used appropriately
38. rename type of lock field in private data from spinlock_t to rtdm_lock_t
39. replace spin_lock_init(&<priv>->lock) with rtdm_lock_init(&<priv>->lock)
40. rtskb structure does not contain a data_len field => set any occurrence to zero
41. return from interrupt handler only by providing RTDM_IRQ_HANDLED or RTDM_IRQ_NONE as
return values, depending if the IRQ was handled or not
42. fill rtdev field in every received rtskb object properly
skb->rtdev = rtdev
XX. check the critical paths in xmit function and interrupt handler for delays
or hardware wait loops, disable or avoid them
HOWTO for using RTnet over FireWire (ETH1394)
To use RTnet over FireWire, one needs another package, i.e. RT-FireWire, which
can be checked out via "svn checkout svn://".
RT-FireWire package is developed by RT-FireWire project team, see the project
homepage for more interesting information (
It is recommended to compile and test the RT-FireWire package first.
RT-FireWire only compiles with fusion. At the time of writing, it is the CVS
version of fusion which will become release 0.9. Use --with-rtai=XXX to
specify the installation location of fusion in your system.
To compile RTnet's Eth1394 driver with RT-FireWire, one needs to do 2 things
in configuration:
1. add --with-rtfw=XXX to specify the source location of RT-FireWire
2. add --enable-eth1394 to enable the compiling of eth1394
Of course, don't forget --with-rtai=XXX for RTnet.
RT-FireWire comes with some basic testing tool, one of which is similiar to
"rtping" on Ethernet. See the Readme of RT-FireWire for how to play around
with basic FireWire testing.
Currently, Eth1394 appears exactly the same as normal Ethernet device. So from
the application point of view, no medium difference can be seen, which means
application on Ethernet can be directly moved to FireWire without any porting
So, play around with your new medium i.e. FireWire, with exactly the same tool
on Ethernet-:).
Modification to RFC2734
Each IP-capable node must have it own unique hardware address in the network.
The original IPover1394 spec (RFC2734) employs the 64-bit GUID of each
FireWire adapter chip as the hardware address. That way, the hardware address
can be guaranteed to be unique even in the world scale, but the address
resolution process is not efficient, see below:
ARP Eth1394 internal
resolution resolution
48-bit MAC 16-bit
IP address -----------> (64-bit GUID) ---------------> FireWire nodeid
The modified ARP on IPover1394 directly use the FireWire node id as hardware
address for each Eth1394 nodes. That way, the mapping between IP address and
hardware address (FireWire node id) only needs one time of resolution, which
is more efficient than the original one. Note that here we assume that we use
static allocation of 1394 address space to IPover1394, i.e. on each node, the
address space for Eth1394 would be exactly the same, see "eth1394.h". So, the
16 bits would be enough to represent the hardware address. Now the address
resolution process is more efficient, as below:
ARP resolution
48-bit IP address ---------------> MAC (FireWire nodeid)
To give exactly the same look as normal Ethernet devices, the MAC address of
Eth1394 is extended to 6-bytes by filling 0 after the 2 bytes FireWire node
id. This way all the highlevel stuff which is already working on Ethernet,
like RTnet's TDMA, RTcfg, can be directly moved to Eth1394.
Good Luck!
2005-08-02 Zhang Yuchen <>
19-May-2003 - Mathias Koehrer ( (original version)
21-Oct-2003 - Jan Kiszka (
This file documents the restrictions and pitfalls when using fragmented IP
packets with RTnet.
Ethernet provides 1500 bytes of payload within each packet. Subtracting the IP
header (20 bytes without options) and the UDP header (8 bytes), this leaves
1472 bytes of data for the (UDP) user. When sending larger packets, the RTnet
implementation of IP fragments the packet and sends it in multiple chunks over
the network. When a RTnet station receives a sequence of fragmented IP packets,
it reassembles it and passed the whole packet to the next layer (UDP)
Incoming IP fragments are collected by the IP layer. The collector mechanism is
a global resource, when all collector slots are used, unassignable fragmented
packets are dropped! In order to guarantee bounded execution time of the
collector lookup mechanism, it is not possible to provide an unlimited number
of collectors (currently 10 are support, see ipv4/ip_fragment.c). Therefore, be
careful how many fragmented packets all of your stations are producing and if
one receiver might be overwhelmed with fragments!
Fragmented IP packets are generated AND received at the expense of the socket
rtskb pool. Adjust the pool size appropriately to provide sufficient rtskbs
(see also examples/frap_ip).
To identify the destination socket and to simplify the defragmentation, all IP
fragments must arrive in a strictly ascending order. Unordered packets are
dropped, if they can be assigned to an existing collector, the already
collected fragments are also cleaned up. However, for typically isolated
real-time networks, this requirement can be easily fulfilled.
Known Issues:
When sending fragmented IP packets over a NIC without RTmac being installed,
the NIC's transmission queue may easily overflow (take a look at the driver
source for the exact limit - typically TX_RING_SIZE). This is due to the still
lacking flow control for packet transmission. Will hopefully be fixed soon...
Buffer Pool Management
RTnet holds packet or packet fragments internally in so-called real-time socket
buffers (rtskbs, comparable to Linux skbs). These buffers are used to store
incoming data while it is processed by the stack and before it is copied to the
user buffer. They are also used for setting up outgoing packets and passing
them to the NIC driver.
Unlike buffers in a normal network stack, rtskbs have to be allocatable in a
strictly deterministic way. For this reason, rtskbs are kept preallocated in
multiple pools, one for each producer or consumer of packets. When a filled
buffer is passed from a producer to a consumer, the consumer has to return an
empty rtskb back. Thus it can be avoided that a failing component can exhaust
global resources like the buffers and lock the whole RTnet system.
This is an overview of rtskb pool in RTnet, how large they are by default, and
how they can be extended or shrunk.
1. Socket Pools
Default Size: 16
Resizable: module parameter "socket_rtskbs"
Runtime Resize: [rt_dev_]setsockopt()
Initialization: real-time / non real-time (see text)
Every socket gets an own rtskb pool upon creation. This pool is used for
compensation when an incoming packet needs to be stored until the user fetches
it and when a packet is prepared for transmission. The initial pool size can be
set with "socket_rtskbs".
During runtime the pool can be extended (RT_SO_EXTPOOL) or shrunk
(RT_SO_SHRPOOL) using the [rt_dev_]setsockopt() function. When a socket is to
be created within a real-time context (e.g. a kernel RT-task), the buffers are
allocated from the real-time rtskb cache (see below) instead of using a Linux
system call. When a real-time-created socket is closed again, the buffers
return to that cache. Note that a [rt_dev_]close() call can fail if not all
buffers have yet return to the socket pool. In this case, be patient and retry
later. :)
2. Global Pool
Default Size: 0 + 16 * number of registered NICs
Resizable: module parameter "global_rtskbs" (base value)
module parameter "device_rtskbs" (increment per NIC)
Runtime Resize: by adding or removing NIC drivers
Initialization: non real-time
The global pool is used by the ARP protocol (transmission only) and by the
real-time protocol part of RTmac.
3. ICMP Pool
Default Size: 8
Resizable: -
Runtime Resize: -
Initialization: non real-time
For technical reasons, the ICMP pool which is used for replying incoming
requests is separated from the global pool.
4. NIC Receiver Pool
Default Size: 16 (typically RX_RING_SIZE*2)
Resizable: module parameter "rx_pool_size" (8139too-rt.o only)
Runtime Resize: -
Initialization: non real-time
The receiver pools are used by the NICs to store incoming packets. Their size
is typically fixed and can only be changed by recompiling the driver.
5. VNIC Pool
Default Size: 32
Resizable: module parameter "vnic_rtskbs" (rtmac.o)
Runtime Resize: -
Initialization: non real-time
The VNIC pool is used from compensating incoming non real-time packets when
they are queued for being processed by Linux. The pool is also used for
creating outgoing VNIC packets.
6. rtnetproxy Pool
Default Size: 32
Resizable: module parameter "proxy_rtskbs" (rtnetproxy.o)
Runtime Resize: -
Initialization: non real-time
This pool is used the same way as the VNIC pool.
All module parameters at a glance:
Module | Parameter | Default Value
rtnet | socket_rtskbs | 16
rtnet | global_rtskbs | 0
rtnet | device_rtskbs | 16
rtmac | vnic_rtskbs | 32
rtnetproxy | proxy_rtskbs | 32
rt_8139too | rx_pool_size | 16
A statistic of the currently allocated pools is available through the /proc
interface of RTnet (/proc/rtnet/rtskb).
IP Routing Subsystem
The IPv4 implementation of RTnet comes with a real-time routing subsystem which
has some differences compared to normal IP stacks. Basically, all dynamic
elements of the routing and device address resolution (ARP) process have been
converted into statically configurable mechanisms. This allows an easy analysis
of the routing and address resolution complexity for known real-time networks.
1. Concept
The routing systems is based on two tables. The so-called host routing table
contains all destination IPs which can be reached directly over local network
segments. These IPs include local loopback addresses and network broadcasts.
The optional network routing table provides the addresses of gateways
to distant real-time networks, thus allowing more complex network structures.
In order to use the network routing feature, RTnet has to be compiled with
--enable-net-routing (see configure script).
When preparing the transmission of an IP packet, RTnet first tries to find the
destination address in the host routing table. If this fails and network
routing is available, the network routing table is queried. On success, the
host routing table is consulted again, this time using the gateway IP.
Incoming IP packets are no longer checked against any routing table on standard
RTnet nodes. Only if RTnet was compiled as a router by passing --enable-router
to the configure script, the destination IP is checked if it describes a
non-local address. In case the destination address does not equals the unicast
or broadcast IP of the receiving device and if the input channel is not a
loopback device, the RTnet router will try to find the next hop by performing
the output routing procedure described above and, on success, will forward the
packet. Note that, just like with non-real-time networks, any RTnet router can
become a bottleneck for real-time messages if the traffic is not planned
thoroughly (packets of the RTmac VNICs do not interfer with the real-time
2. Host Routing Table
The content of the host routing table is comparable to ARP tables of standard
IP stacks: destination IP address, the respective device address, and a
reference to the output device. While normal ARP table lookups are not
performed before the routing decision is made, RTnet is using this table
already for the first and mostly sole routing process, and regardless of the
device type, thus also for loopback IPs.
All entries of the host routing table are stored according to a hash mechanism.
The hash key is calculated using the least significant bits of the destination
IP. The size of the hash table, i.e. the number of relevant destination bits is
statically configured (default: 64, see ipv4/route.c). Also the number of
available host routing entries is statically limited (default: 32) and can be
set by recompiling RTnet with modified values.
Example (hash table size 64): & = 35, the host hash key
Host routes are either added or updated manually via the rtroute tool or
automatically when an ARP request or reply arrives. Note that ARP messages are
only triggered by explicite user commands (rtroute solicit). Moreover, the
entries in the host routing table will not expire until they are manually
removed, e.g. by shutting down the respective output device.
The easiest way to create and maintain the host routing table is to use RTcfg,
see README.rtcfg for further information.
3. Network Routing Table
The entries of the network routing table contain the destination IP address, a
mask defining the relevant bits of the destination IP, and the IP of the
gateway to reach the destination network (or host). To simplify updates of host
routes, i.e. foremost changes of the destination device address, gateway IPs
have to be resolved through the host routing table.
Network routes are either stored using a hash key derived from the destination
IP or without any hashing mechanism. The size of the hash table and thus the
number of considered IP bits for generating the key is defined in the source
code (default: 32). The start of the bit range is specified by a module
parameter of rtnet.o called net_hash_key_shift (default: 8).
Example (hash table size 32, net_hash_key_shift 8):
( >> 8) & =
= & = 2, the network hash key
A new network route is only assigned to a hash key if the network mask of the
route completely covers the hash mask.
Examples (hash table size is 32, net_hash_key_shift is 8):
rtroute add netmask gw
hashmask = << 8 =
netmask & hashmask = & = = hashmask => use key!
rtroute add netmask gw
netmask & hashmask = & = != hashmask => no hash key!
In the latter case, RTnet adds the new route to the list of key-less network
routes. This list is querried only if a network route lookup in the hash table
fails. Thus, the network routing process effectively consists of two stages:
the hash-key-based lookup and a potential query of the key-less list of routes.
RTnet provides by default a pool of 16 network routes. This number can be
modified in the source code (see ipv4/route.c). Network routes are only
manually added or removed via rtroute.
Real-Time Ethernet Capturing (RTcap)
RTnet can capture incoming and outgoing Ethernet packets with a very low time
stamp jitter, typically below 10 us (depends on the hardware).
When it is configured and compiled with --enable-rtcap, some extensions will be
added to the RTnet stack and an additional module rtcap.o will be created. This
module has to be loaded *after* all NIC drivers are inserted and *before* any
device is started or a RTmac discipline is attached to it. It will create two
read-only Linux shadow network devices for every NIC:
<rtdevX> (e.g. rteth0) and
<rtdevX>-mac (exception: loopback device will only be mirrored to "rtlo").
The first capturing device mirrors any incoming packet the hardware reports to
the stack and any outgoing packet sent on the local station using RTnet. The
second one captures only packets which have be delayed by an active RTmac
discipline. As the capturing time is dictated by the parent shadow device,
packet lists can be unchronologic, but it provides a deeper look on the
influence of RTmac on the packet transmission process.
After these shadow devices are started up using ifconfig, any capturing tool
like tcpdump or Ethereal can be used for the actual analysis work. In order to
get hold of any packet on the network, the real-time NIC should be
furthermore switched to promiscuous mode when it is configured:
rtifconfig <rtdevX> up <IP> promisc
If you notice any potential packet losses while capturing, you can try to
increase the number of real-time buffer used for storing packets before they
can be processed by Linux. The module parameter rtcap_rtskb controls this
parameter. It is set to 128 by default. Generally you should also tell RTcap to
switch on the RTAI timer (module parameter: start_timer=1) and prevent any
other module or program to do so as well.
The capturing support adds a slight overhead to both paths of packets,
therefore the compilation parameter should only be switched on when the service
is actually required.
The Real-Time Configuration Service (RTcfg) provides a mechanism to start up
RTnet nodes synchronously. It implements a rendezvous during the RTnet start-up
process, exchanges MAC addresses and optionally IP routes, and distributes