Skip to content
  • Eli Cohen's avatar
    IPoIB: Copy small received SKBs in connected mode · f89271da
    Eli Cohen authored
    
    
    The connected mode implementation in the IPoIB driver has a large
    overhead in the way SKBs are handled in the receive flow.  It usually
    allocates an SKB with as big as was used in the currently received SKB
    and moves unused fragments from the old SKB to the new one. This
    involves a loop on all the remaining fragments and incurs overhead on
    the CPU.  This patch, for small SKBs, allocates an SKB just large
    enough to contain the received data and copies to it the data from the
    received SKB.  The newly allocated SKB is passed to the stack and the
    old SKB is reposted.
    
    When running netperf, UDP small messages, without this pach I get:
    
        UDP UNIDIRECTIONAL SEND TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to
        14.4.3.178 (14.4.3.178) port 0 AF_INET
        Socket  Message  Elapsed      Messages
        Size    Size     Time         Okay Errors   Throughput
        bytes   bytes    secs            #      #   10^6bits/sec
    
        114688     128   10.00     5142034      0     526.31
        114688           10.00     1130489            115.71
    
    With this patch I get both send and receive at ~315 mbps.
    
    The reason that send performance actually slows down is as follows:
    When using this patch, the overhead of the CPU for handling RX packets
    is dramatically reduced.  As a result, we do not experience RNR NAK
    messages from the receiver which cause the connection to be closed and
    reopened again; when the patch is not used, the receiver cannot handle
    the packets fast enough so there is less time to post new buffers and
    hence the mentioned RNR NACKs.  So what happens is that the
    application *thinks* it posted a certain number of packets for
    transmission but these packets are flushed and do not really get
    transmitted.  Since the connection gets opened and closed many times,
    each time netperf gets the CPU time that otherwise would have been
    given to IPoIB to actually transmit the packets.  This can be verified
    when looking at the port counters -- the output of ifconfig and the
    oputput of netperf (this is for the case without the patch):
    
        tx packets
        ==========
        port counter:   1,543,996
        ifconfig:       1,581,426
        netperf:        5,142,034
    
        rx packets
        ==========
        netperf         1,1304,089
    
    Signed-off-by: default avatarEli Cohen <eli@mellanox.co.il>
    f89271da