1. 21 Apr, 2011 1 commit
  2. 06 Apr, 2011 1 commit
    • Neil Horman's avatar
      ipv6: Enable RFS sk_rxhash tracking for ipv6 sockets (v2) · 47482f13
      Neil Horman authored
      properly record sk_rxhash in ipv6 sockets (v2)
      Noticed while working on another project that flows to sockets which I had open
      on a test systems weren't getting steered properly when I had RFS enabled.
      Looking more closely I found that:
      1) The affected sockets were all ipv6
      2) They weren't getting steered because sk->sk_rxhash was never set from the
      incomming skbs on that socket.
      This was occuring because there are several points in the IPv4 tcp and udp code
      which save the rxhash value when a new connection is established.  Those calls
      to sock_rps_save_rxhash were never added to the corresponding ipv6 code paths.
      This patch adds those calls.  Tested by myself to properly enable RFS
      functionalty on ipv6.
      Change notes:
      	Filtered UDP to only arm RFS on bound sockets (Eric Dumazet)
      Signed-off-by: default avatarNeil Horman <nhorman@tuxdriver.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
  3. 12 Mar, 2011 4 commits
  4. 01 Mar, 2011 1 commit
    • David S. Miller's avatar
      ipv6: Consolidate route lookup sequences. · 68d0c6d3
      David S. Miller authored
      Route lookups follow a general pattern in the ipv6 code wherein
      we first find the non-IPSEC route, potentially override the
      flow destination address due to ipv6 options settings, and then
      finally make an IPSEC search using either xfrm_lookup() or
      __xfrm_lookup() is used when we want to generate a blackhole route
      if the key manager needs to resolve the IPSEC rules (in this case
      -EREMOTE is returned and the original 'dst' is left unchanged).
      Otherwise plain xfrm_lookup() is used and when asynchronous IPSEC
      resolution is necessary, we simply fail the lookup completely.
      All of these cases are encapsulated into two routines,
      ip6_dst_lookup_flow and ip6_sk_dst_lookup_flow.  The latter of which
      handles unconnected UDP datagram sockets.
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
  5. 24 Jan, 2011 1 commit
  6. 16 Dec, 2010 1 commit
    • Octavian Purdila's avatar
      net: fix nulls list corruptions in sk_prot_alloc · fcbdf09d
      Octavian Purdila authored
      Special care is taken inside sk_port_alloc to avoid overwriting
      skc_node/skc_nulls_node. We should also avoid overwriting
      The patch fixes the following crash:
       BUG: unable to handle kernel paging request at fffffffffffffff0
       IP: [<ffffffff812ec6dd>] udp4_lib_lookup2+0xad/0x370
       [<ffffffff812ecc22>] __udp4_lib_lookup+0x282/0x360
       [<ffffffff812ed63e>] __udp4_lib_rcv+0x31e/0x700
       [<ffffffff812bba45>] ? ip_local_deliver_finish+0x65/0x190
       [<ffffffff812bbbf8>] ? ip_local_deliver+0x88/0xa0
       [<ffffffff812eda35>] udp_rcv+0x15/0x20
       [<ffffffff812bba45>] ip_local_deliver_finish+0x65/0x190
       [<ffffffff812bbbf8>] ip_local_deliver+0x88/0xa0
       [<ffffffff812bb2cd>] ip_rcv_finish+0x32d/0x6f0
       [<ffffffff8128c14c>] ? netif_receive_skb+0x99c/0x11c0
       [<ffffffff812bb94b>] ip_rcv+0x2bb/0x350
       [<ffffffff8128c14c>] netif_receive_skb+0x99c/0x11c0
      Signed-off-by: default avatarLeonard Crestez <lcrestez@ixiacom.com>
      Signed-off-by: default avatarOctavian Purdila <opurdila@ixiacom.com>
      Acked-by: default avatarEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
  7. 10 Dec, 2010 2 commits
    • Jiri Pirko's avatar
      net/ipv6/udp.c: fix typo in flush_stack() · c0722400
      Jiri Pirko authored
      skb1 should be passed as parameter to sk_rcvqueues_full() here.
      Signed-off-by: default avatarJiri Pirko <jpirko@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
    • Eric Dumazet's avatar
      net: optimize INET input path further · 68835aba
      Eric Dumazet authored
      Followup of commit b178bb3d
       (net: reorder struct sock fields)
      Optimize INET input path a bit further, by :
      1) moving sk_refcnt close to sk_lock.
      This reduces number of dirtied cache lines by one on 64bit arches (and
      64 bytes cache line size).
      2) moving inet_daddr & inet_rcv_saddr at the beginning of sk
      (same cache line than hash / family / bound_dev_if / nulls_node)
      This reduces number of accessed cache lines in lookups by one, and dont
      increase size of inet and timewait socks.
      inet and tw sockets now share same place-holder for these fields.
      Before patch :
      offsetof(struct sock, sk_refcnt) = 0x10
      offsetof(struct sock, sk_lock) = 0x40
      offsetof(struct sock, sk_receive_queue) = 0x60
      offsetof(struct inet_sock, inet_daddr) = 0x270
      offsetof(struct inet_sock, inet_rcv_saddr) = 0x274
      After patch :
      offsetof(struct sock, sk_refcnt) = 0x44
      offsetof(struct sock, sk_lock) = 0x48
      offsetof(struct sock, sk_receive_queue) = 0x68
      offsetof(struct inet_sock, inet_daddr) = 0x0
      offsetof(struct inet_sock, inet_rcv_saddr) = 0x4
      compute_score() (udp or tcp) now use a single cache line per ignored
      item, instead of two.
      Signed-off-by: default avatarEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
  8. 16 Nov, 2010 1 commit
  9. 25 Oct, 2010 1 commit
  10. 21 Oct, 2010 2 commits
  11. 09 Sep, 2010 1 commit
    • Eric Dumazet's avatar
      udp: add rehash on connect() · 719f8358
      Eric Dumazet authored
      commit 30fff923
       introduced in linux-2.6.33 (udp: bind() optimisation)
      added a secondary hash on UDP, hashed on (local addr, local port).
      Problem is that following sequence :
      fd = socket(...)
      connect(fd, &remote, ...)
      not only selects remote end point (address and port), but also sets
      local address, while UDP stack stored in secondary hash table the socket
      while its local address was INADDR_ANY (or ipv6 equivalent)
      Sequence is :
       - autobind() : choose a random local port, insert socket in hash tables
                    [while local address is INADDR_ANY]
       - connect() : set remote address and port, change local address to IP
                    given by a route lookup.
      When an incoming UDP frame comes, if more than 10 sockets are found in
      primary hash table, we switch to secondary table, and fail to find
      socket because its local address changed.
      One solution to this problem is to rehash datagram socket if needed.
      We add a new rehash(struct socket *) method in "struct proto", and
      implement this method for UDP v4 & v6, using a common helper.
      This rehashing only takes care of secondary hash table, since primary
      hash (based on local port only) is not changed.
      Reported-by: default avatarKrzysztof Piotr Oledzki <ole@ans.pl>
      Signed-off-by: default avatarEric Dumazet <eric.dumazet@gmail.com>
      Tested-by: default avatarKrzysztof Piotr Oledzki <ole@ans.pl>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
  12. 02 Jun, 2010 1 commit
  13. 01 Jun, 2010 1 commit
  14. 29 May, 2010 1 commit
  15. 27 May, 2010 1 commit
    • Eric Dumazet's avatar
      net: fix lock_sock_bh/unlock_sock_bh · 8a74ad60
      Eric Dumazet authored
      This new sock lock primitive was introduced to speedup some user context
      socket manipulation. But it is unsafe to protect two threads, one using
      regular lock_sock/release_sock, one using lock_sock_bh/unlock_sock_bh
      This patch changes lock_sock_bh to be careful against 'owned' state.
      If owned is found to be set, we must take the slow path.
      lock_sock_bh() now returns a boolean to say if the slow path was taken,
      and this boolean is used at unlock_sock_bh time to call the appropriate
      unlock function.
      After this change, BH are either disabled or enabled during the
      lock_sock_bh/unlock_sock_bh protected section. This might be misleading,
      so we rename these functions to lock_sock_fast()/unlock_sock_fast().
      Reported-by: default avatarAnton Blanchard <anton@samba.org>
      Signed-off-by: default avatarEric Dumazet <eric.dumazet@gmail.com>
      Tested-by: default avatarAnton Blanchard <anton@samba.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
  16. 07 May, 2010 1 commit
  17. 28 Apr, 2010 2 commits
    • Eric Dumazet's avatar
      net: ip_queue_rcv_skb() helper · f84af32c
      Eric Dumazet authored
      When queueing a skb to socket, we can immediately release its dst if
      target socket do not use IP_CMSG_PKTINFO.
      tcp_data_queue() can drop dst too.
      This to benefit from a hot cache line and avoid the receiver, possibly
      on another cpu, to dirty this cache line himself.
      Signed-off-by: default avatarEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
    • Eric Dumazet's avatar
      net: speedup udp receive path · 4b0b72f7
      Eric Dumazet authored
      Since commit 95766fff
       ([UDP]: Add memory accounting.), 
      each received packet needs one extra sock_lock()/sock_release() pair.
      This added latency because of possible backlog handling. Then later,
      ticket spinlocks added yet another latency source in case of DDOS.
      This patch introduces lock_sock_bh() and unlock_sock_bh()
      synchronization primitives, avoiding one atomic operation and backlog
      skb_free_datagram_locked() uses them instead of full blown
      lock_sock()/release_sock(). skb is orphaned inside locked section for
      proper socket memory reclaim, and finally freed outside of it.
      UDP receive path now take the socket spinlock only once.
      Signed-off-by: default avatarEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
  18. 27 Apr, 2010 1 commit
    • Eric Dumazet's avatar
      net: sk_add_backlog() take rmem_alloc into account · c377411f
      Eric Dumazet authored
      Current socket backlog limit is not enough to really stop DDOS attacks,
      because user thread spend many time to process a full backlog each
      round, and user might crazy spin on socket lock.
      We should add backlog size and receive_queue size (aka rmem_alloc) to
      pace writers, and let user run without being slow down too much.
      Introduce a sk_rcvqueues_full() helper, to avoid taking socket lock in
      stress situations.
      Under huge stress from a multiqueue/RPS enabled NIC, a single flow udp
      receiver can now process ~200.000 pps (instead of ~100 pps before the
      patch) on a 8 core machine.
      Signed-off-by: default avatarEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
  19. 24 Apr, 2010 2 commits
  20. 21 Apr, 2010 1 commit
  21. 08 Apr, 2010 1 commit
  22. 30 Mar, 2010 1 commit
    • Tejun Heo's avatar
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking... · 5a0e3ad6
      Tejun Heo authored
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h
      percpu.h is included by sched.h and module.h and thus ends up being
      included when building most .c files.  percpu.h includes slab.h which
      in turn includes gfp.h making everything defined by the two files
      universally available and complicating inclusion dependencies.
      percpu.h -> slab.h dependency is about to be removed.  Prepare for
      this change by updating users of gfp and slab facilities include those
      headers directly instead of assuming availability.  As this conversion
      needs to touch large number of source files, the following script is
      used as the basis of conversion.
      The script does the followings.
      * Scan files for gfp and slab usages and update includes such that
        only the necessary includes are there.  ie. if only gfp is used,
        gfp.h, if slab is used, slab.h.
      * When the script inserts a new include, it looks at the include
        blocks and try to put the new include such that its order conforms
        to its surrounding.  It's put in the include block which contains
        core kernel includes, in the same order that the rest are ordered -
        alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
        doesn't seem to be any matching order.
      * If the script can't find a place to put a new include (mostly
        because the file doesn't have fitting include block), it prints out
        an error message indicating which .h file needs to be added to the
      The conversion was done in the following steps.
      1. The initial automatic conversion of all .c files updated slightly
         over 4000 files, deleting around 700 includes and adding ~480 gfp.h
         and ~3000 slab.h inclusions.  The script emitted errors for ~400
      2. Each error was manually checked.  Some didn't need the inclusion,
         some needed manual addition while adding it to implementation .h or
         embedding .c file was more appropriate for others.  This step added
         inclusions to around 150 files.
      3. The script was run again and the output was compared to the edits
         from #2 to make sure no file was left behind.
      4. Several build tests were done and a couple of problems were fixed.
         e.g. lib/decompress_*.c used malloc/free() wrappers around slab
         APIs requiring slab.h to be added manually.
      5. The script was run on all .h files but without automatically
         editing them as sprinkling gfp.h and slab.h inclusions around .h
         files could easily lead to inclusion dependency hell.  Most gfp.h
         inclusion directives were ignored as stuff from gfp.h was usually
         wildly available and often used in preprocessor macros.  Each
         slab.h inclusion directive was examined and added manually as
      6. percpu.h was updated not to include slab.h.
      7. Build test were done on the following configurations and failures
         were fixed.  CONFIG_GCOV_KERNEL was turned off for all tests (as my
         distributed build env didn't work with gcov compiles) and a few
         more options had to be turned off depending on archs to make things
         build (like ipr on powerpc/64 which failed due to missing writeq).
         * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
         * powerpc and powerpc64 SMP allmodconfig
         * sparc and sparc64 SMP allmodconfig
         * ia64 SMP allmodconfig
         * s390 SMP allmodconfig
         * alpha SMP allmodconfig
         * um on x86_64 SMP allmodconfig
      8. percpu.h modifications were reverted so that it could be applied as
         a separate patch and serve as bisection point.
      Given the fact that I had only a couple of failures from tests on step
      6, I'm fairly confident about the coverage of this conversion patch.
      If there is a breakage, it's likely to be something in one of the arch
      headers which should be easily discoverable easily on most builds of
      the specific arch.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Guess-its-ok-by: default avatarChristoph Lameter <cl@linux-foundation.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
  23. 05 Mar, 2010 2 commits
  24. 18 Feb, 2010 1 commit
  25. 13 Feb, 2010 1 commit
  26. 18 Jan, 2010 1 commit
  27. 11 Nov, 2009 2 commits
  28. 09 Nov, 2009 4 commits