1. 17 Nov, 2016 1 commit
    • Xin Long's avatar
      sctp: use new rhlist interface on sctp transport rhashtable · 7fda702f
      Xin Long authored
      
      
      Now sctp transport rhashtable uses hash(lport, dport, daddr) as the key
      to hash a node to one chain. If in one host thousands of assocs connect
      to one server with the same lport and different laddrs (although it's
      not a normal case), all the transports would be hashed into the same
      chain.
      
      It may cause to keep returning -EBUSY when inserting a new node, as the
      chain is too long and sctp inserts a transport node in a loop, which
      could even lead to system hangs there.
      
      The new rhlist interface works for this case that there are many nodes
      with the same key in one chain. It puts them into a list then makes this
      list be as a node of the chain.
      
      This patch is to replace rhashtable_ interface with rhltable_ interface.
      Since a chain would not be too long and it would not return -EBUSY with
      this fix when inserting a node, the reinsert loop is also removed here.
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Acked-by: default avatarNeil Horman <nhorman@tuxdriver.com>
      Acked-by: default avatarMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7fda702f
  2. 16 Nov, 2016 6 commits
    • Eric Dumazet's avatar
      netpoll: more efficient locking · 89c4b442
      Eric Dumazet authored
      Callers of netpoll_poll_lock() own NAPI_STATE_SCHED
      
      Callers of netpoll_poll_unlock() have BH blocked between
      the NAPI_STATE_SCHED being cleared and poll_lock is released.
      
      We can avoid the spinlock which has no contention, and use cmpxchg()
      on poll_owner which we need to set anyway.
      
      This removes a possible lockdep violation after the cited commit,
      since sk_busy_loop() re-enables BH before calling busy_poll_stop()
      
      Fixes: 217f6974
      
       ("net: busy-poll: allow preemption in sk_busy_loop()")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      89c4b442
    • Eric Dumazet's avatar
      net: busy-poll: return busypolling status to drivers · 364b6055
      Eric Dumazet authored
      
      
      NAPI drivers use napi_complete_done() or napi_complete() when
      they drained RX ring and right before re-enabling device interrupts.
      
      In busy polling, we can avoid interrupts being delivered since
      we are polling RX ring in a controlled loop.
      
      Drivers can chose to use napi_complete_done() return value
      to reduce interrupts overhead while busy polling is active.
      
      This is optional, legacy drivers should work fine even
      if not updated.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Willem de Bruijn <willemb@google.com>
      Cc: Adam Belay <abelay@google.com>
      Cc: Tariq Toukan <tariqt@mellanox.com>
      Cc: Yuval Mintz <Yuval.Mintz@cavium.com>
      Cc: Ariel Elior <ariel.elior@cavium.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      364b6055
    • Eric Dumazet's avatar
      net: busy-poll: allow preemption in sk_busy_loop() · 217f6974
      Eric Dumazet authored
      After commit 4cd13c21
      
       ("softirq: Let ksoftirqd do its job"),
      sk_busy_loop() needs a bit of care :
      softirqs might be delayed since we do not allow preemption yet.
      
      This patch adds preemptiom points in sk_busy_loop(),
      and makes sure no unnecessary cache line dirtying
      or atomic operations are done while looping.
      
      A new flag is added into napi->state : NAPI_STATE_IN_BUSY_POLL
      
      This prevents napi_complete_done() from clearing NAPIF_STATE_SCHED,
      so that sk_busy_loop() does not have to grab it again.
      
      Similarly, netpoll_poll_lock() is done one time.
      
      This gives about 10 to 20 % improvement in various busy polling
      tests, especially when many threads are busy polling in
      configurations with large number of NIC queues.
      
      This should allow experimenting with bigger delays without
      hurting overall latencies.
      
      Tested:
       On a 40Gb mlx4 NIC, 32 RX/TX queues.
      
       echo 70 >/proc/sys/net/core/busy_read
       for i in `seq 1 40`; do echo -n $i: ; ./super_netperf $i -H lpaa24 -t UDP_RR -- -N -n; done
      
          Before:      After:
       1:   90072   92819
       2:  157289  184007
       3:  235772  213504
       4:  344074  357513
       5:  394755  458267
       6:  461151  487819
       7:  549116  625963
       8:  544423  716219
       9:  720460  738446
      10:  794686  837612
      11:  915998  923960
      12:  937507  925107
      13: 1019677  971506
      14: 1046831 1113650
      15: 1114154 1148902
      16: 1105221 1179263
      17: 1266552 1299585
      18: 1258454 1383817
      19: 1341453 1312194
      20: 1363557 1488487
      21: 1387979 1501004
      22: 1417552 1601683
      23: 1550049 1642002
      24: 1568876 1601915
      25: 1560239 1683607
      26: 1640207 1745211
      27: 1706540 1723574
      28: 1638518 1722036
      29: 1734309 1757447
      30: 1782007 1855436
      31: 1724806 1888539
      32: 1717716 1944297
      33: 1778716 1869118
      34: 1805738 1983466
      35: 1815694 2020758
      36: 1893059 2035632
      37: 1843406 2034653
      38: 1888830 2086580
      39: 1972827 2143567
      40: 1877729 2181851
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Willem de Bruijn <willemb@google.com>
      Cc: Adam Belay <abelay@google.com>
      Cc: Tariq Toukan <tariqt@mellanox.com>
      Cc: Yuval Mintz <Yuval.Mintz@cavium.com>
      Cc: Ariel Elior <ariel.elior@cavium.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      217f6974
    • David Lebrun's avatar
      ipv6: sr: add option to control lwtunnel support · 46738b13
      David Lebrun authored
      This patch adds a new option CONFIG_IPV6_SEG6_LWTUNNEL to enable/disable
      support of encapsulation with the lightweight tunnels. When this option
      is enabled, CONFIG_LWTUNNEL is automatically selected.
      
      Fix commit 6c8702c6
      
       ("ipv6: sr: add support for SRH encapsulation and injection with lwtunnels")
      
      Without a proper option to control lwtunnel support for SR-IPv6, if
      CONFIG_LWTUNNEL=n then the IPv6 initialization fails as a consequence
      of seg6_iptunnel_init() failure with EOPNOTSUPP:
      
      NET: Registered protocol family 10
      IPv6: Attempt to unregister permanent protocol 6
      IPv6: Attempt to unregister permanent protocol 136
      IPv6: Attempt to unregister permanent protocol 17
      NET: Unregistered protocol family 10
      
      Tested (compiling, booting, and loading ipv6 module when relevant)
      with possible combinations of CONFIG_IPV6={y,m,n},
      CONFIG_IPV6_SEG6_LWTUNNEL={y,n} and CONFIG_LWTUNNEL={y,n}.
      Reported-by: default avatarLorenzo Colitti <lorenzo@google.com>
      Suggested-by: default avatarRoopa Prabhu <roopa@cumulusnetworks.com>
      Signed-off-by: default avatarDavid Lebrun <david.lebrun@uclouvain.be>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      46738b13
    • Andrey Vagin's avatar
      tcp: allow to enable the repair mode for non-listening sockets · 319b0534
      Andrey Vagin authored
      
      
      The repair mode is used to get and restore sequence numbers and
      data from queues. It used to checkpoint/restore connections.
      
      Currently the repair mode can be enabled for sockets in the established
      and closed states, but for other states we have to dump the same socket
      properties, so lets allow to enable repair mode for these sockets.
      
      The repair mode reveals nothing more for sockets in other states.
      Signed-off-by: default avatarAndrei Vagin <avagin@openvz.org>
      Acked-by: default avatarPavel Emelyanov <xemul@openvz.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      319b0534
    • Florian Westphal's avatar
      dctcp: update cwnd on congestion event · 47805667
      Florian Westphal authored
      
      
      draft-ietf-tcpm-dctcp-02 says:
      
      ... when the sender receives an indication of congestion
      (ECE), the sender SHOULD update cwnd as follows:
      
               cwnd = cwnd * (1 - DCTCP.Alpha / 2)
      
      So, lets do this and reduce cwnd more smoothly (and faster), as per
      current congestion estimate.
      
      Cc: Lawrence Brakmo <brakmo@fb.com>
      Cc: Andrew Shewmaker <agshew@gmail.com>
      Cc: Glenn Judd <glenn.judd@morganstanley.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      47805667
  3. 15 Nov, 2016 1 commit
  4. 14 Nov, 2016 2 commits
  5. 13 Nov, 2016 13 commits
  6. 10 Nov, 2016 17 commits