1. 17 Nov, 2016 1 commit
    • Xin Long's avatar
      sctp: use new rhlist interface on sctp transport rhashtable · 7fda702f
      Xin Long authored
      
      
      Now sctp transport rhashtable uses hash(lport, dport, daddr) as the key
      to hash a node to one chain. If in one host thousands of assocs connect
      to one server with the same lport and different laddrs (although it's
      not a normal case), all the transports would be hashed into the same
      chain.
      
      It may cause to keep returning -EBUSY when inserting a new node, as the
      chain is too long and sctp inserts a transport node in a loop, which
      could even lead to system hangs there.
      
      The new rhlist interface works for this case that there are many nodes
      with the same key in one chain. It puts them into a list then makes this
      list be as a node of the chain.
      
      This patch is to replace rhashtable_ interface with rhltable_ interface.
      Since a chain would not be too long and it would not return -EBUSY with
      this fix when inserting a node, the reinsert loop is also removed here.
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Acked-by: default avatarNeil Horman <nhorman@tuxdriver.com>
      Acked-by: default avatarMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7fda702f
  2. 16 Nov, 2016 6 commits
    • Eric Dumazet's avatar
      netpoll: more efficient locking · 89c4b442
      Eric Dumazet authored
      Callers of netpoll_poll_lock() own NAPI_STATE_SCHED
      
      Callers of netpoll_poll_unlock() have BH blocked between
      the NAPI_STATE_SCHED being cleared and poll_lock is released.
      
      We can avoid the spinlock which has no contention, and use cmpxchg()
      on poll_owner which we need to set anyway.
      
      This removes a possible lockdep violation after the cited commit,
      since sk_busy_loop() re-enables BH before calling busy_poll_stop()
      
      Fixes: 217f6974
      
       ("net: busy-poll: allow preemption in sk_busy_loop()")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      89c4b442
    • David Lebrun's avatar
      lwtunnel: subtract tunnel headroom from mtu on output redirect · a23a8f5b
      David Lebrun authored
      
      
      This patch changes the lwtunnel_headroom() function which is called
      in ipv4_mtu() and ip6_mtu(), to also return the correct headroom
      value when the lwtunnel state is OUTPUT_REDIRECT.
      
      This patch enables e.g. SR-IPv6 encapsulations to work without
      manually setting the route mtu.
      Acked-by: default avatarRoopa Prabhu <roopa@cumulusnetworks.com>
      Signed-off-by: default avatarDavid Lebrun <david.lebrun@uclouvain.be>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a23a8f5b
    • Eric Dumazet's avatar
      net: busy-poll: return busypolling status to drivers · 364b6055
      Eric Dumazet authored
      
      
      NAPI drivers use napi_complete_done() or napi_complete() when
      they drained RX ring and right before re-enabling device interrupts.
      
      In busy polling, we can avoid interrupts being delivered since
      we are polling RX ring in a controlled loop.
      
      Drivers can chose to use napi_complete_done() return value
      to reduce interrupts overhead while busy polling is active.
      
      This is optional, legacy drivers should work fine even
      if not updated.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Willem de Bruijn <willemb@google.com>
      Cc: Adam Belay <abelay@google.com>
      Cc: Tariq Toukan <tariqt@mellanox.com>
      Cc: Yuval Mintz <Yuval.Mintz@cavium.com>
      Cc: Ariel Elior <ariel.elior@cavium.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      364b6055
    • Eric Dumazet's avatar
      net: busy-poll: remove need_resched() from sk_can_busy_loop() · 21cb84c4
      Eric Dumazet authored
      
      
      Now sk_busy_loop() can schedule by itself, we can remove
      need_resched() check from sk_can_busy_loop()
      
      Also add a const to its struct sock parameter.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Willem de Bruijn <willemb@google.com>
      Cc: Adam Belay <abelay@google.com>
      Cc: Tariq Toukan <tariqt@mellanox.com>
      Cc: Yuval Mintz <Yuval.Mintz@cavium.com>
      Cc: Ariel Elior <ariel.elior@cavium.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      21cb84c4
    • Eric Dumazet's avatar
      net: busy-poll: allow preemption in sk_busy_loop() · 217f6974
      Eric Dumazet authored
      After commit 4cd13c21
      
       ("softirq: Let ksoftirqd do its job"),
      sk_busy_loop() needs a bit of care :
      softirqs might be delayed since we do not allow preemption yet.
      
      This patch adds preemptiom points in sk_busy_loop(),
      and makes sure no unnecessary cache line dirtying
      or atomic operations are done while looping.
      
      A new flag is added into napi->state : NAPI_STATE_IN_BUSY_POLL
      
      This prevents napi_complete_done() from clearing NAPIF_STATE_SCHED,
      so that sk_busy_loop() does not have to grab it again.
      
      Similarly, netpoll_poll_lock() is done one time.
      
      This gives about 10 to 20 % improvement in various busy polling
      tests, especially when many threads are busy polling in
      configurations with large number of NIC queues.
      
      This should allow experimenting with bigger delays without
      hurting overall latencies.
      
      Tested:
       On a 40Gb mlx4 NIC, 32 RX/TX queues.
      
       echo 70 >/proc/sys/net/core/busy_read
       for i in `seq 1 40`; do echo -n $i: ; ./super_netperf $i -H lpaa24 -t UDP_RR -- -N -n; done
      
          Before:      After:
       1:   90072   92819
       2:  157289  184007
       3:  235772  213504
       4:  344074  357513
       5:  394755  458267
       6:  461151  487819
       7:  549116  625963
       8:  544423  716219
       9:  720460  738446
      10:  794686  837612
      11:  915998  923960
      12:  937507  925107
      13: 1019677  971506
      14: 1046831 1113650
      15: 1114154 1148902
      16: 1105221 1179263
      17: 1266552 1299585
      18: 1258454 1383817
      19: 1341453 1312194
      20: 1363557 1488487
      21: 1387979 1501004
      22: 1417552 1601683
      23: 1550049 1642002
      24: 1568876 1601915
      25: 1560239 1683607
      26: 1640207 1745211
      27: 1706540 1723574
      28: 1638518 1722036
      29: 1734309 1757447
      30: 1782007 1855436
      31: 1724806 1888539
      32: 1717716 1944297
      33: 1778716 1869118
      34: 1805738 1983466
      35: 1815694 2020758
      36: 1893059 2035632
      37: 1843406 2034653
      38: 1888830 2086580
      39: 1972827 2143567
      40: 1877729 2181851
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Willem de Bruijn <willemb@google.com>
      Cc: Adam Belay <abelay@google.com>
      Cc: Tariq Toukan <tariqt@mellanox.com>
      Cc: Yuval Mintz <Yuval.Mintz@cavium.com>
      Cc: Ariel Elior <ariel.elior@cavium.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      217f6974
    • Madalin Bucur's avatar
      devres: add devm_alloc_percpu() · ff86aae3
      Madalin Bucur authored
      
      
      Introduce managed counterparts for alloc_percpu() and free_percpu().
      Add devm_alloc_percpu() and devm_free_percpu() into the managed
      interfaces list.
      Signed-off-by: default avatarMadalin Bucur <madalin.bucur@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ff86aae3
  3. 15 Nov, 2016 6 commits
  4. 14 Nov, 2016 1 commit
  5. 13 Nov, 2016 9 commits
  6. 11 Nov, 2016 3 commits
    • Jakub Kicinski's avatar
      mm: kmemleak: scan .data.ro_after_init · d7c19b06
      Jakub Kicinski authored
      Limit the number of kmemleak false positives by including
      .data.ro_after_init in memory scanning.  To achieve this we need to add
      symbols for start and end of the section to the linker scripts.
      
      The problem was been uncovered by commit 56989f6d ("genetlink: mark
      families as __ro_after_init").
      
      Link: http://lkml.kernel.org/r/1478274173-15218-1-git-send-email-jakub.kicinski@netronome.com
      
      Reviewed-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Signed-off-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Cong Wang <xiyou.wangcong@gmail.com>
      Cc: Johannes Berg <johannes@sipsolutions.net>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      d7c19b06
    • Hans de Goede's avatar
      Revert "console: don't prefer first registered if DT specifies stdout-path" · c6c7d83b
      Hans de Goede authored
      This reverts commit 05fd007e ("console: don't prefer first
      registered if DT specifies stdout-path").
      
      The reverted commit changes existing behavior on which many ARM boards
      rely.  Many ARM small-board-computers, like e.g.  the Raspberry Pi have
      both a video output and a serial console.  Depending on whether the user
      is using the device as a more regular computer; or as a headless device
      we need to have the console on either one or the other.
      
      Many users rely on the kernel behavior of the console being present on
      both outputs, before the reverted commit the console setup with no
      console= kernel arguments on an ARM board which sets stdout-path in dt
      would look like this:
      
        [root@localhost ~]# cat /proc/consoles
        ttyS0                -W- (EC p a)    4:64
        tty0                 -WU (E  p  )    4:1
      
      Where as after the reverted commit, it looks like this:
      
        [root@localhost ~]# cat /proc/consoles
        ttyS0                -W- (EC p a)    4:64
      
      This commit reverts commit 05fd007e ("console: don't prefer first
      registered if DT specifies stdout-path") restoring the original
      behavior.
      
      Fixes: 05fd007e ("console: don't prefer first registered if DT specifies stdout-path")
      Link: http://lkml.kernel.org/r/20161104121135.4780-2-hdegoede@redhat.com
      
      Signed-off-by: default avatarHans de Goede <hdegoede@redhat.com>
      Cc: Paul Burton <paul.burton@imgtec.com>
      Cc: Rob Herring <robh+dt@kernel.org>
      Cc: Frank Rowand <frowand.list@gmail.com>
      Cc: Thorsten Leemhuis <regressions@leemhuis.info>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      c6c7d83b
    • Vlastimil Babka's avatar
      mm, frontswap: make sure allocated frontswap map is assigned · 5e322bee
      Vlastimil Babka authored
      Christian Borntraeger reports:
      
      With commit 8ea1d2a1 ("mm, frontswap: convert frontswap_enabled to
      static key") kmemleak complains about a memory leak in swapon
      
          unreferenced object 0x3e09ba56000 (size 32112640):
            comm "swapon", pid 7852, jiffies 4294968787 (age 1490.770s)
            hex dump (first 32 bytes):
              00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
              00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
            backtrace:
               __vmalloc_node_range+0x194/0x2d8
               vzalloc+0x58/0x68
               SyS_swapon+0xd60/0x12f8
               system_call+0xd6/0x270
      
      Turns out kmemleak is right.  We now allocate the frontswap map
      depending on the kernel config (and no longer on the enablement)
      
        swapfile.c:
        [...]
            if (IS_ENABLED(CONFIG_FRONTSWAP))
                      frontswap_map = vzalloc(BITS_TO_LONGS(maxpages) * sizeof(long));
      
      but later on this is passed along
        --> enable_swap_info(p, prio, swap_map, cluster_info, frontswap_map);
      
      and ignored if frontswap is disabled
        --> frontswap_init(p->type, frontswap_map);
      
        static inline void frontswap_init(unsigned type, unsigned long *map)
        {
              if (frontswap_enabled())
                      __frontswap_init(type, map);
        }
      
      Thing is, that frontswap map is never freed.
      
      The leakage is relatively not that bad, because swapon is an infrequent
      and privileged operation.  However, if the first frontswap backend is
      registered after a swap type has been already enabled, it will WARN_ON
      in frontswap_register_ops() and frontswap will not be available for the
      swap type.
      
      Fix this by making sure the map is assigned by frontswap_init() as long
      as CONFIG_FRONTSWAP is enabled.
      
      Fixes: 8ea1d2a1 ("mm, frontswap: convert frontswap_enabled to static key")
      Link: http://lkml.kernel.org/r/20161026134220.2566-1-vbabka@suse.cz
      
      Signed-off-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Reported-by: default avatarChristian Borntraeger <borntraeger@de.ibm.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: David Vrabel <david.vrabel@citrix.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      5e322bee
  7. 10 Nov, 2016 14 commits