1. 08 Mar, 2016 1 commit
  2. 22 Feb, 2016 1 commit
    • Neil Horman's avatar
      sctp: Fix port hash table size computation · d9749fb5
      Neil Horman authored
      
      
      Dmitry Vyukov noted recently that the sctp_port_hashtable had an error in
      its size computation, observing that the current method never guaranteed
      that the hashsize (measured in number of entries) would be a power of two,
      which the input hash function for that table requires.  The root cause of
      the problem is that two values need to be computed (one, the allocation
      order of the storage requries, as passed to __get_free_pages, and two the
      number of entries for the hash table).  Both need to be ^2, but for
      different reasons, and the existing code is simply computing one order
      value, and using it as the basis for both, which is wrong (i.e. it assumes
      that ((1<<order)*PAGE_SIZE)/sizeof(bucket) is still ^2 when its not).
      
      To fix this, we change the logic slightly.  We start by computing a goal
      allocation order (which is limited by the maximum size hash table we want
      to support.  Then we attempt to allocate that size table, decreasing the
      order until a successful allocation is made.  Then, with the resultant
      successful order we compute the number of buckets that hash table supports,
      which we then round down to the nearest power of two, giving us the number
      of entries the table actually supports.
      
      I've tested this locally here, using non-debug and spinlock-debug kernels,
      and the number of entries in the hashtable consistently work out to be
      powers of two in all cases.
      Signed-off-by: default avatarNeil Horman <nhorman@tuxdriver.com>
      Reported-by: default avatarDmitry Vyukov <dvyukov@google.com>
      CC: Dmitry Vyukov <dvyukov@google.com>
      CC: Vladislav Yasevich <vyasevich@gmail.com>
      CC: "David S. Miller" <davem@davemloft.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d9749fb5
  3. 05 Jan, 2016 2 commits
  4. 16 Dec, 2015 2 commits
  5. 11 Sep, 2015 1 commit
    • Marcelo Ricardo Leitner's avatar
      sctp: fix race on protocol/netns initialization · 8e2d61e0
      Marcelo Ricardo Leitner authored
      Consider sctp module is unloaded and is being requested because an user
      is creating a sctp socket.
      
      During initialization, sctp will add the new protocol type and then
      initialize pernet subsys:
      
              status = sctp_v4_protosw_init();
              if (status)
                      goto err_protosw_init;
      
              status = sctp_v6_protosw_init();
              if (status)
                      goto err_v6_protosw_init;
      
              status = register_pernet_subsys(&sctp_net_ops);
      
      The problem is that after those calls to sctp_v{4,6}_protosw_init(), it
      is possible for userspace to create SCTP sockets like if the module is
      already fully loaded. If that happens, one of the possible effects is
      that we will have readers for net->sctp.local_addr_list list earlier
      than expected and sctp_net_init() does not take precautions while
      dealing with that list, leading to a potential panic but not limited to
      that, as sctp_sock_init() will copy a bunch of blank/partially
      initialized values from net->sctp.
      
      The race happens like this:
      
           CPU 0                           |  CPU 1
        socket()                           |
         __sock_create                     | socket()
          inet_create                      |  __sock_create
           list_for_each_entry_rcu(        |
              answer, &inetsw[sock->type], |
              list) {                      |   inet_create
            /* no hits */                  |
           if (unlikely(err)) {            |
            ...                            |
            request_module()               |
            /* socket creation is blocked  |
             * the module is fully loaded  |
             */                            |
             sctp_init                     |
              sctp_v4_protosw_init         |
               inet_register_protosw       |
                list_add_rcu(&p->list,     |
                             last_perm);   |
                                           |  list_for_each_entry_rcu(
                                           |     answer, &inetsw[sock->type],
              sctp_v6_protosw_init         |     list) {
                                           |     /* hit, so assumes protocol
                                           |      * is already loaded
                                           |      */
                                           |  /* socket creation continues
                                           |   * before netns is initialized
                                           |   */
              register_pernet_subsys       |
      
      Simply inverting the initialization order between
      register_pernet_subsys() and sctp_v4_protosw_init() is not possible
      because register_pernet_subsys() will create a control sctp socket, so
      the protocol must be already visible by then. Deferring the socket
      creation to a work-queue is not good specially because we loose the
      ability to handle its errors.
      
      So, as suggested by Vlad, the fix is to split netns initialization in
      two moments: defaults and control socket, so that the defaults are
      already loaded by when we register the protocol, while control socket
      initialization is kept at the same moment it is today.
      
      Fixes: 4db67e80
      
       ("sctp: Make the address lists per network namespace")
      Signed-off-by: default avatarVlad Yasevich <vyasevich@gmail.com>
      Signed-off-by: default avatarMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8e2d61e0
  6. 03 Sep, 2015 2 commits
  7. 21 Jul, 2015 2 commits
    • Marcelo Ricardo Leitner's avatar
      sctp: fix src address selection if using secondary addresses · 0ca50d12
      Marcelo Ricardo Leitner authored
      
      
      In short, sctp is likely to incorrectly choose src address if socket is
      bound to secondary addresses. This patch fixes it by adding a new check
      that checks if such src address belongs to the interface that routing
      identified as output.
      
      This is enough to avoid rp_filter drops on remote peer.
      
      Details:
      
      Currently, sctp will do a routing attempt without specifying the src
      address and compare the returned value (preferred source) with the
      addresses that the socket is bound to. When using secondary addresses,
      this will not match.
      
      Then it will try specifying each of the addresses that the socket is
      bound to and re-routing, checking if that address is valid as src for
      that dst. Thing is, this check alone is weak:
      
      # ip r l
      192.168.100.0/24 dev eth1  proto kernel  scope link  src 192.168.100.149
      192.168.122.0/24 dev eth0  proto kernel  scope link  src 192.168.122.147
      
      # ip a l
      1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default
          link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
          inet 127.0.0.1/8 scope host lo
             valid_lft forever preferred_lft forever
          inet6 ::1/128 scope host
             valid_lft forever preferred_lft forever
      2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
          link/ether 52:54:00:15:18:6a brd ff:ff:ff:ff:ff:ff
          inet 192.168.122.147/24 brd 192.168.122.255 scope global dynamic eth0
             valid_lft 2160sec preferred_lft 2160sec
          inet 192.168.122.148/24 scope global secondary eth0
             valid_lft forever preferred_lft forever
          inet6 fe80::5054:ff:fe15:186a/64 scope link
             valid_lft forever preferred_lft forever
      3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
          link/ether 52:54:00:b3:91:46 brd ff:ff:ff:ff:ff:ff
          inet 192.168.100.149/24 brd 192.168.100.255 scope global dynamic eth1
             valid_lft 2162sec preferred_lft 2162sec
          inet 192.168.100.148/24 scope global secondary eth1
             valid_lft forever preferred_lft forever
          inet6 fe80::5054:ff:feb3:9146/64 scope link
             valid_lft forever preferred_lft forever
      4: ens9: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
          link/ether 52:54:00:05:47:ee brd ff:ff:ff:ff:ff:ff
          inet6 fe80::5054:ff:fe05:47ee/64 scope link
             valid_lft forever preferred_lft forever
      
      # ip r g 192.168.100.193 from 192.168.122.148
      192.168.100.193 from 192.168.122.148 dev eth1
          cache
      
      Even if you specify an interface:
      
      # ip r g 192.168.100.193 from 192.168.122.148 oif eth1
      192.168.100.193 from 192.168.122.148 dev eth1
          cache
      
      Although this would be valid, peers using rp_filter will drop such
      packets as their src doesn't match the routes for that interface.
      Signed-off-by: default avatarMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0ca50d12
    • Marcelo Ricardo Leitner's avatar
      sctp: reduce indent level on sctp_v4_get_dst · 07868284
      Marcelo Ricardo Leitner authored
      
      
      Paves the day for the next patch. Functionality stays untouched.
      Signed-off-by: default avatarMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      07868284
  8. 11 May, 2015 1 commit
  9. 02 Mar, 2015 1 commit
  10. 09 Sep, 2014 1 commit
  11. 08 Sep, 2014 1 commit
    • Tejun Heo's avatar
      percpu_counter: add @gfp to percpu_counter_init() · 908c7f19
      Tejun Heo authored
      
      
      Percpu allocator now supports allocation mask.  Add @gfp to
      percpu_counter_init() so that !GFP_KERNEL allocation masks can be used
      with percpu_counters too.
      
      We could have left percpu_counter_init() alone and added
      percpu_counter_init_gfp(); however, the number of users isn't that
      high and introducing _gfp variants to all percpu data structures would
      be quite ugly, so let's just do the conversion.  This is the one with
      the most users.  Other percpu data structures are a lot easier to
      convert.
      
      This patch doesn't make any functional difference.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarJan Kara <jack@suse.cz>
      Acked-by: default avatar"David S. Miller" <davem@davemloft.net>
      Cc: x86@kernel.org
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      908c7f19
  12. 01 Aug, 2014 1 commit
    • Jason Gunthorpe's avatar
      sctp: Fixup v4mapped behaviour to comply with Sock API · 299ee123
      Jason Gunthorpe authored
      
      
      The SCTP socket extensions API document describes the v4mapping option as
      follows:
      
      8.1.15.  Set/Clear IPv4 Mapped Addresses (SCTP_I_WANT_MAPPED_V4_ADDR)
      
         This socket option is a Boolean flag which turns on or off the
         mapping of IPv4 addresses.  If this option is turned on, then IPv4
         addresses will be mapped to V6 representation.  If this option is
         turned off, then no mapping will be done of V4 addresses and a user
         will receive both PF_INET6 and PF_INET type addresses on the socket.
         See [RFC3542] for more details on mapped V6 addresses.
      
      This description isn't really in line with what the code does though.
      
      Introduce addr_to_user (renamed addr_v4map), which should be called
      before any sockaddr is passed back to user space. The new function
      places the sockaddr into the correct format depending on the
      SCTP_I_WANT_MAPPED_V4_ADDR option.
      
      Audit all places that touched v4mapped and either sanely construct
      a v4 or v6 address then call addr_to_user, or drop the
      unnecessary v4mapped check entirely.
      
      Audit all places that call addr_to_user and verify they are on a sycall
      return path.
      
      Add a custom getname that formats the address properly.
      
      Several bugs are addressed:
       - SCTP_I_WANT_MAPPED_V4_ADDR=0 often returned garbage for
         addresses to user space
       - The addr_len returned from recvmsg was not correct when
         returning AF_INET on a v6 socket
       - flowlabel and scope_id were not zerod when promoting
         a v4 to v6
       - Some syscalls like bind and connect behaved differently
         depending on v4mapped
      
      Tested bind, getpeername, getsockname, connect, and recvmsg for proper
      behaviour in v4mapped = 1 and 0 cases.
      Signed-off-by: default avatarNeil Horman <nhorman@tuxdriver.com>
      Tested-by: default avatarJason Gunthorpe <jgunthorpe@obsidianresearch.com>
      Signed-off-by: default avatarJason Gunthorpe <jgunthorpe@obsidianresearch.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      299ee123
  13. 23 May, 2014 1 commit
  14. 07 May, 2014 1 commit
    • WANG Cong's avatar
      net: clean up snmp stats code · 698365fa
      WANG Cong authored
      commit 8f0ea0fe (snmp: reduce percpu needs by 50%)
      reduced snmp array size to 1, so technically it doesn't have to be
      an array any more. What's more, after the following commit:
      
      	commit 933393f5
      
      
      	Date:   Thu Dec 22 11:58:51 2011 -0600
      
      	    percpu: Remove irqsafe_cpu_xxx variants
      
      	    We simply say that regular this_cpu use must be safe regardless of
      	    preemption and interrupt state.  That has no material change for x86
      	    and s390 implementations of this_cpu operations.  However, arches that
      	    do not provide their own implementation for this_cpu operations will
      	    now get code generated that disables interrupts instead of preemption.
      
      probably no arch wants to have SNMP_ARRAY_SZ == 2. At least after
      almost 3 years, no one complains.
      
      So, just convert the array to a single pointer and remove snmp_mib_init()
      and snmp_mib_free() as well.
      
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: David S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      698365fa
  15. 27 Apr, 2014 1 commit
    • Xufeng Zhang's avatar
      sctp: reset flowi4_oif parameter on route lookup · 85350871
      Xufeng Zhang authored
      commit 813b3b5d (ipv4: Use caller's on-stack flowi as-is
      in output route lookups.) introduces another regression which
      is very similar to the problem of commit e6b45241
      
       (ipv4: reset
      flowi parameters on route connect) wants to fix:
      Before we call ip_route_output_key() in sctp_v4_get_dst() to
      get a dst that matches a bind address as the source address,
      we have already called this function previously and the flowi
      parameters have been initialized including flowi4_oif, so when
      we call this function again, the process in __ip_route_output_key()
      will be different because of the setting of flowi4_oif, and we'll
      get a networking device which corresponds to the inputted flowi4_oif
      as the output device, this is wrong because we'll never hit this
      place if the previously returned source address of dst match one
      of the bound addresses.
      
      To reproduce this problem, a vlan setting is enough:
        # ifconfig eth0 up
        # route del default
        # vconfig add eth0 2
        # vconfig add eth0 3
        # ifconfig eth0.2 10.0.1.14 netmask 255.255.255.0
        # route add default gw 10.0.1.254 dev eth0.2
        # ifconfig eth0.3 10.0.0.14 netmask 255.255.255.0
        # ip rule add from 10.0.0.14 table 4
        # ip route add table 4 default via 10.0.0.254 src 10.0.0.14 dev eth0.3
        # sctp_darn -H 10.0.0.14 -P 36422 -h 10.1.4.134 -p 36422 -s -I
      You'll detect that all the flow are routed to eth0.2(10.0.1.254).
      Signed-off-by: default avatarXufeng Zhang <xufeng.zhang@windriver.com>
      Signed-off-by: default avatarJulian Anastasov <ja@ssi.bg>
      Acked-by: default avatarVlad Yasevich <vyasevich@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      85350871
  16. 15 Apr, 2014 1 commit
  17. 22 Jan, 2014 1 commit
  18. 17 Jan, 2014 1 commit
  19. 13 Jan, 2014 1 commit
    • Hannes Frederic Sowa's avatar
      ipv4: introduce hardened ip_no_pmtu_disc mode · 8ed1dc44
      Hannes Frederic Sowa authored
      
      
      This new ip_no_pmtu_disc mode only allowes fragmentation-needed errors
      to be honored by protocols which do more stringent validation on the
      ICMP's packet payload. This knob is useful for people who e.g. want to
      run an unmodified DNS server in a namespace where they need to use pmtu
      for TCP connections (as they are used for zone transfers or fallback
      for requests) but don't want to use possibly spoofed UDP pmtu information.
      
      Currently the whitelisted protocols are TCP, SCTP and DCCP as they check
      if the returned packet is in the window or if the association is valid.
      
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: David Miller <davem@davemloft.net>
      Cc: John Heffner <johnwheffner@gmail.com>
      Suggested-by: default avatarFlorian Weimer <fweimer@redhat.com>
      Signed-off-by: default avatarHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8ed1dc44
  20. 26 Dec, 2013 1 commit
  21. 06 Dec, 2013 1 commit
  22. 09 Aug, 2013 3 commits
  23. 25 Jul, 2013 1 commit
  24. 02 Jul, 2013 1 commit
    • Daniel Borkmann's avatar
      net: sctp: rework debugging framework to use pr_debug and friends · bb33381d
      Daniel Borkmann authored
      We should get rid of all own SCTP debug printk macros and use the ones
      that the kernel offers anyway instead. This makes the code more readable
      and conform to the kernel code, and offers all the features of dynamic
      debbuging that pr_debug() et al has, such as only turning on/off portions
      of debug messages at runtime through debugfs. The runtime cost of having
      CONFIG_DYNAMIC_DEBUG enabled, but none of the debug statements printing,
      is negligible [1]. If kernel debugging is completly turned off, then these
      statements will also compile into "empty" functions.
      
      While we're at it, we also need to change the Kconfig option as it /now/
      only refers to the ifdef'ed code portions in outqueue.c that enable further
      debugging/tracing of SCTP transaction fields. Also, since SCTP_ASSERT code
      was enabled with this Kconfig option and has now been removed, we
      transform those code parts into WARNs resp. where appropriate BUG_ONs so
      that those bugs can be more easily detected as probably not many people
      have SCTP debugging permanently turned on.
      
      To turn on all SCTP debugging, the following steps are needed:
      
       # mount -t debugfs none /sys/kernel/debug
       # echo -n 'module sctp +p' > /sys/kernel/debug/dynamic_debug/control
      
      This can be done more fine-grained on a per file, per line basis and others
      as described in [2].
      
       [1] https://www.kernel.org/doc/ols/2009/ols2009-pages-39-46.pdf
      
      
       [2] Documentation/dynamic-debug-howto.txt
      Signed-off-by: default avatarDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bb33381d
  25. 20 Jun, 2013 1 commit
  26. 18 Jun, 2013 2 commits
  27. 18 Mar, 2013 1 commit
  28. 29 Dec, 2012 1 commit
  29. 16 Dec, 2012 1 commit
    • Neil Horman's avatar
      sctp: Change defaults on cookie hmac selection · 0d0863b0
      Neil Horman authored
      Recently I posted commit 3c68198e which made selection of the cookie hmac
      algorithm selectable.  This is all well and good, but Linus noted that it
      changes the default config:
      http://marc.info/?l=linux-netdev&m=135536629004808&w=2
      
      
      
      I've modified the sctp Kconfig file to reflect the recommended way of making
      this choice, using the thermal driver example specified, and brought the
      defaults back into line with the way they were prior to my origional patch
      
      Also, on Linus' suggestion, re-adding ability to select default 'none' hmac
      algorithm, so we don't needlessly bloat the kernel by forcing a non-none
      default.  This also led me to note that we won't honor the default none
      condition properly because of how sctp_net_init is encoded.  Fix that up as
      well.
      
      Tested by myself (allbeit fairly quickly).  All configuration combinations seems
      to work soundly.
      Signed-off-by: default avatarNeil Horman <nhorman@tuxdriver.com>
      CC: David Miller <davem@davemloft.net>
      CC: Linus Torvalds <torvalds@linux-foundation.org>
      CC: Vlad Yasevich <vyasevich@gmail.com>
      CC: linux-sctp@vger.kernel.org
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0d0863b0
  30. 07 Dec, 2012 1 commit
  31. 26 Oct, 2012 1 commit
    • Neil Horman's avatar
      sctp: Make hmac algorithm selection for cookie generation dynamic · 3c68198e
      Neil Horman authored
      
      
      Currently sctp allows for the optional use of md5 of sha1 hmac algorithms to
      generate cookie values when establishing new connections via two build time
      config options.  Theres no real reason to make this a static selection.  We can
      add a sysctl that allows for the dynamic selection of these algorithms at run
      time, with the default value determined by the corresponding crypto library
      availability.
      This comes in handy when, for example running a system in FIPS mode, where use
      of md5 is disallowed, but SHA1 is permitted.
      
      Note: This new sysctl has no corresponding socket option to select the cookie
      hmac algorithm.  I chose not to implement that intentionally, as RFC 6458
      contains no option for this value, and I opted not to pollute the socket option
      namespace.
      
      Change notes:
      v2)
      	* Updated subject to have the proper sctp prefix as per Dave M.
      	* Replaced deafult selection options with new options that allow
      	  developers to explicitly select available hmac algs at build time
      	  as per suggestion by Vlad Y.
      Signed-off-by: default avatarNeil Horman <nhorman@tuxdriver.com>
      CC: Vlad Yasevich <vyasevich@gmail.com>
      CC: "David S. Miller" <davem@davemloft.net>
      CC: netdev@vger.kernel.org
      Acked-by: default avatarVlad Yasevich <vyasevich@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3c68198e
  32. 15 Aug, 2012 2 commits