1. 20 Aug, 2016 1 commit
    • Marcelo Ricardo Leitner's avatar
      sctp: linearize early if it's not GSO · 4c2f2454
      Marcelo Ricardo Leitner authored
      Because otherwise when crc computation is still needed it's way more
      expensive than on a linear buffer to the point that it affects
      performance.
      
      It's so expensive that netperf test gives a perf output as below:
      
      Overhead  Command         Shared Object       Symbol
        18,62%  netserver       [kernel.vmlinux]    [k] crc32_generic_shift
         2,57%  netserver       [kernel.vmlinux]    [k] __pskb_pull_tail
         1,94%  netserver       [kernel.vmlinux]    [k] fib_table_lookup
         1,90%  netserver       [kernel.vmlinux]    [k] copy_user_enhanced_fast_string
         1,66%  swapper         [kernel.vmlinux]    [k] intel_idle
         1,63%  netserver       [kernel.vmlinux]    [k] _raw_spin_lock
         1,59%  netserver       [sctp]              [k] sctp_packet_transmit
         1,55%  netserver       [kernel.vmlinux]    [k] memcpy_erms
         1,42%  netserver       [sctp]              [k] sctp_rcv
      
      # netperf -H 192.168.10.1 -l 10 -t SCTP_STREAM -cC -- -m 12000
      SCTP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.10.1 () port 0 AF_INET
      Recv   Send    Send                          Utilization       Service Demand
      Socket Socket  Message  Elapsed              Send     Recv     Send    Recv
      Size   Size    Size     Time     Throughput  local    remote   local   remote
      bytes  bytes   bytes    secs.    10^6bits/s  % S      % S      us/KB   us/KB
      
      212992 212992  12000    10.00      3016.42   2.88     3.78     1.874   2.462
      
      After patch:
      Overhead  Command         Shared Object      Symbol
         2,75%  netserver       [kernel.vmlinux]   [k] memcpy_erms
         2,63%  netserver       [kernel.vmlinux]   [k] copy_user_enhanced_fast_string
         2,39%  netserver       [kernel.vmlinux]   [k] fib_table_lookup
         2,04%  netserver       [kernel.vmlinux]   [k] __pskb_pull_tail
         1,91%  netserver       [kernel.vmlinux]   [k] _raw_spin_lock
         1,91%  netserver       [sctp]             [k] sctp_packet_transmit
         1,72%  netserver       [mlx4_en]          [k] mlx4_en_process_rx_cq
         1,68%  netserver       [sctp]             [k] sctp_rcv
      
      # netperf -H 192.168.10.1 -l 10 -t SCTP_STREAM -cC -- -m 12000
      SCTP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.10.1 () port 0 AF_INET
      Recv   Send    Send                          Utilization       Service Demand
      Socket Socket  Message  Elapsed              Send     Recv     Send    Recv
      Size   Size    Size     Time     Throughput  local    remote   local   remote
      bytes  bytes   bytes    secs.    10^6bits/s  % S      % S      us/KB   us/KB
      
      212992 212992  12000    10.00      3681.77   3.83     3.46     2.045   1.849
      
      Fixes: 3acb50c1
      
       ("sctp: delay as much as possible skb_linearize")
      Signed-off-by: default avatarMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4c2f2454
  2. 19 Aug, 2016 21 commits
  3. 18 Aug, 2016 3 commits
    • Liping Zhang's avatar
      netfilter: cttimeout: fix use after free error when delete netns · b75911b6
      Liping Zhang authored
      
      
      In general, when we want to delete a netns, cttimeout_net_exit will
      be called before ipt_unregister_table, i.e. before ctnl_timeout_put.
      
      But after call kfree_rcu in cttimeout_net_exit, we will still decrease
      the timeout object's refcnt in ctnl_timeout_put, this is incorrect,
      and will cause a use after free error.
      
      It is easy to reproduce this problem:
        # while : ; do
        ip netns add xxx
        ip netns exec xxx nfct add timeout testx inet icmp timeout 200
        ip netns exec xxx iptables -t raw -p icmp -I OUTPUT -j CT --timeout testx
        ip netns del xxx
        done
      
        =======================================================================
        BUG kmalloc-96 (Tainted: G    B       E  ): Poison overwritten
        -----------------------------------------------------------------------
        INFO: 0xffff88002b5161e8-0xffff88002b5161e8. First byte 0x6a instead of
        0x6b
        INFO: Allocated in cttimeout_new_timeout+0xd4/0x240 [nfnetlink_cttimeout]
        age=104 cpu=0 pid=3330
        ___slab_alloc+0x4da/0x540
        __slab_alloc+0x20/0x40
        __kmalloc+0x1c8/0x240
        cttimeout_new_timeout+0xd4/0x240 [nfnetlink_cttimeout]
        nfnetlink_rcv_msg+0x21a/0x230 [nfnetlink]
        [ ... ]
      
      So only when the refcnt decreased to 0, we call kfree_rcu to free the
      timeout object. And like nfnetlink_acct do, use atomic_cmpxchg to
      avoid race between ctnl_timeout_try_del and ctnl_timeout_put.
      Signed-off-by: default avatarLiping Zhang <liping.zhang@spreadtrum.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      b75911b6
    • Liping Zhang's avatar
      netfilter: nfnetlink_acct: fix race between nfacct del and xt_nfacct destroy · 12be15dd
      Liping Zhang authored
      
      
      Suppose that we input the following commands at first:
        # nfacct add test
        # iptables -A INPUT -m nfacct --nfacct-name test
      
      And now "test" acct's refcnt is 2, but later when we try to delete the
      "test" nfacct and the related iptables rule at the same time, race maybe
      happen:
            CPU0                                    CPU1
        nfnl_acct_try_del                      nfnl_acct_put
        atomic_dec_and_test //ref=1,testfail          -
             -                                 atomic_dec_and_test //ref=0,testok
             -                                 kfree_rcu
        atomic_inc //ref=1                            -
      
      So after the rcu grace period, nf_acct will be freed but it is still linked
      in the nfnl_acct_list, and we can access it later, then oops will happen.
      
      Convert atomic_dec_and_test and atomic_inc combinaiton to one atomic
      operation atomic_cmpxchg here to fix this problem.
      Signed-off-by: default avatarLiping Zhang <liping.zhang@spreadtrum.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      12be15dd
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 184ca823
      Linus Torvalds authored
      Pull networking fixes from David Miller:
      
       1) Buffers powersave frame test is reversed in cfg80211, fix from Felix
          Fietkau.
      
       2) Remove bogus WARN_ON in openvswitch, from Jarno Rajahalme.
      
       3) Fix some tg3 ethtool logic bugs, and one that would cause no
          interrupts to be generated when rx-coalescing is set to 0.  From
          Satish Baddipadige and Siva Reddy Kallam.
      
       4) QLCNIC mailbox corruption and napi budget handling fix from Manish
          Chopra.
      
       5) Fix fib_trie logic when walking the trie during /proc/net/route
          output than can access a stale node pointer.  From David Forster.
      
       6) Several sctp_diag fixes from Phil Sutter.
      
       7) PAUSE frame handling fixes in mlxsw driver from Ido Schimmel.
      
       8) Checksum fixup fixes in bpf from Daniel Borkmann.
      
       9) Memork leaks in nfnetlink, from Liping Zhang.
      
      10) Use after free in rxrpc, from David Howells.
      
      11) Use after free in new skb_array code of macvtap driver, from Jason
          Wang.
      
      12) Calipso resource leak, from Colin Ian King.
      
      13) mediatek bug fixes (missing stats sync init, etc.) from Sean Wang.
      
      14) Fix bpf non-linear packet write helpers, from Daniel Borkmann.
      
      15) Fix lockdep splats in macsec, from Sabrina Dubroca.
      
      16) hv_netvsc bug fixes from Vitaly Kuznetsov, mostly to do with VF
          handling.
      
      17) Various tc-action bug fixes, from CONG Wang.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (116 commits)
        net_sched: allow flushing tc police actions
        net_sched: unify the init logic for act_police
        net_sched: convert tcf_exts from list to pointer array
        net_sched: move tc offload macros to pkt_cls.h
        net_sched: fix a typo in tc_for_each_action()
        net_sched: remove an unnecessary list_del()
        net_sched: remove the leftover cleanup_a()
        mlxsw: spectrum: Allow packets to be trapped from any PG
        mlxsw: spectrum: Unmap 802.1Q FID before destroying it
        mlxsw: spectrum: Add missing rollbacks in error path
        mlxsw: reg: Fix missing op field fill-up
        mlxsw: spectrum: Trap loop-backed packets
        mlxsw: spectrum: Add missing packet traps
        mlxsw: spectrum: Mark port as active before registering it
        mlxsw: spectrum: Create PVID vPort before registering netdevice
        mlxsw: spectrum: Remove redundant errors from the code
        mlxsw: spectrum: Don't return upon error in removal path
        i40e: check for and deal with non-contiguous TCs
        ixgbe: Re-enable ability to toggle VLAN filtering
        ixgbe: Force VLNCTRL.VFE to be set in all VMDq paths
        ...
      184ca823
  4. 17 Aug, 2016 15 commits