1. 24 Mar, 2015 2 commits
  2. 09 Mar, 2015 1 commit
    • Francesco Ruggeri's avatar
      net: delete stale packet_mclist entries · 82f17091
      Francesco Ruggeri authored
      
      
      When an interface is deleted from a net namespace the ifindex in the
      corresponding entries in PF_PACKET sockets' mclists becomes stale.
      This can create inconsistencies if later an interface with the same ifindex
      is moved from a different namespace (not that unlikely since ifindexes are
      per-namespace).
      In particular we saw problems with dev->promiscuity, resulting
      in "promiscuity touches roof, set promiscuity failed. promiscuity
      feature of device might be broken" warnings and EOVERFLOW failures of
      setsockopt(PACKET_ADD_MEMBERSHIP).
      This patch deletes the mclist entries for interfaces that are deleted.
      Since this now causes setsockopt(PACKET_DROP_MEMBERSHIP) to fail with
      EADDRNOTAVAIL if called after the interface is deleted, also make
      packet_mc_drop not fail.
      Signed-off-by: default avatarFrancesco Ruggeri <fruggeri@arista.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      82f17091
  3. 02 Mar, 2015 4 commits
  4. 24 Feb, 2015 1 commit
    • Alexander Drozdov's avatar
      af_packet: don't pass empty blocks for PACKET_V3 · 41a50d62
      Alexander Drozdov authored
      Before da413eec
      
       ("packet: Fixed TPACKET V3 to signal poll when block is
      closed rather than every packet") poll listening for an af_packet socket was
      not signaled if there was no packets to process. After the patch poll is
      signaled evety time when block retire timer expires. That happens because
      af_packet closes the current block on timeout even if the block is empty.
      
      Passing empty blocks to the user not only wastes CPU but also wastes ring
      buffer space increasing probability of packets dropping on small timeouts.
      Signed-off-by: default avatarAlexander Drozdov <al.drozdov@gmail.com>
      Cc: Dan Collins <dan@dcollins.co.nz>
      Cc: Willem de Bruijn <willemb@google.com>
      Cc: Guy Harris <guy@alum.mit.edu>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      41a50d62
  5. 22 Feb, 2015 1 commit
  6. 13 Jan, 2015 1 commit
  7. 12 Jan, 2015 2 commits
  8. 22 Dec, 2014 1 commit
    • Dan Collins's avatar
      packet: Fixed TPACKET V3 to signal poll when block is closed rather than every packet · da413eec
      Dan Collins authored
      
      
      Make TPACKET_V3 signal poll when block is closed rather than for every
      packet. Side effect is that poll will be signaled when block retire
      timer expires which didn't previously happen. Issue was visible when
      sending packets at a very low frequency such that all blocks are retired
      before packets are received by TPACKET_V3. This caused avoidable packet
      loss. The fix ensures that the signal is sent when blocks are closed
      which covers the normal path where the block is filled as well as the
      path where the timer expires. The case where a block is filled without
      moving to the next block (ie. all blocks are full) will still cause poll
      to be signaled.
      Signed-off-by: default avatarDan Collins <dan@dcollins.co.nz>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      da413eec
  9. 09 Dec, 2014 2 commits
  10. 24 Nov, 2014 4 commits
  11. 21 Nov, 2014 1 commit
  12. 05 Nov, 2014 1 commit
    • David S. Miller's avatar
      net: Add and use skb_copy_datagram_msg() helper. · 51f3d02b
      David S. Miller authored
      
      
      This encapsulates all of the skb_copy_datagram_iovec() callers
      with call argument signature "skb, offset, msghdr->msg_iov, length".
      
      When we move to iov_iters in the networking, the iov_iter object will
      sit in the msghdr.
      
      Having a helper like this means there will be less places to touch
      during that transformation.
      
      Based upon descriptions and patch from Al Viro.
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      51f3d02b
  13. 02 Sep, 2014 2 commits
  14. 30 Aug, 2014 1 commit
  15. 25 Aug, 2014 1 commit
  16. 21 Aug, 2014 1 commit
  17. 29 Jul, 2014 1 commit
  18. 15 Jul, 2014 1 commit
  19. 11 Apr, 2014 1 commit
    • David S. Miller's avatar
      net: Fix use after free by removing length arg from sk_data_ready callbacks. · 676d2369
      David S. Miller authored
      
      
      Several spots in the kernel perform a sequence like:
      
      	skb_queue_tail(&sk->s_receive_queue, skb);
      	sk->sk_data_ready(sk, skb->len);
      
      But at the moment we place the SKB onto the socket receive queue it
      can be consumed and freed up.  So this skb->len access is potentially
      to freed up memory.
      
      Furthermore, the skb->len can be modified by the consumer so it is
      possible that the value isn't accurate.
      
      And finally, no actual implementation of this callback actually uses
      the length argument.  And since nobody actually cared about it's
      value, lots of call sites pass arbitrary values in such as '0' and
      even '1'.
      
      So just remove the length argument from the callback, that way there
      is no confusion whatsoever and all of these use-after-free cases get
      fixed as a side effect.
      
      Based upon a patch by Eric Dumazet and his suggestion to audit this
      issue tree-wide.
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      676d2369
  20. 03 Apr, 2014 2 commits
  21. 28 Mar, 2014 1 commit
    • Daniel Borkmann's avatar
      packet: respect devices with LLTX flag in direct xmit · 43279500
      Daniel Borkmann authored
      
      
      Quite often it can be useful to test with dummy or similar
      devices as a blackhole sink for skbs. Such devices are only
      equipped with a single txq, but marked as NETIF_F_LLTX as
      they do not require locking their internal queues on xmit
      (or implement locking themselves). Therefore, rather use
      HARD_TX_{UN,}LOCK API, so that NETIF_F_LLTX will be respected.
      
      trafgen mmap/TX_RING example against dummy device with config
      foo: { fill(0xff, 64) } results in the following performance
      improvements for such scenarios on an ordinary Core i7/2.80GHz:
      
      Before:
      
       Performance counter stats for 'trafgen -i foo -o du0 -n100000000' (10 runs):
      
         160,975,944,159 instructions:k            #    0.55  insns per cycle          ( +-  0.09% )
         293,319,390,278 cycles:k                  #    0.000 GHz                      ( +-  0.35% )
             192,501,104 branch-misses:k                                               ( +-  1.63% )
                     831 context-switches:k                                            ( +-  9.18% )
                       7 cpu-migrations:k                                              ( +-  7.40% )
                  69,382 cache-misses:k            #    0.010 % of all cache refs      ( +-  2.18% )
             671,552,021 cache-references:k                                            ( +-  1.29% )
      
            22.856401569 seconds time elapsed                                          ( +-  0.33% )
      
      After:
      
       Performance counter stats for 'trafgen -i foo -o du0 -n100000000' (10 runs):
      
         133,788,739,692 instructions:k            #    0.92  insns per cycle          ( +-  0.06% )
         145,853,213,256 cycles:k                  #    0.000 GHz                      ( +-  0.17% )
              59,867,100 branch-misses:k                                               ( +-  4.72% )
                     384 context-switches:k                                            ( +-  3.76% )
                       6 cpu-migrations:k                                              ( +-  6.28% )
                  70,304 cache-misses:k            #    0.077 % of all cache refs      ( +-  1.73% )
              90,879,408 cache-references:k                                            ( +-  1.35% )
      
            11.719372413 seconds time elapsed                                          ( +-  0.24% )
      Signed-off-by: default avatarDaniel Borkmann <dborkman@redhat.com>
      Cc: Jesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      43279500
  22. 26 Mar, 2014 1 commit
  23. 28 Feb, 2014 1 commit
    • Daniel Borkmann's avatar
      packet: allow to transmit +4 byte in TX_RING slot for VLAN case · 52f1454f
      Daniel Borkmann authored
      Commit 57f89bfa ("network: Allow af_packet to transmit +4 bytes
      for VLAN packets.") added the possibility for non-mmaped frames to
      send extra 4 byte for VLAN header so the MTU increases from 1500 to
      1504 byte, for example.
      
      Commit cbd89acb ("af_packet: fix for sending VLAN frames via
      packet_mmap") attempted to fix that for the mmap part but was
      reverted as it caused regressions while using eth_type_trans()
      on output path.
      
      Lets just act analogous to 57f89bfa
      
       and add a similar logic
      to TX_RING. We presume size_max as overcharged with +4 bytes and
      later on after skb has been built by tpacket_fill_skb() check
      for ETH_P_8021Q header on packets larger than normal MTU. Can
      be easily reproduced with a slightly modified trafgen in mmap(2)
      mode, test cases:
      
       { fill(0xff, 12) const16(0x8100) fill(0xff, <1504|1505>) }
       { fill(0xff, 12) const16(0x0806) fill(0xff, <1500|1501>) }
      
      Note that we need to do the test right after tpacket_fill_skb()
      as sockets can have PACKET_LOSS set where we would not fail but
      instead just continue to traverse the ring.
      Reported-by: default avatarMathias Kretschmer <mathias.kretschmer@fokus.fraunhofer.de>
      Signed-off-by: default avatarDaniel Borkmann <dborkman@redhat.com>
      Cc: Ben Greear <greearb@candelatech.com>
      Cc: Phil Sutter <phil@nwl.cc>
      Tested-by: default avatarMathias Kretschmer <mathias.kretschmer@fokus.fraunhofer.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      52f1454f
  24. 18 Feb, 2014 1 commit
  25. 17 Feb, 2014 1 commit
    • Daniel Borkmann's avatar
      packet: check for ndo_select_queue during queue selection · 0fd5d57b
      Daniel Borkmann authored
      Mathias reported that on an AMD Geode LX embedded board (ALiX)
      with ath9k driver PACKET_QDISC_BYPASS, introduced in commit
      d346a3fa ("packet: introduce PACKET_QDISC_BYPASS socket
      option"), triggers a WARN_ON() coming from the driver itself
      via 066dae93
      
       ("ath9k: rework tx queue selection and fix
      queue stopping/waking").
      
      The reason why this happened is that ndo_select_queue() call
      is not invoked from direct xmit path i.e. for ieee80211 subsystem
      that sets queue and TID (similar to 802.1d tag) which is being
      put into the frame through 802.11e (WMM, QoS). If that is not
      set, pending frame counter for e.g. ath9k can get messed up.
      
      So the WARN_ON() in ath9k is absolutely legitimate. Generally,
      the hw queue selection in ieee80211 depends on the type of
      traffic, and priorities are set according to ieee80211_ac_numbers
      mapping; working in a similar way as DiffServ only on a lower
      layer, so that the AP can favour frames that have "real-time"
      requirements like voice or video data frames.
      
      Therefore, check for presence of ndo_select_queue() in netdev
      ops and, if available, invoke it with a fallback handler to
      __packet_pick_tx_queue(), so that driver such as bnx2x, ixgbe,
      or mlx4 can still select a hw queue for transmission in
      relation to the current CPU while e.g. ieee80211 subsystem
      can make their own choices.
      Reported-by: default avatarMathias Kretschmer <mathias.kretschmer@fokus.fraunhofer.de>
      Signed-off-by: default avatarDaniel Borkmann <dborkman@redhat.com>
      Cc: Jesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0fd5d57b
  26. 23 Jan, 2014 1 commit
  27. 22 Jan, 2014 3 commits