1. 28 Apr, 2016 3 commits
  2. 14 Apr, 2016 2 commits
    • Martin KaFai Lau's avatar
      ipv6: udp: Do a route lookup and update during release_cb · e646b657
      Martin KaFai Lau authored
      This patch adds a release_cb for UDPv6.  It does a route lookup
      and updates sk->sk_dst_cache if it is needed.  It picks up the
      left-over job from ip6_sk_update_pmtu() if the sk was owned
      by user during the pmtu update.
      It takes a rcu_read_lock to protect the __sk_dst_get() operations
      because another thread may do ip6_dst_store() without taking the
      sk lock (e.g. sendmsg).
      Fixes: 45e4fd26
       ("ipv6: Only create RTF_CACHE routes after encountering pmtu exception")
      Signed-off-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Reported-by: default avatarWei Wang <weiwan@google.com>
      Cc: Cong Wang <xiyou.wangcong@gmail.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Wei Wang <weiwan@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
    • Martin KaFai Lau's avatar
      ipv6: datagram: Update dst cache of a connected datagram sk during pmtu update · 33c162a9
      Martin KaFai Lau authored
      There is a case in connected UDP socket such that
      getsockopt(IPV6_MTU) will return a stale MTU value. The reproducible
      sequence could be the following:
      1. Create a connected UDP socket
      2. Send some datagrams out
      3. Receive a ICMPV6_PKT_TOOBIG
      4. No new outgoing datagrams to trigger the sk_dst_check()
         logic to update the sk->sk_dst_cache.
      5. getsockopt(IPV6_MTU) returns the mtu from the invalid
         sk->sk_dst_cache instead of the newly created RTF_CACHE clone.
      This patch updates the sk->sk_dst_cache for a connected datagram sk
      during pmtu-update code path.
      Note that the sk->sk_v6_daddr is used to do the route lookup
      instead of skb->data (i.e. iph).  It is because a UDP socket can become
      connected after sending out some datagrams in un-connected state.  or
      It can be connected multiple times to different destinations.  Hence,
      iph may not be related to where sk is currently connected to.
      It is done under '!sock_owned_by_user(sk)' condition because
      the user may make another ip6_datagram_connect()  (i.e changing
      the sk->sk_v6_daddr) while dst lookup is happening in the pmtu-update
      code path.
      For the sock_owned_by_user(sk) == true case, the next patch will
      introduce a release_cb() which will update the sk->sk_dst_cache.
      Server (Connected UDP Socket):
      Route Details:
      [root@arch-fb-vm1 ~]# ip -6 r show | egrep '2fac'
      2fac::/64 dev eth0  proto kernel  metric 256  pref medium
      2fac:face::/64 via 2fac::face dev eth0  metric 1024  pref medium
      A simple python code to create a connected UDP socket:
      import socket
      import errno
      HOST = '2fac::1'
      PORT = 8080
      s = socket.socket(socket.AF_INET6, socket.SOCK_DGRAM)
      s.bind((HOST, PORT))
      s.connect(('2fac:face::face', 53))
      while True:
      	data = s.recv(1024)
          except socket.error as se:
      	if se.errno == errno.EMSGSIZE:
      		pmtu = s.getsockopt(41, 24)
      		print("PMTU:%d" % pmtu)
      Python program output after getting a ICMPV6_PKT_TOOBIG:
      [root@arch-fb-vm1 ~]# python2 ~/devshare/kernel/tasks/fib6/udp-connect-53-8080.py
      Cache routes after recieving TOOBIG:
      [root@arch-fb-vm1 ~]# ip -6 r show table cache
      2fac:face::face via 2fac::face dev eth0  metric 0
          cache  expires 463sec mtu 1300 pref medium
      Client (Send the ICMPV6_PKT_TOOBIG):
      scapy is used to generate the TOOBIG message.  Here is the scapy script I have
      >>> p=Ether(src='da:75:4d:36:ac:32', dst='52:54:00:12:34:66', type=0x86dd)/IPv6(src='2fac::face', dst='2fac::1')/ICMPv6PacketTooBig(mtu=1300)/IPv6(src='2fac::
      1',dst='2fac:face::face', nh='UDP')/UDP(sport=8080,dport=53)
      >>> sendp(p, iface='qemubr0')
      Fixes: 45e4fd26
       ("ipv6: Only create RTF_CACHE routes after encountering pmtu exception")
      Signed-off-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Reported-by: default avatarWei Wang <weiwan@google.com>
      Cc: Cong Wang <xiyou.wangcong@gmail.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Wei Wang <weiwan@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
  3. 04 Apr, 2016 1 commit
    • Soheil Hassas Yeganeh's avatar
      sock: enable timestamping using control messages · c14ac945
      Soheil Hassas Yeganeh authored
      Currently, SOL_TIMESTAMPING can only be enabled using setsockopt.
      This is very costly when users want to sample writes to gather
      tx timestamps.
      Add support for enabling SO_TIMESTAMPING via control messages by
      using tsflags added in `struct sockcm_cookie` (added in the previous
      patches in this series) to set the tx_flags of the last skb created in
      a sendmsg. With this patch, the timestamp recording bits in tx_flags
      of the skbuff is overridden if SO_TIMESTAMPING is passed in a cmsg.
      Please note that this is only effective for overriding the recording
      timestamps flags. Users should enable timestamp reporting (e.g.,
      socket options and then should ask for SOF_TIMESTAMPING_TX_*
      using control messages per sendmsg to sample timestamps for each
      Signed-off-by: default avatarSoheil Hassas Yeganeh <soheil@google.com>
      Acked-by: default avatarWillem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
  4. 20 Mar, 2016 1 commit
  5. 18 Feb, 2016 1 commit
  6. 10 Dec, 2015 1 commit
  7. 03 Dec, 2015 1 commit
  8. 24 Nov, 2015 1 commit
  9. 08 Oct, 2015 6 commits
  10. 25 Sep, 2015 2 commits
  11. 18 Sep, 2015 1 commit
    • Eric W. Biederman's avatar
      netfilter: Pass net into okfn · 0c4b51f0
      Eric W. Biederman authored
      This is immediately motivated by the bridge code that chains functions that
      call into netfilter.  Without passing net into the okfns the bridge code would
      need to guess about the best expression for the network namespace to process
      packets in.
      As net is frequently one of the first things computed in continuation functions
      after netfilter has done it's job passing in the desired network namespace is in
      many cases a code simplification.
      To support this change the function dst_output_okfn is introduced to
      simplify passing dst_output as an okfn.  For the moment dst_output_okfn
      just silently drops the struct net.
      Signed-off-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
  12. 01 Aug, 2015 3 commits
  13. 31 Jul, 2015 1 commit
    • Roopa Prabhu's avatar
      ipv6: change ipv6_stub_impl.ipv6_dst_lookup to take net argument · 343d60aa
      Roopa Prabhu authored
      This patch adds net argument to ipv6_stub_impl.ipv6_dst_lookup
      for use cases where sk is not available (like mpls).
      sk appears to be needed to get the namespace 'net' and is optional
      otherwise. This patch series changes ipv6_stub_impl.ipv6_dst_lookup
      to take net argument. sk remains optional.
      All callers of ipv6_stub_impl.ipv6_dst_lookup have been modified
      to pass net. I have modified them to use already available
      'net' in the scope of the call. I can change them to
      sock_net(sk) to avoid any unintended change in behaviour if sock
      namespace is different. They dont seem to be from code inspection.
      Signed-off-by: default avatarRoopa Prabhu <roopa@cumulusnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
  14. 30 Jul, 2015 1 commit
  15. 04 Jun, 2015 2 commits
    • Tom Herbert's avatar
      net: Add full IPv6 addresses to flow_keys · c3f83241
      Tom Herbert authored
      This patch adds full IPv6 addresses into flow_keys and uses them as
      input to the flow hash function. The implementation supports either
      IPv4 or IPv6 addresses in a union, and selector is used to determine
      how may words to input to jhash2.
      We also add flow_get_u32_dst and flow_get_u32_src functions which are
      used to get a u32 representation of the source and destination
      addresses. For IPv6, ipv6_addr_hash is called. These functions retain
      getting the legacy values of src and dst in flow_keys.
      With this patch, Ethertype and IP protocol are now included in the
      flow hash input.
      Signed-off-by: default avatarTom Herbert <tom@herbertland.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
    • Tom Herbert's avatar
      net: Get skb hash over flow_keys structure · 42aecaa9
      Tom Herbert authored
      This patch changes flow hashing to use jhash2 over the flow_keys
      structure instead just doing jhash_3words over src, dst, and ports.
      This method will allow us take more input into the hashing function
      so that we can include full IPv6 addresses, VLAN, flow labels etc.
      without needing to resort to xor'ing which makes for a poor hash.
      Acked-by: default avatarJiri Pirko <jiri@resnulli.us>
      Signed-off-by: default avatarTom Herbert <tom@herbertland.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
  16. 26 May, 2015 1 commit
  17. 25 May, 2015 2 commits
  18. 13 May, 2015 3 commits
  19. 04 May, 2015 1 commit
    • Tom Herbert's avatar
      ipv6: Flow label state ranges · 82a584b7
      Tom Herbert authored
      This patch divides the IPv6 flow label space into two ranges:
      0-7ffff is reserved for flow label manager, 80000-fffff will be
      used for creating auto flow labels (per RFC6438). This only affects how
      labels are set on transmit, it does not affect receive. This range split
      can be disbaled by systcl.
      IPv6 flow labels have been an unmitigated disappointment thus far
      in the lifetime of IPv6. Support in HW devices to use them for ECMP
      is lacking, and OSes don't turn them on by default. If we had these
      we could get much better hashing in IPv6 networks without resorting
      to DPI, possibly eliminating some of the motivations to to define new
      encaps in UDP just for getting ECMP.
      Unfortunately, the initial specfications of IPv6 did not clarify
      how they are to be used. There has always been a vague concept that
      these can be used for ECMP, flow hashing, etc. and we do now have a
      good standard how to this in RFC6438. The problem is that flow labels
      can be either stateful or stateless (as in RFC6438), and we are
      presented with the possibility that a stateless label may collide
      with a stateful one.  Attempts to split the flow label space were
      rejected in IETF. When we added support in Linux for RFC6438, we
      could not turn on flow labels by default due to this conflict.
      This patch splits the flow label space and should give us
      a path to enabling auto flow labels by default for all IPv6 packets.
      This is an API change so we need to consider compatibility with
      existing deployment. The stateful range is chosen to be the lower
      values in hopes that most uses would have chosen small numbers.
      Once we resolve the stateless/stateful issue, we can proceed to
      look at enabling RFC6438 flow labels by default (starting with
      scaled testing).
      Signed-off-by: default avatarTom Herbert <tom@herbertland.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
  20. 08 Apr, 2015 1 commit
  21. 07 Apr, 2015 2 commits
  22. 25 Mar, 2015 1 commit
  23. 19 Mar, 2015 1 commit
  24. 27 Feb, 2015 1 commit