1. 01 Sep, 2015 2 commits
  2. 17 Sep, 2014 1 commit
  3. 30 Oct, 2013 1 commit
    • Daniel Borkmann's avatar
      net: ipvs: sctp: do not recalc sctp csum when ports didn't change · 97203abe
      Daniel Borkmann authored
      
      
      Unlike UDP or TCP, we do not take the pseudo-header into
      account in SCTP checksums. So in case port mapping is the
      very same, we do not need to recalculate the whole SCTP
      checksum in software, which is very expensive.
      
      Also, similarly as in TCP, take into account when a private
      helper mangled the packet. In that case, we also need to
      recalculate the checksum even if ports might be same.
      
      Thanks for feedback regarding skb->ip_summed checks from
      Julian Anastasov; here's a discussion on these checks for
      snat and dnat:
      
      * For snat_handler(), we can see CHECKSUM_PARTIAL from
        virtual devices, and from LOCAL_OUT, otherwise it
        should be CHECKSUM_UNNECESSARY. In general, in snat it
        is more complex. skb contains the original route and
        ip_vs_route_me_harder() can change the route after
        snat_handler. So, for locally generated replies from
        local server we can not preserve the CHECKSUM_PARTIAL
        mode. It is an chicken or egg dilemma: snat_handler
        needs the device after rerouting (to check for
        NETIF_F_SCTP_CSUM), while ip_route_me_harder() wants
        the snat_handler() to put the new saddr for proper
        rerouting.
      
      * For dnat_handler(), we should not see CHECKSUM_COMPLETE
        for SCTP, in fact the small set of drivers that support
        SCTP offloading return CHECKSUM_UNNECESSARY on correctly
        received SCTP csum. We can see CHECKSUM_PARTIAL from
        local stack or received from virtual drivers. The idea is
        that SCTP decides to avoid csum calculation if hardware
        supports offloading. IPVS can change the device after
        rerouting to real server but we can preserve the
        CHECKSUM_PARTIAL mode if the new device supports
        offloading too. This works because skb dst is changed
        before dnat_handler and we see the new device. So, checks
        in the 'if' part will decide whether it is ok to keep
        CHECKSUM_PARTIAL for the output. If the packet was with
        CHECKSUM_NONE, hence we deal with unknown checksum. As we
        recalculate the sum for IP header in all cases, it should
        be safe to use CHECKSUM_UNNECESSARY. We can forward wrong
        checksum in this case (without cp->app). In case of
        CHECKSUM_UNNECESSARY, the csum was valid on receive.
      Signed-off-by: default avatarDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: default avatarJulian Anastasov <ja@ssi.bg>
      Signed-off-by: default avatarSimon Horman <horms@verge.net.au>
      97203abe
  4. 28 Oct, 2013 1 commit
  5. 28 Jul, 2013 1 commit
  6. 26 Jun, 2013 2 commits
    • Julian Anastasov's avatar
      ipvs: replace the SCTP state machine · 61e7c420
      Julian Anastasov authored
      
      
      Convert the SCTP state table, so that it is more readable.
      Change the states to be according to the diagram in RFC 2960
      and add more states suitable for middle box. Still, such
      change in states adds incompatibility if systems in sync
      setup include this change and others do not include it.
      
      With this change we also have proper transitions in INPUT-ONLY
      mode (DR/TUN) where we see packets only from client. Now
      we should not switch to 10-second CLOSED state at a time
      when we should stay in ESTABLISHED state.
      
      The short names for states are because we have 16-char space
      in ipvsadm and 11-char limit for the connection list format.
      It is a sequence of the TCP implementation where the longest
      state name is ESTABLISHED.
      Signed-off-by: default avatarJulian Anastasov <ja@ssi.bg>
      Signed-off-by: default avatarSimon Horman <horms@verge.net.au>
      61e7c420
    • Alexander Frolkin's avatar
      ipvs: sloppy TCP and SCTP · c6c96c18
      Alexander Frolkin authored
      
      
      This adds support for sloppy TCP and SCTP modes to IPVS.
      
      When enabled (sysctls net.ipv4.vs.sloppy_tcp and
      net.ipv4.vs.sloppy_sctp), allows IPVS to create connection state on any
      packet, not just a TCP SYN (or SCTP INIT).
      
      This allows connections to fail over from one IPVS director to another
      mid-flight.
      Signed-off-by: default avatarAlexander Frolkin <avf@eldamar.org.uk>
      Signed-off-by: default avatarJulian Anastasov <ja@ssi.bg>
      Signed-off-by: default avatarSimon Horman <horms@verge.net.au>
      c6c96c18
  7. 23 Apr, 2013 1 commit
  8. 01 Apr, 2013 3 commits
    • Julian Anastasov's avatar
      ipvs: do not disable bh for long time · ac69269a
      Julian Anastasov authored
      
      
      We used a global BH disable in LOCAL_OUT hook.
      Add _bh suffix to all places that need it and remove
      the disabling from LOCAL_OUT and sync code.
      
      Functions like ip_defrag need protection from
      BH, so add it. As for nf_nat_mangle_tcp_packet, it needs
      RCU lock.
      Signed-off-by: default avatarJulian Anastasov <ja@ssi.bg>
      Signed-off-by: default avatarSimon Horman <horms@verge.net.au>
      ac69269a
    • Julian Anastasov's avatar
      ipvs: convert services to rcu · ceec4c38
      Julian Anastasov authored
      
      
      This is the final step in RCU conversion.
      
      Things that are removed:
      
      - svc->usecnt: now svc is accessed under RCU read lock
      - svc->inc: and some unused code
      - ip_vs_bind_pe and ip_vs_unbind_pe: no ability to replace PE
      - __ip_vs_svc_lock: replaced with RCU
      - IP_VS_WAIT_WHILE: now readers lookup svcs and dests under
      	RCU and work in parallel with configuration
      
      Other changes:
      
      - before now, a RCU read-side critical section included the
      calling of the schedule method, now it is extended to include
      service lookup
      - ip_vs_svc_table and ip_vs_svc_fwm_table are now using hlist
      - svc->pe and svc->scheduler remain to the end (of grace period),
      	the schedulers are prepared for such RCU readers
      	even after done_service is called but they need
      	to use synchronize_rcu because last ip_vs_scheduler_put
      	can happen while RCU read-side critical sections
      	use an outdated svc->scheduler pointer
      - as planned, update_service is removed
      - empty services can be freed immediately after grace period.
      	If dests were present, the services are freed from
      	the dest trash code
      Signed-off-by: default avatarJulian Anastasov <ja@ssi.bg>
      Signed-off-by: default avatarSimon Horman <horms@verge.net.au>
      ceec4c38
    • Julian Anastasov's avatar
      ipvs: convert app locks · 363c97d7
      Julian Anastasov authored
      
      
      We use locks like tcp_app_lock, udp_app_lock,
      sctp_app_lock to protect access to the protocol hash tables
      from readers in packet context while the application
      instances (inc) are [un]registered under global mutex.
      
      As the hash tables are mostly read when conns are
      created and bound to app, use RCU for readers and reclaim
      app instance after grace period.
      
      Simplify ip_vs_app_inc_get because we use usecnt
      only for statistics and rely on module refcounting.
      Signed-off-by: default avatarJulian Anastasov <ja@ssi.bg>
      Signed-off by: Hans Schillstrom <hans@schillstrom.com>
      Signed-off-by: default avatarSimon Horman <horms@verge.net.au>
      363c97d7
  9. 19 Mar, 2013 1 commit
  10. 06 Feb, 2013 1 commit
    • Daniel Borkmann's avatar
      ipvs: sctp: fix checksumming on snat and dnat handlers · 4b47bc9a
      Daniel Borkmann authored
      In our test lab, we have a simple SCTP client connecting to a SCTP
      server via an IPVS load balancer. On some machines, load balancing
      works, but on others the initial handshake just fails, thus no
      SCTP connection whatsoever can be established!
      
      We observed that the SCTP INIT-ACK handshake reply from the IPVS
      machine to the client had a correct IP checksum, but corrupt SCTP
      checksum when forwarded, thus on the client-side the packet was
      dropped and an intial handshake retriggered until all attempts
      run into the void.
      
      To fix this issue, this patch i) adds a missing CHECKSUM_UNNECESSARY
      after the full checksum (re-)calculation (as done in IPVS TCP and UDP
      code as well), ii) calculates the checksum in little-endian format
      (as fixed with the SCTP code in commit 4458f04c
      
      : sctp: Clean up sctp
      checksumming code) and iii) refactors duplicate checksum code into a
      common function. Tested by myself.
      Signed-off-by: default avatarDaniel Borkmann <dborkman@redhat.com>
      Acked-by: default avatarJulian Anastasov <ja@ssi.bg>
      Signed-off-by: default avatarSimon Horman <horms@verge.net.au>
      4b47bc9a
  11. 28 Sep, 2012 2 commits
    • Jesper Dangaard Brouer's avatar
      ipvs: API change to avoid rescan of IPv6 exthdr · d4383f04
      Jesper Dangaard Brouer authored
      
      
      Reduce the number of times we scan/skip the IPv6 exthdrs.
      
      This patch contains a lot of API changes.  This is done, to avoid
      repeating the scan of finding the IPv6 headers, via ipv6_find_hdr(),
      which is called by ip_vs_fill_iph_skb().
      
      Finding the IPv6 headers is done as early as possible, and passed on
      as a pointer "struct ip_vs_iphdr *" to the affected functions.
      
      This patch reduce/removes 19 calls to ip_vs_fill_iph_skb().
      
      Notice, I have choosen, not to change the API of function
      pointer "(*schedule)" (in struct ip_vs_scheduler) as it can be
      used by external schedulers, via {un,}register_ip_vs_scheduler.
      Only 4 out of 10 schedulers use info from ip_vs_iphdr*, and when
      they do, they are only interested in iph->{s,d}addr.
      Signed-off-by: default avatarJesper Dangaard Brouer <brouer@redhat.com>
      Acked-by: default avatarJulian Anastasov <ja@ssi.bg>
      Signed-off-by: default avatarSimon Horman <horms@verge.net.au>
      d4383f04
    • Jesper Dangaard Brouer's avatar
      ipvs: Fix faulty IPv6 extension header handling in IPVS · 63dca2c0
      Jesper Dangaard Brouer authored
      IPv6 packets can contain extension headers, thus its wrong to assume
      that the transport/upper-layer header, starts right after (struct
      ipv6hdr) the IPv6 header.  IPVS uses this false assumption, and will
      write SNAT & DNAT modifications at a fixed pos which will corrupt the
      message.
      
      To fix this, proper header position must be found before modifying
      packets.  Introducing ip_vs_fill_iph_skb(), which uses ipv6_find_hdr()
      to skip the exthdrs. It finds (1) the transport header offset, (2) the
      protocol, and (3) detects if the packet is a fragment.
      
      Note, that fragments in IPv6 is represented via an exthdr.  Thus, this
      is detected while skipping through the exthdrs.
      
      This patch depends on commit 84018f55
      
      :
       "netfilter: ip6_tables: add flags parameter to ipv6_find_hdr()"
      This also adds a dependency to ip6_tables.
      
      Originally based on patch from: Hans Schillstrom
      
      kABI notes:
      Changing struct ip_vs_iphdr is a potential minor kABI breaker,
      because external modules can be compiled with another version of
      this struct.  This should not matter, as they would most-likely
      be using a compiled-in version of ip_vs_fill_iphdr().  When
      recompiled, they will notice ip_vs_fill_iphdr() no longer exists,
      and they have to used ip_vs_fill_iph_skb() instead.
      Signed-off-by: default avatarJesper Dangaard Brouer <brouer@redhat.com>
      Acked-by: default avatarJulian Anastasov <ja@ssi.bg>
      Signed-off-by: default avatarSimon Horman <horms@verge.net.au>
      63dca2c0
  12. 30 Apr, 2012 1 commit
  13. 01 Nov, 2011 1 commit
  14. 31 Mar, 2011 1 commit
  15. 03 Feb, 2011 1 commit
  16. 13 Jan, 2011 7 commits
  17. 25 Nov, 2010 1 commit
    • Hans Schillstrom's avatar
      IPVS: Handle Scheduling errors. · a5959d53
      Hans Schillstrom authored
      
      
      If ip_vs_conn_fill_param_persist return an error to ip_vs_sched_persist,
      this error must propagate as ignored=-1 to ip_vs_schedule().
      Errors from ip_vs_conn_new() in ip_vs_sched_persist() and ip_vs_schedule()
      should also return *ignored=-1;
      
      This patch just relies on the fact that ignored is 1 before calling
      ip_vs_sched_persist().
      
      Sent from Julian:
        "The new case when ip_vs_conn_fill_param_persist fails
         should set *ignored = -1, so that we can use NF_DROP,
         see below. *ignored = -1 should be also used for ip_vs_conn_new
         failure in ip_vs_sched_persist() and ip_vs_schedule().
         The new negative value should be handled in tcp,udp,sctp"
      
      "To summarize:
      
      - *ignored = 1:
            protocol tried to schedule (eg. on SYN), found svc but the
            svc/scheduler decides that this packet should be accepted with
            NF_ACCEPT because it must not be scheduled.
      
      - *ignored = 0:
            scheduler can not find destination, so try bypass or
            return ICMP and then NF_DROP (ip_vs_leave).
      
      - *ignored = -1:
            scheduler tried to schedule but fatal error occurred, eg.
            ip_vs_conn_new failure (ENOMEM) or ip_vs_sip_fill_param
            failure such as missing Call-ID, ENOMEM on skb_linearize
            or pe_data. In this case we should return NF_DROP without
            any attempts to send ICMP with ip_vs_leave."
      
      More or less all ideas and input to this patch is work from
      Julian Anastasov
      Signed-off-by: default avatarHans Schillstrom <hans.schillstrom@ericsson.com>
      Acked-by: default avatarJulian Anastasov <ja@ssi.bg>
      Signed-off-by: default avatarSimon Horman <horms@verge.net.au>
      a5959d53
  18. 21 Oct, 2010 2 commits
    • Julian Anastasov's avatar
      ipvs: provide address family for debugging · 0d79641a
      Julian Anastasov authored
      
      
       	As skb->protocol is not valid in LOCAL_OUT add
      parameter for address family in packet debugging functions.
      Even if ports are not present in AH and ESP change them to
      use ip_vs_tcpudp_debug_packet to show at least valid addresses
      as before. This patch removes the last user of skb->protocol
      in IPVS.
      Signed-off-by: default avatarJulian Anastasov <ja@ssi.bg>
      Signed-off-by: default avatarSimon Horman <horms@verge.net.au>
      0d79641a
    • Julian Anastasov's avatar
      ipvs: do not schedule conns from real servers · 190ecd27
      Julian Anastasov authored
      
      
       	This patch is needed to avoid scheduling of
      packets from local real server when we add ip_vs_in
      in LOCAL_OUT hook to support local client.
      
       	Currently, when ip_vs_in can not find existing
      connection it tries to create new one by calling ip_vs_schedule.
      
       	The default indication from ip_vs_schedule was if
      connection was scheduled to real server. If real server is
      not available we try to use the bypass forwarding method
      or to send ICMP error. But in some cases we do not want to use
      the bypass feature. So, add flag 'ignored' to indicate if
      the scheduler ignores this packet.
      
       	Make sure we do not create new connections from replies.
      We can hit this problem for persistent services and local real
      server when ip_vs_in is added to LOCAL_OUT hook to handle
      local clients.
      
       	Also, make sure ip_vs_schedule ignores SYN packets
      for Active FTP DATA from local real server. The FTP DATA
      connection should be created on SYN+ACK from client to assign
      correct connection daddr.
      Signed-off-by: default avatarJulian Anastasov <ja@ssi.bg>
      Signed-off-by: default avatarSimon Horman <horms@verge.net.au>
      190ecd27
  19. 05 Oct, 2010 1 commit
  20. 02 Aug, 2010 1 commit
  21. 09 Jul, 2010 1 commit
  22. 18 Feb, 2010 1 commit