- 28 Sep, 2012 3 commits
-
-
Jesper Dangaard Brouer authored
IPv6 packets can contain extension headers, thus its wrong to assume that the transport/upper-layer header, starts right after (struct ipv6hdr) the IPv6 header. IPVS uses this false assumption, and will write SNAT & DNAT modifications at a fixed pos which will corrupt the message. To fix this, proper header position must be found before modifying packets. Introducing ip_vs_fill_iph_skb(), which uses ipv6_find_hdr() to skip the exthdrs. It finds (1) the transport header offset, (2) the protocol, and (3) detects if the packet is a fragment. Note, that fragments in IPv6 is represented via an exthdr. Thus, this is detected while skipping through the exthdrs. This patch depends on commit 84018f55 : "netfilter: ip6_tables: add flags parameter to ipv6_find_hdr()" This also adds a dependency to ip6_tables. Originally based on patch from: Hans Schillstrom kABI notes: Changing struct ip_vs_iphdr is a potential minor kABI breaker, because external modules can be compiled with another version of this struct. This should not matter, as they would most-likely be using a compiled-in version of ip_vs_fill_iphdr(). When recompiled, they will notice ip_vs_fill_iphdr() no longer exists, and they have to used ip_vs_fill_iph_skb() instead. Signed-off-by:
Jesper Dangaard Brouer <brouer@redhat.com> Acked-by:
Julian Anastasov <ja@ssi.bg> Signed-off-by:
Simon Horman <horms@verge.net.au>
-
Jesper Dangaard Brouer authored
Extend handling of ICMPv6, to all none Informational Messages (via ICMPV6_INFOMSG_MASK). This actually only extend our handling to type ICMPV6_PARAMPROB (Parameter Problem), and future types. Signed-off-by:
Jesper Dangaard Brouer <brouer@redhat.com> Acked-by:
Julian Anastasov <ja@ssi.bg> Signed-off-by:
Simon Horman <horms@verge.net.au>
-
Jesper Dangaard Brouer authored
Have not converted the proc file output to compressed IPv6 addresses. Signed-off-by:
Jesper Dangaard Brouer <brouer@redhat.com> Acked-by:
Julian Anastasov <ja@ssi.bg> Signed-off-by:
Simon Horman <horms@verge.net.au>
-
- 10 Sep, 2012 2 commits
-
-
Eric W. Biederman authored
It is a frequent mistake to confuse the netlink port identifier with a process identifier. Try to reduce this confusion by renaming fields that hold port identifiers portid instead of pid. I have carefully avoided changing the structures exported to userspace to avoid changing the userspace API. I have successfully built an allyesconfig kernel with this change. Signed-off-by:
"Eric W. Biederman" <ebiederm@xmission.com> Acked-by:
Stephen Hemminger <shemminger@vyatta.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Wei Yongjun authored
Using list_del_init() instead of list_del() + INIT_LIST_HEAD(). spatch with a semantic match is used to found this problem. (http://coccinelle.lip6.fr/ ) Signed-off-by:
Wei Yongjun <yongjun_wei@trendmicro.com.cn> Acked-by:
Simon Horman <horms@verge.net.au> Signed-off-by:
Pablo Neira Ayuso <pablo@netfilter.org>
-
- 30 Aug, 2012 4 commits
-
-
Julia Lawall authored
Initialize return variable before exiting on an error path. A simplified version of the semantic match that finds this problem is as follows: (http://coccinelle.lip6.fr/ ) // <smpl> ( if@p1 (\(ret < 0\|ret != 0\)) { ... return ret; } | ret@p1 = 0 ) ... when != ret = e1 when != &ret *if(...) { ... when != ret = e2 when forall return ret; } // </smpl> Signed-off-by:
Julia Lawall <Julia.Lawall@lip6.fr> Acked-by:
Simon Horman <horms@verge.net.au> Signed-off-by:
Pablo Neira Ayuso <pablo@netfilter.org>
-
Patrick McHardy authored
For mangling IPv6 packets the protocol header offset needs to be known by the NAT packet mangling functions. Add a so far unused protoff argument and convert the conntrack and NAT helpers to use it in preparation of IPv6 NAT. Signed-off-by:
Patrick McHardy <kaber@trash.net>
-
Patrick McHardy authored
The IPv6 conntrack fragmentation currently has a couple of shortcomings. Fragmentes are collected in PREROUTING/OUTPUT, are defragmented, the defragmented packet is then passed to conntrack, the resulting conntrack information is attached to each original fragment and the fragments then continue their way through the stack. Helper invocation occurs in the POSTROUTING hook, at which point only the original fragments are available. The result of this is that fragmented packets are never passed to helpers. This patch improves the situation in the following way: - If a reassembled packet belongs to a connection that has a helper assigned, the reassembled packet is passed through the stack instead of the original fragments. - During defragmentation, the largest received fragment size is stored. On output, the packet is refragmented if required. If the largest received fragment size exceeds the outgoing MTU, a "packet too big" message is generated, thus behaving as if the original fragments were passed through the stack from an outside point of view. - The ipv6_helper() hook function can't receive fragments anymore for connections using a helper, so it is switched to use ipv6_skip_exthdr() instead of the netfilter specific nf_ct_ipv6_skip_exthdr() and the reassembled packets are passed to connection tracking helpers. The result of this is that we can properly track fragmented packets, but still generate ICMPv6 Packet too big messages if we would have before. This patch is also required as a precondition for IPv6 NAT, where NAT helpers might enlarge packets up to a point that they require fragmentation. In that case we can't generate Packet too big messages since the proper MTU can't be calculated in all cases (f.i. when changing textual representation of a variable amount of addresses), so the packet is transparently fragmented iff the original packet or fragments would have fit the outgoing MTU. IPVS parts by Jesper Dangaard Brouer <brouer@redhat.com>. Signed-off-by:
Patrick McHardy <kaber@trash.net>
-
Jesper Dangaard Brouer authored
Cleaning up the IPv6 MTU checking in the IPVS xmit code, by using a common helper function __mtu_check_toobig_v6(). The MTU check for tunnel mode can also use this helper as ntohs(old_iph->payload_len) + sizeof(struct ipv6hdr) is qual to skb->len. And the 'mtu' variable have been adjusted before calling helper. Notice, this also fixes a bug, as the the MTU check in ip_vs_dr_xmit_v6() were missing a check for skb_is_gso(). This bug e.g. caused issues for KVM IPVS setups, where different Segmentation Offloading techniques are utilized, between guests, via the virtio driver. This resulted in very bad performance, due to the ICMPv6 "too big" messages didn't affect the sender. Signed-off-by:
Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by:
Patrick McHardy <kaber@trash.net> Signed-off-by:
Pablo Neira Ayuso <pablo@netfilter.org>
-
- 16 Aug, 2012 1 commit
-
-
Mathias Krause authored
If at least one of CONFIG_IP_VS_PROTO_TCP or CONFIG_IP_VS_PROTO_UDP is not set, __ip_vs_get_timeouts() does not fully initialize the structure that gets copied to userland and that for leaks up to 12 bytes of kernel stack. Add an explicit memset(0) before passing the structure to __ip_vs_get_timeouts() to avoid the info leak. Signed-off-by:
Mathias Krause <minipli@googlemail.com> Cc: Wensong Zhang <wensong@linux-vs.org> Cc: Simon Horman <horms@verge.net.au> Cc: Julian Anastasov <ja@ssi.bg> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
- 10 Aug, 2012 5 commits
-
-
Julian Anastasov authored
Disabling PMTU discovery can increase the output packet rate but some users have enough resources and prefer to fragment than to drop traffic. By default, we copy the DF bit but if pmtu_disc is disabled we do not send FRAG_NEEDED messages anymore. Signed-off-by:
Julian Anastasov <ja@ssi.bg> Signed-off-by:
Simon Horman <horms@verge.net.au>
-
Julian Anastasov authored
IPVS is missing the logic to update PMTU in routing for its IPIP packets. We monitor the dst_mtu and can return FRAG_NEEDED messages but if the tunneled packets get ICMP error we can not rely on other traffic to save the lowest MTU. The following patch adds ICMP handling for IPIP packets in incoming direction, from some remote host to our local IP used as saddr in the outer header. By this way we can forward any related ICMP traffic if it is for IPVS TUN connection. For the special case of PMTUD we update the routing and if client requested DF we can forward the error. To properly update the routing we have to bind the cached route (dest->dst_cache) to the selected saddr because ipv4_update_pmtu uses saddr for dst lookup. Add IP_VS_RT_MODE_CONNECT flag to force such binding with second route. Update ip_vs_tunnel_xmit to provide IP_VS_RT_MODE_CONNECT and change the code to copy DF. For now we prefer not to force PMTU discovery (outer DF=1) because we don't have configuration option to enable or disable PMTUD. As we do not keep any packets to resend, we prefer not to play games with packets without DF bit because the sender is not informed when they are rejected. Also, change ops->update_pmtu to be called only for local clients because there is no point to update MTU for input routes, in our case skb->dst->dev is lo. It seems the code is copied from ipip.c where the skb dst points to tunnel device. Signed-off-by:
Julian Anastasov <ja@ssi.bg> Signed-off-by:
Simon Horman <horms@verge.net.au>
-
Claudiu Ghioc authored
Removed the following sparse warnings, wether CONFIG_SYSCTL is defined or not: * warning: symbol 'ip_vs_control_net_init_sysctl' was not declared. Should it be static? * warning: symbol 'ip_vs_control_net_cleanup_sysctl' was not declared. Should it be static? Signed-off-by:
Claudiu Ghioc <claudiu.ghioc@gmail.com> Signed-off-by:
Simon Horman <horms@verge.net.au>
-
Julian Anastasov authored
Get rid of the ftp_app pointer and allow applications to be registered without adding fields in the netns_ipvs structure. v2: fix coding style as suggested by Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by:
Julian Anastasov <ja@ssi.bg> Signed-off-by:
Simon Horman <horms@verge.net.au>
-
Julian Anastasov authored
The FTP application indirectly depends on the nf_conntrack_ftp helper for proper NAT support. If the module is not loaded, IPVS can resize the packets for the command connection, eg. PASV response but the SEQ adjustment logic in ipv4_confirm is not called without helper. Signed-off-by:
Julian Anastasov <ja@ssi.bg> Signed-off-by:
Simon Horman <horms@verge.net.au>
-
- 17 Jul, 2012 2 commits
-
-
David S. Miller authored
This will be used so that we can compose a full flow key. Even though we have a route in this context, we need more. In the future the routes will be without destination address, source address, etc. keying. One ipv4 route will cover entire subnets, etc. In this environment we have to have a way to possess persistent storage for redirects and PMTU information. This persistent storage will exist in the FIB tables, and that's why we'll need to be able to rebuild a full lookup flow key here. Using that flow key will do a fib_lookup() and create/update the persistent entry. Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Julian Anastasov authored
After commit 39f618b4 (3.4) "ipvs: reset ipvs pointer in netns" we can oops in ip_vs_dst_event on rmmod ip_vs because ip_vs_control_cleanup is called after the ipvs_core_ops subsys is unregistered and net->ipvs is NULL. Fix it by exiting early from ip_vs_dst_event if ipvs is NULL. It is safe because all services and dests for the net are already freed. Signed-off-by:
Julian Anastasov <ja@ssi.bg> Signed-off-by:
Simon Horman <horms@verge.net.au> Signed-off-by:
Pablo Neira Ayuso <pablo@netfilter.org>
-
- 25 Jun, 2012 1 commit
-
-
Eric Dumazet authored
After call to ip6_route_output() we must release dst or we leak it. Also should test dst->error, as ip6_route_output() never returns NULL. Use boolean while we are at it. Signed-off-by:
Eric Dumazet <edumazet@google.com> Signed-off-by:
Pablo Neira Ayuso <pablo@netfilter.org>
-
- 07 Jun, 2012 1 commit
-
-
Alban Crequy authored
This patch is a cleanup. Use NFPROTO_* for consistency with other netfilter code. Signed-off-by:
Alban Crequy <alban.crequy@collabora.co.uk> Reviewed-by:
Javier Martinez Canillas <javier.martinez@collabora.co.uk> Reviewed-by:
Vincent Sanders <vincent.sanders@collabora.co.uk> Signed-off-by:
Pablo Neira Ayuso <pablo@netfilter.org>
-
- 04 Jun, 2012 1 commit
-
-
Eric Dumazet authored
Remove some dropwatch/drop_monitor false positives. Signed-off-by:
Eric Dumazet <edumazet@google.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
- 08 May, 2012 17 commits
-
-
H Hartley Sweeten authored
Functions not referenced outside of a source file should be marked static to prevent it from being exposed globally. This quiets the sparse warnings: warning: symbol '__ipvs_proto_data_get' was not declared. Should it be static? Signed-off-by:
H Hartley Sweeten <hsweeten@visionengravers.com> Signed-off-by:
Simon Horman <horms@verge.net.au>
-
H Hartley Sweeten authored
Functions not referenced outside of a source file should be marked static to prevent it from being exposed globally. This quiets the sparse warnings: warning: symbol 'ip_vs_ftp_init' was not declared. Should it be static? Signed-off-by:
H Hartley Sweeten <hsweeten@visionengravers.com> Signed-off-by:
Simon Horman <horms@verge.net.au>
-
Pablo Neira Ayuso authored
cp->flags is marked volatile but ip_vs_bind_dest can safely modify the flags, so save some CPU cycles by using temp variable. Signed-off-by:
Julian Anastasov <ja@ssi.bg> Signed-off-by:
Simon Horman <horms@verge.net.au>
-
Pablo Neira Ayuso authored
Allow master and backup servers to use many threads for sync traffic. Add sysctl var "sync_ports" to define the number of threads. Every thread will use single UDP port, thread 0 will use the default port 8848 while last thread will use port 8848+sync_ports-1. The sync traffic for connections is scheduled to many master threads based on the cp address but one connection is always assigned to same thread to avoid reordering of the sync messages. Remove ip_vs_sync_switch_mode because this check for sync mode change is still risky. Instead, check for mode change under sync_buff_lock. Make sure the backup socks do not block on reading. Special thanks to Aleksey Chudov for helping in all tests. Signed-off-by:
Julian Anastasov <ja@ssi.bg> Tested-by:
Aleksey Chudov <aleksey.chudov@gmail.com> Signed-off-by:
Simon Horman <horms@verge.net.au>
-
Julian Anastasov authored
Add two new sysctl vars to control the sync rate with the main idea to reduce the rate for connection templates because currently it depends on the packet rate for controlled connections. This mechanism should be useful also for normal connections with high traffic. sync_refresh_period: in seconds, difference in reported connection timer that triggers new sync message. It can be used to avoid sync messages for the specified period (or half of the connection timeout if it is lower) if connection state is not changed from last sync. sync_retries: integer, 0..3, defines sync retries with period of sync_refresh_period/8. Useful to protect against loss of sync messages. Allow sysctl_sync_threshold to be used with sysctl_sync_period=0, so that only single sync message is sent if sync_refresh_period is also 0. Add new field "sync_endtime" in connection structure to hold the reported time when connection expires. The 2 lowest bits will represent the retry count. As the sysctl_sync_period now can be 0 use ACCESS_ONCE to avoid division by zero. Special thanks to Aleksey Chudov for being patient with me, for his extensive reports and helping in all tests. Signed-off-by:
Julian Anastasov <ja@ssi.bg> Tested-by:
Aleksey Chudov <aleksey.chudov@gmail.com> Signed-off-by:
Simon Horman <horms@verge.net.au>
-
Pablo Neira Ayuso authored
High rate of sync messages in master can lead to overflowing the socket buffer and dropping the messages. Fixed sleep of 1 second without wakeup events is not suitable for loaded masters, Use delayed_work to schedule sending for queued messages and limit the delay to IPVS_SYNC_SEND_DELAY (20ms). This will reduce the rate of wakeups but to avoid sending long bursts we wakeup the master thread after IPVS_SYNC_WAKEUP_RATE (8) messages. Add hard limit for the queued messages before sending by using "sync_qlen_max" sysctl var. It defaults to 1/32 of the memory pages but actually represents number of messages. It will protect us from allocating large parts of memory when the sending rate is lower than the queuing rate. As suggested by Pablo, add new sysctl var "sync_sock_size" to configure the SNDBUF (master) or RCVBUF (slave) socket limit. Default value is 0 (preserve system defaults). Change the master thread to detect and block on SNDBUF overflow, so that we do not drop messages when the socket limit is low but the sync_qlen_max limit is not reached. On ENOBUFS or other errors just drop the messages. Change master thread to enter TASK_INTERRUPTIBLE state early, so that we do not miss wakeups due to messages or kthread_should_stop event. Thanks to Pablo Neira Ayuso for his valuable feedback! Signed-off-by:
Julian Anastasov <ja@ssi.bg> Signed-off-by:
Simon Horman <horms@verge.net.au>
-
Julian Anastasov authored
As the goal is to mirror the inactconns/activeconns counters in the backup server, make sure the cp->flags are updated even if cp is still not bound to dest. If cp->flags are not updated ip_vs_bind_dest will rely only on the initial flags when updating the counters. To avoid mistakes and complicated checks for protocol state rely only on the IP_VS_CONN_F_INACTIVE bit when updating the counters. Signed-off-by:
Julian Anastasov <ja@ssi.bg> Tested-by:
Aleksey Chudov <aleksey.chudov@gmail.com> Signed-off-by:
Simon Horman <horms@verge.net.au>
-
Julian Anastasov authored
Initially, when the synced connection is created we use the forwarding method provided by master but once we bind to destination it can be changed. As result, we must update the application and the transmitter. As ip_vs_try_bind_dest is called always for connections that require dest binding, there is no need to validate the cp and dest pointers. Signed-off-by:
Julian Anastasov <ja@ssi.bg> Signed-off-by:
Simon Horman <horms@verge.net.au>
-
Julian Anastasov authored
As the IP_VS_CONN_F_INACTIVE bit is properly set in cp->flags for all kind of connections we do not need to add special checks for synced connections when updating the activeconns/inactconns counters for first time. Now logic will look just like in ip_vs_unbind_dest. Signed-off-by:
Julian Anastasov <ja@ssi.bg> Signed-off-by:
Simon Horman <horms@verge.net.au>
-
Julian Anastasov authored
As IP_VS_CONN_F_NOOUTPUT is derived from the forwarding method we should get it from conn_flags just like we do it for IP_VS_CONN_F_FWD_MASK bits when binding to real server. Signed-off-by:
Julian Anastasov <ja@ssi.bg> Signed-off-by:
Simon Horman <horms@verge.net.au>
-
Sasha Levin authored
Use GFP_KERNEL instead of GFP_ATOMIC when registering an ipvs protocol. This is safe since it will always run from a process context. Signed-off-by:
Sasha Levin <levinsasha928@gmail.com> Acked-by:
Julian Anastasov <ja@ssi.bg> Signed-off-by:
Simon Horman <horms@verge.net.au> Signed-off-by:
Pablo Neira Ayuso <pablo@netfilter.org>
-
Julian Anastasov authored
Schedulers are initialized and bound to services only on commands. Signed-off-by:
Julian Anastasov <ja@ssi.bg> Signed-off-by:
Hans Schillstrom <hans@schillstrom.com> Signed-off-by:
Simon Horman <horms@verge.net.au>
-
Julian Anastasov authored
Schedulers are initialized and bound to services only on commands. Signed-off-by:
Julian Anastasov <ja@ssi.bg> Signed-off-by:
Hans Schillstrom <hans@schillstrom.com> Signed-off-by:
Simon Horman <horms@verge.net.au>
-
Julian Anastasov authored
Schedulers are initialized and bound to services only on commands. Signed-off-by:
Julian Anastasov <ja@ssi.bg> Signed-off-by:
Hans Schillstrom <hans@schillstrom.com> Signed-off-by:
Simon Horman <horms@verge.net.au>
-
Julian Anastasov authored
Schedulers are initialized and bound to services only on commands. Signed-off-by:
Julian Anastasov <ja@ssi.bg> Signed-off-by:
Hans Schillstrom <hans@schillstrom.com> Signed-off-by:
Simon Horman <horms@verge.net.au>
-
Julian Anastasov authored
Schedulers are initialized and bound to services only on commands. Signed-off-by:
Julian Anastasov <ja@ssi.bg> Signed-off-by:
Hans Schillstrom <hans@schillstrom.com> Signed-off-by:
Simon Horman <horms@verge.net.au>
-
Julian Anastasov authored
They are called only on initialization. Signed-off-by:
Julian Anastasov <ja@ssi.bg> Signed-off-by:
Hans Schillstrom <hans@schillstrom.com> Signed-off-by:
Simon Horman <horms@verge.net.au>
-
- 30 Apr, 2012 3 commits
-
-
Hans Schillstrom authored
Change order of init so netns init is ready when register ioctl and netlink. Ver2 Whitespace fixes and __init added. Reported-by:
"Ryan O'Hara" <rohara@redhat.com> Signed-off-by:
Hans Schillstrom <hans.schillstrom@ericsson.com> Acked-by:
Julian Anastasov <ja@ssi.bg> Signed-off-by:
Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by:
Simon Horman <horms@verge.net.au>
-
Hans Schillstrom authored
ip_vs_create_timeout_table() can return NULL All functions protocol init_netns is affected of this patch. Signed-off-by:
Hans Schillstrom <hans.schillstrom@ericsson.com> Acked-by:
Julian Anastasov <ja@ssi.bg> Signed-off-by:
Simon Horman <horms@verge.net.au>
-
Hans Schillstrom authored
Avoid crash when registering shedulers after the IPVS core initialization for netns fails. Do this by checking for present core (net->ipvs). Signed-off-by:
Hans Schillstrom <hans.schillstrom@ericsson.com> Acked-by:
Julian Anastasov <ja@ssi.bg> Signed-off-by:
Simon Horman <horms@verge.net.au>
-