1. 22 Sep, 2016 1 commit
  2. 11 Apr, 2016 1 commit
    • Marcelo Ricardo Leitner's avatar
      sctp: avoid refreshing heartbeat timer too often · ba6f5e33
      Marcelo Ricardo Leitner authored
      
      
      Currently on high rate SCTP streams the heartbeat timer refresh can
      consume quite a lot of resources as timer updates are costly and it
      contains a random factor, which a) is also costly and b) invalidates
      mod_timer() optimization for not editing a timer to the same value.
      It may even cause the timer to be slightly advanced, for no good reason.
      
      As suggested by David Laight this patch now removes this timer update
      from hot path by leaving the timer on and re-evaluating upon its
      expiration if the heartbeat is still needed or not, similarly to what is
      done for TCP. If it's not needed anymore the timer is re-scheduled to
      the new timeout, considering the time already elapsed.
      
      For this, we now record the last tx timestamp per transport, updated in
      the same spots as hb timer was restarted on tx. Also split up
      sctp_transport_reset_timers into sctp_transport_reset_t3_rtx and
      sctp_transport_reset_hb_timer, so we can re-arm T3 without re-arming the
      heartbeat one.
      
      On loopback with MTU of 65535 and data chunks with 1636, so that we
      have a considerable amount of chunks without stressing system calls,
      netperf -t SCTP_STREAM -l 30, perf looked like this before:
      
      Samples: 103K of event 'cpu-clock', Event count (approx.): 25833000000
        Overhead  Command  Shared Object      Symbol
      +    6,15%  netperf  [kernel.vmlinux]   [k] copy_user_enhanced_fast_string
      -    5,43%  netperf  [kernel.vmlinux]   [k] _raw_write_unlock_irqrestore
         - _raw_write_unlock_irqrestore
            - 96,54% _raw_spin_unlock_irqrestore
               - 36,14% mod_timer
                  + 97,24% sctp_transport_reset_timers
                  + 2,76% sctp_do_sm
               + 33,65% __wake_up_sync_key
               + 28,77% sctp_ulpq_tail_event
               + 1,40% del_timer
            - 1,84% mod_timer
               + 99,03% sctp_transport_reset_timers
               + 0,97% sctp_do_sm
            + 1,50% sctp_ulpq_tail_event
      
      And after this patch, now with netperf -l 60:
      
      Samples: 230K of event 'cpu-clock', Event count (approx.): 57707250000
        Overhead  Command  Shared Object      Symbol
      +    5,65%  netperf  [kernel.vmlinux]   [k] memcpy_erms
      +    5,59%  netperf  [kernel.vmlinux]   [k] copy_user_enhanced_fast_string
      -    5,05%  netperf  [kernel.vmlinux]   [k] _raw_spin_unlock_irqrestore
         - _raw_spin_unlock_irqrestore
            + 49,89% __wake_up_sync_key
            + 45,68% sctp_ulpq_tail_event
            - 2,85% mod_timer
               + 76,51% sctp_transport_reset_t3_rtx
               + 23,49% sctp_do_sm
            + 1,55% del_timer
      +    2,50%  netperf  [sctp]             [k] sctp_datamsg_from_user
      +    2,26%  netperf  [sctp]             [k] sctp_sendmsg
      
      Throughput-wise, from 6800mbps without the patch to 7050mbps with it,
      ~3.7%.
      Signed-off-by: default avatarMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ba6f5e33
  3. 20 Mar, 2016 1 commit
    • Marcelo Ricardo Leitner's avatar
      sctp: align MTU to a word · 3822a5ff
      Marcelo Ricardo Leitner authored
      
      
      SCTP is a protocol that is aligned to a word (4 bytes). Thus using bare
      MTU can sometimes return values that are not aligned, like for loopback,
      which is 65536 but ipv4_mtu() limits that to 65535. This mis-alignment
      will cause the last non-aligned bytes to never be used and can cause
      issues with congestion control.
      
      So it's better to just consider a lower MTU and keep congestion control
      calcs saner as they are based on PMTU.
      
      Same applies to icmp frag needed messages, which is also fixed by this
      patch.
      
      One other effect of this is the inability to send MTU-sized packet
      without queueing or fragmentation and without hitting Nagle. As the
      check performed at sctp_packet_can_append_data():
      
      if (chunk->skb->len + q->out_qlen >= transport->pathmtu - packet->overhead)
      	/* Enough data queued to fill a packet */
      	return SCTP_XMIT_OK;
      
      with the above example of MTU, if there are no other messages queued,
      one cannot send a packet that just fits one packet (65532 bytes) and
      without causing DATA chunk fragmentation or a delay.
      
      v2:
       - Added WORD_TRUNC macro
      Signed-off-by: default avatarMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3822a5ff
  4. 14 Mar, 2016 1 commit
    • Xin Long's avatar
      sctp: fix the transports round robin issue when init is retransmitted · 39d2adeb
      Xin Long authored
      prior to this patch, at the beginning if we have two paths in one assoc,
      they may have the same params other than the last_time_heard, it will try
      the paths like this:
      
      1st cycle
        try trans1 fail.
        then trans2 is selected.(cause it's last_time_heard is after trans1).
      
      2nd cycle:
        try  trans2 fail
        then trans2 is selected.(cause it's last_time_heard is after trans1).
      
      3rd cycle:
        try  trans2 fail
        then trans2 is selected.(cause it's last_time_heard is after trans1).
      
      ....
      
      trans1 will never have change to be selected, which is not what we expect.
      we should keeping round robin all the paths if they are just added at the
      beginning.
      
      So at first every tranport's last_time_heard should be initialized 0, so
      that we ensure they have the same value at the beginning, only by this,
      all the transports could get equal chance to be selected.
      
      Then for sctp_trans_elect_best, it should return the trans_next one when
      *trans == *trans_next, so that we can try next if it fails,  but now it
      always return trans. so we can fix it by exchanging these two params when
      we calls sctp_trans_elect_tie().
      
      Fixes: 4c47af4d
      
       ('net: sctp: rework multihoming retransmission path selection to rfc4960')
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Acked-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      39d2adeb
  5. 28 Jan, 2016 2 commits
  6. 09 Nov, 2015 1 commit
    • Andrew Morton's avatar
      remove abs64() · 79211c8e
      Andrew Morton authored
      
      
      Switch everything to the new and more capable implementation of abs().
      Mainly to give the new abs() a bit of a workout.
      
      Cc: Michal Nazarewicz <mina86@mina86.com>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      79211c8e
  7. 01 Aug, 2014 1 commit
    • Jason Gunthorpe's avatar
      sctp: Fixup v4mapped behaviour to comply with Sock API · 299ee123
      Jason Gunthorpe authored
      
      
      The SCTP socket extensions API document describes the v4mapping option as
      follows:
      
      8.1.15.  Set/Clear IPv4 Mapped Addresses (SCTP_I_WANT_MAPPED_V4_ADDR)
      
         This socket option is a Boolean flag which turns on or off the
         mapping of IPv4 addresses.  If this option is turned on, then IPv4
         addresses will be mapped to V6 representation.  If this option is
         turned off, then no mapping will be done of V4 addresses and a user
         will receive both PF_INET6 and PF_INET type addresses on the socket.
         See [RFC3542] for more details on mapped V6 addresses.
      
      This description isn't really in line with what the code does though.
      
      Introduce addr_to_user (renamed addr_v4map), which should be called
      before any sockaddr is passed back to user space. The new function
      places the sockaddr into the correct format depending on the
      SCTP_I_WANT_MAPPED_V4_ADDR option.
      
      Audit all places that touched v4mapped and either sanely construct
      a v4 or v6 address then call addr_to_user, or drop the
      unnecessary v4mapped check entirely.
      
      Audit all places that call addr_to_user and verify they are on a sycall
      return path.
      
      Add a custom getname that formats the address properly.
      
      Several bugs are addressed:
       - SCTP_I_WANT_MAPPED_V4_ADDR=0 often returned garbage for
         addresses to user space
       - The addr_len returned from recvmsg was not correct when
         returning AF_INET on a v6 socket
       - flowlabel and scope_id were not zerod when promoting
         a v4 to v6
       - Some syscalls like bind and connect behaved differently
         depending on v4mapped
      
      Tested bind, getpeername, getsockname, connect, and recvmsg for proper
      behaviour in v4mapped = 1 and 0 cases.
      Signed-off-by: default avatarNeil Horman <nhorman@tuxdriver.com>
      Tested-by: default avatarJason Gunthorpe <jgunthorpe@obsidianresearch.com>
      Signed-off-by: default avatarJason Gunthorpe <jgunthorpe@obsidianresearch.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      299ee123
  8. 03 Jul, 2014 1 commit
    • Daniel Borkmann's avatar
      net: sctp: improve timer slack calculation for transport HBs · 8f61059a
      Daniel Borkmann authored
      
      
      RFC4960, section 8.3 says:
      
        On an idle destination address that is allowed to heartbeat,
        it is recommended that a HEARTBEAT chunk is sent once per RTO
        of that destination address plus the protocol parameter
        'HB.interval', with jittering of +/- 50% of the RTO value,
        and exponential backoff of the RTO if the previous HEARTBEAT
        is unanswered.
      
      Currently, we calculate jitter via sctp_jitter() function first,
      and then add its result to the current RTO for the new timeout:
      
        TMO = RTO + (RAND() % RTO) - (RTO / 2)
                    `------------------------^-=> sctp_jitter()
      
      Instead, we can just simplify all this by directly calculating:
      
        TMO = (RTO / 2) + (RAND() % RTO)
      
      With the help of prandom_u32_max(), we don't need to open code
      our own global PRNG, but can instead just make use of the per
      CPU implementation of prandom with better quality numbers. Also,
      we can now spare us the conditional for divide by zero check
      since no div or mod operation needs to be used. Note that
      prandom_u32_max() won't emit the same result as a mod operation,
      but we really don't care here as we only want to have a random
      number scaled into RTO interval.
      
      Note, exponential RTO backoff is handeled elsewhere, namely in
      sctp_do_8_2_transport_strike().
      Signed-off-by: default avatarDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8f61059a
  9. 11 Jun, 2014 1 commit
  10. 13 Feb, 2014 1 commit
  11. 06 Dec, 2013 2 commits
  12. 13 Aug, 2013 1 commit
  13. 09 Aug, 2013 1 commit
  14. 25 Jul, 2013 1 commit
  15. 02 Jul, 2013 1 commit
    • Daniel Borkmann's avatar
      net: sctp: rework debugging framework to use pr_debug and friends · bb33381d
      Daniel Borkmann authored
      We should get rid of all own SCTP debug printk macros and use the ones
      that the kernel offers anyway instead. This makes the code more readable
      and conform to the kernel code, and offers all the features of dynamic
      debbuging that pr_debug() et al has, such as only turning on/off portions
      of debug messages at runtime through debugfs. The runtime cost of having
      CONFIG_DYNAMIC_DEBUG enabled, but none of the debug statements printing,
      is negligible [1]. If kernel debugging is completly turned off, then these
      statements will also compile into "empty" functions.
      
      While we're at it, we also need to change the Kconfig option as it /now/
      only refers to the ifdef'ed code portions in outqueue.c that enable further
      debugging/tracing of SCTP transaction fields. Also, since SCTP_ASSERT code
      was enabled with this Kconfig option and has now been removed, we
      transform those code parts into WARNs resp. where appropriate BUG_ONs so
      that those bugs can be more easily detected as probably not many people
      have SCTP debugging permanently turned on.
      
      To turn on all SCTP debugging, the following steps are needed:
      
       # mount -t debugfs none /sys/kernel/debug
       # echo -n 'module sctp +p' > /sys/kernel/debug/dynamic_debug/control
      
      This can be done more fine-grained on a per file, per line basis and others
      as described in [2].
      
       [1] https://www.kernel.org/doc/ols/2009/ols2009-pages-39-46.pdf
      
      
       [2] Documentation/dynamic-debug-howto.txt
      Signed-off-by: default avatarDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bb33381d
  16. 18 Jun, 2013 1 commit
  17. 17 Apr, 2013 1 commit
  18. 04 Feb, 2013 2 commits
    • Ying Xue's avatar
      net: remove redundant check for timer pending state before del_timer · 25cc4ae9
      Ying Xue authored
      
      
      As in del_timer() there has already placed a timer_pending() function
      to check whether the timer to be deleted is pending or not, it's
      unnecessary to check timer pending state again before del_timer() is
      called.
      Signed-off-by: default avatarYing Xue <ying.xue@windriver.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      25cc4ae9
    • Daniel Borkmann's avatar
      sctp: sctp_close: fix release of bindings for deferred call_rcu's · 8c98653f
      Daniel Borkmann authored
      
      
      It seems due to RCU usage, i.e. within SCTP's address binding list,
      a, say, ``behavioral change'' was introduced which does actually
      not conform to the RFC anymore. In particular consider the following
      (fictional) scenario to demonstrate this:
      
        do:
          Two SOCK_SEQPACKET-style sockets are opened (S1, S2)
          S1 is bound to 127.0.0.1, port 1024 [server]
          S2 is bound to 127.0.0.1, port 1025 [client]
          listen(2) is invoked on S1
          From S2 we call one sendmsg(2) with msg.msg_name and
             msg.msg_namelen parameters set to the server's
             address
          S1, S2 are closed
          goto do
      
      The first pass of this loop passes successful, while the second round
      fails during binding of S1 (address still in use). What is happening?
      In the first round, the initial handshake is being done, and, at the
      time close(2) is called on S1, a non-graceful shutdown is performed via
      ABORT since in S1's receive queue an unprocessed packet is present,
      thus stating an error condition. This can be considered as a correct
      behavior.
      
      During close also all bound addresses are freed, thus nothing *must*
      be active anymore. In reference to RFC2960:
      
        After checking the Verification Tag, the receiving endpoint shall
        remove the association from its record, and shall report the
        termination to its upper layer. (9.1 Abort of an Association)
      
      Also, no half-open states are supported, thus after an ungraceful
      shutdown, we leave nothing behind. However, this seems not to be
      happening though. In a real-world scenario, this is exactly where
      it breaks the lksctp-tools functional test suite, *for instance*:
      
        ./test_sockopt
        test_sockopt.c  1 PASS : getsockopt(SCTP_STATUS) on a socket with no assoc
        test_sockopt.c  2 PASS : getsockopt(SCTP_STATUS)
        test_sockopt.c  3 PASS : getsockopt(SCTP_STATUS) with invalid associd
        test_sockopt.c  4 PASS : getsockopt(SCTP_STATUS) with NULL associd
        test_sockopt.c  5 BROK : bind: Address already in use
      
      The underlying problem is that sctp_endpoint_destroy() hasn't been
      triggered yet while the next bind attempt is being done. It will be
      triggered eventually (but too late) by sctp_transport_destroy_rcu()
      after one RCU grace period:
      
        sctp_transport_destroy()
          sctp_transport_destroy_rcu() ----.
            sctp_association_put() [*]  <--+--> sctp_packet_free()
              sctp_association_destroy()          [...]
                sctp_endpoint_put()                 skb->destructor
                  sctp_endpoint_destroy()             sctp_wfree()
                    sctp_bind_addr_free()               sctp_association_put() [*]
      
      Thus, we move out the condition with sctp_association_put() as well as
      the sctp_packet_free() invocation and the issue can be solved. We also
      better free the SCTP chunks first before putting the ref of the association.
      
      With this patch, the example above (which simulates a similar scenario
      as in the implementation of this test case) and therefore also the test
      suite run successfully through. Tested by myself.
      
      Cc: Vlad Yasevich <vyasevich@gmail.com>
      Signed-off-by: default avatarDaniel Borkmann <dborkman@redhat.com>
      Acked-by: default avatarVlad Yasevich <vyasevich@gmail.com>
      Acked-by: default avatarNeil Horman <nhorman@tuxdriver.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8c98653f
  19. 07 Dec, 2012 1 commit
  20. 03 Dec, 2012 1 commit
    • Michele Baldessari's avatar
      sctp: Add support to per-association statistics via a new SCTP_GET_ASSOC_STATS call · 196d6759
      Michele Baldessari authored
      
      
      The current SCTP stack is lacking a mechanism to have per association
      statistics. This is an implementation modeled after OpenSolaris'
      SCTP_GET_ASSOC_STATS.
      
      Userspace part will follow on lksctp if/when there is a general ACK on
      this.
      V4:
      - Move ipackets++ before q->immediate.func() for consistency reasons
      - Move sctp_max_rto() at the end of sctp_transport_update_rto() to avoid
        returning bogus RTO values
      - return asoc->rto_min when max_obs_rto value has not changed
      
      V3:
      - Increase ictrlchunks in sctp_assoc_bh_rcv() as well
      - Move ipackets++ to sctp_inq_push()
      - return 0 when no rto updates took place since the last call
      
      V2:
      - Implement partial retrieval of stat struct to cope for future expansion
      - Kill the rtxpackets counter as it cannot be precise anyway
      - Rename outseqtsns to outofseqtsns to make it clearer that these are out
        of sequence unexpected TSNs
      - Move asoc->ipackets++ under a lock to avoid potential miscounts
      - Fold asoc->opackets++ into the already existing asoc check
      - Kill unneeded (q->asoc) test when increasing rtxchunks
      - Do not count octrlchunks if sending failed (SCTP_XMIT_OK != 0)
      - Don't count SHUTDOWNs as SACKs
      - Move SCTP_GET_ASSOC_STATS to the private space API
      - Adjust the len check in sctp_getsockopt_assoc_stats() to allow for
        future struct growth
      - Move association statistics in their own struct
      - Update idupchunks when we send a SACK with dup TSNs
      - return min_rto in max_rto when RTO has not changed. Also return the
        transport when max_rto last changed.
      
      Signed-off: Michele Baldessari <michele@acksyn.org>
      Acked-by: default avatarVlad Yasevich <vyasevich@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      196d6759
  21. 28 Nov, 2012 1 commit
  22. 15 Aug, 2012 2 commits
  23. 22 Jul, 2012 1 commit
    • Neil Horman's avatar
      sctp: Implement quick failover draft from tsvwg · 5aa93bcf
      Neil Horman authored
      I've seen several attempts recently made to do quick failover of sctp transports
      by reducing various retransmit timers and counters.  While its possible to
      implement a faster failover on multihomed sctp associations, its not
      particularly robust, in that it can lead to unneeded retransmits, as well as
      false connection failures due to intermittent latency on a network.
      
      Instead, lets implement the new ietf quick failover draft found here:
      http://tools.ietf.org/html/draft-nishida-tsvwg-sctp-failover-05
      
      
      
      This will let the sctp stack identify transports that have had a small number of
      errors, and avoid using them quickly until their reliability can be
      re-established.  I've tested this out on two virt guests connected via multiple
      isolated virt networks and believe its in compliance with the above draft and
      works well.
      Signed-off-by: default avatarNeil Horman <nhorman@tuxdriver.com>
      CC: Vlad Yasevich <vyasevich@gmail.com>
      CC: Sridhar Samudrala <sri@us.ibm.com>
      CC: "David S. Miller" <davem@davemloft.net>
      CC: linux-sctp@vger.kernel.org
      CC: joe@perches.com
      Acked-by: default avatarVlad Yasevich <vyasevich@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5aa93bcf
  24. 20 Jul, 2012 1 commit
  25. 17 Jul, 2012 1 commit
    • David S. Miller's avatar
      net: Pass optional SKB and SK arguments to dst_ops->{update_pmtu,redirect}() · 6700c270
      David S. Miller authored
      
      
      This will be used so that we can compose a full flow key.
      
      Even though we have a route in this context, we need more.  In the
      future the routes will be without destination address, source address,
      etc. keying.  One ipv4 route will cover entire subnets, etc.
      
      In this environment we have to have a way to possess persistent storage
      for redirects and PMTU information.  This persistent storage will exist
      in the FIB tables, and that's why we'll need to be able to rebuild a
      full lookup flow key here.  Using that flow key will do a fib_lookup()
      and create/update the persistent entry.
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6700c270
  26. 16 Jul, 2012 1 commit
  27. 01 Jul, 2012 1 commit
    • Neil Horman's avatar
      sctp: be more restrictive in transport selection on bundled sacks · 4244854d
      Neil Horman authored
      
      
      It was noticed recently that when we send data on a transport, its possible that
      we might bundle a sack that arrived on a different transport.  While this isn't
      a major problem, it does go against the SHOULD requirement in section 6.4 of RFC
      2960:
      
       An endpoint SHOULD transmit reply chunks (e.g., SACK, HEARTBEAT ACK,
         etc.) to the same destination transport address from which it
         received the DATA or control chunk to which it is replying.  This
         rule should also be followed if the endpoint is bundling DATA chunks
         together with the reply chunk.
      
      This patch seeks to correct that.  It restricts the bundling of sack operations
      to only those transports which have moved the ctsn of the association forward
      since the last sack.  By doing this we guarantee that we only bundle outbound
      saks on a transport that has received a chunk since the last sack.  This brings
      us into stricter compliance with the RFC.
      
      Vlad had initially suggested that we strictly allow only sack bundling on the
      transport that last moved the ctsn forward.  While this makes sense, I was
      concerned that doing so prevented us from bundling in the case where we had
      received chunks that moved the ctsn on multiple transports.  In those cases, the
      RFC allows us to select any of the transports having received chunks to bundle
      the sack on.  so I've modified the approach to allow for that, by adding a state
      variable to each transport that tracks weather it has moved the ctsn since the
      last sack.  This I think keeps our behavior (and performance), close enough to
      our current profile that I think we can do this without a sysctl knob to
      enable/disable it.
      Signed-off-by: default avatarNeil Horman <nhorman@tuxdriver.com>
      CC: Vlad Yaseivch <vyasevich@gmail.com>
      CC: David S. Miller <davem@davemloft.net>
      CC: linux-sctp@vger.kernel.org
      Reported-by: default avatarMichele Baldessari <michele@redhat.com>
      Reported-by: default avatarsorin serban <sserban@redhat.com>
      Acked-by: default avatarVlad Yasevich <vyasevich@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4244854d
  28. 11 May, 2012 1 commit
  29. 08 Nov, 2011 1 commit
  30. 08 May, 2011 1 commit
  31. 27 Apr, 2011 3 commits
  32. 26 Aug, 2010 1 commit
  33. 16 May, 2010 1 commit
  34. 06 May, 2010 1 commit
    • Vlad Yasevich's avatar
      sctp: Fix a race between ICMP protocol unreachable and connect() · 50b5d6ad
      Vlad Yasevich authored
      
      
      ICMP protocol unreachable handling completely disregarded
      the fact that the user may have locked the socket.  It proceeded
      to destroy the association, even though the user may have
      held the lock and had a ref on the association.  This resulted
      in the following:
      
      Attempt to release alive inet socket f6afcc00
      
      =========================
      [ BUG: held lock freed! ]
      -------------------------
      somenu/2672 is freeing memory f6afcc00-f6afcfff, with a lock still held
      there!
       (sk_lock-AF_INET){+.+.+.}, at: [<c122098a>] sctp_connect+0x13/0x4c
      1 lock held by somenu/2672:
       #0:  (sk_lock-AF_INET){+.+.+.}, at: [<c122098a>] sctp_connect+0x13/0x4c
      
      stack backtrace:
      Pid: 2672, comm: somenu Not tainted 2.6.32-telco #55
      Call Trace:
       [<c1232266>] ? printk+0xf/0x11
       [<c1038553>] debug_check_no_locks_freed+0xce/0xff
       [<c10620b4>] kmem_cache_free+0x21/0x66
       [<c1185f25>] __sk_free+0x9d/0xab
       [<c1185f9c>] sk_free+0x1c/0x1e
       [<c1216e38>] sctp_association_put+0x32/0x89
       [<c1220865>] __sctp_connect+0x36d/0x3f4
       [<c122098a>] ? sctp_connect+0x13/0x4c
       [<c102d073>] ? autoremove_wake_function+0x0/0x33
       [<c12209a8>] sctp_connect+0x31/0x4c
       [<c11d1e80>] inet_dgram_connect+0x4b/0x55
       [<c11834fa>] sys_connect+0x54/0x71
       [<c103a3a2>] ? lock_release_non_nested+0x88/0x239
       [<c1054026>] ? might_fault+0x42/0x7c
       [<c1054026>] ? might_fault+0x42/0x7c
       [<c11847ab>] sys_socketcall+0x6d/0x178
       [<c10da994>] ? trace_hardirqs_on_thunk+0xc/0x10
       [<c1002959>] syscall_call+0x7/0xb
      
      This was because the sctp_wait_for_connect() would aqcure the socket
      lock and then proceed to release the last reference count on the
      association, thus cause the fully destruction path to finish freeing
      the socket.
      
      The simplest solution is to start a very short timer in case the socket
      is owned by user.  When the timer expires, we can do some verification
      and be able to do the release properly.
      Signed-off-by: default avatarVlad Yasevich <vladislav.yasevich@hp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      50b5d6ad