1. 06 May, 2010 1 commit
    • Vlad Yasevich's avatar
      sctp: Fix a race between ICMP protocol unreachable and connect() · 50b5d6ad
      Vlad Yasevich authored
      
      
      ICMP protocol unreachable handling completely disregarded
      the fact that the user may have locked the socket.  It proceeded
      to destroy the association, even though the user may have
      held the lock and had a ref on the association.  This resulted
      in the following:
      
      Attempt to release alive inet socket f6afcc00
      
      =========================
      [ BUG: held lock freed! ]
      -------------------------
      somenu/2672 is freeing memory f6afcc00-f6afcfff, with a lock still held
      there!
       (sk_lock-AF_INET){+.+.+.}, at: [<c122098a>] sctp_connect+0x13/0x4c
      1 lock held by somenu/2672:
       #0:  (sk_lock-AF_INET){+.+.+.}, at: [<c122098a>] sctp_connect+0x13/0x4c
      
      stack backtrace:
      Pid: 2672, comm: somenu Not tainted 2.6.32-telco #55
      Call Trace:
       [<c1232266>] ? printk+0xf/0x11
       [<c1038553>] debug_check_no_locks_freed+0xce/0xff
       [<c10620b4>] kmem_cache_free+0x21/0x66
       [<c1185f25>] __sk_free+0x9d/0xab
       [<c1185f9c>] sk_free+0x1c/0x1e
       [<c1216e38>] sctp_association_put+0x32/0x89
       [<c1220865>] __sctp_connect+0x36d/0x3f4
       [<c122098a>] ? sctp_connect+0x13/0x4c
       [<c102d073>] ? autoremove_wake_function+0x0/0x33
       [<c12209a8>] sctp_connect+0x31/0x4c
       [<c11d1e80>] inet_dgram_connect+0x4b/0x55
       [<c11834fa>] sys_connect+0x54/0x71
       [<c103a3a2>] ? lock_release_non_nested+0x88/0x239
       [<c1054026>] ? might_fault+0x42/0x7c
       [<c1054026>] ? might_fault+0x42/0x7c
       [<c11847ab>] sys_socketcall+0x6d/0x178
       [<c10da994>] ? trace_hardirqs_on_thunk+0xc/0x10
       [<c1002959>] syscall_call+0x7/0xb
      
      This was because the sctp_wait_for_connect() would aqcure the socket
      lock and then proceed to release the last reference count on the
      association, thus cause the fully destruction path to finish freeing
      the socket.
      
      The simplest solution is to start a very short timer in case the socket
      is owned by user.  When the timer expires, we can do some verification
      and be able to do the release properly.
      Signed-off-by: default avatarVlad Yasevich <vladislav.yasevich@hp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      50b5d6ad
  2. 28 Apr, 2010 1 commit
    • Vlad Yasevich's avatar
      sctp: Fix oops when sending queued ASCONF chunks · c0786693
      Vlad Yasevich authored
      
      
      When we finish processing ASCONF_ACK chunk, we try to send
      the next queued ASCONF.  This action runs the sctp state
      machine recursively and it's not prepared to do so.
      
      kernel BUG at kernel/timer.c:790!
      invalid opcode: 0000 [#1] SMP
      last sysfs file: /sys/module/ipv6/initstate
      Modules linked in: sha256_generic sctp libcrc32c ipv6 dm_multipath
      uinput 8139too i2c_piix4 8139cp mii i2c_core pcspkr virtio_net joydev
      floppy virtio_blk virtio_pci [last unloaded: scsi_wait_scan]
      
      Pid: 0, comm: swapper Not tainted 2.6.34-rc4 #15 /Bochs
      EIP: 0060:[<c044a2ef>] EFLAGS: 00010286 CPU: 0
      EIP is at add_timer+0xd/0x1b
      EAX: cecbab14 EBX: 000000f0 ECX: c0957b1c EDX: 03595cf4
      ESI: cecba800 EDI: cf276f00 EBP: c0957aa0 ESP: c0957aa0
       DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
      Process swapper (pid: 0, ti=c0956000 task=c0988ba0 task.ti=c0956000)
      Stack:
       c0957ae0 d1851214 c0ab62e4 c0ab5f26 0500ffff 00000004 00000005 00000004
      <0> 00000000 d18694fd 00000004 1666b892 cecba800 cecba800 c0957b14
      00000004
      <0> c0957b94 d1851b11 ceda8b00 cecba800 cf276f00 00000001 c0957b14
      000000d0
      Call Trace:
       [<d1851214>] ? sctp_side_effects+0x607/0xdfc [sctp]
       [<d1851b11>] ? sctp_do_sm+0x108/0x159 [sctp]
       [<d1863386>] ? sctp_pname+0x0/0x1d [sctp]
       [<d1861a56>] ? sctp_primitive_ASCONF+0x36/0x3b [sctp]
       [<d185657c>] ? sctp_process_asconf_ack+0x2a4/0x2d3 [sctp]
       [<d184e35c>] ? sctp_sf_do_asconf_ack+0x1dd/0x2b4 [sctp]
       [<d1851ac1>] ? sctp_do_sm+0xb8/0x159 [sctp]
       [<d1863334>] ? sctp_cname+0x0/0x52 [sctp]
       [<d1854377>] ? sctp_assoc_bh_rcv+0xac/0xe1 [sctp]
       [<d1858f0f>] ? sctp_inq_push+0x2d/0x30 [sctp]
       [<d186329d>] ? sctp_rcv+0x797/0x82e [sctp]
      Tested-by: default avatarWei Yongjun <yjwei@cn.fujitsu.com>
      Signed-off-by: default avatarYuansong Qiao <ysqiao@research.ait.ie>
      Signed-off-by: default avatarShuaijun Zhang <szhang@research.ait.ie>
      Signed-off-by: default avatarVlad Yasevich <vladislav.yasevich@hp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c0786693
  3. 30 Mar, 2010 1 commit
    • Tejun Heo's avatar
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking... · 5a0e3ad6
      Tejun Heo authored
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h
      
      percpu.h is included by sched.h and module.h and thus ends up being
      included when building most .c files.  percpu.h includes slab.h which
      in turn includes gfp.h making everything defined by the two files
      universally available and complicating inclusion dependencies.
      
      percpu.h -> slab.h dependency is about to be removed.  Prepare for
      this change by updating users of gfp and slab facilities include those
      headers directly instead of assuming availability.  As this conversion
      needs to touch large number of source files, the following script is
      used as the basis of conversion.
      
        http://userweb.kernel.org/~tj/misc/slabh-sweep.py
      
      
      
      The script does the followings.
      
      * Scan files for gfp and slab usages and update includes such that
        only the necessary includes are there.  ie. if only gfp is used,
        gfp.h, if slab is used, slab.h.
      
      * When the script inserts a new include, it looks at the include
        blocks and try to put the new include such that its order conforms
        to its surrounding.  It's put in the include block which contains
        core kernel includes, in the same order that the rest are ordered -
        alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
        doesn't seem to be any matching order.
      
      * If the script can't find a place to put a new include (mostly
        because the file doesn't have fitting include block), it prints out
        an error message indicating which .h file needs to be added to the
        file.
      
      The conversion was done in the following steps.
      
      1. The initial automatic conversion of all .c files updated slightly
         over 4000 files, deleting around 700 includes and adding ~480 gfp.h
         and ~3000 slab.h inclusions.  The script emitted errors for ~400
         files.
      
      2. Each error was manually checked.  Some didn't need the inclusion,
         some needed manual addition while adding it to implementation .h or
         embedding .c file was more appropriate for others.  This step added
         inclusions to around 150 files.
      
      3. The script was run again and the output was compared to the edits
         from #2 to make sure no file was left behind.
      
      4. Several build tests were done and a couple of problems were fixed.
         e.g. lib/decompress_*.c used malloc/free() wrappers around slab
         APIs requiring slab.h to be added manually.
      
      5. The script was run on all .h files but without automatically
         editing them as sprinkling gfp.h and slab.h inclusions around .h
         files could easily lead to inclusion dependency hell.  Most gfp.h
         inclusion directives were ignored as stuff from gfp.h was usually
         wildly available and often used in preprocessor macros.  Each
         slab.h inclusion directive was examined and added manually as
         necessary.
      
      6. percpu.h was updated not to include slab.h.
      
      7. Build test were done on the following configurations and failures
         were fixed.  CONFIG_GCOV_KERNEL was turned off for all tests (as my
         distributed build env didn't work with gcov compiles) and a few
         more options had to be turned off depending on archs to make things
         build (like ipr on powerpc/64 which failed due to missing writeq).
      
         * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
         * powerpc and powerpc64 SMP allmodconfig
         * sparc and sparc64 SMP allmodconfig
         * ia64 SMP allmodconfig
         * s390 SMP allmodconfig
         * alpha SMP allmodconfig
         * um on x86_64 SMP allmodconfig
      
      8. percpu.h modifications were reverted so that it could be applied as
         a separate patch and serve as bisection point.
      
      Given the fact that I had only a couple of failures from tests on step
      6, I'm fairly confident about the coverage of this conversion patch.
      If there is a breakage, it's likely to be something in one of the arch
      headers which should be easily discoverable easily on most builds of
      the specific arch.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Guess-its-ok-by: default avatarChristoph Lameter <cl@linux-foundation.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
      5a0e3ad6
  4. 09 Feb, 2010 1 commit
  5. 04 Dec, 2009 1 commit
  6. 29 Nov, 2009 1 commit
    • Andrei Pelinescu-Onciul's avatar
      sctp: on T3_RTX retransmit all the in-flight chunks · 5fdd4bae
      Andrei Pelinescu-Onciul authored
      When retransmitting due to T3 timeout, retransmit all the
      in-flight chunks for the corresponding  transport/path, including
      chunks sent less then 1 rto ago.
      This is the correct behaviour according to rfc4960 section 6.3.3
      E3 and
      "Note: Any DATA chunks that were sent to the address for which the
       T3-rtx timer expired but did not fit in one MTU (rule E3 above)
       should be marked for retransmission and sent as soon as cwnd
       allows (normally, when a SACK arrives). ".
      
      This fixes problems when more then one path is present and the T3
      retransmission of the first chunk that timeouts stops the T3 timer
      for the initial active path, leaving all the other in-flight
      chunks waiting forever or until a new chunk is transmitted on the
      same path and timeouts (and this will happen only if the cwnd
      allows sending new chunks, but since cwnd was dropped to MTU by
      the timeout => it will wait until the first heartbeat).
      
      Example: 10 packets in flight, sent at 0.1 s intervals on the
      primary path. The primary path is down and the first packet
      timeouts. The first packet is retransmitted on another path, the
      T3 timer for the primary path is stopped and cwnd is set to MTU.
      All the other 9 in-flight packets will not be retransmitted
      (unless more new packets are sent on the primary path which depend
      on cwnd allowing it, and even in this case the 9 packets will be
      retransmitted only after a new packet timeouts which even in the
      best case would be more then RTO).
      
      This commit reverts d0ce9291 and
      also removes the now unused transport->last_rto, introduced in
       b6157d8e
      
      .
      
      p.s  The problem is not only when multiple paths are there.  It
      can happen in a single homed environment.  If the application
      stops sending data, it possible to have a hung association.
      Signed-off-by: default avatarAndrei Pelinescu-Onciul <andrei@iptel.org>
      Signed-off-by: default avatarVlad Yasevich <vladislav.yasevich@hp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5fdd4bae
  7. 23 Nov, 2009 2 commits
  8. 04 Sep, 2009 3 commits
    • Vlad Yasevich's avatar
      sctp: Try not to change a_rwnd when faking a SACK from SHUTDOWN. · d4d6fb57
      Vlad Yasevich authored
      
      
      We currently set a_rwnd to 0 when faking a SACK from SHUTDOWN.
      This results in an hung association if the remote only uses
      SHUTDOWNs (which it's allowed to do) to acknowlege DATA when
      closing.  The reason for that is that we simply honor the a_rwnd
      from the sack, but since we faked it to be 0, we enter 0-window
      probing.  The fix is to use the peers old rwnd and add our flight
      size to it.
      Signed-off-by: default avatarVlad Yasevich <vladislav.yasevich@hp.com>
      d4d6fb57
    • Vlad Yasevich's avatar
      sctp: Fix error count increments that were results of HEARTBEATS · b9f84786
      Vlad Yasevich authored
      
      
      SCTP RFC 4960 states that unacknowledged HEARTBEATS count as
      errors agains a given transport or endpoint.  As such, we
      should increment the error counts for only for unacknowledged
      HB, otherwise we detect failure too soon.  This goes for both
      the overall error count and the path error count.
      
      Now, there is a difference in how the detection is done
      between the two.  The path error detection is done after
      the increment, so to detect it properly, we actually need
      to exceed the path threshold.  The overall error detection
      is done _BEFORE_ the increment.  Thus to detect the failure,
      it's enough for the error count to match the threshold.
      This is why all the state functions use '>=' to detect failure,
      while path detection uses '>'.
      
      Thanks goes to Chunbo Luo <chunbo.luo@windriver.com> who first
      proposed patches to fix this issue and made me re-read the spec
      and the code to figure out how this cruft really works.
      Signed-off-by: default avatarVlad Yasevich <vladislav.yasevich@hp.com>
      b9f84786
    • Vlad Yasevich's avatar
      sctp: Send user messages to the lower layer as one · 9c5c62be
      Vlad Yasevich authored
      
      
      Currenlty, sctp breaks up user messages into fragments and
      sends each fragment to the lower layer by itself.  This means
      that for each fragment we go all the way down the stack
      and back up.  This also discourages bundling of multiple
      fragments when they can fit into a sigle packet (ex: due
      to user setting a low fragmentation threashold).
      
      We introduce a new command SCTP_CMD_SND_MSG and hand the
      whole message down state machine.  The state machine and
      the side-effect parser will cork the queue, add all chunks
      from the message to the queue, and then un-cork the queue
      thus causing the chunks to get transmitted.
      Signed-off-by: default avatarVlad Yasevich <vladislav.yasevich@hp.com>
      9c5c62be
  9. 03 Jun, 2009 1 commit
    • Wei Yongjun's avatar
      sctp: fix to choose alternate destination when retransmit ASCONF chunk · 9919b455
      Wei Yongjun authored
      
      
      RFC 5061 Section 5.1 ASCONF Chunk Procedures said:
      
      B4)  Re-transmit the ASCONF Chunk last sent and if possible choose an
           alternate destination address (please refer to [RFC4960],
           Section 6.4.1).  An endpoint MUST NOT add new parameters to this
           chunk; it MUST be the same (including its Sequence Number) as
           the last ASCONF sent.  An endpoint MAY, however, bundle an
           additional ASCONF with new ASCONF parameters with the next
           Sequence Number.  For details, see Section 5.5.
      
      This patch fix to choose an alternate destination address when
      re-transmit the ASCONF chunk, with some dup codes cleanup.
      Signed-off-by: default avatarWei Yongjun <yjwei@cn.fujitsu.com>
      Signed-off-by: default avatarVlad Yasevich <vladislav.yasevich@hp.com>
      9919b455
  10. 03 Mar, 2009 2 commits
  11. 16 Feb, 2009 1 commit
    • Vlad Yasevich's avatar
      sctp: Fix the RTO-doubling on idle-link heartbeats · faee47cd
      Vlad Yasevich authored
      
      
      SCTP incorrectly doubles rto ever time a Hearbeat chunk
      is generated.   However RFC 4960 states:
      
         On an idle destination address that is allowed to heartbeat, it is
         recommended that a HEARTBEAT chunk is sent once per RTO of that
         destination address plus the protocol parameter 'HB.interval', with
         jittering of +/- 50% of the RTO value, and exponential backoff of the
         RTO if the previous HEARTBEAT is unanswered.
      
      Essentially, of if the heartbean is unacknowledged, do we double the RTO.
      Signed-off-by: default avatarVlad Yasevich <vladislav.yasevich@hp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      faee47cd
  12. 08 Oct, 2008 1 commit
    • Vlad Yasevich's avatar
      sctp: Rework the tsn map to use generic bitmap. · 8e1ee18c
      Vlad Yasevich authored
      
      
      The tsn map currently use is 4K large and is stuck inside
      the sctp_association structure making memory references REALLY
      expensive.  What we really need is at most 4K worth of bits
      so the biggest map we would have is 512 bytes.   Also, the
      map is only really usefull when we have gaps to store and
      report.  As such, starting with minimal map of say 32 TSNs (bits)
      should be enough for normal low-loss operations.  We can grow
      the map by some multiple of 32 along with some extra room any
      time we receive the TSN which would put us outside of the map
      boundry.  As we close gaps, we can shift the map to rebase
      it on the latest TSN we've seen.  This saves 4088 bytes per
      association just in the map alone along savings from the now
      unnecessary structure members.
      Signed-off-by: default avatarVlad Yasevich <vladislav.yasevich@hp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8e1ee18c
  13. 01 Oct, 2008 1 commit
  14. 19 Jun, 2008 1 commit
    • Vlad Yasevich's avatar
      sctp: Follow security requirement of responding with 1 packet · 2e3216cd
      Vlad Yasevich authored
      
      
      RFC 4960, Section 11.4. Protection of Non-SCTP-Capable Hosts
      
      When an SCTP stack receives a packet containing multiple control or
      DATA chunks and the processing of the packet requires the sending of
      multiple chunks in response, the sender of the response chunk(s) MUST
      NOT send more than one packet.  If bundling is supported, multiple
      response chunks that fit into a single packet MAY be bundled together
      into one single response packet.  If bundling is not supported, then
      the sender MUST NOT send more than one response chunk and MUST
      discard all other responses.  Note that this rule does NOT apply to a
      SACK chunk, since a SACK chunk is, in itself, a response to DATA and
      a SACK does not require a response of more DATA.
      
      We implement this by not servicing our outqueue until we reach the end
      of the packet.  This enables maximum bundling.  We also identify
      'response' chunks and make sure that we only send 1 packet when sending
      such chunks.
      Signed-off-by: default avatarVlad Yasevich <vladislav.yasevich@hp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2e3216cd
  15. 09 May, 2008 1 commit
  16. 13 Apr, 2008 3 commits
  17. 06 Mar, 2008 1 commit
  18. 05 Feb, 2008 1 commit
  19. 07 Nov, 2007 1 commit
    • Vlad Yasevich's avatar
      SCTP: Fix difference cases of retransmit. · b6157d8e
      Vlad Yasevich authored
      Commit d0ce9291
      
       broke several retransmit
      cases including fast retransmit.  The reason is that we should
      only delay by rto while doing retranmists as a result of a timeout.
      Retransmit as a result of path mtu discover, fast retransmit, or
      other evernts that should trigger immidiate retransmissions got broken.
      
      Also, since rto is doubled prior to marking of packets elegable for
      retransmission, we never marked correct chunks anyway.
      
      The fix is provide a reason for a given retransmission so that we
      can mark chunks appropriately and to save the old rto value to do
      comparisons against.
      
      All regressions tests passed with this code.
      
      Spotted by Wei Yongjun <yjwei@cn.fujitsu.com>
      Signed-off-by: default avatarVlad Yasevich <vladislav.yasevich@hp.com>
      b6157d8e
  20. 10 Oct, 2007 1 commit
  21. 30 Aug, 2007 1 commit
  22. 29 Aug, 2007 1 commit
  23. 04 May, 2007 1 commit
    • Vlad Yasevich's avatar
      [SCTP]: Set assoc_id correctly during INIT collision. · 07d93967
      Vlad Yasevich authored
      
      
      During the INIT/COOKIE-ACK collision cases, it's possible to get
      into a situation where the association id is not yet set at the time
      of the user event generation.  As a result, user events have an
      association id set to 0 which will confuse applications.
      
      This happens if we hit case B of duplicate cookie processing.
      In the particular example found and provided by Oscar Isaula
      <Oscar.Isaula@motorola.com>, flow looks like this:
      A				B
      ---- INIT------->  (lost)
      	    <---------INIT------
      ---- INIT-ACK--->
      	    <------ Cookie ECHO
      
      When the Cookie Echo is received, we end up trying to update the
      association that was created on A as a result of the (lost) INIT,
      but that association doesn't have the ID set yet.
      Signed-off-by: default avatarVlad Yasevich <vladislav.yasevich@hp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      07d93967
  24. 26 Apr, 2007 2 commits
  25. 11 Feb, 2007 1 commit
  26. 30 Jan, 2007 1 commit
  27. 24 Jan, 2007 1 commit
  28. 03 Dec, 2006 6 commits