1. 14 May, 2014 1 commit
    • Benjamin Marzinski's avatar
      GFS2: remove transaction glock · 24972557
      Benjamin Marzinski authored
      
      
      GFS2 has a transaction glock, which must be grabbed for every
      transaction, whose purpose is to deal with freezing the filesystem.
      Aside from this involving a large amount of locking, it is very easy to
      make the current fsfreeze code hang on unfreezing.
      
      This patch rewrites how gfs2 handles freezing the filesystem. The
      transaction glock is removed. In it's place is a freeze glock, which is
      cached (but not held) in a shared state by every node in the cluster
      when the filesystem is mounted. This lock only needs to be grabbed on
      freezing, and actions which need to be safe from freezing, like
      recovery.
      
      When a node wants to freeze the filesystem, it grabs this glock
      exclusively.  When the freeze glock state changes on the nodes (either
      from shared to unlocked, or shared to exclusive), the filesystem does a
      special log flush.  gfs2_log_flush() does all the work for flushing out
      the and shutting down the incore log, and then it tries to grab the
      freeze glock in a shared state again.  Since the filesystem is stuck in
      gfs2_log_flush, no new transaction can start, and nothing can be written
      to disk. Unfreezing the filesytem simply involes dropping the freeze
      glock, allowing gfs2_log_flush() to grab and then release the shared
      lock, so it is cached for next time.
      
      However, in order for the unfreezing ioctl to occur, gfs2 needs to get a
      shared lock on the filesystem root directory inode to check permissions.
      If that glock has already been grabbed exclusively, fsfreeze will be
      unable to get the shared lock and unfreeze the filesystem.
      
      In order to allow the unfreeze, this patch makes gfs2 grab a shared lock
      on the filesystem root directory during the freeze, and hold it until it
      unfreezes the filesystem.  The functions which need to grab a shared
      lock in order to allow the unfreeze ioctl to be issued now use the lock
      grabbed by the freeze code instead.
      
      The freeze and unfreeze code take care to make sure that this shared
      lock will not be dropped while another process is using it.
      
      Signed-off-by: default avatarBenjamin Marzinski <bmarzins@redhat.com>
      Signed-off-by: default avatarSteven Whitehouse <swhiteho@redhat.com>
      24972557
  2. 28 Apr, 2014 1 commit
  3. 17 Apr, 2014 1 commit
  4. 16 Apr, 2014 11 commits
    • Linus Torvalds's avatar
      Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 6ca2a88a
      Linus Torvalds authored
      Pull x86 fixes from Ingo Molnar:
       "Various fixes:
      
         - reboot regression fix
         - build message spam fix
         - GPU quirk fix
         - 'make kvmconfig' fix
      
        plus the wire-up of the renameat2() system call on i386"
      
      * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86: Remove the PCI reboot method from the default chain
        x86/build: Supress "Nothing to be done for ..." messages
        x86/gpu: Fix sign extension issue in Intel graphics stolen memory quirks
        x86/platform: Fix "make O=dir kvmconfig"
        i386: Wire up the renameat2() syscall
      6ca2a88a
    • Linus Torvalds's avatar
      Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 2a83dc7e
      Linus Torvalds authored
      Pull perf fixes from Ingo Molnar:
       "Tooling fixes, plus a simple hardware-enablement patch for the Intel
        RAPL PMU (energy use measurement) on Haswell CPUs, which I hope is
        still fine at this stage"
      
      * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        perf tools: Instead of redirecting flex output, use -o
        perf tools: Fix double free in perf test 21 (code-reading.c)
        perf stat: Initialize statistics correctly
        perf bench: Set more defaults in the 'numa' suite
        perf bench: Fix segfault at the end of an 'all' execution
        perf bench: Update manpage to mention numa and futex
        perf probe: Use dwarf_getcfi_elf() instead of dwarf_getcfi()
        perf probe: Fix to handle errors in line_range searching
        perf probe: Fix --line option behavior
        perf tools: Pick up libdw without explicit LIBDW_DIR
        MAINTAINERS: Change e-mail to kernel.org one
        perf callchains: Disable unwind libraries when libelf isn't found
        tools lib traceevent: Do not call warning() directly
        tools lib traceevent: Print event name when show warning if possible
        perf top: Fix documentation of invalid -s option
        perf/x86: Enable DRAM RAPL support on Intel Haswell
      2a83dc7e
    • Linus Torvalds's avatar
      Merge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 17cf7db2
      Linus Torvalds authored
      Pull irq fix from Ingo Molnar:
       "ARM VIC (Vectored Irq Controller) irqchip driver fix"
      
      * 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        irqchip: vic: Properly chain the cascaded IRQs
      17cf7db2
    • Linus Torvalds's avatar
      Merge branch 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · d99d5917
      Linus Torvalds authored
      Pull locking fixes from Ingo Molnar:
       "liblockdep fixes and mutex debugging fixes"
      
      * 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        locking/mutex: Fix debug_mutexes
        tools/liblockdep: Add proper versioning to the shared obj
        tools/liblockdep: Ignore asmlinkage and visible
      d99d5917
    • Linus Torvalds's avatar
      Merge tag 'fbdev-fixes-3.15' of git://git.kernel.org/pub/scm/linux/kernel/git/tomba/linux · 498f9620
      Linus Torvalds authored
      Pull fbdev fixes from Tomi Valkeinen:
       - fix build errors for bf54x-lq043fb and imxfb
       - fbcon fix for da8xx-fb
       - omapdss fixes for hdmi audio, irq handling and fclk calculation
      
      * tag 'fbdev-fixes-3.15' of git://git.kernel.org/pub/scm/linux/kernel/git/tomba/linux:
        video: bf54x-lq043fb: fix build error
        OMAPDSS: Change struct reg_field to dispc_reg_field
        OMAPDSS: Take pixelclock unit change into account in hdmi_compute_acr()
        OMAPDSS: fix shared irq handlers
        video: imxfb: Select LCD_CLASS_DEVICE unconditionally
        OMAPDSS: fix rounding when calculating fclk rate
        video: da8xx-fb: Fix casting of info->pseudo_palette
      498f9620
    • Linus Torvalds's avatar
      Merge tag 'pinctrl-v3.15-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl · 5f63517c
      Linus Torvalds authored
      Pull pincontrol fixes from Linus Walleij:
       "A first set of pin control fixes for the v3.15 series:
      
         - Fix a couple of barnsjukdomar on the Rockchip driver.
      
         - Remove an idiotic debug print I happened to leave behind in the
           Nomadik driver.
      
         - Fixup the Qualcomm MSM interrupt handling code for the TLMM v2.
      
         - Three patches renaming the Broadcom Capri driver to BCM28155.  This
           has been falling between the chairs for some time due to some
           cross-tree synchronization misunderstandings, now I'm fed up with
           this and just rename it in this -rc1 phase"
      
      * tag 'pinctrl-v3.15-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
        pinctrl: fix typo in bindings documentation
        Update bcm_defconfig with new pinctrl CONFIG
        pinctrl: Rename Broadcom Capri pinctrl driver
        pinctrl: msm: Correct interrupt code for TLMM v2
        pinctrl: nomadik: delete stray debug print
        pinctrl: rockchip: handle first half of rk3188-bank0 correctly
        pinctrl: rockchip: add return value to rockchip_set_mux
        pinctrl: rockchip: fix offset of mux registers for rk3188
      5f63517c
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux · 0f689a33
      Linus Torvalds authored
      Pull s390 patches from Martin Schwidefsky:
       "An update to the oops output with additional information about the
        crash.  The renameat2 system call is enabled.  Two patches in regard
        to the PTR_ERR_OR_ZERO cleanup.  And a bunch of bug fixes"
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
        s390/sclp_cmd: replace PTR_RET with PTR_ERR_OR_ZERO
        s390/sclp: replace PTR_RET with PTR_ERR_OR_ZERO
        s390/sclp_vt220: Fix kernel panic due to early terminal input
        s390/compat: fix typo
        s390/uaccess: fix possible register corruption in strnlen_user_srst()
        s390: add 31 bit warning message
        s390: wire up sys_renameat2
        s390: show_registers() should not map user space addresses to kernel symbols
        s390/mm: print control registers and page table walk on crash
        s390/smp: fix smp_stop_cpu() for !CONFIG_SMP
        s390: fix control register update
      0f689a33
    • Linus Torvalds's avatar
      Merge tag 'please-pull-ia64-erratum' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux · 7d38cc02
      Linus Torvalds authored
      Pull itanium erratum fix from Tony Luck:
       "Small workaround for a rare, but annoying, erratum #237"
      
      * tag 'please-pull-ia64-erratum' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux:
        [IA64] Change default PSR.ac from '1' to '0' (Fix erratum #237)
      7d38cc02
    • Tony Luck's avatar
      [IA64] Change default PSR.ac from '1' to '0' (Fix erratum #237) · c0b5a64d
      Tony Luck authored
      April 2014 Itanium processor specification update:
      
      http://www.intel.com/content/www/us/en/processors/itanium/itanium-specification-update.html
      
      describes this erratum:
      
      =========================================================================
      237. Under a complex set of conditions, store to load forwarding for a
      sub 8-byte load may complete incorrectly
      
      Problem: A load instruction may complete incorrectly when a code sequence
      using 4-byte or smaller load and store operations to the same address
      is executed in combination with specific timing of all the following
      concurrent conditions: store to load forwarding, alignment checking
      enabled, a mis-predicted branch, and complex cache utilization activity.
      
      Implication: The affected sub 8-byte instruction may complete
      incorrectly resulting in unpredictable system behavior. There is an
      extremely low probability of exposure due to the significant number of
      complex microarchitectural concurrent conditions required to encounter
      the erratum.
      
      Workaround: Set PSR.ac = 0 to completely avoid the erratum. Disabling
      Hyper-Threading will significantly reduce exposure to the conditions
      that contribute to encountering the erratum.
      
      Status: See the Summary Table of Changes for the affected steppings.
      =========================================================================
      
      [Table of changes essentially lists all models from McKinley to Tukwila]
      
      The PSR.ac bit controls whether the processor will always generate
      an unaligned reference trap (0x5a00) for a misaligned data access
      (when PSR.ac=1) or if it will let the access succeed when running
      on a cpu that implements logic to handle some unaligned accesses.
      
      Way back in 2008 in commit b704882e
      
      
        [IA64] Rationalize kernel mode alignment checking
      we made the decision to always enable strict checking. We were
      already doing so in trap/interrupt context because the common
      preamble code set this bit - but the rest of supervisor code
      (and by inheritance user code) ran with PSR.ac=0.
      
      We now reverse that decision and set PSR.ac=0 everywhere in the
      kernel (also inherited by user processes). This will avoid the
      erratum using the method described in the Itanium specification
      update.  Net effect for users is that the processor will handle
      unaligned access when it can (typically with a tiny performance
      bubble in the pipeline ... but much less invasive than taking a
      trap and having the OS perform the access).
      
      Signed-off-by: default avatarTony Luck <tony.luck@intel.com>
      c0b5a64d
    • Ingo Molnar's avatar
      x86: Remove the PCI reboot method from the default chain · 5be44a6f
      Ingo Molnar authored
      Steve reported a reboot hang and bisected it back to this commit:
      
        a4f1987e
      
       x86, reboot: Add EFI and CF9 reboot methods into the default list
      
      He heroically tested all reboot methods and found the following:
      
        reboot=t       # triple fault                  ok
        reboot=k       # keyboard ctrl                 FAIL
        reboot=b       # BIOS                          ok
        reboot=a       # ACPI                          FAIL
        reboot=e       # EFI                           FAIL   [system has no EFI]
        reboot=p       # PCI 0xcf9                     FAIL
      
      And I think it's pretty obvious that we should only try PCI 0xcf9 as a
      last resort - if at all.
      
      The other observation is that (on this box) we should never try
      the PCI reboot method, but close with either the 'triple fault'
      or the 'BIOS' (terminal!) reboot methods.
      
      Thirdly, CF9_COND is a total misnomer - it should be something like
      CF9_SAFE or CF9_CAREFUL, and 'CF9' should be 'CF9_FORCE' ...
      
      So this patch fixes the worst problems:
      
       - it orders the actual reboot logic to follow the reboot ordering
         pattern - it was in a pretty random order before for no good
         reason.
      
       - it fixes the CF9 misnomers and uses BOOT_CF9_FORCE and
         BOOT_CF9_SAFE flags to make the code more obvious.
      
       - it tries the BIOS reboot method before the PCI reboot method.
         (Since 'BIOS' is a terminal reboot method resulting in a hang
          if it does not work, this is essentially equivalent to removing
          the PCI reboot method from the default reboot chain.)
      
       - just for the miraculous possibility of terminal (resulting
         in hang) reboot methods of triple fault or BIOS returning
         without having done their job, there's an ordering between
         them as well.
      
      Reported-and-bisected-and-tested-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Cc: Li Aubrey <aubrey.li@linux.intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Matthew Garrett <mjg59@srcf.ucam.org>
      Link: http://lkml.kernel.org/r/20140404064120.GB11877@gmail.com
      
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      5be44a6f
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 10ec34fc
      Linus Torvalds authored
      Pull networking fixes from David Miller:
      
       1) Fix BPF filter validation of netlink attribute accesses, from
          Mathias Kruase.
      
       2) Netfilter conntrack generation seqcount not initialized properly,
          from Andrey Vagin.
      
       3) Fix comparison mask computation on big-endian in nft_cmp_fast(),
          from Patrick McHardy.
      
       4) Properly limit MTU over ipv6, from Eric Dumazet.
      
       5) Fix seccomp system call argument population on 32-bit, from Daniel
          Borkmann.
      
       6) skb_network_protocol() should not use hard-coded ETH_HLEN, instead
          skb->mac_len needs to be used.  From Vlad Yasevich.
      
       7) We have several cases of using socket based communications to
          implement a tunnel.  For example, some tunnels are encapsulations
          over UDP so we use an internal kernel UDP socket to do the
          transmits.
      
          These tunnels should behave just like other software devices and
          pass the packets on down to the next layer.
      
          Most importantly we want the top-level socket (eg TCP) that created
          the traffic to be charged for the SKB memory.
      
          However, once you get into the IP output path, we have code that
          assumed that whatever was attached to skb->sk is an IP socket.
      
          To keep the top-level socket being charged for the SKB memory,
          whilst satisfying the needs of the IP output path, we now pass in an
          explicit 'sk' argument.
      
          From Eric Dumazet.
      
       8) ping_init_sock() leaks group info, from Xiaoming Wang.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (33 commits)
        cxgb4: use the correct max size for firmware flash
        qlcnic: Fix MSI-X initialization code
        ip6_gre: don't allow to remove the fb_tunnel_dev
        ipv4: add a sock pointer to dst->output() path.
        ipv4: add a sock pointer to ip_queue_xmit()
        driver/net: cosa driver uses udelay incorrectly
        at86rf230: fix __at86rf230_read_subreg function
        at86rf230: remove check if AVDD settled
        net: cadence: Add architecture dependencies
        net: Start with correct mac_len in skb_network_protocol
        Revert "net: sctp: Fix a_rwnd/rwnd management to reflect real state of the receiver's buffer"
        cxgb4: Save the correct mac addr for hw-loopback connections in the L2T
        net: filter: seccomp: fix wrong decoding of BPF_S_ANC_SECCOMP_LD_W
        seccomp: fix populating a0-a5 syscall args in 32-bit x86 BPF
        qlcnic: Do not disable SR-IOV when VFs are assigned to VMs
        qlcnic: Fix QLogic application/driver interface for virtual NIC configuration
        qlcnic: Fix PVID configuration on eSwitch port.
        qlcnic: Fix max ring count calculation
        qlcnic: Fix to send INIT_NIC_FUNC as first mailbox.
        qlcnic: Fix panic due to uninitialzed delayed_work struct in use.
        ...
      10ec34fc
  5. 15 Apr, 2014 14 commits
  6. 14 Apr, 2014 12 commits
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/virt/kvm/kvm · 55101e2d
      Linus Torvalds authored
      Pull KVM fixes from Marcelo Tosatti:
       - Fix for guest triggerable BUG_ON (CVE-2014-0155)
       - CR4.SMAP support
       - Spurious WARN_ON() fix
      
      * git://git.kernel.org/pub/scm/virt/kvm/kvm:
        KVM: x86: remove WARN_ON from get_kernel_ns()
        KVM: Rename variable smep to cr4_smep
        KVM: expose SMAP feature to guest
        KVM: Disable SMAP for guests in EPT realmode and EPT unpaging mode
        KVM: Add SMAP support when setting CR4
        KVM: Remove SMAP bit from CR4_RESERVED_BITS
        KVM: ioapic: try to recover if pending_eoi goes out of range
        KVM: ioapic: fix assignment of ioapic->rtc_status.pending_eoi (CVE-2014-0155)
      55101e2d
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 · dafe344d
      Linus Torvalds authored
      Pull bmc2835 crypto fix from Herbert Xu:
       "This fixes a potential boot crash on bcm2835 due to the recent change
        that now causes hardware RNGs to be accessed on registration"
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
        hwrng: bcm2835 - fix oops when rng h/w is accessed during registration
      dafe344d
    • Mikulas Patocka's avatar
      user namespace: fix incorrect memory barriers · e79323bd
      Mikulas Patocka authored
      
      
      smp_read_barrier_depends() can be used if there is data dependency between
      the readers - i.e. if the read operation after the barrier uses address
      that was obtained from the read operation before the barrier.
      
      In this file, there is only control dependency, no data dependecy, so the
      use of smp_read_barrier_depends() is incorrect. The code could fail in the
      following way:
      * the cpu predicts that idx < entries is true and starts executing the
        body of the for loop
      * the cpu fetches map->extent[0].first and map->extent[0].count
      * the cpu fetches map->nr_extents
      * the cpu verifies that idx < extents is true, so it commits the
        instructions in the body of the for loop
      
      The problem is that in this scenario, the cpu read map->extent[0].first
      and map->nr_extents in the wrong order. We need a full read memory barrier
      to prevent it.
      
      Signed-off-by: default avatarMikulas Patocka <mpatocka@redhat.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      e79323bd
    • David S. Miller's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf · 00cbc3dc
      David S. Miller authored
      
      
      Pablo Neira Ayuso says:
      
      ====================
      Netfilter fixes for net
      
      The following patchset contains three Netfilter fixes for your net tree,
      they are:
      
      * Fix missing generation sequence initialization which results in a splat
        if lockdep is enabled, it was introduced in the recent works to improve
        nf_conntrack scalability, from Andrey Vagin.
      
      * Don't flush the GRE keymap list in nf_conntrack when the pptp helper is
        disabled otherwise this crashes due to a double release, from Andrey
        Vagin.
      
      * Fix nf_tables cmp fast in big endian, from Patrick McHardy.
      ====================
      
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      00cbc3dc
    • Vlad Yasevich's avatar
      net: Start with correct mac_len in skb_network_protocol · 1e785f48
      Vlad Yasevich authored
      Sometimes, when the packet arrives at skb_mac_gso_segment()
      its skb->mac_len already accounts for some of the mac lenght
      headers in the packet.  This seems to happen when forwarding
      through and OpenSSL tunnel.
      
      When we start looking for any vlan headers in skb_network_protocol()
      we seem to ignore any of the already known mac headers and start
      with an ETH_HLEN.  This results in an incorrect offset, dropped
      TSO frames and general slowness of the connection.
      
      We can start counting from the known skb->mac_len
      and return at least that much if all mac level headers
      are known and accounted for.
      
      Fixes: 53d6471c
      
       (net: Account for all vlan headers in skb_mac_gso_segment)
      CC: Eric Dumazet <eric.dumazet@gmail.com>
      CC: Daniel Borkman <dborkman@redhat.com>
      Tested-by: default avatarMartin Filip <nexus+kernel@smoula.net>
      Signed-off-by: default avatarVlad Yasevich <vyasevic@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1e785f48
    • Marcelo Tosatti's avatar
      b351c39c
    • Feng Wu's avatar
      KVM: Rename variable smep to cr4_smep · 66386ade
      Feng Wu authored
      
      
      Rename variable smep to cr4_smep, which can better reflect the
      meaning of the variable.
      
      Signed-off-by: default avatarFeng Wu <feng.wu@intel.com>
      Signed-off-by: default avatarMarcelo Tosatti <mtosatti@redhat.com>
      66386ade
    • Feng Wu's avatar
      KVM: expose SMAP feature to guest · de935ae1
      Feng Wu authored
      
      
      This patch exposes SMAP feature to guest
      
      Signed-off-by: default avatarFeng Wu <feng.wu@intel.com>
      Signed-off-by: default avatarMarcelo Tosatti <mtosatti@redhat.com>
      de935ae1
    • Feng Wu's avatar
      KVM: Disable SMAP for guests in EPT realmode and EPT unpaging mode · e1e746b3
      Feng Wu authored
      
      
      SMAP is disabled if CPU is in non-paging mode in hardware.
      However KVM always uses paging mode to emulate guest non-paging
      mode with TDP. To emulate this behavior, SMAP needs to be
      manually disabled when guest switches to non-paging mode.
      
      Signed-off-by: default avatarFeng Wu <feng.wu@intel.com>
      Signed-off-by: default avatarMarcelo Tosatti <mtosatti@redhat.com>
      e1e746b3
    • Feng Wu's avatar
      KVM: Add SMAP support when setting CR4 · 97ec8c06
      Feng Wu authored
      
      
      This patch adds SMAP handling logic when setting CR4 for guests
      
      Thanks a lot to Paolo Bonzini for his suggestion to use the branchless
      way to detect SMAP violation.
      
      Signed-off-by: default avatarFeng Wu <feng.wu@intel.com>
      Signed-off-by: default avatarMarcelo Tosatti <mtosatti@redhat.com>
      97ec8c06
    • Feng Wu's avatar
      KVM: Remove SMAP bit from CR4_RESERVED_BITS · 56d6efc2
      Feng Wu authored
      
      
      This patch removes SMAP bit from CR4_RESERVED_BITS.
      
      Signed-off-by: default avatarFeng Wu <feng.wu@intel.com>
      Signed-off-by: default avatarMarcelo Tosatti <mtosatti@redhat.com>
      56d6efc2
    • Daniel Borkmann's avatar
      Revert "net: sctp: Fix a_rwnd/rwnd management to reflect real state of the receiver's buffer" · 362d5204
      Daniel Borkmann authored
      This reverts commit ef2820a7 ("net: sctp: Fix a_rwnd/rwnd management
      to reflect real state of the receiver's buffer") as it introduced a
      serious performance regression on SCTP over IPv4 and IPv6, though a not
      as dramatic on the latter. Measurements are on 10Gbit/s with ixgbe NICs.
      
      Current state:
      
      [root@Lab200slot2 ~]# iperf3 --sctp -4 -c 192.168.241.3 -V -l 1452 -t 60
      iperf version 3.0.1 (10 January 2014)
      Linux Lab200slot2 3.14.0 #1 SMP Thu Apr 3 23:18:29 EDT 2014 x86_64
      Time: Fri, 11 Apr 2014 17:56:21 GMT
      Connecting to host 192.168.241.3, port 5201
            Cookie: Lab200slot2.1397238981.812898.548918
      [  4] local 192.168.241.2 port 38616 connected to 192.168.241.3 port 5201
      Starting Test: protocol: SCTP, 1 streams, 1452 byte blocks, omitting 0 seconds, 60 second test
      [ ID] Interval           Transfer     Bandwidth
      [  4]   0.00-1.09   sec  20.8 MBytes   161 Mbits/sec
      [  4]   1.09-2.13   sec  10.8 MBytes  86.8 Mbits/sec
      [  4]   2.13-3.15   sec  3.57 MBytes  29.5 Mbits/sec
      [  4]   3.15-4.16   sec  4.33 MBytes  35.7 Mbits/sec
      [  4]   4.16-6.21   sec  10.4 MBytes  42.7 Mbits/sec
      [  4]   6.21-6.21   sec  0.00 Bytes    0.00 bits/sec
      [  4]   6.21-7.35   sec  34.6 MBytes   253 Mbits/sec
      [  4]   7.35-11.45  sec  22.0 MBytes  45.0 Mbits/sec
      [  4]  11.45-11.45  sec  0.00 Bytes    0.00 bits/sec
      [  4]  11.45-11.45  sec  0.00 Bytes    0.00 bits/sec
      [  4]  11.45-11.45  sec  0.00 Bytes    0.00 bits/sec
      [  4]  11.45-12.51  sec  16.0 MBytes   126 Mbits/sec
      [  4]  12.51-13.59  sec  20.3 MBytes   158 Mbits/sec
      [  4]  13.59-14.65  sec  13.4 MBytes   107 Mbits/sec
      [  4]  14.65-16.79  sec  33.3 MBytes   130 Mbits/sec
      [  4]  16.79-16.79  sec  0.00 Bytes    0.00 bits/sec
      [  4]  16.79-17.82  sec  5.94 MBytes  48.7 Mbits/sec
      (etc)
      
      [root@Lab200slot2 ~]#  iperf3 --sctp -6 -c 2001:db8:0:f101::1 -V -l 1400 -t 60
      iperf version 3.0.1 (10 January 2014)
      Linux Lab200slot2 3.14.0 #1 SMP Thu Apr 3 23:18:29 EDT 2014 x86_64
      Time: Fri, 11 Apr 2014 19:08:41 GMT
      Connecting to host 2001:db8:0:f101::1, port 5201
            Cookie: Lab200slot2.1397243321.714295.2b3f7c
      [  4] local 2001:db8:0:f101::2 port 55804 connected to 2001:db8:0:f101::1 port 5201
      Starting Test: protocol: SCTP, 1 streams, 1400 byte blocks, omitting 0 seconds, 60 second test
      [ ID] Interval           Transfer     Bandwidth
      [  4]   0.00-1.00   sec   169 MBytes  1.42 Gbits/sec
      [  4]   1.00-2.00   sec   201 MBytes  1.69 Gbits/sec
      [  4]   2.00-3.00   sec   188 MBytes  1.58 Gbits/sec
      [  4]   3.00-4.00   sec   174 MBytes  1.46 Gbits/sec
      [  4]   4.00-5.00   sec   165 MBytes  1.39 Gbits/sec
      [  4]   5.00-6.00   sec   199 MBytes  1.67 Gbits/sec
      [  4]   6.00-7.00   sec   163 MBytes  1.36 Gbits/sec
      [  4]   7.00-8.00   sec   174 MBytes  1.46 Gbits/sec
      [  4]   8.00-9.00   sec   193 MBytes  1.62 Gbits/sec
      [  4]   9.00-10.00  sec   196 MBytes  1.65 Gbits/sec
      [  4]  10.00-11.00  sec   157 MBytes  1.31 Gbits/sec
      [  4]  11.00-12.00  sec   175 MBytes  1.47 Gbits/sec
      [  4]  12.00-13.00  sec   192 MBytes  1.61 Gbits/sec
      [  4]  13.00-14.00  sec   199 MBytes  1.67 Gbits/sec
      (etc)
      
      After patch:
      
      [root@Lab200slot2 ~]#  iperf3 --sctp -4 -c 192.168.240.3 -V -l 1452 -t 60
      iperf version 3.0.1 (10 January 2014)
      Linux Lab200slot2 3.14.0+ #1 SMP Mon Apr 14 12:06:40 EDT 2014 x86_64
      Time: Mon, 14 Apr 2014 16:40:48 GMT
      Connecting to host 192.168.240.3, port 5201
            Cookie: Lab200slot2.1397493648.413274.65e131
      [  4] local 192.168.240.2 port 50548 connected to 192.168.240.3 port 5201
      Starting Test: protocol: SCTP, 1 streams, 1452 byte blocks, omitting 0 seconds, 60 second test
      [ ID] Interval           Transfer     Bandwidth
      [  4]   0.00-1.00   sec   240 MBytes  2.02 Gbits/sec
      [  4]   1.00-2.00   sec   239 MBytes  2.01 Gbits/sec
      [  4]   2.00-3.00   sec   240 MBytes  2.01 Gbits/sec
      [  4]   3.00-4.00   sec   239 MBytes  2.00 Gbits/sec
      [  4]   4.00-5.00   sec   245 MBytes  2.05 Gbits/sec
      [  4]   5.00-6.00   sec   240 MBytes  2.01 Gbits/sec
      [  4]   6.00-7.00   sec   240 MBytes  2.02 Gbits/sec
      [  4]   7.00-8.00   sec   239 MBytes  2.01 Gbits/sec
      
      With the reverted patch applied, the SCTP/IPv4 performance is back
      to normal on latest upstream for IPv4 and IPv6 and has same throughput
      as 3.4.2 test kernel, steady and interval reports are smooth again.
      
      Fixes: ef2820a7
      
       ("net: sctp: Fix a_rwnd/rwnd management to reflect real state of the receiver's buffer")
      Reported-by: default avatarPeter Butler <pbutler@sonusnet.com>
      Reported-by: default avatarDongsheng Song <dongsheng.song@gmail.com>
      Reported-by: default avatarFengguang Wu <fengguang.wu@intel.com>
      Tested-by: default avatarPeter Butler <pbutler@sonusnet.com>
      Signed-off-by: default avatarDaniel Borkmann <dborkman@redhat.com>
      Cc: Matija Glavinic Pecotic <matija.glavinic-pecotic.ext@nsn.com>
      Cc: Alexander Sverdlin <alexander.sverdlin@nsn.com>
      Cc: Vlad Yasevich <vyasevich@gmail.com>
      Acked-by: default avatarVlad Yasevich <vyasevich@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      362d5204