1. 19 May, 2011 3 commits
    • Linus Torvalds's avatar
      list: remove prefetching from regular list iterators · e66eed65
      Linus Torvalds authored
      
      
      This is removes the use of software prefetching from the regular list
      iterators.  We don't want it.  If you do want to prefetch in some
      iterator of yours, go right ahead.  Just don't expect the iterator to do
      it, since normally the downsides are bigger than the upsides.
      
      It also replaces <linux/prefetch.h> with <linux/const.h>, because the
      use of LIST_POISON ends up needing it.  <linux/poison.h> is sadly not
      self-contained, and including prefetch.h just happened to hide that.
      
      Suggested by David Miller (networking has a lot of regular lists that
      are often empty or a single entry, and prefetching is not going to do
      anything but add useless instructions).
      
      Acked-by: default avatarIngo Molnar <mingo@elte.hu>
      Acked-by: default avatarDavid S. Miller <davem@davemloft.net>
      Cc: linux-arch@vger.kernel.org
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      e66eed65
    • Linus Torvalds's avatar
      hlist: remove software prefetching in hlist iterators · 75d65a42
      Linus Torvalds authored
      
      
      They not only increase the code footprint, they actually make things
      slower rather than faster.  On internationally acclaimed benchmarks
      ("make -j16" on an already fully built kernel source tree) the hlist
      prefetching slows down the build by up to 1%.
      
      (Almost all of it comes from hlist_for_each_entry_rcu() as used by
      avc_has_perm_noaudit(), which is very hot due to all the pathname
      lookups to see if there is anything to do).
      
      The cause seems to be two-fold:
      
       - on at least some Intel cores, prefetch(NULL) ends up with some
         microarchitectural stall due to the TLB miss that it incurs.  The
         hlist case triggers this very commonly, since the NULL pointer is the
         last entry in the list.
      
       - the prefetch appears to cause more D$ activity, probably because it
         prefetches hash list entries that are never actually used (because we
         ended the search early due to a hit).
      
      Regardless, the numbers clearly say that the implicit prefetching is
      simply a bad idea.  If some _particular_ user of the hlist iterators
      wants to prefetch the next list entry, they can do so themselves
      explicitly, rather than depend on all list iterators doing so
      implicitly.
      
      Acked-by: default avatarIngo Molnar <mingo@elte.hu>
      Acked-by: default avatarDavid S. Miller <davem@davemloft.net>
      Cc: linux-arch@vger.kernel.org
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      75d65a42
    • Linus Torvalds's avatar
      Linux 2.6.39 · 61c4f2c8
      Linus Torvalds authored
      61c4f2c8
  2. 18 May, 2011 22 commits
  3. 17 May, 2011 10 commits
    • Jeff Layton's avatar
      cifs: fix cifsConvertToUCS() for the mapchars case · 11379b5e
      Jeff Layton authored
      As Metze pointed out, commit 84cdf74e broke mapchars option:
      
          Commit "cifs: fix unaligned accesses in cifsConvertToUCS"
          (84cdf74e) does multiple steps
          in just one commit (moving the function and changing it without
          testing).
      
          put_unaligned_le16(temp, &target[j]); is never called for any
          codepoint the goes via the 'default' switch statement. As a result
          we put just zero (or maybe uninitialized) bytes into the target
          buffer.
      
      His proposed patch looks correct, but doesn't apply to the current head
      of the tree. This patch should also fix it.
      
      Cc: <stable@kernel.org> # .38.x: 581ade4d
      
      : cifs: clean up various nits in unicode routines (try #2)
      Reported-by: default avatarStefan Metzmacher <metze@samba.org>
      Signed-off-by: default avatarJeff Layton <jlayton@redhat.com>
      Signed-off-by: default avatarSteve French <sfrench@us.ibm.com>
      11379b5e
    • Jeff Layton's avatar
      cifs: add fallback in is_path_accessible for old servers · 221d1d79
      Jeff Layton authored
      
      
      The is_path_accessible check uses a QPathInfo call, which isn't
      supported by ancient win9x era servers. Fall back to an older
      SMBQueryInfo call if it fails with the magic error codes.
      
      Cc: stable@kernel.org
      Reported-and-Tested-by: default avatarSandro Bonazzola <sandro.bonazzola@gmail.com>
      Signed-off-by: default avatarJeff Layton <jlayton@redhat.com>
      Signed-off-by: default avatarSteve French <sfrench@us.ibm.com>
      221d1d79
    • Linus Torvalds's avatar
      Merge branch 'timers-fixes-for-linus' of... · a085963a
      Linus Torvalds authored
      Merge branch 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
      
      * 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
        tick: Clear broadcast active bit when switching to oneshot
        rtc: mc13xxx: Don't call rtc_device_register while holding lock
        rtc: rp5c01: Initialize drvdata before registering device
        rtc: pcap: Initialize drvdata before registering device
        rtc: msm6242: Initialize drvdata before registering device
        rtc: max8998: Initialize drvdata before registering device
        rtc: max8925: Initialize drvdata before registering device
        rtc: m41t80: Initialize clientdata before registering device
        rtc: ds1286: Initialize drvdata before registering device
        rtc: ep93xx: Initialize drvdata before registering device
        rtc: davinci: Initialize drvdata before registering device
        rtc: mxc: Initialize drvdata before registering device
        clocksource: Install completely before selecting
      a085963a
    • Borislav Petkov's avatar
      x86, AMD: Fix ARAT feature setting again · 14fb57dc
      Borislav Petkov authored
      
      
      Trying to enable the local APIC timer on early K8 revisions
      uncovers a number of other issues with it, in conjunction with
      the C1E enter path on AMD. Fixing those causes much more churn
      and troubles than the benefit of using that timer brings so
      don't enable it on K8 at all, falling back to the original
      functionality the kernel had wrt to that.
      
      Reported-and-bisected-by: default avatarNick Bowler <nbowler@elliptictech.com>
      Cc: Boris Ostrovsky <Boris.Ostrovsky@amd.com>
      Cc: Andreas Herrmann <andreas.herrmann3@amd.com>
      Cc: Greg Kroah-Hartman <greg@kroah.com>
      Cc: Hans Rosenfeld <hans.rosenfeld@amd.com>
      Cc: Nick Bowler <nbowler@elliptictech.com>
      Cc: Joerg-Volker-Peetz <jvpeetz@web.de>
      Signed-off-by: default avatarBorislav Petkov <borislav.petkov@amd.com>
      Link: http://lkml.kernel.org/r/1305636919-31165-3-git-send-email-bp@amd64.org
      
      
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      14fb57dc
    • Borislav Petkov's avatar
      Revert "x86, AMD: Fix APIC timer erratum 400 affecting K8 Rev.A-E processors" · 328935e6
      Borislav Petkov authored
      This reverts commit e20a2d20, as it crashes
      certain boxes with specific AMD CPU models.
      
      Moving the lower endpoint of the Erratum 400 check to accomodate
      earlier K8 revisions (A-E) opens a can of worms which is simply
      not worth to fix properly by tweaking the errata checking
      framework:
      
      * missing IntPenging MSR on revisions < CG cause #GP:
      
      http://marc.info/?l=linux-kernel&m=130541471818831
      
      * makes earlier revisions use the LAPIC timer instead of the C1E
      idle routine which switches to HPET, thus not waking up in
      deeper C-states:
      
      http://lkml.org/lkml/2011/4/24/20
      
      
      
      Therefore, leave the original boundary starting with K8-revF.
      
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      328935e6
    • Jens Axboe's avatar
      scsi: remove performance regression due to async queue run · 9937a5e2
      Jens Axboe authored
      Commit c21e6beb
      
       removed our queue request_fn re-enter
      protection, and defaulted to always running the queues from
      kblockd to be safe. This was a known potential slow down,
      but should be safe.
      
      Unfortunately this is causing big performance regressions for
      some, so we need to improve this logic. Looking into the details
      of the re-enter, the real issue is on requeue of requests.
      
      Requeue of requests upon seeing a BUSY condition from the device
      ends up re-running the queue, causing traces like this:
      
      scsi_request_fn()
              scsi_dispatch_cmd()
                      scsi_queue_insert()
                              __scsi_queue_insert()
                                      scsi_run_queue()
      					scsi_request_fn()
      						...
      
      potentially causing the issue we want to avoid. So special
      case the requeue re-run of the queue, but improve it to offload
      the entire run of local queue and starved queue from a single
      workqueue callback. This is a lot better than potentially
      kicking off a workqueue run for each device seen.
      
      This also fixes the issue of the local device going into recursion,
      since the above mentioned commit never moved that queue run out
      of line.
      
      Signed-off-by: default avatarJens Axboe <jaxboe@fusionio.com>
      9937a5e2
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6 · c1d10d18
      Linus Torvalds authored
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6:
        net: Change netdev_fix_features messages loglevel
        vmxnet3: Fix inconsistent LRO state after initialization
        sfc: Fix oops in register dump after mapping change
        IPVS: fix netns if reading ip_vs_* procfs entries
        bridge: fix forwarding of IPv6
      c1d10d18
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/cjb/mmc · 477de0de
      Linus Torvalds authored
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/cjb/mmc:
        Revert "mmc: fix a race between card-detect rescan and clock-gate work instances"
      477de0de
    • Randy Dunlap's avatar
      mm: fix kernel-doc warning in page_alloc.c · b5e6ab58
      Randy Dunlap authored
      
      
      Fix new kernel-doc warning in mm/page_alloc.c:
      
        Warning(mm/page_alloc.c:2370): No description found for parameter 'nid'
      
      Signed-off-by: default avatarRandy Dunlap <randy.dunlap@oracle.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      b5e6ab58
    • Yinghai Lu's avatar
      PCI: Clear bridge resource flags if requested size is 0 · 93d2175d
      Yinghai Lu authored
      During pci remove/rescan testing found:
      
        pci 0000:c0:03.0: PCI bridge to [bus c4-c9]
        pci 0000:c0:03.0:   bridge window [io  0x1000-0x0fff]
        pci 0000:c0:03.0:   bridge window [mem 0xf0000000-0xf00fffff]
        pci 0000:c0:03.0:   bridge window [mem 0xfc180000000-0xfc197ffffff 64bit pref]
        pci 0000:c0:03.0: device not available (can't reserve [io  0x1000-0x0fff])
        pci 0000:c0:03.0: Error enabling bridge (-22), continuing
        pci 0000:c0:03.0: enabling bus mastering
        pci 0000:c0:03.0: setting latency timer to 64
        pcieport 0000:c0:03.0: device not available (can't reserve [io  0x1000-0x0fff])
        pcieport: probe of 0000:c0:03.0 failed with error -22
      
      This bug was caused by commit c8adf9a3
      
       ("PCI: pre-allocate
      additional resources to devices only after successful allocation of
      essential resources.")
      
      After that commit, pci_hotplug_io_size is changed to additional_io_size
      from minium size.  So it will not go through resource_size(res) != 0
      path, and will not be reset.
      
      The root cause is: pci_bridge_check_ranges will set RESOURCE_IO flag for
      pci bridge, and later if children do not need IO resource.  those bridge
      resources will not need to be allocated.  but flags is still there.
      that will confuse the the pci_enable_bridges later.
      
      related code:
      
         static void assign_requested_resources_sorted(struct resource_list *head,
                                          struct resource_list_x *fail_head)
         {
                 struct resource *res;
                 struct resource_list *list;
                 int idx;
      
                 for (list = head->next; list; list = list->next) {
                         res = list->res;
                         idx = res - &list->dev->resource[0];
                         if (resource_size(res) && pci_assign_resource(list->dev, idx)) {
         ...
                                 reset_resource(res);
                         }
                 }
         }
      
      At last, We have to clear the flags in pbus_size_mem/io when requested
      size == 0 and !add_head.  becasue this case it will not go through
      adjust_resources_sorted().
      
      Just make size1 = size0 when !add_head. it will make flags get cleared.
      
      At the same time when requested size == 0, add_size != 0, will still
      have in head and add_list.  because we do not clear the flags for it.
      
      After this, we will get right result:
      
        pci 0000:c0:03.0: PCI bridge to [bus c4-c9]
        pci 0000:c0:03.0:   bridge window [io  disabled]
        pci 0000:c0:03.0:   bridge window [mem 0xf0000000-0xf00fffff]
        pci 0000:c0:03.0:   bridge window [mem 0xfc180000000-0xfc197ffffff 64bit pref]
        pci 0000:c0:03.0: enabling bus mastering
        pci 0000:c0:03.0: setting latency timer to 64
        pcieport 0000:c0:03.0: setting latency timer to 64
        pcieport 0000:c0:03.0: irq 160 for MSI/MSI-X
        pcieport 0000:c0:03.0: Signaling PME through PCIe PME interrupt
        pci 0000:c4:00.0: Signaling PME through PCIe PME interrupt
        pcie_pme 0000:c0:03.0:pcie01: service driver pcie_pme loaded
        aer 0000:c0:03.0:pcie02: service driver aer loaded
        pciehp 0000:c0:03.0:pcie04: Hotplug Controller:
      
      v3: more simple fix. also fix one typo in pbus_size_mem
      
      Signed-off-by: default avatarYinghai Lu <yinghai@kernel.org>
      Reviewed-by: default avatarRam Pai <linuxram@us.ibm.com>
      Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
      Cc: Bjorn Helgaas <bhelgaas@google.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      93d2175d
  4. 16 May, 2011 5 commits
    • Thomas Gleixner's avatar
      tick: Clear broadcast active bit when switching to oneshot · 07f4beb0
      Thomas Gleixner authored
      
      
      The first cpu which switches from periodic to oneshot mode switches
      also the broadcast device into oneshot mode. The broadcast device
      serves as a backup for per cpu timers which stop in deeper
      C-states. To avoid starvation of the cpus which might be in idle and
      depend on broadcast mode it marks the other cpus as broadcast active
      and sets the brodcast expiry value of those cpus to the next tick.
      
      The oneshot mode broadcast bit for the other cpus is sticky and gets
      only cleared when those cpus exit idle. If a cpu was not idle while
      the bit got set in consequence the bit prevents that the broadcast
      device is armed on behalf of that cpu when it enters idle for the
      first time after it switched to oneshot mode.
      
      In most cases that goes unnoticed as one of the other cpus has usually
      a timer pending which keeps the broadcast device armed with a short
      timeout. Now if the only cpu which has a short timer active has the
      bit set then the broadcast device will not be armed on behalf of that
      cpu and will fire way after the expected timer expiry. In the case of
      Christians bug report it took ~145 seconds which is about half of the
      wrap around time of HPET (the limit for that device) due to the fact
      that all other cpus had no timers armed which expired before the 145
      seconds timeframe.
      
      The solution is simply to clear the broadcast active bit
      unconditionally when a cpu switches to oneshot mode after the first
      cpu switched the broadcast device over. It's not idle at that point
      otherwise it would not be executing that code.
      
      [ I fundamentally hate that broadcast crap. Why the heck thought some
        folks that when going into deep idle it's a brilliant concept to
        switch off the last device which brings the cpu back from that
        state? ]
      
      Thanks to Christian for providing all the valuable debug information!
      
      Reported-and-tested-by: default avatarChristian Hoffmann <email@christianhoffmann.info>
      Cc: John Stultz <johnstul@us.ibm.com>
      Link: http://lkml.kernel.org/r/%3Calpine.LFD.2.02.1105161105170.3078%40ionos%3E
      
      
      Cc: stable@kernel.org
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      07f4beb0
    • Michał Mirosław's avatar
      net: Change netdev_fix_features messages loglevel · 6f404e44
      Michał Mirosław authored
      
      
      Those reduced to DEBUG can possibly be triggered by unprivileged processes
      and are nothing exceptional. Illegal checksum combinations can only be
      caused by driver bug, so promote those messages to WARN.
      
      Since GSO without SG will now only cause DEBUG message from
      netdev_fix_features(), remove the workaround from register_netdevice().
      
      Signed-off-by: default avatarMichał Mirosław <mirq-linux@rere.qmqm.pl>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6f404e44
    • Thomas Jarosch's avatar
      vmxnet3: Fix inconsistent LRO state after initialization · ebde6f8a
      Thomas Jarosch authored
      During initialization of vmxnet3, the state of LRO
      gets out of sync with netdev->features.
      
      This leads to very poor TCP performance in a IP forwarding
      setup and is hitting many VMware users.
      
      Simplified call sequence:
      1. vmxnet3_declare_features() initializes "adapter->lro" to true.
      
      2. The kernel automatically disables LRO if IP forwarding is enabled,
      so vmxnet3_set_flags() gets called. This also updates netdev->features.
      
      3. Now vmxnet3_setup_driver_shared() is called. "adapter->lro" is still
      set to true and LRO gets enabled again, even though
      netdev->features shows it's disabled.
      
      Fix it by updating "adapter->lro", too.
      
      The private vmxnet3 adapter flags are scheduled for removal
      in net-next, see commit a0d2730c
      
      
      "net: vmxnet3: convert to hw_features".
      
      Patch applies to 2.6.37 / 2.6.38 and 2.6.39-rc6.
      
      Please CC: comments.
      
      Signed-off-by: default avatarThomas Jarosch <thomas.jarosch@intra2net.com>
      Acked-by: default avatarStephen Hemminger <shemminger@vyatta.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ebde6f8a
    • Ben Hutchings's avatar
      sfc: Fix oops in register dump after mapping change · 867955f5
      Ben Hutchings authored
      Commit 747df225
      
       ('sfc: Always map MCDI
      shared memory as uncacheable') introduced a separate mapping for the
      MCDI shared memory (MC_TREG_SMEM).  This means we can no longer easily
      include it in the register dump.  Since it is not particularly useful
      in debugging, substitute a recognisable dummy value.
      
      Signed-off-by: default avatarBen Hutchings <bhutchings@solarflare.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      867955f5
    • Linus Torvalds's avatar
      Merge branch 'omap-fixes-for-linus' of... · df8d06ad
      Linus Torvalds authored
      Merge branch 'omap-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap-2.6
      
      * 'omap-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap-2.6:
        OMAP3: set the core dpll clk rate in its set_rate function
        omap: iommu: Return IRQ_HANDLED in fault handler when no fault occured
      df8d06ad