1. 20 Feb, 2014 2 commits
  2. 14 Feb, 2014 1 commit
  3. 13 Feb, 2014 1 commit
  4. 11 Feb, 2014 2 commits
  5. 10 Feb, 2014 2 commits
    • Christoph Hellwig's avatar
      blk-mq: rework flush sequencing logic · 18741986
      Christoph Hellwig authored
      
      
      Witch to using a preallocated flush_rq for blk-mq similar to what's done
      with the old request path.  This allows us to set up the request properly
      with a tag from the actually allowed range and ->rq_disk as needed by
      some drivers.  To make life easier we also switch to dynamic allocation
      of ->flush_rq for the old path.
      
      This effectively reverts most of
      
          "blk-mq: fix for flush deadlock"
      
      and
      
          "blk-mq: Don't reserve a tag for flush request"
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      18741986
    • Christoph Hellwig's avatar
      blk-mq: rework I/O completions · 30a91cb4
      Christoph Hellwig authored
      
      
      Rework I/O completions to work more like the old code path.  blk_mq_end_io
      now stays out of the business of deferring completions to others CPUs
      and calling blk_mark_rq_complete.  The latter is very important to allow
      completing requests that have timed out and thus are already marked completed,
      the former allows using the IPI callout even for driver specific completions
      instead of having to reimplement them.
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      30a91cb4
  6. 09 Feb, 2014 3 commits
  7. 07 Feb, 2014 4 commits
  8. 06 Feb, 2014 2 commits
    • Shaohua Li's avatar
      swap: add a simple detector for inappropriate swapin readahead · 579f8290
      Shaohua Li authored
      
      
      This is a patch to improve swap readahead algorithm.  It's from Hugh and
      I slightly changed it.
      
      Hugh's original changelog:
      
      swapin readahead does a blind readahead, whether or not the swapin is
      sequential.  This may be ok on harddisk, because large reads have
      relatively small costs, and if the readahead pages are unneeded they can
      be reclaimed easily - though, what if their allocation forced reclaim of
      useful pages? But on SSD devices large reads are more expensive than
      small ones: if the readahead pages are unneeded, reading them in caused
      significant overhead.
      
      This patch adds very simplistic random read detection.  Stealing the
      PageReadahead technique from Konstantin Khlebnikov's patch, avoiding the
      vma/anon_vma sophistications of Shaohua Li's patch, swapin_nr_pages()
      simply looks at readahead's current success rate, and narrows or widens
      its readahead window accordingly.  There is little science to its
      heuristic: it's about as stupid as can be whilst remaining effective.
      
      The table below shows elapsed times (in centiseconds) when running a
      single repetitive swapping load across a 1000MB mapping in 900MB ram
      with 1GB swap (the harddisk tests had taken painfully too long when I
      used mem=500M, but SSD shows similar results for that).
      
      Vanilla is the 3.6-rc7 kernel on which I started; Shaohua denotes his
      Sep 3 patch in mmotm and linux-next; HughOld denotes my Oct 1 patch
      which Shaohua showed to be defective; HughNew this Nov 14 patch, with
      page_cluster as usual at default of 3 (8-page reads); HughPC4 this same
      patch with page_cluster 4 (16-page reads); HughPC0 with page_cluster 0
      (1-page reads: no readahead).
      
      HDD for swapping to harddisk, SSD for swapping to VertexII SSD.  Seq for
      sequential access to the mapping, cycling five times around; Rand for
      the same number of random touches.  Anon for a MAP_PRIVATE anon mapping;
      Shmem for a MAP_SHARED anon mapping, equivalent to tmpfs.
      
      One weakness of Shaohua's vma/anon_vma approach was that it did not
      optimize Shmem: seen below.  Konstantin's approach was perhaps mistuned,
      50% slower on Seq: did not compete and is not shown below.
      
      HDD        Vanilla Shaohua HughOld HughNew HughPC4 HughPC0
      Seq Anon     73921   76210   75611   76904   78191  121542
      Seq Shmem    73601   73176   73855   72947   74543  118322
      Rand Anon   895392  831243  871569  845197  846496  841680
      Rand Shmem 1058375 1053486  827935  764955  764376  756489
      
      SSD        Vanilla Shaohua HughOld HughNew HughPC4 HughPC0
      Seq Anon     24634   24198   24673   25107   21614   70018
      Seq Shmem    24959   24932   25052   25703   22030   69678
      Rand Anon    43014   26146   28075   25989   26935   25901
      Rand Shmem   45349   45215   28249   24268   24138   24332
      
      These tests are, of course, two extremes of a very simple case: under
      heavier mixed loads I've not yet observed any consistent improvement or
      degradation, and wider testing would be welcome.
      
      Shaohua Li:
      
      Test shows Vanilla is slightly better in sequential workload than Hugh's
      patch.  I observed with Hugh's patch sometimes the readahead size is
      shrinked too fast (from 8 to 1 immediately) in sequential workload if
      there is no hit.  And in such case, continuing doing readahead is good
      actually.
      
      I don't prepare a sophisticated algorithm for the sequential workload
      because so far we can't guarantee sequential accessed pages are swap out
      sequentially.  So I slightly change Hugh's heuristic - don't shrink
      readahead size too fast.
      
      Here is my test result (unit second, 3 runs average):
      	Vanilla		Hugh		New
      Seq	356		370		360
      Random	4525		2447		2444
      
      Attached graph is the swapin/swapout throughput I collected with 'vmstat
      2'.  The first part is running a random workload (till around 1200 of
      the x-axis) and the second part is running a sequential workload.
      swapin and swapout throughput are almost identical in steady state in
      both workloads.  These are expected behavior.  while in Vanilla, swapin
      is much bigger than swapout especially in random workload (because wrong
      readahead).
      
      Original patches by: Shaohua Li and Konstantin Khlebnikov.
      
      [fengguang.wu@intel.com: swapin_nr_pages() can be static]
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Signed-off-by: default avatarShaohua Li <shli@fusionio.com>
      Signed-off-by: default avatarFengguang Wu <fengguang.wu@intel.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Wu Fengguang <fengguang.wu@intel.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Konstantin Khlebnikov <khlebnikov@openvz.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      579f8290
    • Lars-Peter Clausen's avatar
      gpio: consumer.h: Move forward declarations outside #ifdef · a3485d08
      Lars-Peter Clausen authored
      
      
      Make sure that the forward declared structs in gpio/consumer.h are also visible
      on the else branch of the CONFIG_GPIOLIB #ifdef.
      
      Fixes the following warnings and their associated errors when CONFIG_GPIOLIB is
      not selected:
      	include/linux/gpio/consumer.h:67:14: warning: 'struct device' declared inside parameter list
      	include/linux/gpio/consumer.h:67:14: warning: its scope is only this definition or declaration, which is probably not what you want
      	[...]
      Signed-off-by: default avatarLars-Peter Clausen <lars@metafoo.de>
      Reviewed-by: default avatarAlexandre Courbot <acourbot@nvidia.com>
      Signed-off-by: default avatarLinus Walleij <linus.walleij@linaro.org>
      a3485d08
  9. 05 Feb, 2014 4 commits
  10. 01 Feb, 2014 1 commit
  11. 31 Jan, 2014 7 commits
  12. 30 Jan, 2014 4 commits
    • Shaohua Li's avatar
      blk-mq: Don't reserve a tag for flush request · f0276924
      Shaohua Li authored
      
      
      Reserving a tag (request) for flush to avoid dead lock is a overkill. A
      tag is valuable resource. We can track the number of flush requests and
      disallow having too many pending flush requests allocated. With this
      patch, blk_mq_alloc_request_pinned() could do a busy nop (but not a dead
      loop) if too many pending requests are allocated and new flush request
      is allocated. But this should not be a problem, too many pending flush
      requests are very rare case.
      
      I verified this can fix the deadlock caused by too many pending flush
      requests.
      Signed-off-by: default avatarShaohua Li <shli@fusionio.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      f0276924
    • Heiko Carstens's avatar
      fs/compat: fix lookup_dcookie() parameter handling · d8d14bd0
      Heiko Carstens authored
      Commit d5dc77bf
      
       ("consolidate compat lookup_dcookie()") coverted all
      architectures to the new compat_sys_lookup_dcookie() syscall.
      
      The "len" paramater of the new compat syscall must have the type
      compat_size_t in order to enforce zero extension for architectures where
      the ABI requires that the caller of a function performed zero and/or
      sign extension to 64 bit of all parameters.
      Signed-off-by: default avatarHeiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: <stable@vger.kernel.org>	[v3.10+]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      d8d14bd0
    • Heiko Carstens's avatar
      fs/compat: fix parameter handling for compat readv/writev syscalls · dfd948e3
      Heiko Carstens authored
      We got a report that the pwritev syscall does not work correctly in
      compat mode on s390.
      
      It turned out that with commit 72ec3516
      
       ("switch compat readv/writev
      variants to COMPAT_SYSCALL_DEFINE") we lost the zero extension of a
      couple of syscall parameters because the some parameter types haven't
      been converted from unsigned long to compat_ulong_t.
      
      This is needed for architectures where the ABI requires that the caller
      of a function performed zero and/or sign extension to 64 bit of all
      parameters.
      Signed-off-by: default avatarHeiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: <stable@vger.kernel.org>	[v3.10+]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      dfd948e3
    • Johannes Weiner's avatar
      mm/page-writeback.c: do not count anon pages as dirtyable memory · a1c3bfb2
      Johannes Weiner authored
      
      
      The VM is currently heavily tuned to avoid swapping.  Whether that is
      good or bad is a separate discussion, but as long as the VM won't swap
      to make room for dirty cache, we can not consider anonymous pages when
      calculating the amount of dirtyable memory, the baseline to which
      dirty_background_ratio and dirty_ratio are applied.
      
      A simple workload that occupies a significant size (40+%, depending on
      memory layout, storage speeds etc.) of memory with anon/tmpfs pages and
      uses the remainder for a streaming writer demonstrates this problem.  In
      that case, the actual cache pages are a small fraction of what is
      considered dirtyable overall, which results in an relatively large
      portion of the cache pages to be dirtied.  As kswapd starts rotating
      these, random tasks enter direct reclaim and stall on IO.
      
      Only consider free pages and file pages dirtyable.
      Signed-off-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Reported-by: default avatarTejun Heo <tj@kernel.org>
      Tested-by: default avatarTejun Heo <tj@kernel.org>
      Reviewed-by: default avatarRik van Riel <riel@redhat.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Wu Fengguang <fengguang.wu@intel.com>
      Reviewed-by: default avatarMichal Hocko <mhocko@suse.cz>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      a1c3bfb2
  13. 29 Jan, 2014 2 commits
  14. 28 Jan, 2014 5 commits