1. 28 May, 2014 1 commit
  2. 16 Apr, 2014 1 commit
  3. 15 Apr, 2014 2 commits
  4. 09 Apr, 2014 1 commit
  5. 21 Mar, 2014 1 commit
  6. 09 Mar, 2014 1 commit
  7. 21 Feb, 2014 1 commit
  8. 10 Feb, 2014 1 commit
    • Christoph Hellwig's avatar
      blk-mq: rework flush sequencing logic · 18741986
      Christoph Hellwig authored
      
      
      Witch to using a preallocated flush_rq for blk-mq similar to what's done
      with the old request path.  This allows us to set up the request properly
      with a tag from the actually allowed range and ->rq_disk as needed by
      some drivers.  To make life easier we also switch to dynamic allocation
      of ->flush_rq for the old path.
      
      This effectively reverts most of
      
          "blk-mq: fix for flush deadlock"
      
      and
      
          "blk-mq: Don't reserve a tag for flush request"
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      18741986
  9. 30 Jan, 2014 1 commit
    • Shaohua Li's avatar
      blk-mq: Don't reserve a tag for flush request · f0276924
      Shaohua Li authored
      
      
      Reserving a tag (request) for flush to avoid dead lock is a overkill. A
      tag is valuable resource. We can track the number of flush requests and
      disallow having too many pending flush requests allocated. With this
      patch, blk_mq_alloc_request_pinned() could do a busy nop (but not a dead
      loop) if too many pending requests are allocated and new flush request
      is allocated. But this should not be a problem, too many pending flush
      requests are very rare case.
      
      I verified this can fix the deadlock caused by too many pending flush
      requests.
      Signed-off-by: default avatarShaohua Li <shli@fusionio.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      f0276924
  10. 24 Nov, 2013 3 commits
    • Kent Overstreet's avatar
      block: submit_bio_wait() conversions · c170bbb4
      Kent Overstreet authored
      
      
      It was being open coded in a few places.
      Signed-off-by: default avatarKent Overstreet <kmo@daterainc.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Joern Engel <joern@logfs.org>
      Cc: Prasad Joshi <prasadjoshi.linux@gmail.com>
      Cc: Neil Brown <neilb@suse.de>
      Cc: Chris Mason <chris.mason@fusionio.com>
      Acked-by: default avatarNeilBrown <neilb@suse.de>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      c170bbb4
    • Kent Overstreet's avatar
      block: Abstract out bvec iterator · 4f024f37
      Kent Overstreet authored
      
      
      Immutable biovecs are going to require an explicit iterator. To
      implement immutable bvecs, a later patch is going to add a bi_bvec_done
      member to this struct; for now, this patch effectively just renames
      things.
      Signed-off-by: default avatarKent Overstreet <kmo@daterainc.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: "Ed L. Cashin" <ecashin@coraid.com>
      Cc: Nick Piggin <npiggin@kernel.dk>
      Cc: Lars Ellenberg <drbd-dev@lists.linbit.com>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Matthew Wilcox <willy@linux.intel.com>
      Cc: Geoff Levand <geoff@infradead.org>
      Cc: Yehuda Sadeh <yehuda@inktank.com>
      Cc: Sage Weil <sage@inktank.com>
      Cc: Alex Elder <elder@inktank.com>
      Cc: ceph-devel@vger.kernel.org
      Cc: Joshua Morris <josh.h.morris@us.ibm.com>
      Cc: Philip Kelleher <pjk1939@linux.vnet.ibm.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: "Michael S. Tsirkin" <mst@redhat.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Jeremy Fitzhardinge <jeremy@goop.org>
      Cc: Neil Brown <neilb@suse.de>
      Cc: Alasdair Kergon <agk@redhat.com>
      Cc: Mike Snitzer <snitzer@redhat.com>
      Cc: dm-devel@redhat.com
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: linux390@de.ibm.com
      Cc: Boaz Harrosh <bharrosh@panasas.com>
      Cc: Benny Halevy <bhalevy@tonian.com>
      Cc: "James E.J. Bottomley" <JBottomley@parallels.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: "Nicholas A. Bellinger" <nab@linux-iscsi.org>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Chris Mason <chris.mason@fusionio.com>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      Cc: Andreas Dilger <adilger.kernel@dilger.ca>
      Cc: Jaegeuk Kim <jaegeuk.kim@samsung.com>
      Cc: Steven Whitehouse <swhiteho@redhat.com>
      Cc: Dave Kleikamp <shaggy@kernel.org>
      Cc: Joern Engel <joern@logfs.org>
      Cc: Prasad Joshi <prasadjoshi.linux@gmail.com>
      Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
      Cc: KONISHI Ryusuke <konishi.ryusuke@lab.ntt.co.jp>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Ben Myers <bpm@sgi.com>
      Cc: xfs@oss.sgi.com
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Len Brown <len.brown@intel.com>
      Cc: Pavel Machek <pavel@ucw.cz>
      Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
      Cc: Herton Ronaldo Krzesinski <herton.krzesinski@canonical.com>
      Cc: Ben Hutchings <ben@decadent.org.uk>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Guo Chao <yan@linux.vnet.ibm.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Asai Thambi S P <asamymuthupa@micron.com>
      Cc: Selvan Mani <smani@micron.com>
      Cc: Sam Bradshaw <sbradshaw@micron.com>
      Cc: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
      Cc: "Roger Pau Monné" <roger.pau@citrix.com>
      Cc: Jan Beulich <jbeulich@suse.com>
      Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
      Cc: Ian Campbell <Ian.Campbell@citrix.com>
      Cc: Sebastian Ott <sebott@linux.vnet.ibm.com>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Jiang Liu <jiang.liu@huawei.com>
      Cc: Nitin Gupta <ngupta@vflare.org>
      Cc: Jerome Marchand <jmarchand@redhat.com>
      Cc: Joe Perches <joe@perches.com>
      Cc: Peng Tao <tao.peng@emc.com>
      Cc: Andy Adamson <andros@netapp.com>
      Cc: fanchaoting <fanchaoting@cn.fujitsu.com>
      Cc: Jie Liu <jeff.liu@oracle.com>
      Cc: Sunil Mushran <sunil.mushran@gmail.com>
      Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
      Cc: Namjae Jeon <namjae.jeon@samsung.com>
      Cc: Pankaj Kumar <pankaj.km@samsung.com>
      Cc: Dan Magenheimer <dan.magenheimer@oracle.com>
      Cc: Mel Gorman <mgorman@suse.de>6
      4f024f37
    • Kent Overstreet's avatar
      block: submit_bio_wait() conversions · 33879d45
      Kent Overstreet authored
      
      
      It was being open coded in a few places.
      Signed-off-by: default avatarKent Overstreet <kmo@daterainc.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Joern Engel <joern@logfs.org>
      Cc: Prasad Joshi <prasadjoshi.linux@gmail.com>
      Cc: Neil Brown <neilb@suse.de>
      Cc: Chris Mason <chris.mason@fusionio.com>
      Acked-by: default avatarNeilBrown <neilb@suse.de>
      33879d45
  11. 28 Oct, 2013 1 commit
    • Christoph Hellwig's avatar
      blk-mq: fix for flush deadlock · 3228f48b
      Christoph Hellwig authored
      
      
      The flush state machine takes in a struct request, which then is
      submitted multiple times to the underling driver.  The old block code
      requeses the same request for each of those, so it does not have an
      issue with tapping into the request pool.  The new one on the other hand
      allocates a new request for each of the actualy steps of the flush
      sequence. If have already allocated all of the tags for IO, we will
      fail allocating the flush request.
      
      Set aside a reserved request just for flushes.
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      3228f48b
  12. 25 Oct, 2013 1 commit
    • Jens Axboe's avatar
      blk-mq: new multi-queue block IO queueing mechanism · 320ae51f
      Jens Axboe authored
      
      
      Linux currently has two models for block devices:
      
      - The classic request_fn based approach, where drivers use struct
        request units for IO. The block layer provides various helper
        functionalities to let drivers share code, things like tag
        management, timeout handling, queueing, etc.
      
      - The "stacked" approach, where a driver squeezes in between the
        block layer and IO submitter. Since this bypasses the IO stack,
        driver generally have to manage everything themselves.
      
      With drivers being written for new high IOPS devices, the classic
      request_fn based driver doesn't work well enough. The design dates
      back to when both SMP and high IOPS was rare. It has problems with
      scaling to bigger machines, and runs into scaling issues even on
      smaller machines when you have IOPS in the hundreds of thousands
      per device.
      
      The stacked approach is then most often selected as the model
      for the driver. But this means that everybody has to re-invent
      everything, and along with that we get all the problems again
      that the shared approach solved.
      
      This commit introduces blk-mq, block multi queue support. The
      design is centered around per-cpu queues for queueing IO, which
      then funnel down into x number of hardware submission queues.
      We might have a 1:1 mapping between the two, or it might be
      an N:M mapping. That all depends on what the hardware supports.
      
      blk-mq provides various helper functions, which include:
      
      - Scalable support for request tagging. Most devices need to
        be able to uniquely identify a request both in the driver and
        to the hardware. The tagging uses per-cpu caches for freed
        tags, to enable cache hot reuse.
      
      - Timeout handling without tracking request on a per-device
        basis. Basically the driver should be able to get a notification,
        if a request happens to fail.
      
      - Optional support for non 1:1 mappings between issue and
        submission queues. blk-mq can redirect IO completions to the
        desired location.
      
      - Support for per-request payloads. Drivers almost always need
        to associate a request structure with some driver private
        command structure. Drivers can tell blk-mq this at init time,
        and then any request handed to the driver will have the
        required size of memory associated with it.
      
      - Support for merging of IO, and plugging. The stacked model
        gets neither of these. Even for high IOPS devices, merging
        sequential IO reduces per-command overhead and thus
        increases bandwidth.
      
      For now, this is provided as a potential 3rd queueing model, with
      the hope being that, as it matures, it can replace both the classic
      and stacked model. That would get us back to having just 1 real
      model for block devices, leaving the stacked approach to dm/md
      devices (as it was originally intended).
      
      Contributions in this patch from the following people:
      
      Shaohua Li <shli@fusionio.com>
      Alexander Gordeev <agordeev@redhat.com>
      Christoph Hellwig <hch@infradead.org>
      Mike Christie <michaelc@cs.wisc.edu>
      Matias Bjorling <m@bjorling.me>
      Jeff Moyer <jmoyer@redhat.com>
      Acked-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      320ae51f
  13. 22 Mar, 2013 1 commit
  14. 15 Feb, 2013 1 commit
    • Vladimir Davydov's avatar
      block: account iowait time when waiting for completion of IO request · 5577022f
      Vladimir Davydov authored
      
      
      Using wait_for_completion() for waiting for a IO request to be executed
      results in wrong iowait time accounting. For example, a system having
      the only task doing write() and fdatasync() on a block device can be
      reported being idle instead of iowaiting as it should because
      blkdev_issue_flush() calls wait_for_completion() which in turn calls
      schedule() that does not increment the iowait proc counter and thus does
      not turn on iowait time accounting.
      
      The patch makes block layer use wait_for_completion_io() instead of
      wait_for_completion() where appropriate to account iowait time
      correctly.
      Signed-off-by: default avatarVladimir Davydov <vdavydov@parallels.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      5577022f
  15. 24 Oct, 2011 2 commits
  16. 15 Aug, 2011 1 commit
    • Jeff Moyer's avatar
      block: fix flush machinery for stacking drivers with differring flush flags · 4853abaa
      Jeff Moyer authored
      Commit ae1b1539
      
      , block: reimplement
      FLUSH/FUA to support merge, introduced a performance regression when
      running any sort of fsyncing workload using dm-multipath and certain
      storage (in our case, an HP EVA).  The test I ran was fs_mark, and it
      dropped from ~800 files/sec on ext4 to ~100 files/sec.  It turns out
      that dm-multipath always advertised flush+fua support, and passed
      commands on down the stack, where those flags used to get stripped off.
      The above commit changed that behavior:
      
      static inline struct request *__elv_next_request(struct request_queue *q)
      {
              struct request *rq;
      
              while (1) {
      -               while (!list_empty(&q->queue_head)) {
      +               if (!list_empty(&q->queue_head)) {
                              rq = list_entry_rq(q->queue_head.next);
      -                       if (!(rq->cmd_flags & (REQ_FLUSH | REQ_FUA)) ||
      -                           (rq->cmd_flags & REQ_FLUSH_SEQ))
      -                               return rq;
      -                       rq = blk_do_flush(q, rq);
      -                       if (rq)
      -                               return rq;
      +                       return rq;
                      }
      
      Note that previously, a command would come in here, have
      REQ_FLUSH|REQ_FUA set, and then get handed off to blk_do_flush:
      
      struct request *blk_do_flush(struct request_queue *q, struct request *rq)
      {
              unsigned int fflags = q->flush_flags; /* may change, cache it */
              bool has_flush = fflags & REQ_FLUSH, has_fua = fflags & REQ_FUA;
              bool do_preflush = has_flush && (rq->cmd_flags & REQ_FLUSH);
              bool do_postflush = has_flush && !has_fua && (rq->cmd_flags &
              REQ_FUA);
              unsigned skip = 0;
      ...
              if (blk_rq_sectors(rq) && !do_preflush && !do_postflush) {
                      rq->cmd_flags &= ~REQ_FLUSH;
      		if (!has_fua)
      			rq->cmd_flags &= ~REQ_FUA;
      	        return rq;
      	}
      
      So, the flush machinery was bypassed in such cases (q->flush_flags == 0
      && rq->cmd_flags & (REQ_FLUSH|REQ_FUA)).
      
      Now, however, we don't get into the flush machinery at all.  Instead,
      __elv_next_request just hands a request with flush and fua bits set to
      the scsi_request_fn, even if the underlying request_queue does not
      support flush or fua.
      
      The agreed upon approach is to fix the flush machinery to allow
      stacking.  While this isn't used in practice (since there is only one
      request-based dm target, and that target will now reflect the flush
      flags of the underlying device), it does future-proof the solution, and
      make it function as designed.
      
      In order to make this work, I had to add a field to the struct request,
      inside the flush structure (to store the original req->end_io).  Shaohua
      had suggested overloading the union with rb_node and completion_data,
      but the completion data is used by device mapper and can also be used by
      other drivers.  So, I didn't see a way around the additional field.
      
      I tested this patch on an HP EVA with both ext4 and xfs, and it recovers
      the lost performance.  Comments and other testers, as always, are
      appreciated.
      
      Cheers,
      Jeff
      Signed-off-by: default avatarJeff Moyer <jmoyer@redhat.com>
      Acked-by: default avatarTejun Heo <tj@kernel.org>
      Signed-off-by: default avatarJens Axboe <jaxboe@fusionio.com>
      4853abaa
  17. 09 Aug, 2011 1 commit
    • Jeff Moyer's avatar
      allow blk_flush_policy to return REQ_FSEQ_DATA independent of *FLUSH · fa1bf42f
      Jeff Moyer authored
      
      
      blk_insert_flush has the following check:
      
      	/*
      	 * If there's data but flush is not necessary, the request can be
      	 * processed directly without going through flush machinery.  Queue
      	 * for normal execution.
      	 */
      	if ((policy & REQ_FSEQ_DATA) &&
      	    !(policy & (REQ_FSEQ_PREFLUSH | REQ_FSEQ_POSTFLUSH))) {
      		list_add_tail(&rq->queuelist, &q->queue_head);
      		return;
      	}
      
      However, blk_flush_policy will not return with policy set to only
      REQ_FSEQ_DATA:
      
      static unsigned int blk_flush_policy(unsigned int fflags, struct request *rq)
      {
      	unsigned int policy = 0;
      
      	if (fflags & REQ_FLUSH) {
      		if (rq->cmd_flags & REQ_FLUSH)
      			policy |= REQ_FSEQ_PREFLUSH;
      		if (blk_rq_sectors(rq))
      			policy |= REQ_FSEQ_DATA;
      		if (!(fflags & REQ_FUA) && (rq->cmd_flags & REQ_FUA))
      			policy |= REQ_FSEQ_POSTFLUSH;
      	}
      	return policy;
      }
      
      Notice that REQ_FSEQ_DATA is only set if REQ_FLUSH is set.  Fix this
      mismatch by moving the setting of REQ_FSEQ_DATA outside of the REQ_FLUSH
      check.
      
      Tejun notes:
      
        Hmmm... yes, this can become a correctness issue if (and only if)
        blk_queue_flush() is called to change q->flush_flags while requests
        are in-flight; otherwise, requests wouldn't reach the function at all.
        Also, I think it would be a generally good idea to always set
        FSEQ_DATA if the request has data.
      
      Cheers,
      Jeff
      Signed-off-by: default avatarJeff Moyer <jmoyer@redhat.com>
      Acked-by: default avatarTejun Heo <tj@kernel.org>
      Signed-off-by: default avatarJens Axboe <jaxboe@fusionio.com>
      fa1bf42f
  18. 06 May, 2011 1 commit
    • shaohua.li@intel.com's avatar
      block: hold queue if flush is running for non-queueable flush drive · 3ac0cc45
      shaohua.li@intel.com authored
      In some drives, flush requests are non-queueable. When flush request is
      running, normal read/write requests can't run. If block layer dispatches
      such request, driver can't handle it and requeue it.  Tejun suggested we
      can hold the queue when flush is running. This can avoid unnecessary
      requeue.  Also this can improve performance. For example, we have
      request flush1, write1, flush 2. flush1 is dispatched, then queue is
      hold, write1 isn't inserted to queue. After flush1 is finished, flush2
      will be dispatched. Since disk cache is already clean, flush2 will be
      finished very soon, so looks like flush2 is folded to flush1.
      
      In my test, the queue holding completely solves a regression introduced by
      commit 53d63e6b
      
      :
      
          block: make the flush insertion use the tail of the dispatch list
      
          It's not a preempt type request, in fact we have to insert it
          behind requests that do specify INSERT_FRONT.
      
      which causes about 20% regression running a sysbench fileio
      workload.
      
      Stable: 2.6.39 only
      
      Cc: stable@kernel.org
      Signed-off-by: default avatarShaohua Li <shaohua.li@intel.com>
      Acked-by: default avatarTejun Heo <tj@kernel.org>
      Signed-off-by: default avatarJens Axboe <jaxboe@fusionio.com>
      3ac0cc45
  19. 18 Apr, 2011 1 commit
  20. 05 Apr, 2011 2 commits
  21. 10 Mar, 2011 2 commits
    • Jens Axboe's avatar
      block: remove per-queue plugging · 7eaceacc
      Jens Axboe authored
      
      
      Code has been converted over to the new explicit on-stack plugging,
      and delay users have been converted to use the new API for that.
      So lets kill off the old plugging along with aops->sync_page().
      Signed-off-by: default avatarJens Axboe <jaxboe@fusionio.com>
      7eaceacc
    • Jens Axboe's avatar
      block: initial patch for on-stack per-task plugging · 73c10101
      Jens Axboe authored
      
      
      This patch adds support for creating a queuing context outside
      of the queue itself. This enables us to batch up pieces of IO
      before grabbing the block device queue lock and submitting them to
      the IO scheduler.
      
      The context is created on the stack of the process and assigned in
      the task structure, so that we can auto-unplug it if we hit a schedule
      event.
      
      The current queue plugging happens implicitly if IO is submitted to
      an empty device, yet callers have to remember to unplug that IO when
      they are going to wait for it. This is an ugly API and has caused bugs
      in the past. Additionally, it requires hacks in the vm (->sync_page()
      callback) to handle that logic. By switching to an explicit plugging
      scheme we make the API a lot nicer and can get rid of the ->sync_page()
      hack in the vm.
      Signed-off-by: default avatarJens Axboe <jaxboe@fusionio.com>
      73c10101
  22. 02 Mar, 2011 2 commits
    • Tejun Heo's avatar
      block: blk-flush shouldn't call directly into q->request_fn() __blk_run_queue() · 255bb490
      Tejun Heo authored
      blk-flush decomposes a flush into sequence of multiple requests.  On
      completion of a request, the next one is queued; however, block layer
      must not implicitly call into q->request_fn() directly from completion
      path.  This makes the queue behave unexpectedly when seen from the
      drivers and violates the assumption that q->request_fn() is called
      with process context + queue_lock.
      
      This patch makes blk-flush the following two changes to make sure
      q->request_fn() is not called directly from request completion path.
      
      - blk_flush_complete_seq_end_io() now asks __blk_run_queue() to always
        use kblockd instead of calling directly into q->request_fn().
      
      - queue_next_fseq() uses ELEVATOR_INSERT_REQUEUE instead of
        ELEVATOR_INSERT_FRONT so that elv_insert() doesn't try to unplug the
        request queue directly.
      
      Reported by Jan in the following threads.
      
       http://thread.gmane.org/gmane.linux.ide/48778
       http://thread.gmane.org/gmane.linux.ide/48786
      
      
      
      stable: applicable to v2.6.37.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Reported-by: default avatarJan Beulich <JBeulich@novell.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: stable@kernel.org
      Signed-off-by: default avatarJens Axboe <jaxboe@fusionio.com>
      255bb490
    • Tejun Heo's avatar
      block: add @force_kblockd to __blk_run_queue() · 1654e741
      Tejun Heo authored
      
      
      __blk_run_queue() automatically either calls q->request_fn() directly
      or schedules kblockd depending on whether the function is recursed.
      blk-flush implementation needs to be able to explicitly choose
      kblockd.  Add @force_kblockd.
      
      All the current users are converted to specify %false for the
      parameter and this patch doesn't introduce any behavior change.
      
      stable: This is prerequisite for fixing ide oops caused by the new
              blk-flush implementation.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Jan Beulich <JBeulich@novell.com>
      Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
      Cc: stable@kernel.org
      Signed-off-by: default avatarJens Axboe <jaxboe@fusionio.com>
      1654e741
  23. 25 Jan, 2011 2 commits
    • Tejun Heo's avatar
      block: reimplement FLUSH/FUA to support merge · ae1b1539
      Tejun Heo authored
      
      
      The current FLUSH/FUA support has evolved from the implementation
      which had to perform queue draining.  As such, sequencing is done
      queue-wide one flush request after another.  However, with the
      draining requirement gone, there's no reason to keep the queue-wide
      sequential approach.
      
      This patch reimplements FLUSH/FUA support such that each FLUSH/FUA
      request is sequenced individually.  The actual FLUSH execution is
      double buffered and whenever a request wants to execute one for either
      PRE or POSTFLUSH, it queues on the pending queue.  Once certain
      conditions are met, a flush request is issued and on its completion
      all pending requests proceed to the next sequence.
      
      This allows arbitrary merging of different type of flushes.  How they
      are merged can be primarily controlled and tuned by adjusting the
      above said 'conditions' used to determine when to issue the next
      flush.
      
      This is inspired by Darrick's patches to merge multiple zero-data
      flushes which helps workloads with highly concurrent fsync requests.
      
      * As flush requests are never put on the IO scheduler, request fields
        used for flush share space with rq->rb_node.  rq->completion_data is
        moved out of the union.  This increases the request size by one
        pointer.
      
        As rq->elevator_private* are used only by the iosched too, it is
        possible to reduce the request size further.  However, to do that,
        we need to modify request allocation path such that iosched data is
        not allocated for flush requests.
      
      * FLUSH/FUA processing happens on insertion now instead of dispatch.
      
      - Comments updated as per Vivek and Mike.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: "Darrick J. Wong" <djwong@us.ibm.com>
      Cc: Shaohua Li <shli@kernel.org>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Cc: Mike Snitzer <snitzer@redhat.com>
      Signed-off-by: default avatarJens Axboe <jaxboe@fusionio.com>
      ae1b1539
    • Tejun Heo's avatar
      block: add REQ_FLUSH_SEQ · 414b4ff5
      Tejun Heo authored
      
      
      rq == &q->flush_rq was used to determine whether a rq is part of a
      flush sequence, which worked because all requests in a flush sequence
      were sequenced using the single dedicated request.  This is about to
      change, so introduce REQ_FLUSH_SEQ flag to distinguish flush sequence
      requests.
      
      This patch doesn't cause any behavior change.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Signed-off-by: default avatarJens Axboe <jaxboe@fusionio.com>
      414b4ff5
  24. 16 Sep, 2010 1 commit
    • Christoph Hellwig's avatar
      block: remove BLKDEV_IFL_WAIT · dd3932ed
      Christoph Hellwig authored
      
      
      All the blkdev_issue_* helpers can only sanely be used for synchronous
      caller.  To issue cache flushes or barriers asynchronously the caller needs
      to set up a bio by itself with a completion callback to move the asynchronous
      state machine ahead.  So drop the BLKDEV_IFL_WAIT flag that is always
      specified when calling blkdev_issue_* and also remove the now unused flags
      argument to blkdev_issue_flush and blkdev_issue_zeroout.  For
      blkdev_issue_discard we need to keep it for the secure discard flag, which
      gains a more descriptive name and loses the bitops vs flag confusion.
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarJens Axboe <jaxboe@fusionio.com>
      dd3932ed
  25. 10 Sep, 2010 8 commits
    • Tejun Heo's avatar
      block: use REQ_FLUSH in blkdev_issue_flush() · d391a2dd
      Tejun Heo authored
      
      
      Update blkdev_issue_flush() to use new REQ_FLUSH interface.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Christoph Hellwig <hch@infradead.org>
      Signed-off-by: default avatarJens Axboe <jaxboe@fusionio.com>
      d391a2dd
    • Tejun Heo's avatar
      block: make sure FSEQ_DATA request has the same rq_disk as the original · 09d60c70
      Tejun Heo authored
      
      
      rq->rq_disk and bio->bi_bdev->bd_disk may differ if a request has
      passed through remapping drivers.  FSEQ_DATA request incorrectly
      followed bio->bi_bdev->bd_disk ending up being issued w/ mismatching
      rq_disk.  Make it follow orig_rq->rq_disk.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Reported-by: default avatarKiyoshi Ueda <k-ueda@ct.jp.nec.com>
      Tested-by: default avatarKiyoshi Ueda <k-ueda@ct.jp.nec.com>
      Signed-off-by: default avatarJens Axboe <jaxboe@fusionio.com>
      09d60c70
    • Tejun Heo's avatar
      block: kick queue after sequencing REQ_FLUSH/FUA · 47f70d5a
      Tejun Heo authored
      
      
      While completing a request from a REQ_FLUSH/FUA sequence, another
      request can be pushed to the request queue.  If a driver tests
      elv_queue_empty() before completing a request and runs the queue again
      only if the queue wasn't empty, this may lead to hang.  Please note
      that most drivers either kick the queue unconditionally or test queue
      emptiness after completing the current request and don't have this
      problem.
      
      This patch removes this possibility by making REQ_FLUSH/FUA sequence
      code kick the queue if the queue was empty before completing a request
      from REQ_FLUSH/FUA sequence.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Signed-off-by: default avatarJens Axboe <jaxboe@fusionio.com>
      47f70d5a
    • Tejun Heo's avatar
      block: initialize flush request with WRITE_FLUSH instead of REQ_FLUSH · 337238be
      Tejun Heo authored
      
      
      init_flush_request() only set REQ_FLUSH when initializing flush
      requests making them READ requests.  Use WRITE_FLUSH instead.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Reported-by: default avatarMike Snitzer <snitzer@redhat.com>
      Signed-off-by: default avatarJens Axboe <jaxboe@fusionio.com>
      337238be
    • Christoph Hellwig's avatar
      block: simplify queue_next_fseq · cde4c406
      Christoph Hellwig authored
      
      
      We need to call blk_rq_init and elv_insert for all cases in queue_next_fseq,
      so take these calls into common code.  Also move the end_io initialization
      from queue_flush into queue_next_fseq and rename queue_flush to
      init_flush_request now that it's old name doesn't apply anymore.
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Signed-off-by: default avatarJens Axboe <jaxboe@fusionio.com>
      cde4c406
    • Tejun Heo's avatar
      block: implement REQ_FLUSH/FUA based interface for FLUSH/FUA requests · 4fed947c
      Tejun Heo authored
      
      
      Now that the backend conversion is complete, export sequenced
      FLUSH/FUA capability through REQ_FLUSH/FUA flags.  REQ_FLUSH means the
      device cache should be flushed before executing the request.  REQ_FUA
      means that the data in the request should be on non-volatile media on
      completion.
      
      Block layer will choose the correct way of implementing the semantics
      and execute it.  The request may be passed to the device directly if
      the device can handle it; otherwise, it will be sequenced using one or
      more proxy requests.  Devices will never see REQ_FLUSH and/or FUA
      which it doesn't support.
      
      Also, unlike the original REQ_HARDBARRIER, REQ_FLUSH/FUA requests are
      never failed with -EOPNOTSUPP.  If the underlying device doesn't
      support FLUSH/FUA, the block layer simply make those noop.  IOW, it no
      longer distinguishes between writeback cache which doesn't support
      cache flush and writethrough/no cache.  Devices which have WB cache
      w/o flush are very difficult to come by these days and there's nothing
      much we can do anyway, so it doesn't make sense to require everyone to
      implement -EOPNOTSUPP handling.  This will simplify filesystems and
      block drivers as they can drop -EOPNOTSUPP retry logic for barriers.
      
      * QUEUE_ORDERED_* are removed and QUEUE_FSEQ_* are moved into
        blk-flush.c.
      
      * REQ_FLUSH w/o data can also be directly passed to drivers without
        sequencing but some drivers assume that zero length requests don't
        have rq->bio which isn't true for these requests requiring the use
        of proxy requests.
      
      * REQ_COMMON_MASK now includes REQ_FLUSH | REQ_FUA so that they are
        copied from bio to request.
      
      * WRITE_BARRIER is marked deprecated and WRITE_FLUSH, WRITE_FUA and
        WRITE_FLUSH_FUA are added.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Christoph Hellwig <hch@infradead.org>
      Signed-off-by: default avatarJens Axboe <jaxboe@fusionio.com>
      4fed947c
    • Tejun Heo's avatar
      block: rename barrier/ordered to flush · dd4c133f
      Tejun Heo authored
      
      
      With ordering requirements dropped, barrier and ordered are misnomers.
      Now all block layer does is sequencing FLUSH and FUA.  Rename them to
      flush.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Christoph Hellwig <hch@infradead.org>
      Signed-off-by: default avatarJens Axboe <jaxboe@fusionio.com>
      dd4c133f
    • Tejun Heo's avatar
      block: rename blk-barrier.c to blk-flush.c · 8839a0e0
      Tejun Heo authored
      
      
      Without ordering requirements, barrier and ordering are minomers.
      Rename block/blk-barrier.c to block/blk-flush.c.  Rename of symbols
      will follow.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Christoph Hellwig <hch@infradead.org>
      Signed-off-by: default avatarJens Axboe <jaxboe@fusionio.com>
      8839a0e0