    Jens Axboe
      block: rename bio bi_rw to bi_opf · 1eff9d32
      Jens Axboe authored
      Since commit 63a4cc24
      , bio->bi_rw contains flags in the lower
      portion and the op code in the higher portions. This means that
      old code that relies on manually setting bi_rw is most likely
      going to be broken. Instead of letting that brokeness linger,
      rename the member, to force old and out-of-tree code to break
      at compile time instead of at runtime.
      No intended functional changes in this commit.
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
    Jens Axboe
      block/mm: make bdev_ops->rw_page() take a bool for read/write · c11f0c0b
      Jens Axboe authored
      Commit abf54548
       changed it from an 'rw' flags type to the
      newer ops based interface, but now we're effectively leaking
      some bdev internals to the rest of the kernel. Since we only
      care about whether it's a read or a write at that level, just
      pass in a bool 'is_write' parameter instead.
      Then we can also move op_is_write() and friends back under
      CONFIG_BLOCK protection.
      Reviewed-by: default avatarMike Christie <mchristi@redhat.com>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
    Jens Axboe
      block: shrink struct bio down to 2 cache lines again · 2c68f6dc
      Jens Axboe authored
      Commit bcf2843b3f8f added ->bi_error to cleanup the error passing
      for struct bio, but that ended up adding 4 bytes and a 4 byte hole
      to the size of struct bio. For a clean config, that bumped it from
      128 bytes, to 136 bytes, on x86-64.
      The ->bi_flags member is currently an unsigned long, but it fits
      easily within an int. Change it to an unsigned int, adjust the
      the pool offset code, and move ->bi_error into the new hole. Then
      we end up with a 128 byte bio again.
      Change the bio flag set/clear to use cmpxchg to ensure we don't
      lose any flags when manipulating them.
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
    Jens Axboe
      block: manipulate bio->bi_flags through helpers · b7c44ed9
      Jens Axboe authored
      Some places use helpers now, others don't. We only have the 'is set'
      helper, add helpers for setting and clearing flags too.
      It was a bit of a mess of atomic vs non-atomic access. With
      BIO_UPTODATE gone, we don't have any risk of concurrent access to the
      flags. So relax the restriction and don't make any of them atomic. The
      flags that do have serialization issues (reffed and chained), we
      already handle those separately.
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
    Christoph Hellwig
      block: add a bi_error field to struct bio · 4246a0b6
      Christoph Hellwig authored
      Currently we have two different ways to signal an I/O error on a BIO:
       (1) by clearing the BIO_UPTODATE flag
       (2) by returning a Linux errno value to the bi_end_io callback
      The first one has the drawback of only communicating a single possible
      error (-EIO), and the second one has the drawback of not beeing persistent
      when bios are queued up, and are not passed along from child to parent
      bio in the ever more popular chaining scenario.  Having both mechanisms
      available has the additional drawback of utterly confusing driver authors
      and introducing bugs where various I/O submitters only deal with one of
      them, and the others have to add boilerplate code to deal with both kinds
      of error returns.
      So add a new bi_error field to store an errno value directly in struct
      bio and remove the existing mechanisms to clean all this up.
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarHannes Reinecke <hare@suse.de>
      Reviewed-by: default avatarNeilBrown <neilb@suse.com>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
    Christoph Hellwig
      block, dm: don't copy bios for request clones · 5f1b670d
      Christoph Hellwig authored
      Currently dm-multipath has to clone the bios for every request sent
      to the lower devices, which wastes cpu cycles and ties down memory.
      This patch instead adds a new REQ_CLONE flag that instructs req_bio_endio
      to not complete bios attached to a request, which we set on clone
      requests similar to bios in a flush sequence.  With this change I/O
      errors on a path failure only get propagated to dm-multipath, which
      can then either resubmit the I/O or complete the bios on the original
      I've done some basic testing of this on a Linux target with ALUA support,
      and it survives path failures during I/O nicely.
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
    Jens Axboe
      bio: skip atomic inc/dec of ->bi_cnt for most use cases · dac56212
      Jens Axboe authored
      Struct bio has a reference count that controls when it can be freed.
      Most uses cases is allocating the bio, which then returns with a
      single reference to it, doing IO, and then dropping that single
      reference. We can remove this atomic_dec_and_test() in the completion
      path, if nobody else is holding a reference to the bio.
      If someone does call bio_get() on the bio, then we flag the bio as
      now having valid count and that we must properly honor the reference
      count when it's being put.
      Tested-by: default avatarRobert Elliott <elliott@hp.com>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
    Jens Axboe
      bio: skip atomic inc/dec of ->bi_remaining for non-chains · c4cf5261
      Jens Axboe authored
      Struct bio has an atomic ref count for chained bio's, and we use this
      to know when to end IO on the bio. However, most bio's are not chained,
      so we don't need to always introduce this atomic operation as part of
      ending IO.
      Add a helper to elevate the bi_remaining count, and flag the bio as
      now actually needing the decrement at end_io time. Rename the field
      to __bi_remaining to catch any current users of this doing the
      incrementing manually.
      For high IOPS workloads, this reduces the overhead of bio_endio()
      Tested-by: default avatarRobert Elliott <elliott@hp.com>
      Acked-by: default avatarKent Overstreet <kent.overstreet@gmail.com>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
    Shaohua Li
      blk-mq: fix FUA request hang · b2387ddc
      Shaohua Li authored
      When a FUA request enters its DATA stage of flush pipeline, the
      request is added to mq requeue list, the request will then be added to
      ctx->rq_list. blk_mq_attempt_merge() might merge the request with a bio.
      Later when the request is finished the flush pipeline, the
      request->__data_len is 0. Then I only saw the bio gets endio called, the
      original request never finish.
      Adding REQ_FLUSH_SEQ into REQ_NOMERGE_FLAGS looks an easy fix.
      stable: 3.15+
      Signed-off-by: default avatarShaohua Li <shli@fb.com>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
    Bart Van Assche
      Defer processing of REQ_PREEMPT requests for blocked devices · bba0bdd7
      Bart Van Assche authored
      SCSI transport drivers and SCSI LLDs block a SCSI device if the
      transport layer is not operational. This means that in this state
      no requests should be processed, even if the REQ_PREEMPT flag has
      been set. This patch avoids that a rescan shortly after a cable
      pull sporadically triggers the following kernel oops:
      BUG: unable to handle kernel paging request at ffffc9001a6bc084
      IP: [<ffffffffa04e08f2>] mlx4_ib_post_send+0xd2/0xb30 [mlx4_ib]
      Process rescan-scsi-bus (pid: 9241, threadinfo ffff88053484a000, task ffff880534aae100)
      Call Trace:
       [<ffffffffa0718135>] srp_post_send+0x65/0x70 [ib_srp]
       [<ffffffffa071b9df>] srp_queuecommand+0x1cf/0x3e0 [ib_srp]
       [<ffffffffa0001ff1>] scsi_dispatch_cmd+0x101/0x280 [scsi_mod]
       [<ffffffffa0009ad1>] scsi_request_fn+0x411/0x4d0 [scsi_mod]
       [<ffffffff81223b37>] __blk_run_queue+0x27/0x30
       [<ffffffff8122a8d2>] blk_execute_rq_nowait+0x82/0x110
       [<ffffffff8122a9c2>] blk_execute_rq+0x62/0xf0
       [<ffffffffa000b0e8>] scsi_execute+0xe8/0x190 [scsi_mod]
       [<ffffffffa000b2f3>] scsi_execute_req+0xa3/0x130 [scsi_mod]
       [<ffffffffa000c1aa>] scsi_probe_lun+0x17a/0x450 [scsi_mod]
       [<ffffffffa000ce86>] scsi_probe_and_add_lun+0x156/0x480 [scsi_mod]
       [<ffffffffa000dc2f>] __scsi_scan_target+0xdf/0x1f0 [scsi_mod]
       [<ffffffffa000dfa3>] scsi_scan_host_selected+0x183/0x1c0 [scsi_mod]
       [<ffffffffa000edfb>] scsi_scan+0xdb/0xe0 [scsi_mod]
       [<ffffffffa000ee13>] store_scan+0x13/0x20 [scsi_mod]
       [<ffffffff811c8d9b>] sysfs_write_file+0xcb/0x160
       [<ffffffff811589de>] vfs_write+0xce/0x140
       [<ffffffff81158b53>] sys_write+0x53/0xa0
       [<ffffffff81464592>] system_call_fastpath+0x16/0x1b
       [<00007f611c9d9300>] 0x7f611c9d92ff
      Reported-by: default avatarMax Gurtuvoy <maxg@mellanox.com>
      Signed-off-by: default avatarBart Van Assche <bart.vanassche@sandisk.com>
      Reviewed-by: default avatarMike Christie <michaelc@cs.wisc.edu>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarJames Bottomley <JBottomley@Odin.com>
