Skip to content
  • Jan Kara's avatar
    cfq: Disable writeback throttling by default · 142bbdfc
    Jan Kara authored
    
    
    Writeback throttling does not play well with CFQ since that also tries
    to throttle async writes. As a result async writeback can get starved in
    presence of readers. As an example take a benchmark simulating
    postgreSQL database running over a standard rotating SATA drive. There
    are 16 processes doing random reads from a huge file (2*machine memory),
    1 process doing random writes to the huge file and calling fsync once
    per 50000 writes and 1 process doing sequential 8k writes to a
    relatively small file wrapping around at the end of the file and calling
    fsync every 5 writes. Under this load read latency easily exceeds the
    target latency of 75 ms (just because there are so many reads happening
    against a relatively slow disk) and thus writeback is throttled to a
    point where only 1 write request is allowed at a time. Blktrace data
    then looks like:
    
      8,0    1        0     8.347751764     0  m   N cfq workload slice:40000000
      8,0    1        0     8.347755256     0  m   N cfq293A  / set_active wl_class: 0 wl_type:0
      8,0    1        0     8.347784100     0  m   N cfq293A  / Not idling.  st->count:1
      8,0    1     3814     8.347763916  5839 UT   N [kworker/u9:2] 1
      8,0    0        0     8.347777605     0  m   N cfq293A  / Not idling.  st->count:1
      8,0    1        0     8.347784100     0  m   N cfq293A  / Not idling.  st->count:1
      8,0    3     1596     8.354364057     0  C   R 156109528 + 8 (6906954) [0]
      8,0    3        0     8.354383193     0  m   N cfq6196SN / complete rqnoidle 0
      8,0    3        0     8.354386476     0  m   N cfq schedule dispatch
      8,0    3        0     8.354399397     0  m   N cfq293A  / Not idling.  st->count:1
      8,0    3        0     8.354404705     0  m   N cfq293A  / dispatch_insert
      8,0    3        0     8.354409454     0  m   N cfq293A  / dispatched a request
      8,0    3        0     8.354412527     0  m   N cfq293A  / activate rq, drv=1
      8,0    3     1597     8.354414692     0  D   W 145961400 + 24 (6718452) [swapper/0]
      8,0    3        0     8.354484184     0  m   N cfq293A  / Not idling.  st->count:1
      8,0    3        0     8.354487536     0  m   N cfq293A  / slice expired t=0
      8,0    3        0     8.354498013     0  m   N / served: vt=5888102466265088 min_vt=5888074869387264
      8,0    3        0     8.354502692     0  m   N cfq293A  / sl_used=6737519 disp=1 charge=6737519 iops=0 sect=24
      8,0    3        0     8.354505695     0  m   N cfq293A  / del_from_rr
    ...
      8,0    0     1810     8.354728768     0  C   W 145961400 + 24 (314076) [0]
      8,0    0        0     8.354746927     0  m   N cfq293A  / complete rqnoidle 0
    ...
      8,0    1     3829     8.389886102  5839  G   W 145962968 + 24 [kworker/u9:2]
      8,0    1     3830     8.389888127  5839  P   N [kworker/u9:2]
      8,0    1     3831     8.389908102  5839  A   W 145978336 + 24 <- (8,4) 44000
      8,0    1     3832     8.389910477  5839  Q   W 145978336 + 24 [kworker/u9:2]
      8,0    1     3833     8.389914248  5839  I   W 145962968 + 24 (28146) [kworker/u9:2]
      8,0    1        0     8.389919137     0  m   N cfq293A  / insert_request
      8,0    1        0     8.389924305     0  m   N cfq293A  / add_to_rr
      8,0    1     3834     8.389933175  5839 UT   N [kworker/u9:2] 1
    ...
      8,0    0        0     9.455290997     0  m   N cfq workload slice:40000000
      8,0    0        0     9.455294769     0  m   N cfq293A  / set_active wl_class:0 wl_type:0
      8,0    0        0     9.455303499     0  m   N cfq293A  / fifo=ffff880003166090
      8,0    0        0     9.455306851     0  m   N cfq293A  / dispatch_insert
      8,0    0        0     9.455311251     0  m   N cfq293A  / dispatched a request
      8,0    0        0     9.455314324     0  m   N cfq293A  / activate rq, drv=1
      8,0    0     2043     9.455316210  6204  D   W 145962968 + 24 (1065401962) [pgioperf]
      8,0    0        0     9.455392407     0  m   N cfq293A  / Not idling.  st->count:1
      8,0    0        0     9.455395969     0  m   N cfq293A  / slice expired t=0
      8,0    0        0     9.455404210     0  m   N / served: vt=5888958194597888 min_vt=5888941810597888
      8,0    0        0     9.455410077     0  m   N cfq293A  / sl_used=4000000 disp=1 charge=4000000 iops=0 sect=24
      8,0    0        0     9.455416851     0  m   N cfq293A  / del_from_rr
    ...
      8,0    0     2045     9.455648515     0  C   W 145962968 + 24 (332305) [0]
      8,0    0        0     9.455668350     0  m   N cfq293A  / complete rqnoidle 0
    ...
      8,0    1     4371     9.455710115  5839  G   W 145978336 + 24 [kworker/u9:2]
      8,0    1     4372     9.455712350  5839  P   N [kworker/u9:2]
      8,0    1     4373     9.455730159  5839  A   W 145986616 + 24 <- (8,4) 52280
      8,0    1     4374     9.455732674  5839  Q   W 145986616 + 24 [kworker/u9:2]
      8,0    1     4375     9.455737563  5839  I   W 145978336 + 24 (27448) [kworker/u9:2]
      8,0    1        0     9.455742871     0  m   N cfq293A  / insert_request
      8,0    1        0     9.455747550     0  m   N cfq293A  / add_to_rr
      8,0    1     4376     9.455756629  5839 UT   N [kworker/u9:2] 1
    
    So we can see a Q event for a write request, then IO is blocked by
    writeback throttling and G and I events for the request happen only once
    other writeback IO is completed. Thus CFQ always sees only one write
    request. When it sees it, it queues the async queue behind all the read
    queues and the async queue gets scheduled after about one second. When
    it is scheduled, that one request gets dispatched and async queue is
    expired as it has no more requests to submit. Overall we submit about
    one write request per second.
    
    Although this scheduling is beneficial for read latency, writes are
    heavily starved and this causes large delays all over the system (due to
    processes blocking on page lock, transaction starts, etc.). When
    writeback throttling is disabled, write throughput is about one fifth of
    a read throughput which roughly matches readers/writers ratio and
    overall the system stalls are much shorter.
    
    Mixing writeback throttling logic with CFQ throttling logic is always a
    recipe for surprises as CFQ assumes it sees the big part of the picture
    which is not necessarily true when writeback throttling is blocking
    requests. So disable writeback throttling logic by default when CFQ is
    used as an IO scheduler.
    
    Signed-off-by: default avatarJan Kara <jack@suse.cz>
    Signed-off-by: default avatarJens Axboe <axboe@fb.com>
    142bbdfc