1. 21 Nov, 2018 1 commit
  2. 04 Oct, 2018 1 commit
  3. 26 Sep, 2018 1 commit
  4. 05 Jun, 2018 1 commit
  5. 29 Dec, 2017 1 commit
  6. 30 Oct, 2017 1 commit
    • Chris Wilson's avatar
      drm/i915: Hold rcu_read_lock when iterating over the radixtree (objects) · 23e87338
      Chris Wilson authored
      
      
      Kasan spotted
      
          [IGT] gem_tiled_pread_pwrite: exiting, ret=0
          ==================================================================
          BUG: KASAN: use-after-free in __i915_gem_object_reset_page_iter+0x15c/0x170 [i915]
          Read of size 8 at addr ffff8801359da310 by task kworker/3:2/182
      
          CPU: 3 PID: 182 Comm: kworker/3:2 Tainted: G     U          4.14.0-rc6-CI-Custom_3340+ #1
          Hardware name: Intel Corp. Geminilake/GLK RVP1 DDR4 (05), BIOS GELKRVPA.X64.0062.B30.1708222146 08/22/2017
          Workqueue: events __i915_gem_free_work [i915]
          Call Trace:
           dump_stack+0x68/0xa0
           print_address_description+0x78/0x290
           ? __i915_gem_object_reset_page_iter+0x15c/0x170 [i915]
           kasan_report+0x23d/0x350
           __asan_report_load8_noabort+0x19/0x20
           __i915_gem_object_reset_page_iter+0x15c/0x170 [i915]
           ? i915_gem_object_truncate+0x100/0x100 [i915]
           ? lock_acquire+0x380/0x380
           __i915_gem_object_put_pages+0x30d/0x530 [i915]
           __i915_gem_free_objects+0x551/0xbd0 [i915]
           ? lock_acquire+0x13e/0x380
           __i915_gem_free_work+0x4e/0x70 [i915]
           process_one_work+0x6f6/0x1590
           ? pwq_dec_nr_in_flight+0x2b0/0x2b0
           worker_thread+0xe6/0xe90
           ? pci_mmcfg_check_reserved+0x110/0x110
           kthread+0x309/0x410
           ? process_one_work+0x1590/0x1590
           ? kthread_create_on_node+0xb0/0xb0
           ret_from_fork+0x27/0x40
      
          Allocated by task 1801:
           save_stack_trace+0x1b/0x20
           kasan_kmalloc+0xee/0x190
           kasan_slab_alloc+0x12/0x20
           kmem_cache_alloc+0xdc/0x2e0
           radix_tree_node_alloc.constprop.12+0x48/0x330
           __radix_tree_create+0x274/0x480
           __radix_tree_insert+0xa2/0x610
           i915_gem_object_get_sg+0x224/0x670 [i915]
           i915_gem_object_get_page+0xb5/0x1c0 [i915]
           i915_gem_pread_ioctl+0x822/0xf60 [i915]
           drm_ioctl_kernel+0x13f/0x1c0
           drm_ioctl+0x6cf/0x980
           do_vfs_ioctl+0x184/0xf30
           SyS_ioctl+0x41/0x70
           entry_SYSCALL_64_fastpath+0x1c/0xb1
      
          Freed by task 37:
           save_stack_trace+0x1b/0x20
           kasan_slab_free+0xaf/0x190
           kmem_cache_free+0xbf/0x340
           radix_tree_node_rcu_free+0x79/0x90
           rcu_process_callbacks+0x46d/0xf40
           __do_softirq+0x21c/0x8d3
      
          The buggy address belongs to the object at ffff8801359da0f0
          which belongs to the cache radix_tree_node of size 576
          The buggy address is located 544 bytes inside of
          576-byte region [ffff8801359da0f0, ffff8801359da330)
          The buggy address belongs to the page:
          page:ffffea0004d67600 count:1 mapcount:0 mapping:          (null) index:0x0 compound_mapcount: 0
          flags: 0x8000000000008100(slab|head)
          raw: 8000000000008100 0000000000000000 0000000000000000 0000000100110011
          raw: ffffea0004b52920 ffffea0004b38020 ffff88015b416a80 0000000000000000
          page dumped because: kasan: bad access detected
      
          Memory state around the buggy address:
           ffff8801359da200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
           ffff8801359da280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
          >ffff8801359da300: fb fb fb fb fb fb fc fc fc fc fc fc fc fc fc fc
      			     ^
           ffff8801359da380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
           ffff8801359da400: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
          ==================================================================
          Disabling lock debugging due to kernel taint
      
      which looks like the slab containing the radixtree iter was freed as we
      traversed the tree, taking the rcu read lock across the loop should
      prevent that (deferring all the frees until the end).
      Reported-by: default avatarTomi Sarvela <tomi.p.sarvela@intel.com>
      Fixes: 96d77634
      
       ("drm/i915: Use a radixtree for random access to the object's backing storage")
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20171026130032.10677-1-chris@chris-wilson.co.uk
      
      Reviewed-by: default avatarMatthew Auld <matthew.william.auld@gmail.com>
      (cherry picked from commit bea6e987
      
      )
      Signed-off-by: default avatarRodrigo Vivi <rodrigo.vivi@intel.com>
      23e87338
  7. 18 Oct, 2017 1 commit
  8. 09 Oct, 2017 1 commit
  9. 14 Sep, 2017 1 commit
    • Michal Hocko's avatar
      mm: treewide: remove GFP_TEMPORARY allocation flag · 0ee931c4
      Michal Hocko authored
      GFP_TEMPORARY was introduced by commit e12ba74d ("Group short-lived
      and reclaimable kernel allocations") along with __GFP_RECLAIMABLE.  It's
      primary motivation was to allow users to tell that an allocation is
      short lived and so the allocator can try to place such allocations close
      together and prevent long term fragmentation.  As much as this sounds
      like a reasonable semantic it becomes much less clear when to use the
      highlevel GFP_TEMPORARY allocation flag.  How long is temporary? Can the
      context holding that memory sleep? Can it take locks? It seems there is
      no good answer for those questions.
      
      The current implementation of GFP_TEMPORARY is basically GFP_KERNEL |
      __GFP_RECLAIMABLE which in itself is tricky because basically none of
      the existing caller provide a way to reclaim the allocated memory.  So
      this is rather misleading and hard to evaluate for any benefits.
      
      I have checked some random users and none of them has added the flag
      with a specific justification.  I suspect most of them just copied from
      other existing users and others just thought it might be a good idea to
      use without any measuring.  This suggests that GFP_TEMPORARY just
      motivates for cargo cult usage without any reasoning.
      
      I believe that our gfp flags are quite complex already and especially
      those with highlevel semantic should be clearly defined to prevent from
      confusion and abuse.  Therefore I propose dropping GFP_TEMPORARY and
      replace all existing users to simply use GFP_KERNEL.  Please note that
      SLAB users with shrinkers will still get __GFP_RECLAIMABLE heuristic and
      so they will be placed properly for memory fragmentation prevention.
      
      I can see reasons we might want some gfp flag to reflect shorterm
      allocations but I propose starting from a clear semantic definition and
      only then add users with proper justification.
      
      This was been brought up before LSF this year by Matthew [1] and it
      turned out that GFP_TEMPORARY really doesn't have a clear semantic.  It
      seems to be a heuristic without any measured advantage for most (if not
      all) its current users.  The follow up discussion has revealed that
      opinions on what might be temporary allocation differ a lot between
      developers.  So rather than trying to tweak existing users into a
      semantic which they haven't expected I propose to simply remove the flag
      and start from scratch if we really need a semantic for short term
      allocations.
      
      [1] http://lkml.kernel.org/r/20170118054945.GD18349@bombadil.infradead.org
      
      [akpm@linux-foundation.org: fix typo]
      [akpm@linux-foundation.org: coding-style fixes]
      [sfr@canb.auug.org.au: drm/i915: fix up]
        Link: http://lkml.kernel.org/r/20170816144703.378d4f4d@canb.auug.org.au
      Link: http://lkml.kernel.org/r/20170728091904.14627-1-mhocko@kernel.org
      
      Signed-off-by: default avatarMichal Hocko <mhocko@suse.com>
      Signed-off-by: default avatarStephen Rothwell <sfr@canb.auug.org.au>
      Acked-by: default avatarMel Gorman <mgorman@suse.de>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Neil Brown <neilb@suse.de>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      0ee931c4
  10. 07 Sep, 2017 1 commit
    • Chris Wilson's avatar
      drm/i915: wire up shrinkctl->nr_scanned · 912d572d
      Chris Wilson authored
      shrink_slab() allows us to report back the number of objects we
      successfully scanned (out of the target shrinkctl->nr_to_scan).  As
      report the number of pages owned by each GEM object as a separate item
      to the shrinker, we cannot precisely control the number of shrinker
      objects we scan on each pass; and indeed may free more than requested.
      If we fail to tell the shrinker about the number of objects we process,
      it will continue to hold a grudge against us as any objects left
      unscanned are added to the next reclaim -- and so we will keep on
      "unfairly" shrinking our own slab in comparison to other slabs.
      
      Link: http://lkml.kernel.org/r/20170822135325.9191-2-chris@chris-wilson.co.uk
      
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Hillf Danton <hillf.zj@alibaba-inc.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Shaohua Li <shli@fb.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      912d572d
  11. 05 Sep, 2017 1 commit
  12. 30 Aug, 2017 3 commits
  13. 18 Aug, 2017 1 commit
    • Chris Wilson's avatar
      drm/i915: Replace execbuf vma ht with an idr · d1b48c1e
      Chris Wilson authored
      
      
      This was the competing idea long ago, but it was only with the rewrite
      of the idr as an radixtree and using the radixtree directly ourselves,
      along with the realisation that we can store the vma directly in the
      radixtree and only need a list for the reverse mapping, that made the
      patch performant enough to displace using a hashtable. Though the vma ht
      is fast and doesn't require any extra allocation (as we can embed the node
      inside the vma), it does require a thread for resizing and serialization
      and will have the occasional slow lookup. That is hairy enough to
      investigate alternatives and favour them if equivalent in peak performance.
      One advantage of allocating an indirection entry is that we can support a
      single shared bo between many clients, something that was done on a
      first-come first-serve basis for shared GGTT vma previously. To offset
      the extra allocations, we create yet another kmem_cache for them.
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: default avatarTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20170816085210.4199-5-chris@chris-wilson.co.uk
      d1b48c1e
  14. 15 Aug, 2017 2 commits
  15. 27 Jul, 2017 9 commits
  16. 20 Jul, 2017 1 commit
  17. 12 Jul, 2017 1 commit
    • Michal Hocko's avatar
      drm/i915: use __GFP_RETRY_MAYFAIL · dbb32956
      Michal Hocko authored
      Commit 24f8e00a ("drm/i915: Prefer to report ENOMEM rather than
      incur the oom for gfx allocations") has tried to remove disruptive OOM
      killer because the userspace should be able to cope with allocation
      failures.
      
      At the time only __GFP_NORETRY could achieve that and it turned out that
      this would fail the allocations just too easily.  So "drm/i915: Remove
      __GFP_NORETRY from our buffer allocator" removed it and hoped for a
      better solution.  __GFP_RETRY_MAYFAIL is that solution.  It will keep
      retrying the allocation until there is no more progress and we would go
      OOM.  Instead we fail the allocation and let the caller to deal with it.
      
      Link: http://lkml.kernel.org/r/20170623085345.11304-6-mhocko@kernel.org
      
      Signed-off-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Cc: Alex Belits <alex.belits@cavium.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Darrick J. Wong <darrick.wong@oracle.com>
      Cc: David Daney <david.daney@cavium.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: NeilBrown <neilb@suse.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      dbb32956
  18. 28 Jun, 2017 2 commits
  19. 23 Jun, 2017 1 commit
    • Chris Wilson's avatar
      drm/i915: Break modeset deadlocks on reset · 36703e79
      Chris Wilson authored
      Trying to do a modeset from within a reset is fraught with danger. We
      can fall into a cyclic deadlock where the modeset is waiting on a
      previous modeset that is waiting on a request, and since the GPU hung
      that request completion is waiting on the reset. As modesetting doesn't
      allow its locks to be broken and restarted, or for its *own* reset
      mechanism to take over the display, we have to do something very
      evil instead. If we detect that we are stuck waiting to prepare the
      display reset (by using a very simple timeout), resort to cancelling all
      in-flight requests and throwing the user data into /dev/null, which is
      marginally better than the driver locking up and keeping that data to
      itself.
      
      This is not a fix; this is just a workaround that unbreaks machines
      until we can resolve the deadlock in a way that doesn't lose data!
      
      v2: Move the retirement from set-wegded to the i915_reset() error path,
      after which we no longer any delayed worker cleanup for
      i915_handle_error()
      v3: C abuse for syntactic sugar
      v4: Cover all waits with the timeout to catch more driver breakage
      
      References: https://bugs.freedesktop.org/show_bug.cgi?id=99093
      
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
      Cc: Mika Kuoppala <mika.kuoppala@intel.com>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: http://patchwork.freedesktop.org/patch/msgid/20170622105625.16952-1-chris@chris-wilson.co.uk
      
      Reviewed-by: default avatarTvrtko Ursulin <tvrtko.ursulin@intel.com>
      36703e79
  20. 21 Jun, 2017 1 commit
  21. 20 Jun, 2017 3 commits
  22. 19 Jun, 2017 2 commits
    • Chris Wilson's avatar
      drm/i915: Remove __GFP_NORETRY from our buffer allocator · ce2c5872
      Chris Wilson authored
      I tried __GFP_NORETRY in the belief that __GFP_RECLAIM was effective. It
      struggles with handling reclaim of our dirty buffers and relies on
      reclaim via kswapd. As a result, a single pass of direct reclaim is
      unreliable when i915 occupies the majority of available memory, and the
      only means of effectively waiting on kswapd to amke progress is by not
      setting the __GFP_NORETRY flag and lopping. That leaves us with the
      dilemma of invoking the oomkiller instead of propagating the allocation
      failure back to userspace where it can be handled more gracefully (one
      hopes).  In the future we may have __GFP_MAYFAIL to allow repeats up until
      we genuinely run out of memory and the oomkiller would have been invoked.
      Until then, let the oomkiller wreck havoc.
      
      v2: Stop playing with side-effects of gfp flags and await __GFP_MAYFAIL
      v3: Update comments that direct reclaim only appears to be ignoring our
      dirty buffers!
      
      Fixes: 24f8e00a
      
       ("drm/i915: Prefer to report ENOMEM rather than incur the oom for gfx allocations")
      Testcase: igt/gem_tiled_swapping
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
      Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
      Cc: Michal Hocko <mhocko@suse.com>
      Link: http://patchwork.freedesktop.org/patch/msgid/20170609110350.1767-2-chris@chris-wilson.co.uk
      
      Reviewed-by: default avatarJoonas Lahtinen <joonas.lahtinen@linux.intel.com>
      (cherry picked from commit eaf41801
      
      )
      Signed-off-by: default avatarJani Nikula <jani.nikula@intel.com>
      ce2c5872
    • Chris Wilson's avatar
      drm/i915: Encourage our shrinker more when our shmemfs allocations fails · b8d5a9cc
      Chris Wilson authored
      Commit 24f8e00a ("drm/i915: Prefer to report ENOMEM rather than
      incur the oom for gfx allocations") made the bold decision to try and
      avoid the oomkiller by reporting -ENOMEM to userspace if our allocation
      failed after attempting to free enough buffer objects. In short, it
      appears we were giving up too easily (even before we start wondering if
      one pass of reclaim is as strong as we would like). Part of the problem
      is that if we only shrink just enough pages for our expected allocation,
      the likelihood of those pages becoming available to us is less than 100%
      To counter-act that we ask for twice the number of pages to be made
      available. Furthermore, we allow the shrinker to pull pages from the
      active list in later passes.
      
      v2: Be a little more cautious in paging out gfx buffers, and leave that
      to a more balanced approach from shrink_slab(). Important when combined
      with "drm/i915: Start writeback from the shrinker" as anything shrunk is
      immediately swapped out and so should be more conservative.
      
      Fixes: 24f8e00a
      
       ("drm/i915: Prefer to report ENOMEM rather than incur the oom for gfx allocations")
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
      Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
      Reviewed-by: default avatarJoonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Link: http://patchwork.freedesktop.org/patch/msgid/20170609110350.1767-1-chris@chris-wilson.co.uk
      (cherry picked from commit 4846bf0c
      
      )
      Signed-off-by: default avatarJani Nikula <jani.nikula@intel.com>
      b8d5a9cc
  23. 16 Jun, 2017 3 commits
    • Chris Wilson's avatar
      drm/i915: Async GPU relocation processing · 7dd4f672
      Chris Wilson authored
      
      
      If the user requires patching of their batch or auxiliary buffers, we
      currently make the alterations on the cpu. If they are active on the GPU
      at the time, we wait under the struct_mutex for them to finish executing
      before we rewrite the contents. This happens if shared relocation trees
      are used between different contexts with separate address space (and the
      buffers then have different addresses in each), the 3D state will need
      to be adjusted between execution on each context. However, we don't need
      to use the CPU to do the relocation patching, as we could queue commands
      to the GPU to perform it and use fences to serialise the operation with
      the current activity and future - so the operation on the GPU appears
      just as atomic as performing it immediately. Performing the relocation
      rewrites on the GPU is not free, in terms of pure throughput, the number
      of relocations/s is about halved - but more importantly so is the time
      under the struct_mutex.
      
      v2: Break out the request/batch allocation for clearer error flow.
      v3: A few asserts to ensure rq ordering is maintained
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: default avatarJoonas Lahtinen <joonas.lahtinen@linux.intel.com>
      7dd4f672
    • Chris Wilson's avatar
      drm/i915: Wait upon userptr get-user-pages within execbuffer · 8a2421bd
      Chris Wilson authored
      
      
      This simply hides the EAGAIN caused by userptr when userspace causes
      resource contention. However, it is quite beneficial with highly
      contended userptr users as we avoid repeating the setup costs and
      kernel-user context switches.
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: default avatarMichał Winiarski <michal.winiarski@intel.com>
      8a2421bd
    • Chris Wilson's avatar
      drm/i915: Store a direct lookup from object handle to vma · 4ff4b44c
      Chris Wilson authored
      
      
      The advent of full-ppgtt lead to an extra indirection between the object
      and its binding. That extra indirection has a noticeable impact on how
      fast we can convert from the user handles to our internal vma for
      execbuffer. In order to bypass the extra indirection, we use a
      resizable hashtable to jump from the object to the per-ctx vma.
      rhashtable was considered but we don't need the online resizing feature
      and the extra complexity proved to undermine its usefulness. Instead, we
      simply reallocate the hastable on demand in a background task and
      serialize it before iterating.
      
      In non-full-ppgtt modes, multiple files and multiple contexts can share
      the same vma. This leads to having multiple possible handle->vma links,
      so we only use the first to establish the fast path. The majority of
      buffers are not shared and so we should still be able to realise
      speedups with multiple clients.
      
      v2: Prettier names, more magic.
      v3: Many style tweaks, most notably hiding the misuse of execobj[].rsvd2
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: default avatarJoonas Lahtinen <joonas.lahtinen@linux.intel.com>
      4ff4b44c