1. 06 Nov, 2022 2 commits
  2. 05 Nov, 2022 38 commits
    • Philippe Gerum's avatar
    • Philippe Gerum's avatar
      evl/factory: use refcount_t for tracking references to elements · b57305b9
      Philippe Gerum authored
      This makes the logic clearer and less expensive.
      Signed-off-by: Philippe Gerum's avatarPhilippe Gerum <rpm@xenomai.org>
    • Philippe Gerum's avatar
      evl/monitor: fix racy access to wchan in targeted signaling · 33c89468
      Philippe Gerum authored
      We may not retrieve the wait channel a thread pends on then base a
      decision to signal it on that information, all locklessly: this is
      racy. We must pin the wait channel while we do this.
      Signed-off-by: Philippe Gerum's avatarPhilippe Gerum <rpm@xenomai.org>
    • Philippe Gerum's avatar
      x86: dovetail: mark invalid_op entry · 7ae31365
      Philippe Gerum authored
      Signed-off-by: Philippe Gerum's avatarPhilippe Gerum <rpm@xenomai.org>
    • Philippe Gerum's avatar
      evl/mutex: reschedule on failed locking attempt · 70bcf9f8
      Philippe Gerum authored
      Since the scheduler state might change as a result of undoing the PI
      walk after a failed blocking attempt to lock a mutex, we have to call
      evl_schedule() at the first opportunity before returning on error from
      Signed-off-by: Philippe Gerum's avatarPhilippe Gerum <rpm@xenomai.org>
    • Philippe Gerum's avatar
    • Philippe Gerum's avatar
      evl/mutex: undo PI walk on wait error · 49d5172d
      Philippe Gerum authored
      On timeout/break error waiting for __evl_unlock_mutex() to wake us up,
      we have to undo the effects of the last PI walk, adjusting the
      priority of waiters in the chain according to the current boost
      priority of the mutex.
      Failing to do so would leave members of that PI chain indefinitely
      boosted after the thread has stopped waiting for the mutex.
      Signed-off-by: Philippe Gerum's avatarPhilippe Gerum <rpm@xenomai.org>
    • Philippe Gerum's avatar
      evl/wait: lockdep: set lock debugging dependency on CONFIG_LOCKDEP · a254f180
      Philippe Gerum authored
      CONFIG_PROVE_LOCKING depends on the lockdep infrastructure, the hard
      lock debugging feature directly depends on the latter, so do we when
      it comes to the wait/mutex support.
      Signed-off-by: Philippe Gerum's avatarPhilippe Gerum <rpm@xenomai.org>
    • Philippe Gerum's avatar
      evl/wait: drop legacy reordering interface · f1aae605
      Philippe Gerum authored
      This patch completes the series fixing the ABBA issue which plagued
      the mutex support code, by removing the last bits of the (broken)
      legacy infrastructure which exhibited this bug.
      Signed-off-by: Philippe Gerum's avatarPhilippe Gerum <rpm@xenomai.org>
    • Philippe Gerum's avatar
      evl/thread: add deferred wchan requeuing · 93321058
      Philippe Gerum authored
      Changing the priority and/or scheduling policy a thread undergoes
      requires to requeue it in the wait channel it may be pending on at the
      time of that change. In turn, such requeuing may require to walk the
      PI chain. Meanwhile, we have to abide by the common locking order
      which is stated as:
      Any other order would certainly trigger an ABBA locking issue. For
      this reason, we cannot have evl_set_thread_schedparam_locked() perform
      such requeuing immediately, only someone else in the call stack may do
      so later when the target thread and rq locks have been dropped.
      To address this, we introduce the T_WCHAN flag to mark a thread which
      needs a deferred requeuing after its (weighted) scheduling priority
      has changed, handling it next wherever applicable at the first
      opportunity, by calling evl_adjust_wait_priority().
      CAUTION: to prevent any priority inversion, all code between the
      moment T_CHAN is set and the subsequent call to
      evl_adjust_wait_priority() must run with irqs off.
      Signed-off-by: Philippe Gerum's avatarPhilippe Gerum <rpm@xenomai.org>
    • Philippe Gerum's avatar
      evl/mutex: switch to sleep-and-retry locking pattern · 8fd59826
      Philippe Gerum authored
      Instead of having evl_unlock_mutex() eagerly transfer the ownership of
      a mutex to the next waiter in line directly, wake the latter up and
      let it retry the acquisition until it succeeds or an unrecoverable
      error happens. This sleep-and-retry pattern has key upsides:
      - it does not create any race with a preemptible PI walk running
      concurrently, with respect to the current owner of a mutex,
      simplifying and ironing the logic there.
      - it removes the requirement for handling specifically the mutex steal
      case (T_ROBBED+T_WAKEN), where a high priority thread would be allowed
      to snatch the ownership until the current owner actually resumes
      execution after unblock. This property is naturally enforced by the
      new locking pattern, there is no need for specific code since
      ownership is always grabbed by the acquiring thread.
      Signed-off-by: Philippe Gerum's avatarPhilippe Gerum <rpm@xenomai.org>
    • Philippe Gerum's avatar
      evl/mutex: fix PI walk vs unlock race · 733b00dc
      Philippe Gerum authored
      When dropping a mutex, the next owner its ownership is transferred to
      must be unblocked by a call to evl_wakeup_thread() before
      mutex->wchan.lock is released in __evl_unlock_mutex().
      Otherwise, we would still have owner->wchan pointing at mutex, which
      would confuse walk_pi_chain() as it might access the unlocked mutex
      state in the meantime, observing waiter->wchan == &mutex->wchan AND
      mutex->wchan.owner == waiter at the same time, which makes no
      sense. The PI walk would then issue a recursive spinlocking request,
      in an attempt to double-lock the (waiter, owner) pair pointing at the
      same thread.
      Signed-off-by: Philippe Gerum's avatarPhilippe Gerum <rpm@xenomai.org>
    • Philippe Gerum's avatar
    • Philippe Gerum's avatar
      evl/mutex: fix unlock vs lock race · 6557fbd8
      Philippe Gerum authored
      In a locking attempt, walking the PI chain is preemptible by other
      CPUs since mutex->wchan.lock may be released during the process, so
      the latest owner of the mutex might have dropped it while we were busy
      progressing down this chain.
      Do not block on the mutex if it has no owner on exit from the PI chain
      walk, drop of from the wait list then retry a locking attempt from the
      start instead.
      Signed-off-by: Philippe Gerum's avatarPhilippe Gerum <rpm@xenomai.org>
    • Philippe Gerum's avatar
      evl/mutex: fix mutex counting for T_WOLI|T_WEAK · f70f9b28
      Philippe Gerum authored
      When T_WOLI/T_WEAK is set for a thread acquiring or releasing a mutex,
      the count of mutexes it holds is tracked. Problem is that in some
      cases, a thread may release a mutex while bearing these bits, although
      it did not have them set when acquiring it, leading to imbalance in
      counting and general havoc due to any decision based on that
      information afterwards.
      We fix this by marking every mutex which participates to this counting
      with the new EVL_MUTEX_COUNTED flag, to keep the accounting accurate.
      Among several issues, this fixes this kernel splat observed on armv7,
      caused by evl_drop_current_ownership() being denied to unlock a mutex
      because of a bad tracking count, which in turn would cause its caller
      to loop indefinitely:
      [   52.576621] WARNING: CPU: 1 PID: 249 at kernel/evl/mutex.c:1383 evl_drop_current_ownership+0x50/0x7c
      [   52.585878] Modules linked in:
      [   52.589006] CPU: 1 PID: 249 Comm: sched-quota-acc Not tainted 5.15.64-00687-g07ee2d34
      -dirty #572
      [   52.598170] Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree)
      [   52.604718] IRQ stage: Linux
      [   52.607625] [<c0111c28>] (unwind_backtrace) from [<c010bc44>] (show_stack+0x10/0x14)
      [   52.615411] [<c010bc44>] (show_stack) from [<c0ebaedc>] (dump_stack_lvl+0xac/0xfc)
      [   52.623017] [<c0ebaedc>] (dump_stack_lvl) from [<c01271e4>] (__warn+0xd4/0x154)
      [   52.630357] [<c01271e4>] (__warn) from [<c0eb5018>] (warn_slowpath_fmt+0x60/0xbc)
      [   52.637877] [<c0eb5018>] (warn_slowpath_fmt) from [<c02466b4>] (evl_drop_current_ownership+0x50/0x7c)
      [   52.647130] [<c02466b4>] (evl_drop_current_ownership) from [<c024dc50>] (cleanup_current_thread+0x60/0x3f4)
      [   52.656903] [<c024dc50>] (cleanup_current_thread) from [<c024e83c>] (put_current_thread+0x24/0xc8)
      [   52.665891] [<c024e83c>] (put_current_thread) from [<c025241c>] (thread_ioctl+0xa4/0xd0)
      [   52.674011] [<c025241c>] (thread_ioctl) from [<c0316bc8>] (sys_ioctl+0x5a8/0xef4)
      [   52.681531] [<c0316bc8>] (sys_ioctl) from [<c0100080>] (ret_fast_syscall+0x0/0x1c)
      Signed-off-by: Philippe Gerum's avatarPhilippe Gerum <rpm@xenomai.org>
    • Philippe Gerum's avatar
      evl/mutex: fix lock imbalance detection · 9a40d39e
      Philippe Gerum authored
      Depending on the caller to undergo the SCHED_WEAK policy for checking
      for lock imbalance excludes the common case when threads don't.
      The proper way for detecting lock imbalance is to make sure the atomic
      handle of a mutex matches the current unlocker, from the outer call
      Signed-off-by: Philippe Gerum's avatarPhilippe Gerum <rpm@xenomai.org>
    • Philippe Gerum's avatar
      evl/mutex: skip mutex transfer to unblocked waiter · 151f7601
      Philippe Gerum authored
      A thread which has been forcibly unblocked while waiting for a mutex
      might still be linked to the mutex wait list, until it resumes in
      wait_mutex_schedule() eventually.
      Let's detect this case by transferring the mutex to a waiter only if
      it still pends on it.
      Signed-off-by: Philippe Gerum's avatarPhilippe Gerum <rpm@xenomai.org>
    • Philippe Gerum's avatar
      evl/mutex: finalize conversion to set_mutex_owner() · 90dc0379
      Philippe Gerum authored
      Convert the last callers of set_current_owner[_locked]() to using
      set_mutex_owner() instead, namely the lazy ceiling code for priority
      protection and the fast mutex acquisition paths (trylock, uncontended
      As a result, we remove the set_current_owner[_locked]() API entirely.
      Signed-off-by: Philippe Gerum's avatarPhilippe Gerum <rpm@xenomai.org>
    • Philippe Gerum's avatar
      evl/mutex: use two-phase tracking on release · d4a06d19
      Philippe Gerum authored
      Releasing a mutex may involve transferring ownership to the next
      waiter in line. This entails the same actions than performed by
      locking it, i.e. adding the PP boost if need be and propagating this
      change to the wait channel the (new) owner pends on, which are two
      separate operations, running under distinct locking conditions.
      To this end, update transfer_ownership() and __evl_unlock_mutex() to
      use set_mutex_owner() like evl_lock_mutex_timeout() does, with a split
      Signed-off-by: Philippe Gerum's avatarPhilippe Gerum <rpm@xenomai.org>
    • Philippe Gerum's avatar
      evl/mutex: introduce two-phase mutex tracking operations · 6cb980a3
      Philippe Gerum authored
      Mutexes are tracked by their respective owners in the per-thread
      owned_mutex list. Adding a new mutex to this list should then trigger
      an update of the wait channel priority for the owner/current thread,
      in case a priority protection boost is applicable to the thread
      currently owning this mutex.
      However, adding the boost and propagating the change to the wait
      channel are two separate operations, which run under distinct locking
      To this end, introduce set_mutex_owner() which open codes most of the
      work performed by the legacy set_current_owner_locked() except the
      wait channel update, telling the caller about the need for running the
      latter afterwards.
      Eventually, fix up evl_lock_mutex_timeout() in order to use
      set_mutex_owner() instead of set_current_owner_locked() accordingly.
      Signed-off-by: Philippe Gerum's avatarPhilippe Gerum <rpm@xenomai.org>
    • Philippe Gerum's avatar
    • Philippe Gerum's avatar
    • Philippe Gerum's avatar
      evl/thread: hide exiting threads from sysfs attribute files · 63bd7cd7
      Philippe Gerum authored
      Threads are special elements in that they may exit independently from
      the existence of their respective backing cdev. Make sure to hide
      exiting threads from sysfs handlers before we dismantle things.
      Prior to such fix, with lockdep enabled, we could receive this kernel
      splat when stopping an EVL application while an evl-ps loop is
      extracting data concurrently from the sysfs attributes related to an
      exiting thread:
      [  455.474409] DEBUG_LOCKS_WARN_ON(1)
      [  455.474456] WARNING: CPU: 4 PID: 425 at kernel/locking/lockdep.c:249 __lock_acquire+0xa66/0xd10
      [  455.474473] Modules linked in:
      [  455.474480] CPU: 4 PID: 425 Comm: evl-ps Not tainted 5.15.64+ #369
      [  455.474488] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.0-1.fc36 04/01/2014
      [  455.474492] IRQ stage: Linux
      [  455.474496] RIP: 0010:__lock_acquire+0xa66/0xd10
      [  455.474504] Code: 48 b8 4a 83 e8 8b c3 25 00 8b 3d ad 3d 30 02 85 ff 0f 85 45 ff ff ff 48 c7 c6 40 e6 66 82 48 c7 c7 60 dc 66 82 e8 c5 f6 e3 00 <0f> 0b 31 ed e9 c7 f7 ff ff e8 8c 0f 62 00 85 c0 0f 84 f9 fe ff ff
      [  455.474511] RSP: 0018:ffff888107cd7a78 EFLAGS: 00010082
      [  455.474518] RAX: 0000000000000000 RBX: ffff888107276f98 RCX: 0000000000000000
      [  455.474523] RDX: 0000000000000001 RSI: ffffffff8114d5a4 RDI: ffffed1020f9af45
      [  455.474527] RBP: 0000000000000416 R08: 0000000000000001 R09: ffff8881f742d85b
      [  455.474531] R10: ffffed103ee85b0b R11: 284e4f5f4e524157 R12: ffff888107276600
      [  455.474536] R13: 0000000000000000 R14: 0000000000000000 R15: ffff888107276f20
      [  455.474548] FS:  00007f485c93b740(0000) GS:ffff8881f7400000(0000) knlGS:0000000000000000
      [  455.474554] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  455.474559] CR2: 00005653f7bb4e68 CR3: 0000000108d00000 CR4: 00000000000406e0
      [  455.474564] Call Trace:
      [  455.474567]  <TASK>
      [  455.474570]  ? __clear_bit+0x25/0x40
      [  455.474581]  lock_acquire+0x14c/0x3b0
      [  455.474589]  ? sched_show+0x47/0x1b0
      [  455.474598]  ? lock_downgrade+0xe0/0xe0
      [  455.474605]  ? evl_get_element+0x5/0xa0
      [  455.474613]  ? __test_and_set_bit.constprop.0+0xe/0x20
      [  455.474622]  ? do_raw_spin_unlock+0x97/0xf0
      [  455.474630]  ? device_remove_bin_file+0x20/0x20
      [  455.474641]  sched_show+0x6f/0x1b0
      Signed-off-by: Philippe Gerum's avatarPhilippe Gerum <rpm@xenomai.org>
    • Philippe Gerum's avatar
      evl/poll: lockdep: delete group mutex at release · b4b18aed
      Philippe Gerum authored
      With dynamic lock classes enabled for mutexes, we have to destroy the
      latter thoroughly after usage so that classes are removed. Otherwise,
      lockdep may complain loudly about registering classes twice.
      Signed-off-by: Philippe Gerum's avatarPhilippe Gerum <rpm@xenomai.org>
    • Philippe Gerum's avatar
      evl/poll: lockdep: use evl_init_flag_on_stack() · c97e0d76
      Philippe Gerum authored
      This gets us a static lock class for the underlying wait queue lock,
      which is required for initialization from oob context.
      Signed-off-by: Philippe Gerum's avatarPhilippe Gerum <rpm@xenomai.org>
    • Philippe Gerum's avatar
      evl/wait: lockdep: introduce evl_init_wait_on_stack() · c71331c5
      Philippe Gerum authored
      We need a static lock class for wait queues we want to initialize from
      oob context, we cannot register classes dynamically since this would
      involve in-band only code, such as RCU read sections.
      We may assume that only stack-based wait queues may have to be
      initialized dynamically from the oob stage. Let's provide
      evl_init_wait_on_stack() for this purpose, which defines a static lock
      class key instead.
      Extend the flag API which is based on wait queues to provide
      evl_init_flag_on_stack() as well.
      Signed-off-by: Philippe Gerum's avatarPhilippe Gerum <rpm@xenomai.org>
    • Philippe Gerum's avatar
      evl/poll: lockdep: force registration of dynamic lock class · 91742f77
      Philippe Gerum authored
      Lockdep may register a lock class lazily at first lock acquisition,
      which might happen oob for us. Since that registration depends on RCU,
      we need to make sure this always happens in-band instead.
      Use might_lock() to force lockdep to pre-register the lock class at
      lock init time.
      Signed-off-by: Philippe Gerum's avatarPhilippe Gerum <rpm@xenomai.org>
    • Philippe Gerum's avatar
    • Philippe Gerum's avatar
      evl/mutex: sanitize dropping of owned mutexes at thread exit · 6915f0a0
      Philippe Gerum authored
      We should never rely on fundles to determine which mutexes should be
      dropped by an exiting thread: this data is shared with userland,
      therefore it is unsafe by nature.
      Since all mutexes a thread owns is duly tracked by the core, trust the
      tracking list exclusively to figure out which mutexes need to be
      released, there is no rationale to revalidate this information via
      fundle probing.
      At this chance, use the common thread->lock to serialize access to
      this list, there is no point in throwing yet another lock dedicated to
      tracking, since we have to hold it anyway for the surrounding
      operations (locking/unlocking), or nobody racing with us (i.e. at
      thread exit). This also fixes an ABBA issue which existed between
      wchan->lock and thread->tracking_lock. Now we only need to hold
      wchan->lock then thread->lock nested, which is deadlock-safe.
      Signed-off-by: Philippe Gerum's avatarPhilippe Gerum <rpm@xenomai.org>
    • Philippe Gerum's avatar
      evl/mutex: introduce flat PI chain walk · 35e99432
      Philippe Gerum authored
      This starts addressing the general ABBA issue for the PI boost and
      deadlock detection paths (deboosting will be addressed by subsequent
      The issue involves the locking order between mutex->wchan.lock and any
      thread->lock, which is not consistent. To fix this, we need to move
      from a recursive PI chain walk to a flat walk, carefully taking and
      releasing locks during the process in order to always abide by the
      following nesting order: mutex->wchan.lock first, thread->lock nested.
      When we need to lock two threads X and Y concurrently (e.g. PI
      boosting), their lock address in the operation is considered in a
      partial order, so that that if &X->lock < &Y->lock, then lock(X) is
      always acquired before lock(Y), avoiding the ABBA issue entirely.
      Signed-off-by: Philippe Gerum's avatarPhilippe Gerum <rpm@xenomai.org>
    • Philippe Gerum's avatar
    • Philippe Gerum's avatar
      evl/mutex: move lock to the wait channel base · 1042594d
      Philippe Gerum authored
      This is the second and last step to using the private lock provided by
      the wait channel descriptor, this time fixing up the mutex
      Signed-off-by: Philippe Gerum's avatarPhilippe Gerum <rpm@xenomai.org>
    • Philippe Gerum's avatar
      evl/wait: move lock to the wait channel base · c55e3dd0
      Philippe Gerum authored
      In order to walk the wait dependency chain which may exist between
      threads, we need to access the lock guarding each wait channel we may
      traverse. Currently, this lock is provided by the container type which
      embeds the wait channel descriptor, i.e. mutex or wait queue.
      Let's have the wait channel directly provide a private spinlock which
      we can operate in a generic way instead, without knowing about the
      container type.
      Fix up the wait queue implementation and users to use this inner lock.
      Signed-off-by: Philippe Gerum's avatarPhilippe Gerum <rpm@xenomai.org>
    • Philippe Gerum's avatar
      evl/mutex: disentangle locking dependencies · 814707b9
      Philippe Gerum authored
      This is a preparation patch for introducing a new locking model which
      will address a truckload of ABBA issues already detected by
      lockdep. This is not a fix per se, but several changes aimed at
      disentangling some existing locking constructs when it is
      straightforward, i.e. the low hanging fruits first. For this reason,
      this is one bulk patch waiting for the whole locking approach to be
      revised in the following patch series.
      - We don't have to - and must not - hold the last and next owners'
      locks simultaneously when transferring ownership of a mutex. The state
      of the thread dropping the mutex should be updated independently
      first, before ownership is passed to the next waiter in line.
      Removing this useless lock nesting fixes an ABBA issue in the release
      - In-band ownership detection does not require to hold curr->lock. Do
      the detection before grabbing this lock.
      - Untangle the locking constructs when about to sleep on the mutex, so
      that we can group the operations for the current and owner threads,
      both for acquisition and release.
      - Introduce a safe, double ordered locking helper for threads, to
      escape the ABBA issue when we need to lock the current and mutex owner
      thread concurrently.  Threads are locked by increasing address order,
      which garantees a fixed locking sequence between them.
      - We do not have to hold a lock on the originator of the locking
      request when traversing the dependency chain for detecting a
      deadlock. The latter only requires to match such thread by address in
      any depending wait channel.  Drop this constraint from the
      follow_depend() and reorder_wait() handlers.
      - When walking through all waiters for a mutex in
      evl_follow_mutex_depend(), we don't have to hold their respective
      thread->lock since they would have to leave the wait list they belong
      before they can go stale, which they cannot in the first place since
      we hold mutex->lock while iterating. Eventually, drop the useless
      locking of waiter->lock.
      - We don't need to hold curr->lock across the call validating the lock
      dependency chain. Split the thread lock release code in order to drop
      curr->lock as early as possible, before check_lock_chain() is called.
      - In transfer_ownership(), we must guard against the new owner going to
      pend on a resource while removing it from the wait list.
      - Flatten the deadlock detection logic when traversing the lock chain.
      - There is no point in carrying over the (mutex, owner) owner tuple as
      argument in places where mutex->owner == owner by
      construction. Simplify some call signatures accordingly.
    • Philippe Gerum's avatar
      evl/wait: allow for optional follow_depend handler · 6cac9e45
      Philippe Gerum authored
      There is no point in trashing the I-cache with useless jumps to nop
      handlers. Typically, .follow_depend is not provided by the common wait
      queue, so make it optional and check for presence before branching.
      Signed-off-by: Philippe Gerum's avatarPhilippe Gerum <rpm@xenomai.org>
    • Philippe Gerum's avatar
    • Philippe Gerum's avatar
      evl/thread: lockdep: enable tracking for thread->lock · dfa98d69
      Philippe Gerum authored
      Help lockdep in disambiguating thread locks by defining a per-thread
      lock class. To this end, add a per-thread lock key, which the thread's
      spinlock guard is indexed on.
      This fully enables lockdep for tracking spinlock issues in the EVL
      mutex support code.
      Signed-off-by: Philippe Gerum's avatarPhilippe Gerum <rpm@xenomai.org>
    • Philippe Gerum's avatar
      evl/thread: release the oob state unconditionally on cleanup · bb6508ef
      Philippe Gerum authored
      A thread running a fork+exec sequence has its own mm_struct which is
      different from its parent internally, therefore we must release the
      per-mm(_struct) oob state in this case too.
      Signed-off-by: Philippe Gerum's avatarPhilippe Gerum <rpm@xenomai.org>