- 08 Jan, 2023 1 commit
-
-
Philippe Gerum authored
The correct locking order is thread->lock => rq->lock, fix the loop moving the group members to the fifo class accordingly. At this chance, make the locking more fine-grained. Signed-off-by:
Philippe Gerum <rpm@xenomai.org>
-
- 07 Jan, 2023 1 commit
-
-
Philippe Gerum authored
Unconditionally switching the remaining TP threads on a runqueue to the SCHED_FIFO class before swapping the TP schedule plan is an arbitrary decision which may have unwanted side-effects, even papering over a consistency issue in the application. Besides, the implementation had the locking wrong in a couple of places, such as failing to grab the thread lock before calling evl_set_thread_schedparam_locked(). Change the logic by detecting the case and simply denying the change instead, returning -EBUSY to the caller. User must ensure to update the schedule plan consistently, i.e. when no thread is still running the previous one. Signed-off-by:
Philippe Gerum <rpm@xenomai.org>
-
- 30 Dec, 2022 1 commit
-
-
Philippe Gerum authored
Applications were missing some features from ungated events (i.e. count/sema4 and mask) in order to make it easier to use them as building blocks of other synchronization mechanisms. This change set adds the following support to fix this: - the ability to broadcast a count, which is essentially a way to unblock all waiters atomically, returning with a specific error code denoting the condition. This is much simpler and more reliable compared to unblocking all waiters manually. - the ability to broadcast an event mask, so that all waiters receive the same set of bits. This feature makes it simpler to implement gang-based logic in applications, when multiple threads consume particular states of a given event (which gated events do not allow easily). - support for conjunctive and disjunctive wait modes for event masks, so that threads can wait for a particular set of bits to be set in the mask, with AND/OR semantics. This departs from the forme...
-
- 26 Dec, 2022 37 commits
-
-
Philippe Gerum authored
We can make a good use of this information in order to implement broadcast wakeups efficiently. Signed-off-by:
Philippe Gerum <rpm@xenomai.org>
-
Philippe Gerum authored
This fixes a build issue when CONFIG_DOVETAIL is off, allowing arch-specific placeholders to be defined as required by the generic code. Signed-off-by:
Philippe Gerum <rpm@xenomai.org>
-
Philippe Gerum authored
Signed-off-by:
Philippe Gerum <rpm@xenomai.org>
-
Philippe Gerum authored
This change establishes an additional invariant regarding the presence of EVL_T_BOOST in a thread state and that thread's booster list, so that we can now rely on: - if EVL_T_BOOST is set in thread->state, then thread->boosters is not empty. - if EVL_T_BOOST is cleared in thread->state, then thread->boosters is empty. Signed-off-by:
Philippe Gerum <rpm@xenomai.org>
-
Philippe Gerum authored
Signed-off-by:
Philippe Gerum <rpm@xenomai.org>
-
Philippe Gerum authored
Signed-off-by:
Philippe Gerum <rpm@xenomai.org>
-
Philippe Gerum authored
drop_booster() may unlock the wait channel if some priority adjustment is required for the last owner as a result of dropping it. Meanwhile, a remote CPU might observe this wait channel still has a valid owner kernel-wise, although it is being released concurrently. Fix this lock state inconsistency by clearing the mutex owner information before the wait channel is unlocked in drop_booster(). Signed-off-by:
Philippe Gerum <rpm@xenomai.org>
-
Philippe Gerum authored
Signed-off-by:
Philippe Gerum <rpm@xenomai.org>
-
Philippe Gerum authored
Signed-off-by:
Philippe Gerum <rpm@xenomai.org>
-
Philippe Gerum authored
We need to account for interrupt pipelining in these assertions. Signed-off-by:
Philippe Gerum <rpm@xenomai.org>
-
Philippe Gerum authored
Signed-off-by:
Philippe Gerum <rpm@xenomai.org>
-
Philippe Gerum authored
__evl_unlock_mutex() may run concurrently to a lock chain walk, which would drop the ownership of the mutex, setting its owner field to NULL. Signed-off-by:
Philippe Gerum <rpm@xenomai.org>
-
Philippe Gerum authored
Multiple CPUs may walk the same lock chain concurrently, leading to inconsistencies with respect to the priority boost of some wait channel owner(s). The former approach to solve them using a serial token in the chain walk routine is fragile since an unlocked section has existed between the owner boost adjustment and such chain walk, which could lead to an inverted sequence. Fix this issue by always basing the priority boost of any thread owning wait channel(s) on the current state of its booster list. By doing so, we make the former issue impossible by construction. By relying on the invariant that a thread must always inherit the priority of the mutex leading its booster list, the implementation is significantly simplified as well: - PP and PI boost routines are merged into a single, new evl_adjust_boost_thread() call (both PP/PI mutexes are queued to their owner's booster list as long as they are granting it a boost). - we do not need the chain walk serial token anymore, the adjustment in evl_adjust_boost_thread() is performed based on the actual boost requirement, checking the booster list of the owner. - as a result, we only need to maintain the consistency of such booster list at any point in time, before calling evl_adjust_boost_thread() eventually. All the former logic in the callers can go. - eventually, exploiting such invariant enables a much simpler implementation of the lock chain walk code. Signed-off-by:
Philippe Gerum <rpm@xenomai.org>
-
Philippe Gerum authored
Some companion cores may want to fully handle some of the traps directly from the out-of-band stage, without demoting the current context to the in-band stage. Typically, the undefined instruction and VFP exception error traps could be handled that way. For such traps, instead of applying the in-band fixups unconditionally, assume the following: - if the companion core switches in-band upon notification, then the in-band kernel code may assume that all regular fixups need to be applied next. This is the common case for most traps. - if the companion core stays on the out-of-band stage upon notification, then the in-band kernel code should assume that all required fixups were done from that stage already, returning asap from the outer trap handler. Signed-off-by:
Philippe Gerum <rpm@xenomai.org>
-
Philippe Gerum authored
Some companion cores may want to fully handle some of the traps directly from the out-of-band stage, without demoting the current context to the in-band stage. Typically, FPSIMD access traps could be handled that way. For such trap, instead of applying the in-band fixups unconditionally, assume the following: - if the companion core switches in-band upon notification, then the in-band kernel code may assume that all regular fixups need to be applied next. This is the common case for most traps. - if the companion core stays on the out-of-band stage upon notification, then the in-band kernel code should assume that all required fixups were done from that stage already, returning asap from the outer trap handler. Signed-off-by:
Philippe Gerum <rpm@xenomai.org>
-
Philippe Gerum authored
Some companion cores may want to fully handle some of the traps directly from the out-of-band stage, without demoting the current context to the in-band stage. Typically, #UD, #NM or #DE could be handled that way. Instead of applying the in-band fixups unconditionally, assume the following: - if the companion core switches in-band upon notification, then the in-band kernel code may assume that all regular fixups need to be applied next. This is the common case for most traps. - if the companion core stays on the out-of-band stage upon notification, then the in-band kernel code should assume that all required fixups were done from that stage already, returning asap from the outer trap handler. Signed-off-by:
Philippe Gerum <rpm@xenomai.org>
-
Philippe Gerum authored
Dovetail normally notifies the companion core about most traps including when WARN[_ON]() has tripped, expecting a transition to the in-band stage if the CPU was running oob on entry. This is a problem when it comes to handling debug assertions, since the core might place them in sections of code where a stage switch would be unsafe, e.g. because they would hold some hard spinlocks involved in context switching, leading to a chicken-and-egg situation, eventually to a deadlock due to a recursive locking attempt. Let's make handle_bug() oob-safe, skipping the notification when handling a WARN trap. Signed-off-by:
Philippe Gerum <rpm@xenomai.org>
-
Philippe Gerum authored
Fully disable the irq stage (pipeline-wise) on _every_ operation which would normally be serialized by raw_local_irq_save(), so that no oob preemption can occur while holding the inner lockdep __lock. This fixes a deadlock condition with the following typical backtrace: (gdb) bt \#0 0xffffffff81133381 in cpu_relax () at evl-v5.15.y/arch/x86/include/asm/vdso/processor.h:19 \#1 virt_spin_lock (lock=lock@entry=0xffffffff85314090 <__lock>) at evl-v5.15.y/arch/x86/include/asm/qspinlock.h:100 \#2 queued_spin_lock_slowpath (lock=lock@entry=0xffffffff85314090 <__lock>, val=1) at evl-v5.15.y/kernel/locking/qspinlock.c:326 \#3 0xffffffff8112e5e7 in queued_spin_lock (lock=0xffffffff85314090 <__lock>) at evl-v5.15.y/include/asm-generic/qspinlock.h:85 \#4 lockdep_lock () at evl-v5.15.y/kernel/locking/lockdep.c:161 \#5 graph_lock () at evl-v5.15.y/kernel/locking/lockdep.c:187 \#6 mark_lock (curr=0xffff8881002b3100, this=0xffff8881002b3a20, new_bit=new_bit@entry=LOCK_USED_IN_HARDIRQ) at evl-v5.15.y/kernel/locking/lockdep.c:4619 \#7 0xffffffff8112ec2a in mark_usage (curr=curr@entry=0xffff8881002b3100, hlock=hlock@entry=0xffff8881002b3a20, check=check@entry=1) at evl-v5.15.y/kernel/locking/lockdep.c:4530 \#8 0xffffffff8112fae8 in __lock_acquire (lock=lock@entry=0xffffffff825474d8 <rcu_state+24>, subclass=subclass@entry=0, trylock=trylock@entry=0, read=read@entry=0, check=check@entry=1, hardirqs_off=<optimized out>, nest_lock=0x0 <fixed_percpu_data>, ip=18446744071580260235, references=0, pin_count=0) at evl-v5.15.y/kernel/locking/lockdep.c:5013 \#9 0xffffffff8112ed0a in lock_acquire (lock=lock@entry=0xffffffff825474d8 <rcu_state+24>, subclass=subclass@entry=0, trylock=trylock@entry=0, read=read@entry=0, check=check@entry=1, nest_lock=nest_lock@entry=0x0 <fixed_percpu_data>, ip=18446744071580260235) at evl-v5.15.y/kernel/locking/lockdep.c:5677 \#10 0xffffffff81a9ecc5 in __raw_spin_lock_irqsave (lock=0xffffffff825474c0 <rcu_state>) at evl-v5.15.y/include/linux/spinlock_api_smp.h:110 \#11 _raw_spin_lock_irqsave (lock=lock@entry=0xffffffff825474c0 <rcu_state>) at evl-v5.15.y/kernel/locking/spinlock.c:162 \#12 0xffffffff8115978b in print_other_cpu_stall (gp_seq=10445, gps=4294950930) at evl-v5.15.y/kernel/rcu/tree_stall.h:545 \#13 0xffffffff8115f399 in check_cpu_stall (rdp=rdp@entry=0xffff888237dbfb40) at evl-v5.15.y/kernel/rcu/tree_stall.h:729 \#14 0xffffffff8115f476 in rcu_pending (user=user@entry=0) at evl-v5.15.y/kernel/rcu/tree.c:3896 \#15 0xffffffff8115fd05 in rcu_sched_clock_irq (user=0) at evl-v5.15.y/kernel/rcu/tree.c:2614 \#16 0xffffffff8116b013 in update_process_times (user_tick=0) at evl-v5.15.y/kernel/time/timer.c:1788 \#17 0xffffffff8117fae4 in tick_sched_handle (ts=ts@entry=0xffff888237db1da0, regs=regs@entry=0xffff888237dae9e0) at evl-v5.15.y/kernel/time/tick-sched.c:226 \#18 0xffffffff8117fd0f in tick_sched_timer (timer=0xffff888237db1da0) at evl-v5.15.y/kernel/time/tick-sched.c:1420 \#19 0xffffffff8116bb43 in __run_hrtimer (flags=0, now=0xffffc90000200f38, timer=0xffff888237db1da0, base=0xffff888237db1440, cpu_base=0xffff888237db13c0) at evl-v5.15.y/kernel/time/hrtimer.c:1686 \#20 __hrtimer_run_queues (cpu_base=cpu_base@entry=0xffff888237db13c0, now=now@entry=186201588941, flags=flags@entry=0, active_mask=active_mask@entry=15) at evl-v5.15.y/kernel/time/hrtimer.c:1750 \#21 0xffffffff8116cb2e in hrtimer_interrupt (dev=<optimized out>) at evl-v5.15.y/kernel/time/hrtimer.c:1812 \#22 0xffffffff8118188f in proxy_irq_handler (sirq=<optimized out>, dev_id=<optimized out>) at evl-v5.15.y/kernel/time/tick-proxy.c:193 \#23 0xffffffff8114c4fd in handle_synthetic_irq (desc=0xffff888100d1e000) at evl-v5.15.y/kernel/irq/pipeline.c:211 \#24 0xffffffff8105b80b in arch_do_IRQ_pipelined (desc=<optimized out>) at evl-v5.15.y/arch/x86/kernel/irq_pipeline.c:203 This does increase the latency figures even more, but nobody should expect low latency from a system with lockdep enabled anyway. Signed-off-by:
Philippe Gerum <rpm@xenomai.org>
-
Philippe Gerum authored
Enforcing strict grace periods while running oob would not make sense, leave this to the in-band context when active. Signed-off-by:
Philippe Gerum <rpm@xenomai.org>
-
Philippe Gerum authored
Allow an out-of-band caller to start a RCU read locked section by forcing a RCU entry from NMI. Conversely, leaving such section unwinds the NMI entry. Like RCU read lock sections, NMI entries can be nested. In the same move, stop assuming that any out-of-band context means that RCU is not watching anymore, as a matter of fact the previous change would contradict this. The logic is based on the invariant that the pipeline stage cannot change while in a RCU read locked section. Signed-off-by:
Philippe Gerum <rpm@xenomai.org>
-
Philippe Gerum authored
A thread might have acquired a lock from user-space via the fast procedure, releasing it from kernel space - this pattern typically happens with the event monitor, e.g. fast_lock(x) -> signal(event) -> slow_unlock(x). In this case, the owner field into the wait channel might be unset if the lock was uncontended, although the atomic handle does match the thread's fundle. This is a legit case: skip the PI/PP deboosting if so because no boost can be in effect for such lock, waking up the heading waiter if any. CAUTION: this issue can be noticed only if T_WOLI is disabled for the thread (which means that CONFIG_EVL_DEBUG_WOLI is off too), otherwise only the slow acquire/release path would be taken, papering over the bug. Signed-off-by:
Philippe Gerum <rpm@xenomai.org>
-
Philippe Gerum authored
Posting a flag should wake up the thread leading the wait queue, not all waiting threads. The latter would not make sense because the first thread to reach test_event_mask() would fetch and clear the flag value atomically, causing subsequent ones to go sleeping again anyway. Since threads wait by decreasing priority order, there is no point in waking up all waiters upon a post action either. Fix the thundering herd effect by calling evl_wake_up_head() instead of evl_flush_wait_lock() on event receipt. Signed-off-by:
Philippe Gerum <rpm@xenomai.org>
-
Philippe Gerum authored
Signed-off-by:
Philippe Gerum <rpm@xenomai.org>
-
Philippe Gerum authored
Priority protection is about enforcing a static (ceiling) priority, so there is no point in tracking the top waiter priority dynamically for them. IOW, a PP mutex should be queued according to that static priority when part of a booster list. Signed-off-by:
Philippe Gerum <rpm@xenomai.org>
-
Philippe Gerum authored
Rename all state and information bits for EVL threads to prevent conflict with mainline symbols. Since these definitions are visible from user-space, bump the ABI revision in the same move. Signed-off-by:
Philippe Gerum <rpm@xenomai.org>
-
Philippe Gerum authored
Signed-off-by:
Philippe Gerum <rpm@xenomai.org>
-
Philippe Gerum authored
We may enter adjust_owner_boost() with no boosters because we did not hold owner->lock on entry, therefore this owner might have dropped all of them in the meantime if not current on the local CPU. Drop the assertion marking this condition as an issue. However, check that T_BOOST is cleared if the latter is met. Signed-off-by:
Philippe Gerum <rpm@xenomai.org>
-
Philippe Gerum authored
Signed-off-by:
Philippe Gerum <rpm@xenomai.org>
-
Philippe Gerum authored
Checking for PI/PP boosting mutex is not enough when dropping to in-band context: owning any mutex in this case would be wrong, since this would create a priority inversion. Extend the logic of evl_detect_boost_drop() to encompass any owned mutex, renaming it to evl_check_no_mutex() for consistency. As a side-effect, the thread which attempts to switch in-band while owning mutex(es) now receives a single HMDIAG_LKDEPEND notification, instead of notifying all waiter(s) sleeping on those mutexes. As a consequence, we can drop detect_inband_owner() which becomes redundant as it detects the same issue from the converse side without extending the test coverage (i.e. a contender would check whether the mutex owner is running in-band). This change does affect the behavior for applications turning on T_WOLI on waiter threads explicitly. This said, the same issue would still be detected if CONFIG_EVL_DEBUG_WOLI is set globally though, which is the recommended configuration during the development stage. This change also solves an ABBA issue which existed in the former implementation: [ 40.976962] ====================================================== [ 40.976964] WARNING: possible circular locking dependency detected [ 40.976965] 5.15.77-00716-g8390add2 #156 Not tainted [ 40.976968] ------------------------------------------------------ [ 40.976969] monitor-pp-lazy/363 is trying to acquire lock: [ 40.976971] ffff99c5c14e5588 (test363.0){....}-{0:0}, at: evl_detect_boost_drop+0x80/0x200 [ 40.976987] [ 40.976987] but task is already holding lock: [ 40.976988] ffff99c5c243d818 (monitor-pp-lazy:363){....}-{0:0}, at: evl_detect_boost_drop+0x0/0x200 [ 40.976996] [ 40.976996] which lock already depends on the new lock. [ 40.976996] [ 40.976997] [ 40.976997] the existing dependency chain (in reverse order) is: [ 40.976998] [ 40.976998] -> #1 (monitor-pp-lazy:363){....}-{0:0}: [ 40.977003] fast_grab_mutex+0xca/0x150 [ 40.977006] evl_lock_mutex_timeout+0x60/0xa90 [ 40.977009] monitor_oob_ioctl+0x226/0xed0 [ 40.977014] EVL_ioctl+0x41/0xa0 [ 40.977017] handle_pipelined_syscall+0x3d8/0x490 [ 40.977021] __pipeline_syscall+0xcc/0x2e0 [ 40.977026] pipeline_syscall+0x47/0x120 [ 40.977030] syscall_enter_from_user_mode+0x40/0xa0 [ 40.977036] do_syscall_64+0x15/0xf0 [ 40.977039] entry_SYSCALL_64_after_hwframe+0x61/0xcb [ 40.977044] [ 40.977044] -> #0 (test363.0){....}-{0:0}: [ 40.977048] __lock_acquire+0x133a/0x2530 [ 40.977053] lock_acquire+0xce/0x2d0 [ 40.977056] evl_detect_boost_drop+0xb0/0x200 [ 40.977059] evl_switch_inband+0x41e/0x540 [ 40.977064] do_oob_syscall+0x1bc/0x3d0 [ 40.977067] handle_pipelined_syscall+0xbe/0x490 [ 40.977071] __pipeline_syscall+0xcc/0x2e0 [ 40.977075] pipeline_syscall+0x47/0x120 [ 40.977079] syscall_enter_from_user_mode+0x40/0xa0 [ 40.977083] do_syscall_64+0x15/0xf0 [ 40.977086] entry_SYSCALL_64_after_hwframe+0x61/0xcb [ 40.977090] [ 40.977090] other info that might help us debug this: [ 40.977090] [ 40.977091] Possible unsafe locking scenario: [ 40.977091] [ 40.977092] CPU0 CPU1 [ 40.977093] ---- ---- [ 40.977094] lock(monitor-pp-lazy:363); [ 40.977096] lock(test363.0); [ 40.977098] lock(monitor-pp-lazy:363); [ 40.977100] lock(test363.0); [ 40.977102] [ 40.977102] *** DEADLOCK *** [ 40.977102] [ 40.977103] 1 lock held by monitor-pp-lazy/363: [ 40.977105] #0: ffff99c5c243d818 (monitor-pp-lazy:363){....}-{0:0}, at: evl_detect_boost_drop+0x0/0x200 [ 40.977113] Signed-off-by:
Philippe Gerum <rpm@xenomai.org>
-
Philippe Gerum authored
At this chance, privatize some helper macro, dropping an unused one. Signed-off-by:
Philippe Gerum <rpm@xenomai.org>
-
Philippe Gerum authored
A thread which owns neither PI mutex claimed by others nor PP-activated mutex should not be boosted. Add a couple of assertions revealing that kind of brokenness. Signed-off-by:
Philippe Gerum <rpm@xenomai.org>
-
Philippe Gerum authored
Signed-off-by:
Philippe Gerum <rpm@xenomai.org>
-
Philippe Gerum authored
might_hard_lock() is the equivalent of the in-band might_lock() counterpart, usable with hard spinlocks. Signed-off-by:
Philippe Gerum <rpm@xenomai.org>
-
Philippe Gerum authored
Hint about a new tracked lock using might_lock() when initializing a thread, so that we do not risk treading on in-band code for registering the lock class lazily at the first locking attempt from oob. Signed-off-by:
Philippe Gerum <rpm@xenomai.org>
-
Philippe Gerum authored
With a preemptible PI walk, we might find no waiter while adjusting the boost priority of an owner since waiters can leave the wait channel while the latter is unlocked, followed by a corresponding adjustment call if the mutex can be dropped from its owner's booster list. We still have to detect this condition though, aborting the adjustment process immediately in this case since we can expect another adjustment call to take place if need be (i.e. if the mutex has no more waiter). Signed-off-by:
Philippe Gerum <rpm@xenomai.org>
-
Philippe Gerum authored
In the following scenario, the former implementation would yield a double-removal of an event from the tracking list of its gate: threadA: wait_monitor(E, G) <break/timeout> [1] __evl_wait_schedule(E) threadB: enter_monitor(G) signal_event(E) exit_monitor(G) __untrack_event(E) enter_monitor(G) signal_event(E) exit_monitor(G) __untrack_event(E) threadA: [2] untrack_event(E) IOW, between points [1] and [2] of the code path returning from a wait state on an error condition, other thread(s) may signal the event which has no waiter anymore, therefore __untrack_event() would attempt to unlink that event from the tracking list of its gate at every release, leading to a list corruption such as: Nov 11 21:25:07 10.243.1.49 [ 3877.744180] list_del corruption, ffff888151798240->next is LIST_POISON1 (dead000000000100) Nov 11 21:25:07 10.243.1.49 [ 3877.846773] WARNING: CPU: 8 PID: 16134 at lib/list_debug.c:55 __list_del_entry_valid+0xc5/0x120 Nov 11 21:25:08 10.243.1.49 [ 3877.847043] Call Trace: Nov 11 21:25:08 10.243.1.49 [ 3877.847045] <TASK> Nov 11 21:25:08 10.243.1.49 [ 3877.847059] __untrack_event.part.9+0x26/0xe0 Nov 11 21:25:08 10.243.1.49 [ 3877.847096] monitor_oob_ioctl+0x1356/0x1490 Nov 11 21:25:08 10.243.1.49 [ 3877.847367] EVL_ioctl+0x81/0xf0 Nov 11 21:25:08 10.243.1.49 [ 3877.847386] do_oob_syscall+0x699/0x6b0 Nov 11 21:25:08 10.243.1.49 [ 3877.847440] handle_oob_syscall+0x189/0x230 Fix this by ensuring one-time removal based on the value of the event->gate backlink. Signed-off-by:
Philippe Gerum <rpm@xenomai.org>
-
Philippe Gerum authored
We might receive a normal wakeup after a timeout or break condition occurred while sleeping on a waitqueue (the opposite cannot happen though). These conditions must be given higher priority when it comes to returning an operation status to the caller of __evl_wait_schedule() , e.g.: With threadB.pri > threadA.pri, both running on the same CPU: threadA: AAA AAA evl_sleep_on(timeout, &wq->wchan) evl_wait_schedule(wq) ... threadB: BBB BBB IRQ<timeout> evl_wakeup_thread(threadA, T_PEND|T_DELAY, T_TIMEO) BBB BBB evl_wake_up_head(&wq->wchan) evl_wakeup_thread(threadA, T_PEND, 0) /* normal */ BBB BBB threadA: __evl_wait_schedule(&wq->wchan) * threadA is no more linked to wchan->wait_list * but T_TIMEO is set in threadA->info too AAA As a result of the scenario above, threadA should receive -ETIMEDOUT when returning from __evl_wait_schedule(). The same goes for a T_BREAK condition, which should yield -EINTR. Signed-off-by:
Philippe Gerum <rpm@xenomai.org>
-