Commit 1641601f authored by Philippe Gerum's avatar Philippe Gerum
Browse files

evl/thread: fix leaking ready state in evl_switch_inband()

When arming T_INBAND for the stage switching thread, we have to remove
it from the runqueue in the same move if T_READY is present in its
state flags.

Failing to do so creates a race with another CPU readying that thread
by calling evl_release_thread(), which leads to an inconsistent
scheduler state with both T_INBAND and T_READY set for the
thread. When this happens, evl_switch_inband() may pick the switching
thread from the runqueue for out-of-band scheduling in
__evl_schedule() despite being formally blocked by T_INBAND, instead
of waiting for the inband scheduler to do so for completing the
transition to inband context.

As a result, dovetail_resume_inband() spuriously runs from the
out-of-band stage eventually (caught by CONFIG_DEBUG_DOVETAIL), which
leads to a galactic mess.
Signed-off-by: Philippe Gerum's avatarPhilippe Gerum <>
parent e139f010
......@@ -691,6 +691,10 @@ void evl_switch_inband(int cause)
if (curr->state & T_READY) {
curr->state &= ~T_READY;
curr->info &= ~EVL_THREAD_INFO_MASK;
curr->state |= T_INBAND;
curr->local_info &= ~T_SYSRST;
......@@ -1063,7 +1067,6 @@ int evl_join_thread(struct evl_thread *thread, bool uninterruptible)
* We allow multiple callers to join @thread, this is purely a
* synchronization mechanism with no resource collection.
if (thread->info & T_DORMANT) {
xnlock_put_irqrestore(&nklock, flags);
return 0;
......@@ -1079,38 +1082,10 @@ int evl_join_thread(struct evl_thread *thread, bool uninterruptible)
xnlock_put_irqrestore(&nklock, flags);
* We have a tricky issue to deal with, which involves code
* relying on the assumption that a destroyed thread will have
* scheduled away from do_exit() before evl_join_thread()
* returns. A typical example is illustrated by the following
* sequence, with a EVL kthread implemented in a dynamically
* loaded module:
* CPU0: evl_cancel_kthread(kthread)
* evl_cancel_thread(kthread)
* evl_join_thread(kthread)
* ...<back to user>..
* rmmod(module)
* CPU1: in kthread()
* ...
* ...
* __evl_test_cancel()
* do_exit()
* schedule()
* In such a sequence, the code on CPU0 would expect the EVL
* kthread to have scheduled away upon return from
* evl_cancel_kthread(), so that unmapping the cancelled
* kthread code and data memory when unloading the module is
* always safe.
* To address this, the joiner first waits for the joinee to
* signal completion from the EVL thread cleanup handler
* (cleanup_current_thread), then waits for a full RCU grace
* period to have elapsed. Since the completion signal is sent
* on behalf of do_exit(), we may assume that the joinee has
* scheduled away before the RCU grace period ends.
* Wait until the joinee is fully dismantled in
* thread_factory_dispose(), which guarantees safe module
* removal afterwards if applicable. After this point, @thread
* is invalid.
if (uninterruptible)
......@@ -1120,8 +1095,6 @@ int evl_join_thread(struct evl_thread *thread, bool uninterruptible)
return -EINTR;
/* The joinee is gone at this point, @thread is invalid. */
if (switched)
ret = evl_switch_oob();
......@@ -1358,8 +1331,7 @@ void __evl_kick_thread(struct evl_thread *thread) /* nklock locked, irqs off */
* epilogue. Otherwise, we want that thread to enter the
* mayday trap asap.
if (thread != this_evl_rq_thread() &&
(thread->state & T_USER))
if ((thread->state & T_USER) && thread != this_evl_rq_thread())
......@@ -1868,11 +1840,6 @@ static void handle_sigwake_event(struct task_struct *p)
thread->state &= ~T_SSTEP;
if (thread->state & T_INBAND) {
xnlock_put_irqrestore(&nklock, flags);
* A thread running on the oob stage may not be picked by the
* in-band scheduler as it bears the _TLF_OFFSTAGE flag. We
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment