1. 05 Jun, 2021 40 commits
    • Philippe Gerum's avatar
    • Philippe Gerum's avatar
      x86: irq_pipeline: make sure no _TIF_WORK is left pending on IRQ exit · fdaba1b2
      Philippe Gerum authored
      The pipelined interrupt entry code must always run the common work
      loop before returning to user mode on the in-band stage, including
      after the preempted task was demoted from oob to in-band context as a
      result of handling the incoming IRQ.
      
      Failing to do so may cause in-band work to be left pending in this
      particular case, like _TIF_RETUSER and other _TIF_WORK conditions.
      
      This bug caused the smokey 'gdb' test to fail on x86:
      https://xenomai.org/pipermail/xenomai/2021-March/044522.html
      
      Signed-off-by: Philippe Gerum's avatarPhilippe Gerum <rpm@xenomai.org>
      fdaba1b2
    • hongzha1 via Xenomai's avatar
    • Jan Kiszka's avatar
      x86: Ensure asm/proto.h can be included stand-alone · 61a3a02e
      Jan Kiszka authored
      
      
      Avoids
      
      ../arch/x86/include/asm/proto.h:14:30: warning: ‘struct task_struct’ declared inside parameter list will not be visible outside of this definition or declaration
       long do_arch_prctl_64(struct task_struct *task, int option, unsigned long arg2);
                                    ^~~~~~~~~~~
      ../arch/x86/include/asm/proto.h:40:34: warning: ‘struct task_struct’ declared inside parameter list will not be visible outside of this definition or declaration
       long do_arch_prctl_common(struct task_struct *task, int option,
                                        ^~~~~~~~~~~
      
      if linux/sched.h hasn't be included previously.
      Signed-off-by: Jan Kiszka's avatarJan Kiszka <jan.kiszka@siemens.com>
      61a3a02e
    • Philippe Gerum's avatar
    • Philippe Gerum's avatar
    • Philippe Gerum's avatar
      evl/thread: drop call to obsolete force_commit_memory() · 5f6e447d
      Philippe Gerum authored
      
      
      A process is now marked for COW-breaking on fork() upon the first call
      to dovetail_init_altsched(), and must ensure its memory is locked via
      a call to mlockall(MCL_CURRENT|MCL_FUTURE) as usual.
      
      As a result, force_commit_memory() became pointless and was removed
      from the Dovetail interface.
      Signed-off-by: Philippe Gerum's avatarPhilippe Gerum <rpm@xenomai.org>
      5f6e447d
    • Philippe Gerum's avatar
      dovetail: rework address space pinning · c1599c5d
      Philippe Gerum authored
      
      
      Real-time applications contolled by the out-of-band core require some
      guarantees regarding how memory is managed for them in order to
      prevent unexpected delays:
      
      [1] paging must be disabled, all current and future pages must be
          faulted in.
      
      [2] copy-on-write must not be relied upon between a real-time parent
          and any of its children in order to share pages upon fork(). IOW,
          every child should get its own copy of the parent's pages upon
          fork(), and the latter should NOT have to be marked read-only as a
          result of this.
      
      The former implementation relied on Dovetail-specific code to address
      these requirements:
      
      - force_commit_memory() would scan all VMAs attached to the caller's
        address space in order to fault them in via commit_vma(). A new task
        attaching to the out-of-band core was expected to call
        force_commit_memory() in order to process the address space
        accordingly.
      
      - commit_vma() would populate a VMA by calling
        populate_vma_page_range() for common mappings, or pin special
        mappings via GUP such as huge pages.
      
      - commit_vma() would also be called when the protection bits of a page
        is changed, in order to catch cases which would require more
        COW-breaking as a result. This is useless, copy_pte_range() is the
        only code path where pages may have to be unCOWed.
      
      COW-breaking upon fork() was not yet performed by Dovetail.
      
      These applications can use mlockall(MCL_CURRENT|MCL_FUTURE) in order
      to enforce [1], this will ensure the mappings attached to the caller's
      mm are populated and faulted in when applicable. Locking the memory
      has been a requirement for these applications since day
      one. Therefore, force_commit_memory() is redundant with
      mlockall(MCL_CURRENT).
      
      [2] can be obtained by extending to Dovetail-aware memory the
      COW-breaking logic readily available to pinned pages (FOLL_PIN) in
      copy_pte_range() -> copy_present_pte() -> copy_present_page(). The
      real address space of a task which calls dovetail_init_altsched() can
      be marked as Dovetail-aware in the process, since such a call is a
      clear hint that the underlying task will require both [1] and [2].
      
      At this chance, MMF_VM_PINNED is renamed MMF_DOVETAILED to fix a
      confusing name clash with the page pinning logic, which has different
      semantics.
      Signed-off-by: Philippe Gerum's avatarPhilippe Gerum <rpm@xenomai.org>
      c1599c5d
    • Philippe Gerum's avatar
      x86: irq_pipeline: fix interrupt protection using temporary mm - take #2 · b28d00c8
      Philippe Gerum authored
      
      
      This change partially reverts commit #b8ccedcc, which wrongly
      re-enables hard irqs before the TLB is flushed. The entire section
      reinstating the temporary mm must run with hard irqs off, which is
      already enforced by local_irq_{save, restore}_full().
      
      Drop the broken hard irq toggles from {use, unuse}_temporary_mm(), but
      leave the assertion checking for the correct hard interrupt state on
      entry of the former.
      Signed-off-by: Philippe Gerum's avatarPhilippe Gerum <rpm@xenomai.org>
      b28d00c8
    • Philippe Gerum's avatar
      x86: irq_pipeline: fix interrupt protection using temporary mm · b52414f7
      Philippe Gerum authored
      
      
      The protection use_temporary_mm() should provide still expects the
      pipeline entry code not to mess up on handling an interrupt, which
      defeats the purpose of such precaution. Besides, the temp_state should
      be snapshot under protection too.
      
      IOW, IRQs should be hard disabled fully while using the temporary
      mm. We may assume that use_temporary_mm() is always called with hard
      irqs on, only at boot time.
      Signed-off-by: Philippe Gerum's avatarPhilippe Gerum <rpm@xenomai.org>
      b52414f7
    • Philippe Gerum's avatar
      genirq: irq_pipeline: inband_irq helpers must not trace the interrupt state · a965c4fd
      Philippe Gerum authored
      
      
      First of all, since the inband_irq helpers are folded into local_irq*
      calls which already trace the inband interrupt state via the
      trace_hardirqs* helpers, tracing such state in the inband_irq*
      routines too is entirely redundant.
      
      In addition, inband_irq* helpers are also used by raw_local_irq*
      forms, which are supposed not to trace the interrupt state. Typically,
      lockdep assumes this when verifying the sanity of the global interrupt
      state vs lockdep interrupt state, i.e.
      
      	  /* irqs are (virtually) on */
      	  raw_local_irq_save(flags);
      	  check_flags(flags);
      	  ...
      
      In this case, check_flags() is supposed to find out that @flags
      represents an unstalled context, which should match hardirqs_enabled
      == 1, because raw_local_irq_save() should have skipped the
      trace_hardirqs* machinery.
      Signed-off-by: Philippe Gerum's avatarPhilippe Gerum <rpm@xenomai.org>
      a965c4fd
    • Philippe Gerum's avatar
      arm64: dovetail: remove stale user_exit() in syscall entry · b936cf24
      Philippe Gerum authored
      
      
      user_exit() is now redundant when called from el0_svc_common.
      
      At this chance, assume that we must be unstalled on entry if running
      inband, otherwise the virtual interrupt state is broken. Add the
      proper runtime assertion to check this.
      Signed-off-by: Philippe Gerum's avatarPhilippe Gerum <rpm@xenomai.org>
      b936cf24
    • Jan Kiszka's avatar
      x86: dovetail: add support to 32-bit syscall path · 29dabc60
      Jan Kiszka authored
      
      
      Analogously to do_syscall_64.
      Signed-off-by: Jan Kiszka's avatarJan Kiszka <jan.kiszka@siemens.com>
      29dabc60
    • Philippe Gerum's avatar
      x86: dovetail: the #DB exception handler must run w/ hard irqs on · 1156dd5d
      Philippe Gerum authored
      Use local_irq_{enable, disable}_full() call forms to update the
      interrupt state in the #DB handler. Issue caught by a kernel splat
      running gdb on an application with CONFIG_DEBUG_DOVETAIL enabled:
      
      [   52.097079] WARNING: CPU: 2 PID: 1318 at ../kernel/irq/pipeline.c:316 inband_irq_enable+0x10/0x20
      [   52.097079] Modules linked in: 9p
      [   52.097080] CPU: 2 PID: 1318 Comm: latency Not tainted 5.10.19+ #41
      [   52.097080] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
      [   52.097080] IRQ stage: Linux
      [   52.097081] RIP: 0010:inband_irq_enable+0x10/0x20
      [   52.097081] Code: 00 00 00 01 75 ee e8 cf fa ff ff 53 9d 5b c3 66 66 2e 0f 1f 84 00 00 00 00 00 80 3d 9a 38 b3 02 00 75 09 9c 58 f6 c4 02 75 02 <0f> 0b eb 8c 66 66 2e 0f 1f 84 00 00 00 00 00 90 0f 1f 44 00 00 48
      [   52.097081] RSP: 0000:ffffc90000783f20 EFLAGS: 00010046
      [   52.097082] RAX: 0000000000000046 RBX: ffffc90000783f58 RCX: 0000000000000000
      [   52.097082] RDX: ffffc90000783ef0 RSI: ffffffff8109e600 RDI: ffffffff81d4eee2
      [   52.097082] RBP: ffff888006e70000 R08: 0000000000000000 R09: 0000000000000000
      [   52.097083] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000004000
      [   52.097083] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
      [   52.097083] FS:  00007ffff7fe6640(0000) GS:ffff88803ed00000(0000) knlGS:0000000000000000
      [   52.097084] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [   52.097084] CR2: 00007ffff7243610 CR3: 00000000070c6001 CR4: 0000000000370ee0
      [   52.097084] Call Trace:
      [   52.097084]  noist_exc_debug+0xf7/0x180
      [   52.097085]  ? asm_exc_debug+0x23/0x30
      [   52.097085]  asm_exc_debug+0x2b/0x30
      [   52.097085] RIP: 0033:0x401df3
      [   52.097086] Code: 00 00 e9 b0 fb ff ff ff 25 62 44 20 00 68 44 00 00 00 e9 a0 fb ff ff ff 25 5a 44 20 00 68 45 00 00 00 e9 90 fb ff ff 31 ed 90 <e8> f9 30 01 00 48 8d 65 d8 5b 41 5c 41 5d 41 70 44 40 00 48 c7 c1
      [   52.097086] RSP: 002b:00007fffffffe1c0 EFLAGS: 00000346
      [   52.097086] RAX: 00007ffff7ffe0e0 RBX: 00007ffff7ffe0e0 RCX: 00007ffff7df23c7
      [   52.097087] RDX: 0000103e00000000 RSI: 0000000000000000 RDI: 0000000000000000
      [   52.097087] RBP: 00007fffffffe3a0 R08: 00007ffff6e8f008 R09: 0000000000000009
      [   52.097087] R10: 00007ffff7ffd990 R11: 0000000000000206 R12: 0000000000000000
      [   52.097087] R13: 00007ffff7ffe110 R14: 00007ffff7ffe110 R15: 00007ffff7fe6640
      [   52.097088] irq event stamp: 0
      [   52.097088] hardirqs last  enabled at (0): [<0000000000000000>] 0x0
      [   52.097088] hardirqs last disabled at (0): [<ffffffff8106c648>] copy_process+0x718/0x1cd0
      [   52.097089] softirqs last  enabled at (0): [<ffffffff8106c648>] copy_process+0x718/0x1cd0
      [   52.097089] softirqs last disabled at (0): [<0000000000000000>] 0x0
      [   52.097089] ---[ end trace b07496576d3779dc ]---
      
      See https://xenomai.org/pipermail/xenomai/2021-March/044662.html
      
      .
      Reported-by: Jan Kiszka's avatarJan Kiszka <jan.kiszka@siemens.com>
      Signed-off-by: Philippe Gerum's avatarPhilippe Gerum <rpm@xenomai.org>
      1156dd5d
    • Philippe Gerum's avatar
      genirq: irq_pipeline: fix misnomer · 63bdc28a
      Philippe Gerum authored
      
      Signed-off-by: Philippe Gerum's avatarPhilippe Gerum <rpm@xenomai.org>
      63bdc28a
    • Philippe Gerum's avatar
      evl/sched: refine tracepoints · 23d657e2
      Philippe Gerum authored
      
      Signed-off-by: Philippe Gerum's avatarPhilippe Gerum <rpm@xenomai.org>
      23d657e2
    • Philippe Gerum's avatar
      evl/syscall: remove indirection via pointer table · cce881ed
      Philippe Gerum authored
      
      
      We have only very few syscalls, prefer a plain switch to a pointer
      indirection which ends up being fairly costly due to exploit
      mitigations.
      Signed-off-by: Philippe Gerum's avatarPhilippe Gerum <rpm@xenomai.org>
      cce881ed
    • Philippe Gerum's avatar
      evl: kconfig: clarify wording · 834f9969
      Philippe Gerum authored
      
      Signed-off-by: Philippe Gerum's avatarPhilippe Gerum <rpm@xenomai.org>
      834f9969
    • Philippe Gerum's avatar
      evl/wait: display waitqueue name in trace · 12bbc1f7
      Philippe Gerum authored
      
      Signed-off-by: Philippe Gerum's avatarPhilippe Gerum <rpm@xenomai.org>
      12bbc1f7
    • Philippe Gerum's avatar
      evl: kconfig: introduce high per-CPU concurrency switch · a34cfccc
      Philippe Gerum authored
      
      
      EVL_HIGH_PERCPU_CONCURRENCY optimizes the implementation for
      applications with many real-time threads running concurrently on any
      given CPU core (typically when eight or more threads may be sharing a
      single CPU core). This is a combination of the scalable scheduler and
      rb-tree timer indexing as a single configuration switch, since both
      aspects are normally coupled.
      
      If the application system runs only a few EVL threads per CPU core,
      then this option should be turned off, in order to minimize the cache
      footprint of the queuing operations performed by the scheduler and
      timer subsystems. Otherwise, it should be turned on in order to have
      constant-time queuing operations for a large number of runnable
      threads and outstanding timers.
      Signed-off-by: Philippe Gerum's avatarPhilippe Gerum <rpm@xenomai.org>
      a34cfccc
    • Philippe Gerum's avatar
      evl/sched: enable fast linear thread scheduler (non-scalable) · 22920393
      Philippe Gerum authored
      
      
      For applications with only few runnable tasks at any point in time, a
      linear queue ordering the latter for scheduling delivers better
      performance on low-end systems due to smaller CPU cache footprints,
      compared to the multi-level queue used by the scalable scheduler.
      
      Allow users to select between lightning-fast and scalable scheduler
      implementation depending on the runtime profile of the application.
      Signed-off-by: Philippe Gerum's avatarPhilippe Gerum <rpm@xenomai.org>
      
      # Please enter the commit message for your changes. Lines starting
      # with '#' will be ignored, and an empty message aborts the commit.
      #
      # On branch evl/master
      # Your branch is ahead of 'origin/evl/master' by 2 commits.
      #   (use "git push" to publish your local commits)
      #
      # Changes to be committed:
      #	modified:   include/evl/sched.h
      #	modified:   include/evl/sched/queue.h
      #	modified:   include/evl/sched/tp.h
      #	modified:   include/evl/sched/weak.h
      #	modified:   kernel/evl/Kconfig
      #	modified:   kernel/evl/sched/core.c
      #
      # Untracked files:
      #	include/trace/events/mm.h
      #
      22920393
    • Philippe Gerum's avatar
      bd6858d8
    • Philippe Gerum's avatar
      evl/timer: add linear indexing method · 31eca2f4
      Philippe Gerum authored
      
      
      Add (back) the ability to index timers either in a rb-tree or linked
      to a basic linked list.
      
      The latter delivers lower latency to applications systems with very
      few active timers at any point in time (typically less than 10 active
      timers, e.g. not more than a couple of timed loops, very few timed
      syscalls).
      Signed-off-by: Philippe Gerum's avatarPhilippe Gerum <rpm@xenomai.org>
      31eca2f4
    • Philippe Gerum's avatar
      ARM: dovetail: fix compat mode branch · 1d3b7c35
      Philippe Gerum authored
      
      Signed-off-by: Philippe Gerum's avatarPhilippe Gerum <rpm@xenomai.org>
      1d3b7c35
    • Philippe Gerum's avatar
      ARM: dovetail: enable I-pipe compat for syscall routing · b403ee0d
      Philippe Gerum authored
      
      
      Allow legacy applications to issue syscalls following the I-pipe
      syscall convention in EABI mode, which specifies that r7 should be
      loaded with 0xf0042 in order to mark out-of-band syscalls.
      
      When CONFIG_IPIPE_COMPAT is enabled, OOB_SYSCALL_BIT is ORed into r7
      on the fly if its original value is 0xf0042, so that the syscall will
      be routed to the companion core as expected.
      
      This compat mode may be removed on short notice. Do NOT rely on it for
      new applications.
      Signed-off-by: Philippe Gerum's avatarPhilippe Gerum <rpm@xenomai.org>
      b403ee0d
    • Philippe Gerum's avatar
    • Philippe Gerum's avatar
      x86: irq_pipeline: do not duplicate RCU exit on stage switch · 4a9a37ab
      Philippe Gerum authored
      
      
      There is no point in manually exiting RCU upon unwinding a task
      context through a stage transition from out-of-band to in-band. This
      would wreck the RCU state machine badly, duplicating the notification
      about kernel exit.
      
      For sure, that task did originally move out-of-band on some kernel
      entry (e.g. a syscall), and will therefore notify RCU about leaving
      the kernel on its way back to userland.
      Signed-off-by: Philippe Gerum's avatarPhilippe Gerum <rpm@xenomai.org>
      4a9a37ab
    • Philippe Gerum's avatar
      x86/mm: dovetail: fix interrupt state synchronization upon fault · de5a3a12
      Philippe Gerum authored
      
      
      When the fault handling code wants to re-enable IRQs, we need to
      carefully fixup the in-band stall bit (if applicable) and the hardware
      flag appropriately, which means:
      
      - enabling back IRQs over the kernel context should lead to unstalling
        the in-band stage if only we did have to stall it upon taking the
        fault.
      
      - flip the in-band stall bit manually instead of calling
        local_irq_enable(), so that we may do this with hardware IRQs
        disabled without triggering the debug check.
      Signed-off-by: Philippe Gerum's avatarPhilippe Gerum <rpm@xenomai.org>
      de5a3a12
    • Philippe Gerum's avatar
    • Philippe Gerum's avatar
      genirq: clarify naming for out-of-band IPI service · cd4344cc
      Philippe Gerum authored
      
      
      irq_pipeline_send_remote() as a name fails to convey the idea of
      sending out-of-band IPIs.
      
      Since this service can only send such type of IRQ, let's rename it to
      irq_send_oob_ipi() for the sake of clarity and consistency.
      Signed-off-by: Philippe Gerum's avatarPhilippe Gerum <rpm@xenomai.org>
      cd4344cc
    • Philippe Gerum's avatar
      evl: convert to fallthrough markers · c7494cde
      Philippe Gerum authored
      
      Signed-off-by: Philippe Gerum's avatarPhilippe Gerum <rpm@xenomai.org>
      c7494cde
    • Philippe Gerum's avatar
      050651db
    • Philippe Gerum's avatar
      irq_pipeline: locking: add prepare, finish helpers to hard spinlocks · dad24efb
      Philippe Gerum authored
      
      
      The companion core may make good use of a way to act upon a locking
      operation which is about to start, or an unlocking operation which has
      just taken place. Typically, some debug code could be enabled this
      way, checking for the consistency of such operations. Since hybrid
      spinlocks are based on hard spinlocks, those helpers are available in
      both cases.
      
      The locking process is now as follows:
      
      IRQ forms:
      
      * locking:     hard_disable_irqs + lock_prepare + spin_on_lock
      * try-locking: hard_disable_irqs + trylock_prepare + try_lock, trylock_fail if busy
      * unlocking:   unlock + lock_finish + hard_enable_irqs
      
      basic forms:
      
      * locking:     lock_prepare + spin_on_lock
      * try-locking: trylock_prepare + try_lock, trylock_fail if busy
      * unlocking:   unlock + lock_finish
      
      hard_spin_lock_prepare() and hard_spin_unlock_finish() are such
      helpers. An empty implementation is provided by
      include/dovetail/spinlock.h, which the core may override.
      Signed-off-by: Philippe Gerum's avatarPhilippe Gerum <rpm@xenomai.org>
      dad24efb
    • Philippe Gerum's avatar
      evl/lock: add preemption tracking · ba943b38
      Philippe Gerum authored
      
      
      An EVL lock is now distinct from a hard lock in that it tracks and
      disables preemption in the core when held.
      
      Such spinlock may be useful when only EVL threads running out-of-band
      can contend for the lock, to the exclusion of out-of-band IRQ
      handlers. In this case, disabling preemption before attempting to grab
      the lock may be substituted to disabling hard irqs.
      
      There are gotchas when using such type of lock from the in-band
      context, see comments in evl/lock.h.
      Signed-off-by: Philippe Gerum's avatarPhilippe Gerum <rpm@xenomai.org>
      ba943b38
    • Philippe Gerum's avatar
      evl/poll: convert waiter list lock to hard lock · 3ebde572
      Philippe Gerum authored
      
      
      Very short sections of code outside of any hot path are protected by
      such lock. Therefore we would not generally benefit from the
      preemption disabling feature we are going to add to the EVL-specific
      spinlock. Make it a hard lock to clarify the intent.
      Signed-off-by: Philippe Gerum's avatarPhilippe Gerum <rpm@xenomai.org>
      3ebde572
    • Philippe Gerum's avatar
      evl/monitor: convert gate lock to hard lock · c58e6982
      Philippe Gerum authored
      
      
      For the most part, the gate lock is nested with a wait queue hard lock
      - which requires hard irqs to be off - to access the protected
      sections. Therefore we would not benefit in the common case from the
      preemption disabling feature we are going to add to the EVL-specific
      spinlock. Make it a hard lock to clarify the intent.
      Signed-off-by: Philippe Gerum's avatarPhilippe Gerum <rpm@xenomai.org>
      c58e6982
    • Philippe Gerum's avatar
      evl/mutex: convert mutex lock to hard lock · 3a810978
      Philippe Gerum authored
      
      
      For the most part, a thread hard lock - which requires hard irqs to be
      off - is nested with the mutex lock to access the protected
      sections. Therefore we would not benefit in the common case from the
      preemption disabling feature we are going to add to the EVL-specific
      spinlock. Make it a hard lock to clarify the intent.
      Signed-off-by: Philippe Gerum's avatarPhilippe Gerum <rpm@xenomai.org>
      3a810978
    • Philippe Gerum's avatar
      evl/thread: detect sleeping call with preemption disabled · b3e50a80
      Philippe Gerum authored
      
      
      Sleeping voluntarily with EVL preemption disabled is a bug. Add the
      proper assertion to detect this.
      Signed-off-by: Philippe Gerum's avatarPhilippe Gerum <rpm@xenomai.org>
      b3e50a80
    • Philippe Gerum's avatar
      drivers/evl: hectic: stop disabling preemption manually · 5c03353e
      Philippe Gerum authored
      
      
      Given the semantics of an evl_flag, disabling preemption manually
      around the evl_raise_flag(to_flag) -> evl_wait_flag(from_flag)
      sequence does not make sense.
      Signed-off-by: Philippe Gerum's avatarPhilippe Gerum <rpm@xenomai.org>
      5c03353e
    • Philippe Gerum's avatar
      evl/observable: convert observable, subscriber locks to hard locks · dc1edef8
      Philippe Gerum authored
      
      
      The subscriber lock is shared between both execution stages, but
      accessed from the in-band stage for the most part, which implies
      disabling hard irqs while holding it. Meanwhile, out-of-band IRQs and
      EVL threads may compete for the observable lock, which would require
      hard irqs to be disabled while holding it.  Therefore we would not
      generally benefit from the preemption disabling feature we are going
      to add to the EVL-specific spinlock in any case. Make these hard locks
      to clarify the intent.
      Signed-off-by: Philippe Gerum's avatarPhilippe Gerum <rpm@xenomai.org>
      dc1edef8