- 04 Jun, 2021 8 commits
-
-
When running on top of Dovetail, we may want to perform a specific sequence of actions upon notification of a latency peak by some application. Let's create a dedicated trace call in the API to handle this particular case, instead of relying on the generic "user_freeze" event. NOTE: the new trace call does not return any value, because there is nothing valuable to receive from a trace call. If required, xntrace_enabled() should be tested beforehand to figure out whether the kernel tracer is enabled. Signed-off-by:
Philippe Gerum <rpm@xenomai.org> Signed-off-by:
Jan Kiszka <jan.kiszka@siemens.com>
-
Signed-off-by:
Philippe Gerum <rpm@xenomai.org> [Jan: add missing struct xnsched_quota_group declaration, style fixes] Signed-off-by:
Jan Kiszka <jan.kiszka@siemens.com>
-
Jan Kiszka authored
Started to bite us on newer kernels / with dovetail. Signed-off-by:
Jan Kiszka <jan.kiszka@siemens.com>
-
Signed-off-by:
Philippe Gerum <rpm@xenomai.org> Signed-off-by:
Jan Kiszka <jan.kiszka@siemens.com>
-
With Dovetail in, a thread control block is fully generic, merely composed of an alternate scheduling control descriptor. Refactor the definitions of the per-architecture control blocks accordingly. Signed-off-by:
Philippe Gerum <rpm@xenomai.org> Signed-off-by:
Jan Kiszka <jan.kiszka@siemens.com>
-
Dovetail invokes arch_inband_task_init() for each emerging thread in the system, so that we may initialize our extended state with default settings. Make sure to intercept this hook in order to start from a fresh and clean state. Signed-off-by:
Philippe Gerum <rpm@xenomai.org> Signed-off-by:
Jan Kiszka <jan.kiszka@siemens.com>
-
Jan Kiszka authored
Only detected by the compiler when formatted this way. Signed-off-by:
Jan Kiszka <jan.kiszka@siemens.com>
-
Jan Kiszka authored
This service uses CLOCK_REALTIME, not CLOCK_MONOTONIC. Fixes false positives regarding early timeouts. Signed-off-by:
Jan Kiszka <jan.kiszka@siemens.com>
-
- 25 May, 2021 1 commit
-
-
The offset field we receive from the kernel in a vfile next() handler must progress in order for the loop to stop properly, independently from our own tracking of the end-of-list condition. Bug is reproducible by running two loops in parallel: - one continuously spawning an application which creates a few tenths of threads (10-20 would suffice) before exiting shortly after. - another one continuously reading from /proc/xenomai/sched/{threads, stat, acct}. At some point, the vfile handler should cause a kernel crash. Signed-off-by:
Philippe Gerum <rpm@xenomai.org> Signed-off-by:
Jan Kiszka <jan.kiszka@siemens.com>
-
- 20 May, 2021 25 commits
-
-
Dovetail enables out-of-band access to the vDSO-based clock_gettime() vcall from applications. If present, select this method instead of relying on the hardware tick counter for CLOCK_MONOTONIC, CLOCK_MONOTONIC_RAW, CLOCK_REALTIME and CLOCK_HOST_REALTIME. At binding time, receiving a null hardware clock frequency from the core means that we should obtain timestamps directly from the vDSO-based clock_gettime() vcall (see cobalt_use_legacy_tsc()). In this mode, Cobalt shares the in-band kernel's idea of time for all common clocks such as CLOCK_MONOTONIC* and CLOCK_REALTIME. As a result, CLOCK_HOST_REALTIME refers to the common CLOCK_REALTIME clock. Furthermore, libcobalt's clock_settime(CLOCK_REALTIME) is delegated to the underlying *libc, which means the caller may switch to secondary mode. Signed-off-by:
Philippe Gerum <rpm@xenomai.org> Signed-off-by:
Jan Kiszka <jan.kiszka@siemens.com>
-
Signed-off-by:
Philippe Gerum <rpm@xenomai.org> Signed-off-by:
Jan Kiszka <jan.kiszka@siemens.com>
-
When the core runs on top of Dovetail, all time values are represented as counts of nanoseconds, in which case a Cobalt tick equals a nanosecond. Introduce inline wrappers for tick-to/from-ns conversion which are nops in the latter case. Cobalt passes us a null clock frequency at binding time (__cobalt_tsc_clockfreq) when conversion is not needed; otherwise, the frequency is used in scaled maths for converting timestamps between their hardware tick and nanosec representation. Signed-off-by:
Philippe Gerum <rpm@xenomai.org> Signed-off-by:
Jan Kiszka <jan.kiszka@siemens.com>
-
As we move away from the representation of time based on hardware clock ticks, keeping cobalt_read_hrclock() makes no sense anymore. This was an internal, undocumented service returning the hardware TSC value for the platform. The log of commit #d584a57d which introduced it clearly stated that applications should stick with the common representation used by clock_gettime(), i.e. nanosecs. Signed-off-by:
Philippe Gerum <rpm@xenomai.org> Signed-off-by:
Jan Kiszka <jan.kiszka@siemens.com>
-
Signed-off-by:
Hongzhan Chen <hongzhan.chen@intel.com> Signed-off-by:
Jan Kiszka <jan.kiszka@siemens.com>
-
Since we are dealing with pipeline specific code, we may flatten the call stack by using the Dovetail API directly. Signed-off-by:
Philippe Gerum <rpm@xenomai.org> Signed-off-by:
Jan Kiszka <jan.kiszka@siemens.com>
-
Force the next tick to be programmed in the hardware as a result of leaving the ONESHOT_STOPPED Signed-off-by:
Hongzhan Chen <hongzhan.chen@intel.com> Signed-off-by:
Jan Kiszka <jan.kiszka@siemens.com>
-
It adds a way to force the timer management code to reprogram the hardware on option, to make the real device controlled by the proxy tick again as it leaves the ONESHOT_STOPPED mode. The I-pipe does not require any further action in this case, leading to a nop. Signed-off-by:
Hongzhan Chen <hongzhan.chen@intel.com> Signed-off-by:
Jan Kiszka <jan.kiszka@siemens.com>
-
Get the name of real device controlled by the proxy tick device. Signed-off-by:
Hongzhan Chen <hongzhan.chen@intel.com> Signed-off-by:
Jan Kiszka <jan.kiszka@siemens.com>
-
Signed-off-by:
Hongzhan Chen <hongzhan.chen@intel.com> [Philippe: clarify some variable names] Signed-off-by:
Philippe Gerum <rpm@xenomai.org> Signed-off-by:
Jan Kiszka <jan.kiszka@siemens.com>
-
Signed-off-by:
Hongzhan Chen <hongzhan.chen@intel.com> [Philippe: protect xntimer_start with nklock] Signed-off-by:
Philippe Gerum <rpm@xenomai.org> Signed-off-by:
Jan Kiszka <jan.kiszka@siemens.com>
-
Signed-off-by:
Philippe Gerum <rpm@xenomai.org> Signed-off-by:
Jan Kiszka <jan.kiszka@siemens.com>
-
Signed-off-by:
Hongzhan Chen <hongzhan.chen@intel.com> Signed-off-by:
Jan Kiszka <jan.kiszka@siemens.com>
-
Signed-off-by:
Hongzhan Chen <hongzhan.chen@intel.com> Signed-off-by:
Jan Kiszka <jan.kiszka@siemens.com>
-
Signed-off-by:
Hongzhan Chen <hongzhan.chen@intel.com> Signed-off-by:
Jan Kiszka <jan.kiszka@siemens.com>
-
inband sirq request through synthetic_irq_domain and free and post srq. Signed-off-by:
Hongzhan Chen <hongzhan.chen@intel.com> Signed-off-by:
Jan Kiszka <jan.kiszka@siemens.com>
-
Signed-off-by:
Philippe Gerum <rpm@xenomai.org> [Jan: style fixes, dropped/linked shared files] Signed-off-by:
Jan Kiszka <jan.kiszka@siemens.com>
-
Jan Kiszka authored
Those are not affected by pipeline differences. Signed-off-by:
Jan Kiszka <jan.kiszka@siemens.com>
-
Jan Kiszka authored
Signed-off-by:
Jan Kiszka <jan.kiszka@siemens.com>
-
Jan Kiszka authored
Signed-off-by:
Jan Kiszka <jan.kiszka@siemens.com>
-
enable back tracing for handle_oob_trap_entry Signed-off-by:
Hongzhan Chen <hongzhan.chen@intel.com> Signed-off-by:
Jan Kiszka <jan.kiszka@siemens.com>
-
implement oob irq request and free and post for both TIMER_OOB_IPI and RESCHEDULE_OOB_IPI Signed-off-by:
Hongzhan Chen <hongzhan.chen@intel.com> Signed-off-by:
Jan Kiszka <jan.kiszka@siemens.com>
-
We are using regular request/free_irq under dovetail. This also means there is no extra task to be done in the interrupt enable/disable services. The affinity hint set during request needs to be cleared before freeing the IRQ, or Linux will complain. Signed-off-by:
Philippe Gerum <rpm@xenomai.org> [Jan: clear affinity hint on free, drop explicit enable/disable_irq] Signed-off-by:
Jan Kiszka <jan.kiszka@siemens.com>
-
A process is now marked for COW-breaking on fork() upon the first call to dovetail_init_altsched(), and must ensure its memory is locked via a call to mlockall(MCL_CURRENT|MCL_FUTURE) as usual. As a result, force_commit_memory() became pointless and was removed from the Dovetail interface. Signed-off-by:
Philippe Gerum <rpm@xenomai.org> Signed-off-by:
Jan Kiszka <jan.kiszka@siemens.com>
-
This symbol is now I-pipe specific, stick to the I-pipe nomenclature when referring to the high priority execution domain. Signed-off-by:
Philippe Gerum <rpm@xenomai.org> Signed-off-by:
Jan Kiszka <jan.kiszka@siemens.com>
-
- 12 May, 2021 6 commits
-
-
The legacy x86_32 architecture is on its way out, with no support from Dovetail. Besides, it went untested with I-pipe configurations for many moons. We still keep 32bit compat mode available for building the user-space libraries and executables though, along with IA32_EMULATION support in kernel space to cope with legacy applications. Signed-off-by:
Philippe Gerum <rpm@xenomai.org>
-
After raising the topic of (dis)continuing support for the x32 ABI multiple times on the mailing list, it turned out that Xenomai has no known users of this dying ABI. So let's remove it. Signed-off-by:
Philippe Gerum <rpm@xenomai.org>
-
Signed-off-by:
Philippe Gerum <rpm@xenomai.org>
-
Signed-off-by:
Philippe Gerum <rpm@xenomai.org>
-
set_fs() is on its way out, so we cannot open code a file read operation by calling the VFS handler directly anymore, faking a user address space. We do have kernel interfaces for loading files though, particularly kernel_read_file(). So let's use that one for loading the configuration file contents. Unfortunately, the signature of this service changed during the 5.9-rc cycle, so we have to resort to an ugly wrapper to cope with all supported kernels once again. Sigh. Signed-off-by:
Philippe Gerum <rpm@xenomai.org>
-
Since v5.9-rc1, csum_partial_copy_nocheck() forces a zero seed as its last argument to csum_partial(). According to #cc44c17baf7f3, passing a non-zero value would not even yield the proper result on some architectures. However, other locations still expect a non-zero csum seed to be used in the next computation. Meanwhile, some benchmarking (*) revealed that folding copy and checksum operations may not be as optimal as one would have thought when the caches are under pressure, so we switch to a split version, first memcpy() then csum_partial(), so as to always benefit from memcpy() optimizations. As a bonus, we don't have to wrap calls to csum_partial_copy_nocheck() to follow the kernel API change. Instead we can provide a single implementation based on csum_partial() which works with any kernel version. (*) Below are benchmark figures of the csum_copy (folded) vs csum+copy (split) performances in idle vs busy scenarios. Busy means hackbench+dd loop streaming 128M in the background from zero -> null, in order to badly trash the D-caches while the test runs. Three different packet sizes are submitted to checksumming (32, 1024, 1500 bytes), all figures in nanosecs. iMX6QP (Cortex A9) ------------------ === idle CSUM_COPY 32b: min=333, max=1333, avg=439 CSUM_COPY 1024b: min=1000, max=2000, avg=1045 CSUM_COPY 1500b: min=1333, max=2000, avg=1333 COPY+CSUM 32b: min=333, max=1333, avg=443 COPY+CSUM 1024b: min=1000, max=2334, avg=1345 COPY+CSUM 1500b: min=1666, max=2667, avg=1737 === busy CSUM_COPY 32b: min=333, max=4333, avg=466 CSUM_COPY 1024b: min=1000, max=5000, avg=1088 CSUM_COPY 1500b: min=1333, max=5667, avg=1393 COPY+CSUM 32b: min=333, max=1334, avg=454 COPY+CSUM 1024b: min=1000, max=2000, avg=1341 COPY+CSUM 1500b: min=1666, max=2666, avg=1745 C4 (Cortex A55) --------------- === idle CSUM_COPY 32b: min=125, max=791, avg=130 CSUM_COPY 1024b: min=541, max=834, avg=550 CSUM_COPY 1500b: min=708, max=1875, avg=740 COPY+CSUM 32b: min=125, max=167, avg=133 COPY+CSUM 1024b: min=541, max=625, avg=553 COPY+CSUM 1500b: min=708, max=750, avg=730 === busy CSUM_COPY 32b: min=125, max=792, avg=133 CSUM_COPY 1024b: min=500, max=2000, avg=552 CSUM_COPY 1500b: min=708, max=1542, avg=744 COPY+CSUM 32b: min=125, max=375, avg=133 COPY+CSUM 1024b: min=500, max=709, avg=553 COPY+CSUM 1500b: min=708, max=916, avg=743 x86 (atom x5) ------------- === idle CSUM_COPY 32b: min=67, max=590, avg=70 CSUM_COPY 1024b: min=245, max=385, avg=251 CSUM_COPY 1500b: min=343, max=521, avg=350 COPY+CSUM 32b: min=101, max=679, avg=117 COPY+CSUM 1024b: min=296, max=379, avg=298 COPY+CSUM 1500b: min=399, max=502, avg=404 == busy CSUM_COPY 32b: min=65, max=709, avg=71 CSUM_COPY 1024b: min=243, max=702, avg=252 CSUM_COPY 1500b: min=340, max=1055, avg=351 COPY+CSUM 32b: min=100, max=665, avg=120 COPY+CSUM 1024b: min=295, max=669, avg=298 COPY+CSUM 1500b: min=399, max=686, avg=403 arm64 which has no folded csum_copy implementation makes the best of using the split copy+csum path. All architectures seem to benefit from optimized memcpy under load when it comes to the worst case execution time. x86 is less prone to jittery under cache trashing than others as usual, but even there, the max. figures for csum+copy in busy context look pretty much on par with the csum_copy version. Therefore, converting all users to csum+copy makes sense. Signed-off-by:
Philippe Gerum <rpm@xenomai.org>
-