Skip to content
GitLab
  • Explore
  • Sign in
  • Register
  • xenomaixenomai
  • xenomaixenomai
  • Wiki
  • Finding_Spurious_Relaxes

Finding_Spurious_Relaxes · Changes

Page history
merge original static contents from xenomai.org authored Feb 21, 2018 by Philippe Gerum's avatar Philippe Gerum
Page slugs have been fixed up from Wordpress to Wiki specs -- a few
issues may remain though.
Hide whitespace changes
Inline Side-by-side
Finding_Spurious_Relaxes.asciidoc 0 → 100644
View page @ 0ef1a71a
Finding spurious relaxes
========================
:author: Philippe Gerum
:email: rpm@xenomai.org
:categories: Application
:tags: troubleshooting
Sources of spurious relaxes
---------------------------
In a dual kernel configuration, an application does not want to switch
to the regular Linux execution mode unexpectedly, aka _secondary
mode_. When short and bounded response time is a strong requirement,
the application wants to run in _primary mode_ exclusively, because
only this mode is deemed to deliver real-time capabilities.
There are several reasons for an application to be moved out of
real-time mode automatically by the Xenomai co-kernel. This happens
when the application is:
- invoking a regular Linux syscall triggering a processor fault
- receiving a Linux-originated signal (i.e. *not* a Xenomai-originated
- one), possibly sent by the Xenomai kernel upon detecting a
- consistency issue with the application
At any rate, we want to know *where* in the application code the
signal was received, which will be a strong hint to find out *why* we
switched away from real-time mode.
[TIP]
The reason Xenomai has to move the application out of real-time mode
upon such events stems from the way a dual kernel system works. As
the Cobalt kernel may preempt the regular Linux kernel at any time in
the course of its execution for dealing with a high priority event,
including when the latter assumes to be traversing some atomic
section, special care has to be taken for re-entering a valid Linux
context before a regular kernel service can be delivered.
Manually debugging spurious relaxes
-----------------------------------
Xenomai provides a mechanism for sending a misbehaving application a
regular Linux signal immediately when it switches to _secondary mode_
unexpectedly. When run over GDB, the backtrace of the current thread
receiving the signal can be examined. This feature is available with
both Xenomai 2 and Xenomai 3.
Each time a spurious migration to _secondary mode_ happens, the thread
is sent a SIGDEBUG signal, along with additional information in the
+siginfo_t+ data. First of all, the application should trap such
signal to a handler, like this:
-------------------------------------------------------------------------------
#include <signal.h>
struct sigaction sa;
sigemptyset(&sa.sa_mask);
sa.sa_sigaction = sigdebug_handler;
sa.sa_flags = SA_SIGINFO;
sigaction(SIGDEBUG, &sa, NULL);
-------------------------------------------------------------------------------
The +sigdebug_handler()+ routine would have to retrieve and decode the
current backtrace, writing out the cause of the SIGDEBUG notification
and hopefully a meaningful call stack:
-------------------------------------------------------------------------------
#include <sys/types.h>
#include <pthread.h>
#include <signal.h>
#include <unistd.h>
#include <execinfo.h>
static const char *sigdebug_reasons[] = {
[SIGDEBUG_UNDEFINED] = "latency: received SIGXCPU for unknown reason",
[SIGDEBUG_MIGRATE_SIGNAL] = "received signal",
[SIGDEBUG_MIGRATE_SYSCALL] = "invoked syscall",
[SIGDEBUG_MIGRATE_FAULT] = "triggered fault",
[SIGDEBUG_MIGRATE_PRIOINV] = "affected by priority inversion",
[SIGDEBUG_NOMLOCK] = "Xenomai: process memory not locked "
"(missing mlockall?)",
[SIGDEBUG_WATCHDOG] = "Xenomai: watchdog triggered "
"(period too short?)",
};
void sigdebug_handler(int sig, siginfo_t *si, void *context)
{
const char fmt[] = "Mode switch (reason: %s), aborting. Backtrace:\n";
unsigned int reason = sigdebug_reason(si);
static char buffer[256];
static void *bt[200];
unsigned int n;
if (reason > SIGDEBUG_WATCHDOG)
reason = SIGDEBUG_UNDEFINED;
switch(reason) {
case SIGDEBUG_UNDEFINED:
case SIGDEBUG_NOMLOCK:
case SIGDEBUG_WATCHDOG:
/* These errors are lethal, something went really wrong. */
n = snprintf(buffer, sizeof(buffer),
"%s\n", sigdebug_reasons[reason]);
write(STDERR_FILENO, buffer, n);
exit(EXIT_FAILURE);
}
/* Retrieve the current backtrace, and decode it to stdout. */
n = snprintf(buffer, sizeof(buffer), fmt, sigdebug_reason[reason]);
n = write(STDERR_FILENO, buffer, n);
n = backtrace(bt, sizeof(bt)/sizeof(bt[0]));
backtrace_symbols_fd(bt, n, STDERR_FILENO);
signal(sig, SIG_DFL);
kill(getpid(), sig);
}
-------------------------------------------------------------------------------
Then, you need to enable the _warn_upon_switch_ capability on a
per-thread basis. For instance, a POSIX-based application would run
this code from the thread to monitor for spurious relaxes:
-------------------------------------------------------------------------------
/* Ask Xenomai to warn us upon switches to secondary mode. */
pthread_set_mode_np(0, PTHREAD_WARNSW);
-------------------------------------------------------------------------------
Alternatively, an application based on the _native_ API would run this
code instead:
-------------------------------------------------------------------------------
/* Ask Xenomai to warn us upon switches to secondary mode. */
rt_task_set_mode(0, T_WARNSW, NULL);
-------------------------------------------------------------------------------
Debugging spurious relaxes with Xenomai 3 and *slackspot*
----------------------------------------------------------
Xenomai 3 introduces a new utility called *slackspot*, for pinpointing
code locations which caused transitions to _secondary mode_.
*slackspot* combines dedicated kernel support exporting trace
information about spurious relaxes through the
+/proc/xenomai/debug/relax+ virtual file, and a userland utility
called +sbin/slackspot+, which parses the virtual file output to
display the program backtrace, so that we can locate the offending
code easily.
The motivation to centralize tracing information about relaxes
directly into kernel space is fourfold:
- it allows to gather all the trace data into a single location and
keep it safe there, with no external log file involved.
- enabling the tracing does not impose any requirement on the
application (aside of being compiled with debug symbols for best
interpreting that information). We only need a kernel config switch
for this (i.e. CONFIG_XENO_OPT_DEBUG_TRACE_RELAX).
- the data is collected and can be made available exactly the same
way regardless of the application emitting the relax requests, or
whether it is still alive when the trace data are displayed.
- the kernel is able to provide accurate and detailed trace
information, such as the relative offset of instructions causing relax
requests within dynamic shared objects, without having to guess it
roughly from /proc/pid/maps, or relying on ldd's --function-relocs
feature, which both require to run on the target system to get the
needed information. Instead, we allow a build host to use a
cross-compilation toolchain later to extract the source location, from
the raw data the kernel has provided on the target system.
A typical debug session with *slackspot*
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Here is a typical debugging session, involving a target board running
the Cobalt-based application, and a host system, providing the
cross-compilation toolchain used to build that application. We use the
netcat utility to pull the trace data over the wire from the target,
and process it locally on our debug host:
---------------------------------------------------------------------------
target> netcat -l -p <port> -c < /proc/xenomai/debug/relax
host> nc <target-ip> <port> | CROSS_COMPILE=ppc_6xx- ./slackspot --path=/opt/rootfs/MPC5200/lib:$HOME/frags/relax --filter thread=Task*
Thread[828] "Task 2" started by /relax:
#0 0xfff00000 ???
#1 0x000001bb ???
#2 0x00064393 _IO_file_doallocate() in ??:?
#3 0x00073d6f _IO_doallocbuf() in ??:?
#4 0x00072d87 _IO_file_overflow() in ??:?
#5 0x00075f83 __overflow() in ??:?
#6 0x0006997b putchar() in ??:?
#7 0x100017db task2_func() in /home/rpm/frags/relax/relax.c:23
#8 0x000078d7 task_entry() in /home/rpm/git/xenomai-forge/lib/alchemy/task.c:235
#9 0x00005a6b start_thread() in pthread_create.c:?
#10 0x000d389f __clone() in ??:?
Thread[828] "Task 2" started by /relax (4 times):
#0 0x000c443f write() in ??:?
#1 0x00072553 _IO_file_write() in ??:?
#2 0x000721cf _IO_file_seek() in ??:?
#3 0x000724c7 _IO_do_write() in ??:?
#4 0x00072c2f _IO_file_sync() in ??:?
#5 0x00064a4f _IO_fflush() in ??:?
#6 0x100017eb task2_func() in /home/rpm/frags/relax/relax.c:24
#7 0x000078d7 task_entry() in /home/rpm/git/xenomai-forge/lib/alchemy/task.c:235
#8 0x00005a6b start_thread() in pthread_create.c:?
#9 0x000d389f __clone() in ??:?
---------------------------------------------------------------------------
The *slackspot* output here shows that something wrong is happening at
lines 23 and 24 from the example file, as the application calls
buffered standard I/O routines from _primary mode_, which in turn
invoke regular kernel services, causing the thread to relax and leave
the real-time mode.
Clone repository
  • Analogy_General_Presentation
  • Analogy_Practical_Presentation
  • App_Setup_And_Init
  • Benchmarking_With_Xeno_Test
  • Building_Applications_For_Xenomai_3
  • Building_Debian_Packages
  • CXP_RTDM
  • Common_Xenomai_Platform
  • Configuring_For_X86_Based_Dual_Kernels
  • Dealing_With_X86_IRQ_Sharing
  • Dealing_With_X86_SMI_Troubles
  • Dovetail
  • Driver_Serial_16550A
  • FAQ
  • Finding_Spurious_Relaxes
View All Pages

Imprint & Privacy Policy