Commit 534c97b0 authored by Linus Torvalds's avatar Linus Torvalds
Browse files

Merge branch 'timers-nohz-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull 'full dynticks' support from Ingo Molnar:
 "This tree from Frederic Weisbecker adds a new, (exciting! :-) core
  kernel feature to the timer and scheduler subsystems: 'full dynticks',
  or CONFIG_NO_HZ_FULL=y.

  This feature extends the nohz variable-size timer tick feature from
  idle to busy CPUs (running at most one task) as well, potentially
  reducing the number of timer interrupts significantly.

  This feature got motivated by real-time folks and the -rt tree, but
  the general utility and motivation of full-dynticks runs wider than
  that:

   - HPC workloads get faster: CPUs running a single task should be able
     to utilize a maximum amount of CPU power.  A periodic timer tick at
     HZ=1000 can cause a constant overhead of up to 1.0%.  This feature
     removes that overhead - and speeds up the system by 0.5%-1.0% on
     typical distro configs even on modern systems.

   - Real-time workload latency reduction: CPUs running critical tasks
     should experience as little jitter as possible.  The last remaining
     source of kernel-related jitter was the periodic timer tick.

   - A single task executing on a CPU is a pretty common situation,
     especially with an increasing number of cores/CPUs, so this feature
     helps desktop and mobile workloads as well.

  The cost of the feature is mainly related to increased timer
  reprogramming overhead when a CPU switches its tick period, and thus
  slightly longer to-idle and from-idle latency.

  Configuration-wise a third mode of operation is added to the existing
  two NOHZ kconfig modes:

   - CONFIG_HZ_PERIODIC: [formerly !CONFIG_NO_HZ], now explicitly named
     as a config option.  This is the traditional Linux periodic tick
     design: there's a HZ tick going on all the time, regardless of
     whether a CPU is idle or not.

   - CONFIG_NO_HZ_IDLE: [formerly CONFIG_NO_HZ=y], this turns off the
     periodic tick when a CPU enters idle mode.

   - CONFIG_NO_HZ_FULL: this new mode, in addition to turning off the
     tick when a CPU is idle, also slows the tick down to 1 Hz (one
     timer interrupt per second) when only a single task is running on a
     CPU.

  The .config behavior is compatible: existing !CONFIG_NO_HZ and
  CONFIG_NO_HZ=y settings get translated to the new values, without the
  user having to configure anything.  CONFIG_NO_HZ_FULL is turned off by
  default.

  This feature is based on a lot of infrastructure work that has been
  steadily going upstream in the last 2-3 cycles: related RCU support
  and non-periodic cputime support in particular is upstream already.

  This tree adds the final pieces and activates the feature.  The pull
  request is marked RFC because:

   - it's marked 64-bit only at the moment - the 32-bit support patch is
     small but did not get ready in time.

   - it has a number of fresh commits that came in after the merge
     window.  The overwhelming majority of commits are from before the
     merge window, but still some aspects of the tree are fresh and so I
     marked it RFC.

   - it's a pretty wide-reaching feature with lots of effects - and
     while the components have been in testing for some time, the full
     combination is still not very widely used.  That it's default-off
     should reduce its regression abilities and obviously there are no
     known regressions with CONFIG_NO_HZ_FULL=y enabled either.

   - the feature is not completely idempotent: there is no 100%
     equivalent replacement for a periodic scheduler/timer tick.  In
     particular there's ongoing work to map out and reduce its effects
     on scheduler load-balancing and statistics.  This should not impact
     correctness though, there are no known regressions related to this
     feature at this point.

   - it's a pretty ambitious feature that with time will likely be
     enabled by most Linux distros, and we'd like you to make input on
     its design/implementation, if you dislike some aspect we missed.
     Without flaming us to crisp! :-)

  Future plans:

   - there's ongoing work to reduce 1Hz to 0Hz, to essentially shut off
     the periodic tick altogether when there's a single busy task on a
     CPU.  We'd first like 1 Hz to be exposed more widely before we go
     for the 0 Hz target though.

   - once we reach 0 Hz we can remove the periodic tick assumption from
     nr_running>=2 as well, by essentially interrupting busy tasks only
     as frequently as the sched_latency constraints require us to do -
     once every 4-40 msecs, depending on nr_running.

  I am personally leaning towards biting the bullet and doing this in
  v3.10, like the -rt tree this effort has been going on for too long -
  but the final word is up to you as usual.

  More technical details can be found in Documentation/timers/NO_HZ.txt"

* 'timers-nohz-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (39 commits)
  sched: Keep at least 1 tick per second for active dynticks tasks
  rcu: Fix full dynticks' dependency on wide RCU nocb mode
  nohz: Protect smp_processor_id() in tick_nohz_task_switch()
  nohz_full: Add documentation.
  cputime_nsecs: use math64.h for nsec resolution conversion helpers
  nohz: Select VIRT_CPU_ACCOUNTING_GEN from full dynticks config
  nohz: Reduce overhead under high-freq idling patterns
  nohz: Remove full dynticks' superfluous dependency on RCU tree
  nohz: Fix unavailable tick_stop tracepoint in dynticks idle
  nohz: Add basic tracing
  nohz: Select wide RCU nocb for full dynticks
  nohz: Disable the tick when irq resume in full dynticks CPU
  nohz: Re-evaluate the tick for the new task after a context switch
  nohz: Prepare to stop the tick on irq exit
  nohz: Implement full dynticks kick
  nohz: Re-evaluate the tick from the scheduler IPI
  sched: New helper to prevent from stopping the tick in full dynticks
  sched: Kick full dynticks CPU that have more than one task enqueued.
  perf: New helper to prevent full dynticks CPUs from stopping tick
  perf: Kick full dynticks CPU if events rotation is needed
  ...
parents 64049d19 265f22a9
......@@ -191,7 +191,7 @@ o A CPU-bound real-time task in a CONFIG_PREEMPT_RT kernel that
o A hardware or software issue shuts off the scheduler-clock
interrupt on a CPU that is not in dyntick-idle mode. This
problem really has happened, and seems to be most likely to
result in RCU CPU stall warnings for CONFIG_NO_HZ=n kernels.
result in RCU CPU stall warnings for CONFIG_NO_HZ_COMMON=n kernels.
o A bug in the RCU implementation.
......
......@@ -131,8 +131,8 @@ sampling_rate_min:
The sampling rate is limited by the HW transition latency:
transition_latency * 100
Or by kernel restrictions:
If CONFIG_NO_HZ is set, the limit is 10ms fixed.
If CONFIG_NO_HZ is not set or nohz=off boot parameter is used, the
If CONFIG_NO_HZ_COMMON is set, the limit is 10ms fixed.
If CONFIG_NO_HZ_COMMON is not set or nohz=off boot parameter is used, the
limits depend on the CONFIG_HZ option:
HZ=1000: min=20000us (20ms)
HZ=250: min=80000us (80ms)
......
......@@ -1964,6 +1964,14 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
Valid arguments: on, off
Default: on
nohz_full= [KNL,BOOT]
In kernels built with CONFIG_NO_HZ_FULL=y, set
the specified list of CPUs whose tick will be stopped
whenever possible. The boot CPU will be forced outside
the range to maintain the timekeeping.
The CPUs in this range must also be included in the
rcu_nocbs= set.
noiotrap [SH] Disables trapped I/O port accesses.
noirqdebug [X86-32] Disables the code which attempts to detect and
......
NO_HZ: Reducing Scheduling-Clock Ticks
This document describes Kconfig options and boot parameters that can
reduce the number of scheduling-clock interrupts, thereby improving energy
efficiency and reducing OS jitter. Reducing OS jitter is important for
some types of computationally intensive high-performance computing (HPC)
applications and for real-time applications.
There are two main contexts in which the number of scheduling-clock
interrupts can be reduced compared to the old-school approach of sending
a scheduling-clock interrupt to all CPUs every jiffy whether they need
it or not (CONFIG_HZ_PERIODIC=y or CONFIG_NO_HZ=n for older kernels):
1. Idle CPUs (CONFIG_NO_HZ_IDLE=y or CONFIG_NO_HZ=y for older kernels).
2. CPUs having only one runnable task (CONFIG_NO_HZ_FULL=y).
These two cases are described in the following two sections, followed
by a third section on RCU-specific considerations and a fourth and final
section listing known issues.
IDLE CPUs
If a CPU is idle, there is little point in sending it a scheduling-clock
interrupt. After all, the primary purpose of a scheduling-clock interrupt
is to force a busy CPU to shift its attention among multiple duties,
and an idle CPU has no duties to shift its attention among.
The CONFIG_NO_HZ_IDLE=y Kconfig option causes the kernel to avoid sending
scheduling-clock interrupts to idle CPUs, which is critically important
both to battery-powered devices and to highly virtualized mainframes.
A battery-powered device running a CONFIG_HZ_PERIODIC=y kernel would
drain its battery very quickly, easily 2-3 times as fast as would the
same device running a CONFIG_NO_HZ_IDLE=y kernel. A mainframe running
1,500 OS instances might find that half of its CPU time was consumed by
unnecessary scheduling-clock interrupts. In these situations, there
is strong motivation to avoid sending scheduling-clock interrupts to
idle CPUs. That said, dyntick-idle mode is not free:
1. It increases the number of instructions executed on the path
to and from the idle loop.
2. On many architectures, dyntick-idle mode also increases the
number of expensive clock-reprogramming operations.
Therefore, systems with aggressive real-time response constraints often
run CONFIG_HZ_PERIODIC=y kernels (or CONFIG_NO_HZ=n for older kernels)
in order to avoid degrading from-idle transition latencies.
An idle CPU that is not receiving scheduling-clock interrupts is said to
be "dyntick-idle", "in dyntick-idle mode", "in nohz mode", or "running
tickless". The remainder of this document will use "dyntick-idle mode".
There is also a boot parameter "nohz=" that can be used to disable
dyntick-idle mode in CONFIG_NO_HZ_IDLE=y kernels by specifying "nohz=off".
By default, CONFIG_NO_HZ_IDLE=y kernels boot with "nohz=on", enabling
dyntick-idle mode.
CPUs WITH ONLY ONE RUNNABLE TASK
If a CPU has only one runnable task, there is little point in sending it
a scheduling-clock interrupt because there is no other task to switch to.
The CONFIG_NO_HZ_FULL=y Kconfig option causes the kernel to avoid
sending scheduling-clock interrupts to CPUs with a single runnable task,
and such CPUs are said to be "adaptive-ticks CPUs". This is important
for applications with aggressive real-time response constraints because
it allows them to improve their worst-case response times by the maximum
duration of a scheduling-clock interrupt. It is also important for
computationally intensive short-iteration workloads: If any CPU is
delayed during a given iteration, all the other CPUs will be forced to
wait idle while the delayed CPU finishes. Thus, the delay is multiplied
by one less than the number of CPUs. In these situations, there is
again strong motivation to avoid sending scheduling-clock interrupts.
By default, no CPU will be an adaptive-ticks CPU. The "nohz_full="
boot parameter specifies the adaptive-ticks CPUs. For example,
"nohz_full=1,6-8" says that CPUs 1, 6, 7, and 8 are to be adaptive-ticks
CPUs. Note that you are prohibited from marking all of the CPUs as
adaptive-tick CPUs: At least one non-adaptive-tick CPU must remain
online to handle timekeeping tasks in order to ensure that system calls
like gettimeofday() returns accurate values on adaptive-tick CPUs.
(This is not an issue for CONFIG_NO_HZ_IDLE=y because there are no
running user processes to observe slight drifts in clock rate.)
Therefore, the boot CPU is prohibited from entering adaptive-ticks
mode. Specifying a "nohz_full=" mask that includes the boot CPU will
result in a boot-time error message, and the boot CPU will be removed
from the mask.
Alternatively, the CONFIG_NO_HZ_FULL_ALL=y Kconfig parameter specifies
that all CPUs other than the boot CPU are adaptive-ticks CPUs. This
Kconfig parameter will be overridden by the "nohz_full=" boot parameter,
so that if both the CONFIG_NO_HZ_FULL_ALL=y Kconfig parameter and
the "nohz_full=1" boot parameter is specified, the boot parameter will
prevail so that only CPU 1 will be an adaptive-ticks CPU.
Finally, adaptive-ticks CPUs must have their RCU callbacks offloaded.
This is covered in the "RCU IMPLICATIONS" section below.
Normally, a CPU remains in adaptive-ticks mode as long as possible.
In particular, transitioning to kernel mode does not automatically change
the mode. Instead, the CPU will exit adaptive-ticks mode only if needed,
for example, if that CPU enqueues an RCU callback.
Just as with dyntick-idle mode, the benefits of adaptive-tick mode do
not come for free:
1. CONFIG_NO_HZ_FULL selects CONFIG_NO_HZ_COMMON, so you cannot run
adaptive ticks without also running dyntick idle. This dependency
extends down into the implementation, so that all of the costs
of CONFIG_NO_HZ_IDLE are also incurred by CONFIG_NO_HZ_FULL.
2. The user/kernel transitions are slightly more expensive due
to the need to inform kernel subsystems (such as RCU) about
the change in mode.
3. POSIX CPU timers on adaptive-tick CPUs may miss their deadlines
(perhaps indefinitely) because they currently rely on
scheduling-tick interrupts. This will likely be fixed in
one of two ways: (1) Prevent CPUs with POSIX CPU timers from
entering adaptive-tick mode, or (2) Use hrtimers or other
adaptive-ticks-immune mechanism to cause the POSIX CPU timer to
fire properly.
4. If there are more perf events pending than the hardware can
accommodate, they are normally round-robined so as to collect
all of them over time. Adaptive-tick mode may prevent this
round-robining from happening. This will likely be fixed by
preventing CPUs with large numbers of perf events pending from
entering adaptive-tick mode.
5. Scheduler statistics for adaptive-tick CPUs may be computed
slightly differently than those for non-adaptive-tick CPUs.
This might in turn perturb load-balancing of real-time tasks.
6. The LB_BIAS scheduler feature is disabled by adaptive ticks.
Although improvements are expected over time, adaptive ticks is quite
useful for many types of real-time and compute-intensive applications.
However, the drawbacks listed above mean that adaptive ticks should not
(yet) be enabled by default.
RCU IMPLICATIONS
There are situations in which idle CPUs cannot be permitted to
enter either dyntick-idle mode or adaptive-tick mode, the most
common being when that CPU has RCU callbacks pending.
The CONFIG_RCU_FAST_NO_HZ=y Kconfig option may be used to cause such CPUs
to enter dyntick-idle mode or adaptive-tick mode anyway. In this case,
a timer will awaken these CPUs every four jiffies in order to ensure
that the RCU callbacks are processed in a timely fashion.
Another approach is to offload RCU callback processing to "rcuo" kthreads
using the CONFIG_RCU_NOCB_CPU=y Kconfig option. The specific CPUs to
offload may be selected via several methods:
1. One of three mutually exclusive Kconfig options specify a
build-time default for the CPUs to offload:
a. The CONFIG_RCU_NOCB_CPU_NONE=y Kconfig option results in
no CPUs being offloaded.
b. The CONFIG_RCU_NOCB_CPU_ZERO=y Kconfig option causes
CPU 0 to be offloaded.
c. The CONFIG_RCU_NOCB_CPU_ALL=y Kconfig option causes all
CPUs to be offloaded. Note that the callbacks will be
offloaded to "rcuo" kthreads, and that those kthreads
will in fact run on some CPU. However, this approach
gives fine-grained control on exactly which CPUs the
callbacks run on, along with their scheduling priority
(including the default of SCHED_OTHER), and it further
allows this control to be varied dynamically at runtime.
2. The "rcu_nocbs=" kernel boot parameter, which takes a comma-separated
list of CPUs and CPU ranges, for example, "1,3-5" selects CPUs 1,
3, 4, and 5. The specified CPUs will be offloaded in addition to
any CPUs specified as offloaded by CONFIG_RCU_NOCB_CPU_ZERO=y or
CONFIG_RCU_NOCB_CPU_ALL=y. This means that the "rcu_nocbs=" boot
parameter has no effect for kernels built with RCU_NOCB_CPU_ALL=y.
The offloaded CPUs will never queue RCU callbacks, and therefore RCU
never prevents offloaded CPUs from entering either dyntick-idle mode
or adaptive-tick mode. That said, note that it is up to userspace to
pin the "rcuo" kthreads to specific CPUs if desired. Otherwise, the
scheduler will decide where to run them, which might or might not be
where you want them to run.
KNOWN ISSUES
o Dyntick-idle slows transitions to and from idle slightly.
In practice, this has not been a problem except for the most
aggressive real-time workloads, which have the option of disabling
dyntick-idle mode, an option that most of them take. However,
some workloads will no doubt want to use adaptive ticks to
eliminate scheduling-clock interrupt latencies. Here are some
options for these workloads:
a. Use PMQOS from userspace to inform the kernel of your
latency requirements (preferred).
b. On x86 systems, use the "idle=mwait" boot parameter.
c. On x86 systems, use the "intel_idle.max_cstate=" to limit
` the maximum C-state depth.
d. On x86 systems, use the "idle=poll" boot parameter.
However, please note that use of this parameter can cause
your CPU to overheat, which may cause thermal throttling
to degrade your latencies -- and that this degradation can
be even worse than that of dyntick-idle. Furthermore,
this parameter effectively disables Turbo Mode on Intel
CPUs, which can significantly reduce maximum performance.
o Adaptive-ticks slows user/kernel transitions slightly.
This is not expected to be a problem for computationally intensive
workloads, which have few such transitions. Careful benchmarking
will be required to determine whether or not other workloads
are significantly affected by this effect.
o Adaptive-ticks does not do anything unless there is only one
runnable task for a given CPU, even though there are a number
of other situations where the scheduling-clock tick is not
needed. To give but one example, consider a CPU that has one
runnable high-priority SCHED_FIFO task and an arbitrary number
of low-priority SCHED_OTHER tasks. In this case, the CPU is
required to run the SCHED_FIFO task until it either blocks or
some other higher-priority task awakens on (or is assigned to)
this CPU, so there is no point in sending a scheduling-clock
interrupt to this CPU. However, the current implementation
nevertheless sends scheduling-clock interrupts to CPUs having a
single runnable SCHED_FIFO task and multiple runnable SCHED_OTHER
tasks, even though these interrupts are unnecessary.
Better handling of these sorts of situations is future work.
o A reboot is required to reconfigure both adaptive idle and RCU
callback offloading. Runtime reconfiguration could be provided
if needed, however, due to the complexity of reconfiguring RCU at
runtime, there would need to be an earthshakingly good reason.
Especially given that you have the straightforward option of
simply offloading RCU callbacks from all CPUs and pinning them
where you want them whenever you want them pinned.
o Additional configuration is required to deal with other sources
of OS jitter, including interrupts and system-utility tasks
and processes. This configuration normally involves binding
interrupts and tasks to particular CPUs.
o Some sources of OS jitter can currently be eliminated only by
constraining the workload. For example, the only way to eliminate
OS jitter due to global TLB shootdowns is to avoid the unmapping
operations (such as kernel module unload operations) that
result in these shootdowns. For another example, page faults
and TLB misses can be reduced (and in some cases eliminated) by
using huge pages and by constraining the amount of memory used
by the application. Pre-faulting the working set can also be
helpful, especially when combined with the mlock() and mlockall()
system calls.
o Unless all CPUs are idle, at least one CPU must keep the
scheduling-clock interrupt going in order to support accurate
timekeeping.
o If there are adaptive-ticks CPUs, there will be at least one
CPU keeping the scheduling-clock interrupt going, even if all
CPUs are otherwise idle.
......@@ -30,8 +30,8 @@ DEFINE(UM_NSEC_PER_USEC, NSEC_PER_USEC);
#ifdef CONFIG_PRINTK
DEFINE(UML_CONFIG_PRINTK, CONFIG_PRINTK);
#endif
#ifdef CONFIG_NO_HZ
DEFINE(UML_CONFIG_NO_HZ, CONFIG_NO_HZ);
#ifdef CONFIG_NO_HZ_COMMON
DEFINE(UML_CONFIG_NO_HZ_COMMON, CONFIG_NO_HZ_COMMON);
#endif
#ifdef CONFIG_UML_X86
DEFINE(UML_CONFIG_UML_X86, CONFIG_UML_X86);
......
......@@ -79,7 +79,7 @@ long long os_nsecs(void)
return timeval_to_ns(&tv);
}
#ifdef UML_CONFIG_NO_HZ
#ifdef UML_CONFIG_NO_HZ_COMMON
static int after_sleep_interval(struct timespec *ts)
{
return 0;
......
......@@ -16,21 +16,27 @@
#ifndef _ASM_GENERIC_CPUTIME_NSECS_H
#define _ASM_GENERIC_CPUTIME_NSECS_H
#include <linux/math64.h>
typedef u64 __nocast cputime_t;
typedef u64 __nocast cputime64_t;
#define cputime_one_jiffy jiffies_to_cputime(1)
#define cputime_div(__ct, divisor) div_u64((__force u64)__ct, divisor)
#define cputime_div_rem(__ct, divisor, remainder) \
div_u64_rem((__force u64)__ct, divisor, remainder);
/*
* Convert cputime <-> jiffies (HZ)
*/
#define cputime_to_jiffies(__ct) \
((__force u64)(__ct) / (NSEC_PER_SEC / HZ))
cputime_div(__ct, NSEC_PER_SEC / HZ)
#define cputime_to_scaled(__ct) (__ct)
#define jiffies_to_cputime(__jif) \
(__force cputime_t)((__jif) * (NSEC_PER_SEC / HZ))
#define cputime64_to_jiffies64(__ct) \
((__force u64)(__ct) / (NSEC_PER_SEC / HZ))
cputime_div(__ct, NSEC_PER_SEC / HZ)
#define jiffies64_to_cputime64(__jif) \
(__force cputime64_t)((__jif) * (NSEC_PER_SEC / HZ))
......@@ -45,7 +51,7 @@ typedef u64 __nocast cputime64_t;
* Convert cputime <-> microseconds
*/
#define cputime_to_usecs(__ct) \
((__force u64)(__ct) / NSEC_PER_USEC)
cputime_div(__ct, NSEC_PER_USEC)
#define usecs_to_cputime(__usecs) \
(__force cputime_t)((__usecs) * NSEC_PER_USEC)
#define usecs_to_cputime64(__usecs) \
......@@ -55,7 +61,7 @@ typedef u64 __nocast cputime64_t;
* Convert cputime <-> seconds
*/
#define cputime_to_secs(__ct) \
((__force u64)(__ct) / NSEC_PER_SEC)
cputime_div(__ct, NSEC_PER_SEC)
#define secs_to_cputime(__secs) \
(__force cputime_t)((__secs) * NSEC_PER_SEC)
......@@ -69,8 +75,10 @@ static inline cputime_t timespec_to_cputime(const struct timespec *val)
}
static inline void cputime_to_timespec(const cputime_t ct, struct timespec *val)
{
val->tv_sec = (__force u64) ct / NSEC_PER_SEC;
val->tv_nsec = (__force u64) ct % NSEC_PER_SEC;
u32 rem;
val->tv_sec = cputime_div_rem(ct, NSEC_PER_SEC, &rem);
val->tv_nsec = rem;
}
/*
......@@ -83,15 +91,17 @@ static inline cputime_t timeval_to_cputime(const struct timeval *val)
}
static inline void cputime_to_timeval(const cputime_t ct, struct timeval *val)
{
val->tv_sec = (__force u64) ct / NSEC_PER_SEC;
val->tv_usec = ((__force u64) ct % NSEC_PER_SEC) / NSEC_PER_USEC;
u32 rem;
val->tv_sec = cputime_div_rem(ct, NSEC_PER_SEC, &rem);
val->tv_usec = rem / NSEC_PER_USEC;
}
/*
* Convert cputime <-> clock (USER_HZ)
*/
#define cputime_to_clock_t(__ct) \
((__force u64)(__ct) / (NSEC_PER_SEC / USER_HZ))
cputime_div(__ct, (NSEC_PER_SEC / USER_HZ))
#define clock_t_to_cputime(__x) \
(__force cputime_t)((__x) * (NSEC_PER_SEC / USER_HZ))
......
......@@ -788,6 +788,12 @@ static inline int __perf_event_disable(void *info) { return -1; }
static inline void perf_event_task_tick(void) { }
#endif
#if defined(CONFIG_PERF_EVENTS) && defined(CONFIG_NO_HZ_FULL)
extern bool perf_event_can_stop_tick(void);
#else
static inline bool perf_event_can_stop_tick(void) { return true; }
#endif
#if defined(CONFIG_PERF_EVENTS) && defined(CONFIG_CPU_SUP_INTEL)
extern void perf_restore_debug_store(void);
#else
......
......@@ -123,6 +123,8 @@ void run_posix_cpu_timers(struct task_struct *task);
void posix_cpu_timers_exit(struct task_struct *task);
void posix_cpu_timers_exit_group(struct task_struct *task);
bool posix_cpu_timers_can_stop_tick(struct task_struct *tsk);
void set_process_cpu_timer(struct task_struct *task, unsigned int clock_idx,
cputime_t *newval, cputime_t *oldval);
......
......@@ -1000,4 +1000,11 @@ static inline notrace void rcu_read_unlock_sched_notrace(void)
#define kfree_rcu(ptr, rcu_head) \
__kfree_rcu(&((ptr)->rcu_head), offsetof(typeof(*(ptr)), rcu_head))
#ifdef CONFIG_RCU_NOCB_CPU
extern bool rcu_is_nocb_cpu(int cpu);
#else
static inline bool rcu_is_nocb_cpu(int cpu) { return false; }
#endif /* #else #ifdef CONFIG_RCU_NOCB_CPU */
#endif /* __LINUX_RCUPDATE_H */
......@@ -231,7 +231,7 @@ extern void init_idle_bootup_task(struct task_struct *idle);
extern int runqueue_is_locked(int cpu);
#if defined(CONFIG_SMP) && defined(CONFIG_NO_HZ)
#if defined(CONFIG_SMP) && defined(CONFIG_NO_HZ_COMMON)
extern void nohz_balance_enter_idle(int cpu);
extern void set_cpu_sd_state_idle(void);
extern int get_nohz_timer_target(void);
......@@ -1764,13 +1764,13 @@ static inline int set_cpus_allowed_ptr(struct task_struct *p,
}
#endif
#ifdef CONFIG_NO_HZ
#ifdef CONFIG_NO_HZ_COMMON
void calc_load_enter_idle(void);
void calc_load_exit_idle(void);
#else
static inline void calc_load_enter_idle(void) { }
static inline void calc_load_exit_idle(void) { }
#endif /* CONFIG_NO_HZ */
#endif /* CONFIG_NO_HZ_COMMON */
#ifndef CONFIG_CPUMASK_OFFSTACK
static inline int set_cpus_allowed(struct task_struct *p, cpumask_t new_mask)
......@@ -1856,10 +1856,17 @@ extern void idle_task_exit(void);
static inline void idle_task_exit(void) {}
#endif
#if defined(CONFIG_NO_HZ) && defined(CONFIG_SMP)
extern void wake_up_idle_cpu(int cpu);
#if defined(CONFIG_NO_HZ_COMMON) && defined(CONFIG_SMP)
extern void wake_up_nohz_cpu(int cpu);
#else
static inline void wake_up_idle_cpu(int cpu) { }
static inline void wake_up_nohz_cpu(int cpu) { }
#endif
#ifdef CONFIG_NO_HZ_FULL
extern bool sched_can_stop_tick(void);
extern u64 scheduler_tick_max_deferment(void);
#else
static inline bool sched_can_stop_tick(void) { return false; }
#endif
#ifdef CONFIG_SCHED_AUTOGROUP
......
......@@ -82,7 +82,7 @@ extern int tick_program_event(ktime_t expires, int force);
extern void tick_setup_sched_timer(void);
# endif
# if defined CONFIG_NO_HZ || defined CONFIG_HIGH_RES_TIMERS
# if defined CONFIG_NO_HZ_COMMON || defined CONFIG_HIGH_RES_TIMERS
extern void tick_cancel_sched_timer(int cpu);
# else
static inline void tick_cancel_sched_timer(int cpu) { }
......@@ -123,7 +123,7 @@ static inline void tick_check_idle(int cpu) { }
static inline int tick_oneshot_mode_active(void) { return 0; }
#endif /* !CONFIG_GENERIC_CLOCKEVENTS */
# ifdef CONFIG_NO_HZ
# ifdef CONFIG_NO_HZ_COMMON
DECLARE_PER_CPU(struct tick_sched, tick_cpu_sched);
static inline int tick_nohz_tick_stopped(void)
......@@ -138,7 +138,7 @@ extern ktime_t tick_nohz_get_sleep_length(void);
extern u64 get_cpu_idle_time_us(int cpu, u64 *last_update_time);
extern u64 get_cpu_iowait_time_us(int cpu, u64 *last_update_time);
# else /* !CONFIG_NO_HZ */
# else /* !CONFIG_NO_HZ_COMMON */
static inline int tick_nohz_tick_stopped(void)
{
return 0;
......@@ -155,7 +155,24 @@ static inline ktime_t tick_nohz_get_sleep_length(void)
}
static inline u64 get_cpu_idle_time_us(int cpu, u64 *unused) { return -1; }
static inline u64 get_cpu_iowait_time_us(int cpu, u64 *unused) { return -1; }
# endif /* !NO_HZ */
# endif /* !CONFIG_NO_HZ_COMMON */
#ifdef CONFIG_NO_HZ_FULL
extern void tick_nohz_init(void);
extern int tick_nohz_full_cpu(int cpu);
extern void tick_nohz_full_check(void);
extern void tick_nohz_full_kick(void);
extern void tick_nohz_full_kick_all(void);
extern void tick_nohz_task_switch(struct task_struct *tsk);
#else
static inline void tick_nohz_init(void) { }
static inline int tick_nohz_full_cpu(int cpu) { return 0; }
static inline void tick_nohz_full_check(void) { }
static inline void tick_nohz_full_kick(void) { }
static inline void tick_nohz_full_kick_all(void) { }
static inline void tick_nohz_task_switch(struct task_struct *tsk) { }
#endif
# ifdef CONFIG_CPU_IDLE_GOV_MENU
extern void menu_hrtimer_cancel(void);
......
......@@ -323,6 +323,27 @@ TRACE_EVENT(itimer_expire,
(int) __entry->pid, (unsigned long long)__entry->now)
);
#ifdef CONFIG_NO_HZ_COMMON
TRACE_EVENT(tick_stop,
TP_PROTO(int success, char *error_msg),
TP_ARGS(success, error_msg),
TP_STRUCT__entry(
__field( int , success )
__string( msg, error_msg )
),
TP_fast_assign(
__entry->success = success;
__assign_str(msg, error_msg);
),
TP_printk("success=%s msg=%s", __entry->success ? "yes" : "no", __get_str(msg))
);
#endif
#endif /* _TRACE_TIMER_H */
/* This part must be outside protection */
......
......@@ -302,7 +302,7 @@ choice
# Kind of a stub config for the pure tick based cputime accounting
config TICK_CPU_ACCOUNTING
bool "Simple tick based cputime accounting"
depends on !S390
depends on !S390 && !NO_HZ_FULL
help
This is the basic tick based cputime accounting that maintains
statistics about user, system and idle time spent on per jiffies
......@@ -312,7 +312,7 @@ config TICK_CPU_ACCOUNTING
config VIRT_CPU_ACCOUNTING_NATIVE
bool "Deterministic task and CPU time accounting"
depends on HAVE_VIRT_CPU_ACCOUNTING
depends on HAVE_VIRT_CPU_ACCOUNTING && !NO_HZ_FULL
select VIRT_CPU_ACCOUNTING
help
Select this option to enable more accurate task and CPU time
......@@ -342,7 +342,7 @@ config VIRT_CPU_ACCOUNTING_GEN
config IRQ_TIME_ACCOUNTING
bool "Fine granularity task level IRQ time accounting"
depends on HAVE_IRQ_TIME_ACCOUNTING
depends on HAVE_IRQ_TIME_ACCOUNTING && !NO_HZ_FULL
help
Select this option to enable fine granularity task irq time
accounting. This is done by reading a timestamp on each
......@@ -576,7 +576,7 @@ config RCU_FANOUT_EXACT
config RCU_FAST_NO_HZ
bool "Accelerate last non-dyntick-idle CPU's grace periods"
depends on NO_HZ && SMP
depends on NO_HZ_COMMON && SMP
default n
help
This option permits CPUs to enter dynticks-idle state even if
......@@ -687,7 +687,7 @@ choice
config RCU_NOCB_CPU_NONE
bool "No build_forced no-CBs CPUs"
depends on RCU_NOCB_CPU
depends on RCU_NOCB_CPU && !NO_HZ_FULL
help
This option does not force any of the CPUs to be no-CBs CPUs.
Only CPUs designated by the rcu_nocbs= boot parameter will be
......@@ -695,7 +695,7 @@ config RCU_NOCB_CPU_NONE
config RCU_NOCB_CPU_ZERO
bool "CPU 0 is a build_forced no-CBs CPU"
depends on RCU_NOCB_CPU
depends on RCU_NOCB_CPU && !NO_HZ_FULL
help
This option forces CPU 0 to be a no-CBs CPU. Additional CPUs
may be designated as no-CBs CPUs using the rcu_nocbs= boot
......
......@@ -544,6 +544,7 @@ asmlinkage void __init start_kernel(void)
idr_init_cache();
perf_event_init();
rcu_init();