1. 18 Jul, 2017 1 commit
  2. 28 Jun, 2017 1 commit
    • Akshay Adiga's avatar
      powerpc/powernv/idle: Clear r12 on wakeup from stop lite · 4d0d7c02
      Akshay Adiga authored
      pnv_wakeup_noloss() expects r12 to contain SRR1 value to determine if the wakeup
      reason is an HMI in CHECK_HMI_INTERRUPT.
      When we wakeup with ESL=0, SRR1 will not contain the wakeup reason, so there is
      no point setting r12 to SRR1.
      However, we don't set r12 at all so r12 contains garbage (likely a kernel
      pointer), and is still used to check HMI assuming that it contained SRR1. This
      causes the OPAL msglog to be filled with the following print:
        HMI: Received HMI interrupt: HMER = 0x0040000000000000
      This patch clears r12 after waking up from stop with ESL=EC=0, so that we don't
      accidentally enter the HMI handler in pnv_wakeup_noloss() if the value of
      r12[42:45] corresponds to HMI as wakeup reason.
      Prior to commit 9d292501 ("powerpc/64s/idle: Avoid SRR usage in idle
      sleep/wake paths") this bug existed, in that we would incorrectly look at SRR1
      to check for a HMI when SRR1 didn't contain a wakeup reason. However the SRR1
      value would just happen to never have bits 42:45 set.
      Fixes: 9d292501
       ("powerpc/64s/idle: Avoid SRR usage in idle sleep/wake paths")
      Signed-off-by: default avatarAkshay Adiga <akshay.adiga@linux.vnet.ibm.com>
      Reviewed-by: default avatarNicholas Piggin <npiggin@gmail.com>
      [mpe: Change log and comment massaging]
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
  3. 27 Jun, 2017 1 commit
  4. 19 Jun, 2017 3 commits
  5. 30 May, 2017 3 commits
    • Gautham R. Shenoy's avatar
      powerpc/powernv/idle: Use Requested Level for restoring state on P9 DD1 · 22c6663d
      Gautham R. Shenoy authored
      On Power9 DD1 due to a hardware bug the Power-Saving Level Status
      field (PLS) of the PSSCR for a thread waking up from a deep state can
      under-report if some other thread in the core is in a shallow stop
      state. The scenario in which this can manifest is as follows:
         1) All the threads of the core are in deep stop.
         2) One of the threads is woken up. The PLS for this thread will
            correctly reflect that it is waking up from deep stop.
         3) The thread that has woken up now executes a shallow stop.
         4) When some other thread in the core is woken, its PLS will reflect
            the shallow stop state.
      Thus, the subsequent thread for which the PLS is under-reporting the
      wakeup state will not restore the hypervisor resources.
      Hence, on DD1 systems, use the Requested Level (RL) field as a
      workaround to restore the contents of the hypervisor resources on the
      wakeup from the stop state.
      Signed-off-by: default avatarGautham R. Shenoy <ego@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
    • Gautham R. Shenoy's avatar
      powerpc/powernv/idle: Restore LPCR on wakeup from deep-stop · cb0be7ec
      Gautham R. Shenoy authored
      On wakeup from a deep stop state which is supposed to lose the
      hypervisor state, we don't restore the LPCR to the old value but set
      it to a "sane" value via cur_cpu_spec->cpu_restore().
      The problem is that the "sane" value doesn't include UPRT and the HR
      bits which are required to run correctly in Radix mode.
      Fix this on POWER9 onwards by restoring the LPCR value whatever it was
      before executing the stop instruction.
      Signed-off-by: default avatarGautham R. Shenoy <ego@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
    • Gautham R. Shenoy's avatar
      powerpc/powernv/idle: Decouple Timebase restore & Per-core SPRs restore · ec486735
      Gautham R. Shenoy authored
      On POWER8, in case of
         -  nap: both timebase and hypervisor state is retained.
         -  fast-sleep: timebase is lost. But the hypervisor state is retained.
         -  winkle: timebase and hypervisor state is lost.
      Hence, the current code for handling exit from a idle state assumes
      that if the timebase value is retained, then so is the hypervisor
      state. Thus, the current code doesn't restore per-core hypervisor
      state in such cases.
      But that is no longer the case on POWER9 where we do have stop states
      in which timebase value is retained, but the hypervisor state is
      lost. So we have to ensure that the per-core hypervisor state gets
      restored in such cases.
      Fix this by ensuring that even in the case when timebase is retained,
      we explicitly check if we are waking up from a deep stop that loses
      per-core hypervisor state (indicated by cr4 being eq or gt), and if
      this is the case, we restore the per-core hypervisor state.
      Signed-off-by: default avatarGautham R. Shenoy <ego@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
  6. 16 May, 2017 1 commit
  7. 23 Apr, 2017 8 commits
  8. 10 Apr, 2017 1 commit
    • Gautham R. Shenoy's avatar
      powerpc/powernv: Recover correct PACA on wakeup from a stop on P9 DD1 · 17ed4c8f
      Gautham R. Shenoy authored
      POWER9 DD1.0 hardware has a bug where the SPRs of a thread waking up
      from stop 0,1,2 with ESL=1 can endup being misplaced in the core. Thus
      the HSPRG0 of a thread waking up from can contain the paca pointer of
      its sibling.
      This patch implements a context recovery framework within threads of a
      core, by provisioning space in paca_struct for saving every sibling
      threads's paca pointers. Basically, we should be able to arrive at the
      right paca pointer from any of the thread's existing paca pointer.
      At bootup, during powernv idle-init, we save the paca address of every
      CPU in each one its siblings paca_struct in the slot corresponding to
      this CPU's index in the core.
      On wakeup from a stop, the thread will determine its index in the core
      from the TIR register and recover its PACA pointer by indexing into
      the correct slot in the provisioned space in the current PACA.
      Furthermore, ensure that the NVGPRs are restored from the stack on the
      way out by setting the NAPSTATELOST in paca.
      [Changelog written with inputs from svaidy@linux.vnet.ibm.com]
      Signed-off-by: default avatarGautham R. Shenoy <ego@linux.vnet.ibm.com>
      Reviewed-by: default avatarNicholas Piggin <npiggin@gmail.com>
      [mpe: Call it a bug]
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
  9. 20 Mar, 2017 1 commit
  10. 03 Mar, 2017 1 commit
    • Gautham R. Shenoy's avatar
      powerpc/powernv: Fix bug due to labeling ambiguity in power_enter_stop · 424f8acd
      Gautham R. Shenoy authored
      Commit 09206b60 ("powernv: Pass PSSCR value and mask to
      power9_idle_stop") added additional code in power_enter_stop() to
      distinguish between stop requests whose PSSCR had ESL=EC=1 from those
      which did not. When ESL=EC=1, we do a forward-jump to a location
      labelled by "1", which had the code to handle the ESL=EC=1 case.
      Unfortunately just a couple of instructions before this label, is the
      macro IDLE_STATE_ENTER_SEQ() which also has a label "1" in its
      As a result, the current code can result in directly executing stop
      instruction for deep stop requests with PSSCR ESL=EC=1, without saving
      the hypervisor state.
      Fix this BUG by labeling the location that handles ESL=EC=1 case with
      a more descriptive label ".Lhandle_esl_ec_set" (local label suggestion
      a la .Lxx from Anton Blanchard).
      While at it, rename the label "2" labelling the location of the code
      handling entry into deep stop states with ".Lhandle_deep_stop".
      For a good measure, change the label in IDLE_STATE_ENTER_SEQ() macro
      to an not-so commonly used value in order to avoid similar mishaps in
      the future.
      Fixes: 09206b60
       ("powernv: Pass PSSCR value and mask to power9_idle_stop")
      Signed-off-by: default avatarGautham R. Shenoy <ego@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
  11. 07 Feb, 2017 1 commit
  12. 30 Jan, 2017 2 commits
    • Gautham R. Shenoy's avatar
      powernv: Pass PSSCR value and mask to power9_idle_stop · 09206b60
      Gautham R. Shenoy authored
      The power9_idle_stop method currently takes only the requested stop
      level as a parameter and picks up the rest of the PSSCR bits from a
      hand-coded macro. This is not a very flexible design, especially when
      the firmware has the capability to communicate the psscr value and the
      mask associated with a particular stop state via device tree.
      This patch modifies the power9_idle_stop API to take as parameters the
      PSSCR value and the PSSCR mask corresponding to the stop state that
      needs to be set. These PSSCR value and mask are respectively obtained
      by parsing the "ibm,cpu-idle-state-psscr" and
      "ibm,cpu-idle-state-psscr-mask" fields from the device tree.
      In addition to this, the patch adds support for handling stop states
      for which ESL and EC bits in the PSSCR are zero. As per the
      architecture, a wakeup from these stop states resumes execution from
      the subsequent instruction as opposed to waking up at the System
      The older firmware sets only the Requested Level (RL) field in the
      psscr and psscr-mask exposed in the device tree. For older firmware
      where psscr-mask=0xf, this patch will set the default sane values that
      the set for for remaining PSSCR fields (i.e PSLL, MTL, ESL, EC, and
      TR). For the new firmware, the patch will validate that the invariants
      required by the ISA for the psscr values are maintained by the
      This skiboot patch that exports fully populated PSSCR values and the
      mask for all the stop states can be found here:
      [Optimize the number of instructions before entering STOP with
      ESL=EC=0, validate the PSSCR values provided by the firimware
      maintains the invariants required as per the ISA suggested by Balbir
      Acked-by: default avatarBalbir Singh <bsingharora@gmail.com>
      Signed-off-by: default avatarGautham R. Shenoy <ego@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
    • Gautham R. Shenoy's avatar
      powernv:idle: Add IDLE_STATE_ENTER_SEQ_NORET macro · 823b7bd5
      Gautham R. Shenoy authored
      Currently all the low-power idle states are expected to wake up
      at reset vector 0x100. Which is why the macro IDLE_STATE_ENTER_SEQ
      that puts the CPU to an idle state and never returns.
      On ISA v3.0, when the ESL and EC bits in the PSSCR are zero, the CPU
      is expected to wake up at the next instruction of the idle
      This patch adds a new macro named IDLE_STATE_ENTER_SEQ_NORET for the
      no-return variant and reuses the name IDLE_STATE_ENTER_SEQ
      for a variant that allows resuming operation at the instruction next
      to the idle-instruction.
      Acked-by: default avatarBalbir Singh <bsingharora@gmail.com>
      Signed-off-by: default avatarGautham R. Shenoy <ego@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
  13. 24 Oct, 2016 2 commits
    • Paul Mackerras's avatar
      powerpc/64: Fix race condition in setting lock bit in idle/wakeup code · 09b7e37b
      Paul Mackerras authored
      This fixes a race condition where one thread that is entering or
      leaving a power-saving state can inadvertently ignore the lock bit
      that was set by another thread, and potentially also clear it.
      The core_idle_lock_held function is called when the lock bit is
      seen to be set.  It polls the lock bit until it is clear, then
      does a lwarx to load the word containing the lock bit and thread
      idle bits so it can be updated.  However, it is possible that the
      value loaded with the lwarx has the lock bit set, even though an
      immediately preceding lwz loaded a value with the lock bit clear.
      If this happens then we go ahead and update the word despite the
      lock bit being set, and when called from pnv_enter_arch207_idle_mode,
      we will subsequently clear the lock bit.
      No identifiable misbehaviour has been attributed to this race.
      This fixes it by checking the lock bit in the value loaded by the
      lwarx.  If it is set then we just go back and keep on polling.
      Fixes: b32aadc1
       ("powerpc/powernv: Fix race in updating core_idle_state")
      Cc: stable@vger.kernel.org # v4.2+
      Signed-off-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
    • Paul Mackerras's avatar
      powerpc/64: Re-fix race condition between going idle and entering guest · 56c46222
      Paul Mackerras authored
      Commit 8117ac6a ("powerpc/powernv: Switch off MMU before entering
      nap/sleep/rvwinkle mode", 2014-12-10) fixed a race condition where one
      thread entering a KVM guest could switch the MMU context to the guest
      while another thread was still in host kernel context with the MMU on.
      That commit moved the point where a thread entering a power-saving
      mode set its kvm_hstate.hwthread_state field in its PACA to
      KVM_HWTHREAD_IN_IDLE from a point where the MMU was on to after the
      MMU had been switched off.  That commit also added a comment
      explaining that we have to switch to real mode before setting
      hwthread_state to avoid this race.
      Nevertheless, commit 4eae2c9a ("powerpc/powernv: Make
      pnv_powersave_common more generic", 2016-07-08) subsequently moved
      the setting of hwthread_state back to a point where the MMU is on,
      thus reintroducing the race, despite the comment saying that this
      should not be done being included in full in the context lines of
      the patch that did it.
      This fixes the race again and adds a bigger and shoutier comment
      explaining the potential race condition.
      Fixes: 4eae2c9a
       ("powerpc/powernv: Make pnv_powersave_common more generic")
      Cc: stable@vger.kernel.org # v4.8+
      Signed-off-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      Reviewed-by: default avatarShreyas B. Prabhu <shreyasbp@gmail.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
  14. 12 Sep, 2016 1 commit
    • Gautham R. Shenoy's avatar
      powerpc/powernv: Fix restore of SPRs upon wake up from hypervisor state loss · bd00a240
      Gautham R. Shenoy authored
      pnv_wakeup_tb_loss() currently expects cr4 to be "eq" if the CPU is
      waking up from a complete hypervisor state loss. Hence, it currently
      restores the SPR contents only if cr4 is "eq".
      However, after commit bcef83a0 ("powerpc/powernv: Add platform
      support for stop instruction"), on ISA v3.0 CPUs, the function
      pnv_restore_hyp_resource() sets cr4 to contain the result of the
      comparison between the state the CPU has woken up from and the first
      deep stop state before calling pnv_wakeup_tb_loss().
      Thus if the CPU woke up from a state that is deeper than the first
      deep stop state, cr4 will have "gt" set and hence, pnv_wakeup_tb_loss()
      will fail to restore the SPRs on waking up from such a state.
      Fix the code in pnv_wakeup_tb_loss() to restore the SPR states when cr4
      is "eq" or "gt".
      Fixes: bcef83a0
       ("powerpc/powernv: Add platform support for stop instruction")
      Signed-off-by: default avatarGautham R. Shenoy <ego@linux.vnet.ibm.com>
      Reviewed-by: default avatarShreyas B. Prabhu <shreyasbp@gmail.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
  15. 09 Aug, 2016 2 commits
  16. 01 Aug, 2016 1 commit
  17. 17 Jul, 2016 1 commit
  18. 15 Jul, 2016 7 commits
  19. 20 Jun, 2016 1 commit
    • Mahesh Salgaonkar's avatar
      KVM: PPC: Book3S HV: Fix TB corruption in guest exit path on HMI interrupt · fd7bacbc
      Mahesh Salgaonkar authored
      When a guest is assigned to a core it converts the host Timebase (TB)
      into guest TB by adding guest timebase offset before entering into
      guest. During guest exit it restores the guest TB to host TB. This means
      under certain conditions (Guest migration) host TB and guest TB can differ.
      When we get an HMI for TB related issues the opal HMI handler would
      try fixing errors and restore the correct host TB value. With no guest
      running, we don't have any issues. But with guest running on the core
      we run into TB corruption issues.
      If we get an HMI while in the guest, the current HMI handler invokes opal
      hmi handler before forcing guest to exit. The guest exit path subtracts
      the guest TB offset from the current TB value which may have already
      been restored with host value by opal hmi handler. This leads to incorrect
      host and guest TB values.
      With split-core, things become more complex. With split-core, TB also gets
      split and each subcore gets its own TB register. When a hmi handler fixes
      a TB error and restores the TB value, it affects all the TB values of
      sibling subcores on the same core. On TB errors all the thread in the core
      gets HMI. With existing code, the individual threads call opal hmi handle
      independently which can easily throw TB out of sync if we have guest
      running on subcores. Hence we will need to co-ordinate with all the
      threads before making opal hmi handler call followed by TB resync.
      This patch introduces a sibling subcore state structure (shared by all
      threads in the core) in paca which holds information about whether sibling
      subcores are in Guest mode or host mode. An array in_guest[] of size
      MAX_SUBCORE_PER_CORE=4 is used to maintain the state of each subcore.
      The subcore id is used as index into in_guest[] array. Only primary
      thread entering/exiting the guest is responsible to set/unset its
      designated array element.
      On TB error, we get HMI interrupt on every thread on the core. Upon HMI,
      this patch will now force guest to vacate the core/subcore. Primary
      thread from each subcore will then turn off its respective bit
      from the above bitmap during the guest exit path just after the
      guest->host partition switch is complete.
      All other threads that have just exited the guest OR were already in host
      will wait until all other subcores clears their respective bit.
      Once all the subcores turn off their respective bit, all threads will
      will make call to opal hmi handler.
      It is not necessary that opal hmi handler would resync the TB value for
      every HMI interrupts. It would do so only for the HMI caused due to
      TB errors. For rest, it would not touch TB value. Hence to make things
      simpler, primary thread would call TB resync explicitly once for each
      core immediately after opal hmi handler instead of subtracting guest
      offset from TB. TB resync call will restore the TB with host value.
      Thus we can be sure about the TB state.
      One of the primary threads exiting the guest will take up the
      responsibility of calling TB resync. It will use one of the top bits
      (bit 63) from subcore state flags bitmap to make the decision. The first
      primary thread (among the subcores) that is able to set the bit will
      have to call the TB resync. Rest all other threads will wait until TB
      resync is complete.  Once TB resync is complete all threads will then
      Signed-off-by: default avatarMahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
      Signed-off-by: default avatarPaul Mackerras <paulus@ozlabs.org>
  20. 03 Mar, 2016 1 commit