1. 16 Jul, 2016 3 commits
  2. 09 Jul, 2016 3 commits
  3. 05 Jul, 2016 1 commit
  4. 24 Mar, 2016 1 commit
  5. 02 Mar, 2016 2 commits
  6. 28 Feb, 2016 4 commits
  7. 15 Feb, 2016 1 commit
  8. 03 Feb, 2016 1 commit
  9. 01 Feb, 2016 1 commit
  10. 31 Jan, 2016 8 commits
    • Ulrich Weigand's avatar
      powerpc/module: Handle R_PPC64_ENTRY relocations · a33b8ff3
      Ulrich Weigand authored
      commit a61674bd
      GCC 6 will include changes to generated code with -mcmodel=large,
      which is used to build kernel modules on powerpc64le.  This was
      necessary because the large model is supposed to allow arbitrary
      sizes and locations of the code and data sections, but the ELFv2
      global entry point prolog still made the unconditional assumption
      that the TOC associated with any particular function can be found
      within 2 GB of the function entry point:
      	addis r2,r12,(.TOC.-func)@ha
      	addi  r2,r2,(.TOC.-func)@l
      	.localentry func, .-func
      To remove this assumption, GCC will now generate instead this global
      entry point prolog sequence when using -mcmodel=large:
      	.quad .TOC.-func
      	.reloc ., R_PPC64_ENTRY
      	ld    r2, -8(r12)
      	add   r2, r2, r12
      	.localentry func, .-func
      The new .reloc triggers an optimization in the linker that will
      replace this new prolog with the original code (see above) if the
      linker determines that the distance between .TOC. and func is in
      range after all.
      Since this new relocation is now present in module object files,
      the kernel module loader is required to handle them too.  This
      patch adds support for the new relocation and implements the
      same optimization done by the GNU linker.
      Signed-off-by: default avatarUlrich Weigand <ulrich.weigand@de.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
    • Boqun Feng's avatar
      powerpc: Make {cmp}xchg* and their atomic_ versions fully ordered · 4126ac7c
      Boqun Feng authored
      commit 81d7a329 upstream.
      According to memory-barriers.txt, xchg*, cmpxchg* and their atomic_
      versions all need to be fully ordered, however they are now just
      RELEASE+ACQUIRE, which are not fully ordered.
      So also replace PPC_RELEASE_BARRIER and PPC_ACQUIRE_BARRIER with
      __{cmp,}xchg_{u32,u64} respectively to guarantee fully ordered semantics
      of atomic{,64}_{cmp,}xchg() and {cmp,}xchg(), as a complement of commit
       ("powerpc: Fix atomic_xxx_return barrier semantics")
      This patch depends on patch "powerpc: Make value-returning atomics fully
      ordered" for PPC_ATOMIC_ENTRY_BARRIER definition.
      Signed-off-by: default avatarBoqun Feng <boqun.feng@gmail.com>
      Reviewed-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Acked-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
    • Boqun Feng's avatar
      powerpc: Make value-returning atomics fully ordered · af69fe1f
      Boqun Feng authored
      commit 49e9cf3f upstream.
      According to memory-barriers.txt:
      > Any atomic operation that modifies some state in memory and returns
      > information about the state (old or new) implies an SMP-conditional
      > general memory barrier (smp_mb()) on each side of the actual
      > operation ...
      Which mean these operations should be fully ordered. However on PPC,
      PPC_ATOMIC_ENTRY_BARRIER is the barrier before the actual operation,
      which is currently "lwsync" if SMP=y. The leading "lwsync" can not
      guarantee fully ordered atomics, according to Paul Mckenney:
      To fix this, we define PPC_ATOMIC_ENTRY_BARRIER as "sync" to guarantee
      the fully-ordered semantics.
      This also makes futex atomics fully ordered, which can avoid possible
      memory ordering problems if userspace code relies on futex system call
      for fully ordered semantics.
      Fixes: b97021f8
       ("powerpc: Fix atomic_xxx_return barrier semantics")
      Signed-off-by: default avatarBoqun Feng <boqun.feng@gmail.com>
      Reviewed-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Acked-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
    • Stewart Smith's avatar
      powerpc/powernv: pr_warn_once on unsupported OPAL_MSG type · 1e14dd5a
      Stewart Smith authored
      commit 98da62b7
      When running on newer OPAL firmware that supports sending extra
      OPAL_MSG types, we would print a warning on *every* message received.
      This could be a problem for kernels that don't support OPAL_MSG_OCC
      on machines that are running real close to thermal limits and the
      OCC is throttling the chip. For a kernel that is paying attention to
      the message queue, we could get these notifications quite often.
      Conceivably, future message types could also come fairly often,
      and printing that we didn't understand them 10,000 times provides
      no further information than printing them once.
      Signed-off-by: default avatarStewart Smith <stewart@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
    • Michael Neuling's avatar
      powerpc/tm: Check for already reclaimed tasks · a54d3a42
      Michael Neuling authored
      commit 7f821fc9 upstream.
      Currently we can hit a scenario where we'll tm_reclaim() twice.  This
      results in a TM bad thing exception because the second reclaim occurs
      when not in suspend mode.
      The scenario in which this can happen is the following.  We attempt to
      deliver a signal to userspace.  To do this we need obtain the stack
      pointer to write the signal context.  To get this stack pointer we
      must tm_reclaim() in case we need to use the checkpointed stack
      pointer (see get_tm_stackpointer()).  Normally we'd then return
      directly to userspace to deliver the signal without going through
      Unfortunatley, if at this point we get an error (such as a bad
      userspace stack pointer), we need to exit the process.  The exit will
      result in a __switch_to().  __switch_to() will attempt to save the
      process state which results in another tm_reclaim().  This
      tm_reclaim() now causes a TM Bad Thing exception as this state has
      already been saved and the processor is no longer in TM suspend mode.
      This patch checks the state of the MSR to ensure we are TM suspended
      before we attempt the tm_reclaim().  If we've already saved the state
      away, we should no longer be in TM suspend mode.  This has the
      additional advantage of checking for a potential TM Bad Thing
      Found using syscall fuzzer.
      Fixes: fb09692e
       ("powerpc: Add reclaim and recheckpoint functions for context switching transactional memory processes")
      Signed-off-by: default avatarMichael Neuling <mikey@neuling.org>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
    • Michael Neuling's avatar
      powerpc/tm: Block signal return setting invalid MSR state · 567a215d
      Michael Neuling authored
      commit d2b9d2a5 upstream.
      Currently we allow both the MSR T and S bits to be set by userspace on
      a signal return.  Unfortunately this is a reserved configuration and
      will cause a TM Bad Thing exception if attempted (via rfid).
      This patch checks for this case in both the 32 and 64 bit signals
      code.  If both T and S are set, we mark the context as invalid.
      Found using a syscall fuzzer.
      Fixes: 2b0a576d
       ("powerpc: Add new transactional memory state to the signal context")
      Signed-off-by: default avatarMichael Neuling <mikey@neuling.org>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
    • Rabin Vincent's avatar
      net: filter: make JITs zero A for SKF_AD_ALU_XOR_X · 5596242a
      Rabin Vincent authored
      [ Upstream commit 55795ef5 ]
      The SKF_AD_ALU_XOR_X ancillary is not like the other ancillary data
      instructions since it XORs A with X while all the others replace A with
      some loaded value.  All the BPF JITs fail to clear A if this is used as
      the first instruction in a filter.  This was found using american fuzzy
      Add a helper to determine if A needs to be cleared given the first
      instruction in a filter, and use this in the JITs.  Except for ARM, the
      rest have only been compile-tested.
      Fixes: 34805931
       ("net: filter: get rid of BPF_S_* enum")
      Signed-off-by: default avatarRabin Vincent <rabin@rab.in>
      Acked-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
    • Paul Mackerras's avatar
      KVM: PPC: Book3S HV: Prohibit setting illegal transaction state in MSR · e052d6ee
      Paul Mackerras authored
      commit c20875a3
      Currently it is possible for userspace (e.g. QEMU) to set a value
      for the MSR for a guest VCPU which has both of the TS bits set,
      which is an illegal combination.  The result of this is that when
      we execute a hrfid (hypervisor return from interrupt doubleword)
      instruction to enter the guest, the CPU will take a TM Bad Thing
      type of program interrupt (vector 0x700).
      Now, if PR KVM is configured in the kernel along with HV KVM, we
      actually handle this without crashing the host or giving hypervisor
      privilege to the guest; instead what happens is that we deliver a
      program interrupt to the guest, with SRR0 reflecting the address
      of the hrfid instruction and SRR1 containing the MSR value at that
      point.  If PR KVM is not configured in the kernel, then we try to
      run the host's program interrupt handler with the MMU set to the
      guest context, which almost certainly causes a host crash.
      This closes the hole by making kvmppc_set_msr_hv() check for the
      illegal combination and force the TS field to a safe value (00,
      meaning non-transactional).
      Signed-off-by: default avatarPaul Mackerras <paulus@samba.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
  11. 09 Nov, 2015 1 commit
    • Vasant Hegde's avatar
      powerpc/rtas: Validate rtas.entry before calling enter_rtas() · 29589707
      Vasant Hegde authored
      commit 8832317f upstream.
      Currently we do not validate rtas.entry before calling enter_rtas(). This
      leads to a kernel oops when user space calls rtas system call on a powernv
      platform (see below). This patch adds code to validate rtas.entry before
      making enter_rtas() call.
        Oops: Exception in kernel mode, sig: 4 [#1]
        SMP NR_CPUS=1024 NUMA PowerNV
        task: c000000004294b80 ti: c0000007e1a78000 task.ti: c0000007e1a78000
        NIP: 0000000000000000 LR: 0000000000009c14 CTR: c000000000423140
        REGS: c0000007e1a7b920 TRAP: 0e40   Not tainted  (3.18.17-340.el7_1.pkvm3_1_0.2400.1.ppc64le)
        MSR: 1000000000081000 <HV,ME>  CR: 00000000  XER: 00000000
        CFAR: c000000000009c0c SOFTE: 0
        NIP [0000000000000000]           (null)
        LR [0000000000009c14] 0x9c14
        Call Trace:
        [c0000007e1a7bba0] [c00000000041a7f4] avc_has_perm_noaudit+0x54/0x110 (unreliable)
        [c0000007e1a7bd80] [c00000000002ddc0] ppc_rtas+0x150/0x2d0
        [c0000007e1a7be30] [c000000000009358] syscall_exit+0x0/0x98
      Fixes: 55190f88
       ("powerpc: Add skeleton PowerNV platform")
      Reported-by: default avatarNAGESWARA R. SASTRY <nasastry@in.ibm.com>
      Signed-off-by: default avatarVasant Hegde <hegdevasant@linux.vnet.ibm.com>
      [mpe: Reword change log, trim oops, and add stable + fixes]
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
  12. 27 Oct, 2015 1 commit
  13. 22 Oct, 2015 3 commits
    • Paul Mackerras's avatar
      powerpc/MSI: Fix race condition in tearing down MSI interrupts · e6b5ff2b
      Paul Mackerras authored
      commit e297c939 upstream.
      This fixes a race which can result in the same virtual IRQ number
      being assigned to two different MSI interrupts.  The most visible
      consequence of that is usually a warning and stack trace from the
      sysfs code about an attempt to create a duplicate entry in sysfs.
      The race happens when one CPU (say CPU 0) is disposing of an MSI
      while another CPU (say CPU 1) is setting up an MSI.  CPU 0 calls
      (for example) pnv_teardown_msi_irqs(), which calls
      msi_bitmap_free_hwirqs() to indicate that the MSI (i.e. its
      hardware IRQ number) is no longer in use.  Then, before CPU 0 gets
      to calling irq_dispose_mapping() to free up the virtal IRQ number,
      CPU 1 comes in and calls msi_bitmap_alloc_hwirqs() to allocate an
      MSI, and gets the same hardware IRQ number that CPU 0 just freed.
      CPU 1 then calls irq_create_mapping() to get a virtual IRQ number,
      which sees that there is currently a mapping for that hardware IRQ
      number and returns the corresponding virtual IRQ number (which is
      the same virtual IRQ number that CPU 0 was using).  CPU 0 then
      calls irq_dispose_mapping() and frees that virtual IRQ number.
      Now, if another CPU comes along and calls irq_create_mapping(), it
      is likely to get the virtual IRQ number that was just freed,
      resulting in the same virtual IRQ number apparently being used for
      two different hardware interrupts.
      To fix this race, we just move the call to msi_bitmap_free_hwirqs()
      to after the call to irq_dispose_mapping().  Since virq_to_hw()
      doesn't work for the virtual IRQ number after irq_dispose_mapping()
      has been called, we need to call it before irq_dispose_mapping() and
      remember the result for the msi_bitmap_free_hwirqs() call.
      The pattern of calling msi_bitmap_free_hwirqs() before
      irq_dispose_mapping() appears in 5 places under arch/powerpc, and
      appears to have originated in commit 05af7bd2 ("[POWERPC] MPIC
      U3/U4 MSI backend") from 2007.
      Fixes: 05af7bd2
       ("[POWERPC] MPIC U3/U4 MSI backend")
      Reported-by: default avatarAlexey Kardashevskiy <aik@ozlabs.ru>
      Signed-off-by: default avatarPaul Mackerras <paulus@samba.org>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
    • Gautham R. Shenoy's avatar
      KVM: PPC: Book3S HV: Pass the correct trap argument to kvmhv_commence_exit · 6d9cc6c1
      Gautham R. Shenoy authored
      commit 7e022e71 upstream.
      In guest_exit_cont we call kvmhv_commence_exit which expects the trap
      number as the argument. However r3 doesn't contain the trap number at
      this point and as a result we would be calling the function with a
      spurious trap number.
      Fix this by copying r12 into r3 before calling kvmhv_commence_exit as
      r12 contains the trap number.
      Fixes: eddb60fb
      Signed-off-by: default avatarGautham R. Shenoy <ego@linux.vnet.ibm.com>
      Signed-off-by: default avatarPaul Mackerras <paulus@samba.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
    • Thomas Huth's avatar
      KVM: PPC: Book3S: Take the kvm->srcu lock in kvmppc_h_logical_ci_load/store() · ffd269ee
      Thomas Huth authored
      commit 3eb4ee68 upstream.
      Access to the kvm->buses (like with the kvm_io_bus_read() and -write()
      functions) has to be protected via the kvm->srcu lock.
      The kvmppc_h_logical_ci_load() and -store() functions are missing
      this lock so far, so let's add it there, too.
      This fixes the problem that the kernel reports "suspicious RCU usage"
      when lock debugging is enabled.
      Fixes: 99342cf8
      Signed-off-by: default avatarThomas Huth <thuth@redhat.com>
      Signed-off-by: default avatarPaul Mackerras <paulus@samba.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
  14. 29 Sep, 2015 8 commits
    • Aneesh Kumar K.V's avatar
      powerpc/mm: Recompute hash value after a failed update · f5a73e9c
      Aneesh Kumar K.V authored
      commit 36b35d5d upstream.
      If we had secondary hash flag set, we ended up modifying hash value in
      the updatepp code path. Hence with a failed updatepp we will be using
      a wrong hash value for the following hash insert. Fix this by
      recomputing hash before insert.
      Without this patch we can end up with using wrong slot number in linux
      pte. That can result in us missing an hash pte update or invalidate
      which can cause memory corruption or even machine check.
      Fixes: 6d492ecc
       ("powerpc/THP: Add code to handle HPTE faults for hugepages")
      Signed-off-by: default avatarAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Reviewed-by: default avatarPaul Mackerras <paulus@samba.org>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
    • Benjamin Herrenschmidt's avatar
      powerpc/boot: Specify ABI v2 when building an LE boot wrapper · b46f51da
      Benjamin Herrenschmidt authored
      commit 655471f5 upstream.
      The kernel does it, not the boot wrapper, which breaks with some
      cross compilers that still default to ABI v1.
      Fixes: 147c0516
       ("powerpc/boot: Add support for 64bit little endian wrapper")
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
    • Leonidas Da Silva Barbosa's avatar
      powerpc: Uncomment and make enable_kernel_vsx() routine available · b4092436
      Leonidas Da Silva Barbosa authored
      commit 72cd7b44
      enable_kernel_vsx() function was commented since anything was using
      it. However, vmx-crypto driver uses VSX instructions which are
      only available if VSX is enable. Otherwise it rises an exception oops.
      This patch uncomment enable_kernel_vsx() routine and makes it available.
      Signed-off-by: default avatarLeonidas S. Barbosa <leosilva@linux.vnet.ibm.com>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
    • Thomas Huth's avatar
      powerpc/rtas: Introduce rtas_get_sensor_fast() for IRQ handlers · ce813f1f
      Thomas Huth authored
      commit 1c2cb594 upstream.
      The EPOW interrupt handler uses rtas_get_sensor(), which in turn
      uses rtas_busy_delay() to wait for RTAS becoming ready in case it
      is necessary. But rtas_busy_delay() is annotated with might_sleep()
      and thus may not be used by interrupts handlers like the EPOW handler!
      This leads to the following BUG when CONFIG_DEBUG_ATOMIC_SLEEP is
       BUG: sleeping function called from invalid context at arch/powerpc/kernel/rtas.c:496
       in_atomic(): 1, irqs_disabled(): 1, pid: 0, name: swapper/1
       CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.2.0-rc2-thuth #6
       Call Trace:
       [c00000007ffe7b90] [c000000000807670] dump_stack+0xa0/0xdc (unreliable)
       [c00000007ffe7bc0] [c0000000000e1f14] ___might_sleep+0x134/0x180
       [c00000007ffe7c20] [c00000000002aec0] rtas_busy_delay+0x30/0xd0
       [c00000007ffe7c50] [c00000000002bde4] rtas_get_sensor+0x74/0xe0
       [c00000007ffe7ce0] [c000000000083264] ras_epow_interrupt+0x44/0x450
       [c00000007ffe7d90] [c000000000120260] handle_irq_event_percpu+0xa0/0x300
       [c00000007ffe7e70] [c000000000120524] handle_irq_event+0x64/0xc0
       [c00000007ffe7eb0] [c000000000124dbc] handle_fasteoi_irq+0xec/0x260
       [c00000007ffe7ef0] [c00000000011f4f0] generic_handle_irq+0x50/0x80
       [c00000007ffe7f20] [c000000000010f3c] __do_irq+0x8c/0x200
       [c00000007ffe7f90] [c0000000000236cc] call_do_irq+0x14/0x24
       [c00000007e6f39e0] [c000000000011144] do_IRQ+0x94/0x110
       [c00000007e6f3a30] [c000000000002594] hardware_interrupt_common+0x114/0x180
      Fix this issue by introducing a new rtas_get_sensor_fast() function
      that does not use rtas_busy_delay() - and thus can only be used for
      sensors that do not cause a BUSY condition - known as "fast" sensors.
      The EPOW sensor is defined to be "fast" in sPAPR - mpe.
      Fixes: 587f83e8
       ("powerpc/pseries: Use rtas_get_sensor in RAS code")
      Signed-off-by: default avatarThomas Huth <thuth@redhat.com>
      Reviewed-by: default avatarNathan Fontenot <nfont@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
    • Michael Ellerman's avatar
      powerpc/mm: Fix pte_pagesize_index() crash on 4K w/64K hash · e4b33421
      Michael Ellerman authored
      commit 74b5037b upstream.
      The powerpc kernel can be built to have either a 4K PAGE_SIZE or a 64K
      However when built with a 4K PAGE_SIZE there is an additional config
      option which can be enabled, PPC_HAS_HASH_64K, which means the kernel
      also knows how to hash a 64K page even though the base PAGE_SIZE is 4K.
      This is used in one obscure configuration, to support 64K pages for SPU
      local store on the Cell processor when the rest of the kernel is using
      4K pages.
      In this configuration, pte_pagesize_index() is defined to just pass
      through its arguments to get_slice_psize(). However pte_pagesize_index()
      is called for both user and kernel addresses, whereas get_slice_psize()
      only knows how to handle user addresses.
      This has been broken forever, however until recently it happened to
      work. That was because in get_slice_psize() the large kernel address
      would cause the right shift of the slice mask to return zero.
      However in commit 7aa0727f ("powerpc/mm: Increase the slice range to
      64TB"), the get_slice_psize() code was changed so that instead of a
      right shift we do an array lookup based on the address. When passed a
      kernel address this means we index way off the end of the slice array
      and return random junk.
      That is only fatal if we happen to hit something non-zero, but when we
      do return a non-zero value we confuse the MMU code and eventually cause
      a check stop.
      This fix is ugly, but simple. When we're called for a kernel address we
      return 4K, which is always correct in this configuration, otherwise we
      use the slice mask.
      Fixes: 7aa0727f
       ("powerpc/mm: Increase the slice range to 64TB")
      Reported-by: default avatarCyril Bur <cyrilbur@gmail.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Reviewed-by: default avatarAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
    • Gavin Shan's avatar
      powerpc/eeh: Fix fenced PHB caused by eeh_slot_error_detail() · f1ab3c04
      Gavin Shan authored
      commit 25980013 upstream.
      The config space of some PCI devices can't be accessed when their
      PEs are in frozen state. Otherwise, fenced PHB might be seen.
      Those PEs are identified with flag EEH_PE_CFG_RESTRICTED, meaing
      EEH_PE_CFG_BLOCKED is set automatically when the PE is put to
      frozen state (EEH_PE_ISOLATED). eeh_slot_error_detail() restores
      PCI device BARs with eeh_pe_restore_bars(), which then calls
      eeh_ops->restore_config() to reinitialize the PCI device in
      (OPAL) firmware. eeh_ops->restore_config() produces PCI config
      access that causes fenced PHB. The problem was reported on below
         0001:01:00.0 0200: 14e4:168e (rev 10)
         0001:01:00.0 Ethernet controller: Broadcom Corporation \
                      NetXtreme II BCM57810 10 Gigabit Ethernet (rev 10)
      This fixes the issue by skipping eeh_pe_restore_bars() in
      eeh_slot_error_detail() when EEH_PE_CFG_BLOCKED is set for the PE.
      Fixes: b6541db1
       ("powerpc/eeh: Block PCI config access upon frozen PE")
      Reported-by: default avatarManvanthara B. Puttashankar <mputtash@in.ibm.com>
      Signed-off-by: default avatarGavin Shan <gwshan@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
    • Daniel Axtens's avatar
      powerpc/eeh: Probe after unbalanced kref check · 91552f87
      Daniel Axtens authored
      commit e642d11b upstream.
      In the complete hotplug case, EEH PEs are supposed to be released
      and set to NULL. Normally, this is done by eeh_remove_device(),
      which is called from pcibios_release_device().
      However, if something is holding a kref to the device, it will not
      be released, and the PE will remain. eeh_add_device_late() has
      a check for this which will explictly destroy the PE in this case.
      This check in eeh_add_device_late() occurs after a call to
      eeh_ops->probe(). On PowerNV, probe is a pointer to pnv_eeh_probe(),
      which will exit without probing if there is an existing PE.
      This means that on PowerNV, devices with outstanding krefs will not
      be rediscovered by EEH correctly after a complete hotplug. This is
      affecting CXL (CAPI) devices in the field.
      Put the probe after the kref check so that the PE is destroyed
      and affected devices are correctly rediscovered by EEH.
      Fixes: d91dafc0
       ("powerpc/eeh: Delay probing EEH device during hotplug")
      Cc: Gavin Shan <gwshan@linux.vnet.ibm.com>
      Signed-off-by: default avatarDaniel Axtens <dja@axtens.net>
      Acked-by: default avatarGavin Shan <gwshan@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
    • Gavin Shan's avatar
      powerpc/pseries: Fix corrupted pdn list · 7584b2d8
      Gavin Shan authored
      commit 590c7567 upstream.
      Commit cca87d30 ("powerpc/pci: Refactor pci_dn") introduced pdn
      list for SRIOV VFs. It means the pdn is be put into the child list
      of its parent pdn when the pdn is created. When doing PCI hot
      unplugging on pSeries, the PCI device node as well as its pdn are
      released through procfs entry "powerpc/ofdt". Some one else grabs
      the memory chunk of the pdn and update it accordingly. At the same
      time, the pdn is still tracked in the child list of parent pdn. It
      leads to corrupted child list in the parent pdn.
      This fixes above issue by removing the pdn from the child list of
      its parent pdn when the device node is detached from the system.
      Note the pdn is free'd when the device node is released if the
      device node is dynamic one. Otherwise, the device node as well
      as the pdn won't be released.
      Fixes: cca87d30
       ("powerpc/pci: Refactor pci_dn")
      Reported-by: default avatarSantwana Samantray <santwana.samantray@in.ibm.com>
      Signed-off-by: default avatarGavin Shan <gwshan@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
  15. 21 Sep, 2015 2 commits
    • Paul Mackerras's avatar
      KVM: PPC: Book3S HV: Fix race in reading change bit when removing HPTE · 73e56fdc
      Paul Mackerras authored
      commit 1e5bf454 upstream.
      The reference (R) and change (C) bits in a HPT entry can be set by
      hardware at any time up until the HPTE is invalidated and the TLB
      invalidation sequence has completed.  This means that when removing
      a HPTE, we need to read the HPTE after the invalidation sequence has
      completed in order to obtain reliable values of R and C.  The code
      in kvmppc_do_h_remove() used to do this.  However, commit 6f22bd32
      ("KVM: PPC: Book3S HV: Make HTAB code LE host aware") removed the
      read after invalidation as a side effect of other changes.  This
      restores the read of the HPTE after invalidation.
      The user-visible effect of this bug would be that when migrating a
      guest, there is a small probability that a page modified by the guest
      and then unmapped by the guest might not get re-transmitted and thus
      the destination might end up with a stale copy of the page.
      Fixes: 6f22bd32
      Signed-off-by: default avatarPaul Mackerras <paulus@samba.org>
      Signed-off-by: default avatarAlexander Graf <agraf@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
    • Gautham R. Shenoy's avatar
      KVM: PPC: Book3S HV: Exit on H_DOORBELL if HOST_IPI is set · 76c77a45
      Gautham R. Shenoy authored
      commit 06554d9f upstream.
      The code that handles the case when we receive a H_DOORBELL interrupt
      has a comment which says "Hypervisor doorbell - exit only if host IPI
      flag set".  However, the current code does not actually check if the
      host IPI flag is set.  This is due to a comparison instruction that
      got missed.
      As a result, the current code performs the exit to host only
      if some sibling thread or a sibling sub-core is exiting to the
      host.  This implies that, an IPI sent to a sibling core in
      (subcores-per-core != 1) mode will be missed by the host unless the
      sibling core is on the exit path to the host.
      This patch adds the missing comparison operation which will ensure
      that when HOST_IPI flag is set, we unconditionally exit to the host.
      Fixes: 66feed61
      Signed-off-by: default avatarGautham R. Shenoy <ego@linux.vnet.ibm.com>
      Reviewed-by: default avatarDavid Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: default avatarPaul Mackerras <paulus@samba.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>