1. 07 Oct, 2019 13 commits
    • Will Deacon's avatar
      ARM: 8898/1: mm: Don't treat faults reported from cache maintenance as writes · 6a684e00
      Will Deacon authored
      [ Upstream commit 83402036
      
       ]
      
      Translation faults arising from cache maintenance instructions are
      rather unhelpfully reported with an FSR value where the WnR field is set
      to 1, indicating that the faulting access was a write. Since cache
      maintenance instructions on 32-bit ARM do not require any particular
      permissions, this can cause our private 'cacheflush' system call to fail
      spuriously if a translation fault is generated due to page aging when
      targetting a read-only VMA.
      
      In this situation, we will return -EFAULT to userspace, although this is
      unfortunately suppressed by the popular '__builtin___clear_cache()'
      intrinsic provided by GCC, which returns void.
      
      Although it's tempting to write this off as a userspace issue, we can
      actually do a little bit better on CPUs that support LPAE, even if the
      short-descriptor format is in use. On these CPUs, cache maintenance
      faults additionally set the CM field in the FSR, which we can use to
      suppress the write permission checks in the page fault handler and
      succeed in performing cache maintenance to read-only areas even in the
      presence of a translation fault.
      Reported-by: default avatarOrion Hodson <oth@google.com>
      Signed-off-by: default avatarWill Deacon <will@kernel.org>
      Signed-off-by: default avatarRussell King <rmk+kernel@armlinux.org.uk>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      6a684e00
    • Nathan Chancellor's avatar
      MIPS: tlbex: Explicitly cast _PAGE_NO_EXEC to a boolean · 371077ea
      Nathan Chancellor authored
      [ Upstream commit c59ae0a1 ]
      
      clang warns:
      
      arch/mips/mm/tlbex.c:634:19: error: use of logical '&&' with constant
      operand [-Werror,-Wconstant-logical-operand]
              if (cpu_has_rixi && _PAGE_NO_EXEC) {
                               ^  ~~~~~~~~~~~~~
      arch/mips/mm/tlbex.c:634:19: note: use '&' for a bitwise operation
              if (cpu_has_rixi && _PAGE_NO_EXEC) {
                               ^~
                               &
      arch/mips/mm/tlbex.c:634:19: note: remove constant to silence this
      warning
              if (cpu_has_rixi && _PAGE_NO_EXEC) {
                              ~^~~~~~~~~~~~~~~~
      1 error generated.
      
      Explicitly cast this value to a boolean so that clang understands we
      intend for this to be a non-zero value.
      
      Fixes: 00bf1c69 ("MIPS: tlbex: Avoid placing software PTE bits in Entry* PFN fields")
      Link: https://github.com/ClangBuiltLinux/linux/issues/609
      
      Signed-off-by: default avatarNathan Chancellor <natechancellor@gmail.com>
      Signed-off-by: default avatarPaul Burton <paul.burton@mips.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: James Hogan <jhogan@kernel.org>
      Cc: Nick Desaulniers <ndesaulniers@google.com>
      Cc: linux-mips@vger.kernel.org
      Cc: linux-kernel@vger.kernel.org
      Cc: clang-built-linux@googlegroups.com
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      371077ea
    • Zhou Yanjie's avatar
      MIPS: Ingenic: Disable broken BTB lookup optimization. · 3ed14a8d
      Zhou Yanjie authored
      [ Upstream commit 053951dd
      
       ]
      
      In order to further reduce power consumption, the XBurst core
      by default attempts to avoid branch target buffer lookups by
      detecting & special casing loops. This feature will cause
      BogoMIPS and lpj calculate in error. Set cp0 config7 bit 4 to
      disable this feature.
      Signed-off-by: default avatarZhou Yanjie <zhouyanjie@zoho.com>
      Signed-off-by: default avatarPaul Burton <paul.burton@mips.com>
      Cc: linux-mips@vger.kernel.org
      Cc: linux-kernel@vger.kernel.org
      Cc: ralf@linux-mips.org
      Cc: paul@crapouillou.net
      Cc: jhogan@kernel.org
      Cc: malat@debian.org
      Cc: gregkh@linuxfoundation.org
      Cc: tglx@linutronix.de
      Cc: allison@lohutok.net
      Cc: syq@debian.org
      Cc: chenhc@lemote.com
      Cc: jiaxun.yang@flygoat.com
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      3ed14a8d
    • Ganesh Goudar's avatar
      powerpc: dump kernel log before carrying out fadump or kdump · 324b0c9e
      Ganesh Goudar authored
      [ Upstream commit e7ca44ed ]
      
      Since commit 4388c9b3 ("powerpc: Do not send system reset request
      through the oops path"), pstore dmesg file is not updated when dump is
      triggered from HMC. This commit modified system reset (sreset) handler
      to invoke fadump or kdump (if configured), without pushing dmesg to
      pstore. This leaves pstore to have old dmesg data which won't be much
      of a help if kdump fails to capture the dump. This patch fixes that by
      calling kmsg_dump() before heading to fadump ot kdump.
      
      Fixes: 4388c9b3
      
       ("powerpc: Do not send system reset request through the oops path")
      Reviewed-by: default avatarMahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
      Reviewed-by: default avatarNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: default avatarGanesh Goudar <ganeshgr@linux.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/20190904075949.15607-1-ganeshgr@linux.ibm.com
      
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      324b0c9e
    • Arnd Bergmann's avatar
      arm64: fix unreachable code issue with cmpxchg · 952d1c6d
      Arnd Bergmann authored
      [ Upstream commit 920fdab7
      
       ]
      
      On arm64 build with clang, sometimes the __cmpxchg_mb is not inlined
      when CONFIG_OPTIMIZE_INLINING is set.
      Clang then fails a compile-time assertion, because it cannot tell at
      compile time what the size of the argument is:
      
      mm/memcontrol.o: In function `__cmpxchg_mb':
      memcontrol.c:(.text+0x1a4c): undefined reference to `__compiletime_assert_175'
      memcontrol.c:(.text+0x1a4c): relocation truncated to fit: R_AARCH64_CALL26 against undefined symbol `__compiletime_assert_175'
      
      Mark all of the cmpxchg() style functions as __always_inline to
      ensure that the compiler can see the result.
      Acked-by: default avatarNick Desaulniers <ndesaulniers@google.com>
      Reported-by: default avatarNathan Chancellor <natechancellor@gmail.com>
      Link: https://github.com/ClangBuiltLinux/linux/issues/648
      
      Reviewed-by: default avatarNathan Chancellor <natechancellor@gmail.com>
      Tested-by: default avatarNathan Chancellor <natechancellor@gmail.com>
      Reviewed-by: default avatarAndrew Murray <andrew.murray@arm.com>
      Tested-by: default avatarAndrew Murray <andrew.murray@arm.com>
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarWill Deacon <will@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      952d1c6d
    • Nathan Lynch's avatar
      powerpc/pseries: correctly track irq state in default idle · b717a47d
      Nathan Lynch authored
      [ Upstream commit 92c94dfb ]
      
      prep_irq_for_idle() is intended to be called before entering
      H_CEDE (and it is used by the pseries cpuidle driver). However the
      default pseries idle routine does not call it, leading to mismanaged
      lazy irq state when the cpuidle driver isn't in use. Manifestations of
      this include:
      
      * Dropped IPIs in the time immediately after a cpu comes
        online (before it has installed the cpuidle handler), making the
        online operation block indefinitely waiting for the new cpu to
        respond.
      
      * Hitting this WARN_ON in arch_local_irq_restore():
      	/*
      	 * We should already be hard disabled here. We had bugs
      	 * where that wasn't the case so let's dbl check it and
      	 * warn if we are wrong. Only do that when IRQ tracing
      	 * is enabled as mfmsr() can be costly.
      	 */
      	if (WARN_ON_ONCE(mfmsr() & MSR_EE))
      		__hard_irq_disable();
      
      Call prep_irq_for_idle() from pseries_lpar_idle() and honor its
      result.
      
      Fixes: 363edbe2
      
       ("powerpc: Default arch idle could cede processor on pseries")
      Signed-off-by: default avatarNathan Lynch <nathanl@linux.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/20190910225244.25056-1-nathanl@linux.ibm.com
      
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      b717a47d
    • Nicholas Piggin's avatar
      powerpc/64s/exception: machine check use correct cfar for late handler · 0c09b028
      Nicholas Piggin authored
      [ Upstream commit 0b66370c
      
       ]
      
      Bare metal machine checks run an "early" handler in real mode before
      running the main handler which reports the event.
      
      The main handler runs exactly as a normal interrupt handler, after the
      "windup" which sets registers back as they were at interrupt entry.
      CFAR does not get restored by the windup code, so that will be wrong
      when the handler is run.
      
      Restore the CFAR to the saved value before running the late handler.
      Signed-off-by: default avatarNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/20190802105709.27696-8-npiggin@gmail.com
      
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      0c09b028
    • Sam Bobroff's avatar
      powerpc/eeh: Clear stale EEH_DEV_NO_HANDLER flag · c1f7b3fb
      Sam Bobroff authored
      [ Upstream commit aa06e3d6
      
       ]
      
      The EEH_DEV_NO_HANDLER flag is used by the EEH system to prevent the
      use of driver callbacks in drivers that have been bound part way
      through the recovery process. This is necessary to prevent later stage
      handlers from being called when the earlier stage handlers haven't,
      which can be confusing for drivers.
      
      However, the flag is set for all devices that are added after boot
      time and only cleared at the end of the EEH recovery process. This
      results in hot plugged devices erroneously having the flag set during
      the first recovery after they are added (causing their driver's
      handlers to be incorrectly ignored).
      
      To remedy this, clear the flag at the beginning of recovery
      processing. The flag is still cleared at the end of recovery
      processing, although it is no longer really necessary.
      
      Also clear the flag during eeh_handle_special_event(), for the same
      reasons.
      Signed-off-by: default avatarSam Bobroff <sbobroff@linux.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/b8ca5629d27de74c957d4f4b250177d1b6fc4bbd.1565930772.git.sbobroff@linux.ibm.com
      
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      c1f7b3fb
    • Nathan Lynch's avatar
      powerpc/pseries/mobility: use cond_resched when updating device tree · 4c91e678
      Nathan Lynch authored
      [ Upstream commit ccfb5bd7
      
       ]
      
      After a partition migration, pseries_devicetree_update() processes
      changes to the device tree communicated from the platform to
      Linux. This is a relatively heavyweight operation, with multiple
      device tree searches, memory allocations, and conversations with
      partition firmware.
      
      There's a few levels of nested loops which are bounded only by
      decisions made by the platform, outside of Linux's control, and indeed
      we have seen RCU stalls on large systems while executing this call
      graph. Use cond_resched() in these loops so that the cpu is yielded
      when needed.
      Signed-off-by: default avatarNathan Lynch <nathanl@linux.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/20190802192926.19277-4-nathanl@linux.ibm.com
      
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      4c91e678
    • Christophe Leroy's avatar
      powerpc/futex: Fix warning: 'oldval' may be used uninitialized in this function · 6d728a17
      Christophe Leroy authored
      [ Upstream commit 38a0d0cd
      
       ]
      
      We see warnings such as:
        kernel/futex.c: In function 'do_futex':
        kernel/futex.c:1676:17: warning: 'oldval' may be used uninitialized in this function [-Wmaybe-uninitialized]
           return oldval == cmparg;
                         ^
        kernel/futex.c:1651:6: note: 'oldval' was declared here
          int oldval, ret;
              ^
      
      This is because arch_futex_atomic_op_inuser() only sets *oval if ret
      is 0 and GCC doesn't see that it will only use it when ret is 0.
      
      Anyway, the non-zero ret path is an error path that won't suffer from
      setting *oval, and as *oval is a local var in futex_atomic_op_inuser()
      it will have no impact.
      Signed-off-by: default avatarChristophe Leroy <christophe.leroy@c-s.fr>
      [mpe: reword change log slightly]
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/86b72f0c134367b214910b27b9a6dd3321af93bb.1565774657.git.christophe.leroy@c-s.fr
      
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      6d728a17
    • Nathan Lynch's avatar
      powerpc/rtas: use device model APIs and serialization during LPM · 6aa455b0
      Nathan Lynch authored
      [ Upstream commit a6717c01 ]
      
      The LPAR migration implementation and userspace-initiated cpu hotplug
      can interleave their executions like so:
      
      1. Set cpu 7 offline via sysfs.
      
      2. Begin a partition migration, whose implementation requires the OS
         to ensure all present cpus are online; cpu 7 is onlined:
      
           rtas_ibm_suspend_me -> rtas_online_cpus_mask -> cpu_up
      
         This sets cpu 7 online in all respects except for the cpu's
         corresponding struct device; dev->offline remains true.
      
      3. Set cpu 7 online via sysfs. _cpu_up() determines that cpu 7 is
         already online and returns success. The driver core (device_online)
         sets dev->offline = false.
      
      4. The migration completes and restores cpu 7 to offline state:
      
           rtas_ibm_suspend_me -> rtas_offline_cpus_mask -> cpu_down
      
      This leaves cpu7 in a state where the driver core considers the cpu
      device online, but in all other respects it is offline and
      unused. Attempts to online the cpu via sysfs appear to succeed but the
      driver core actually does not pass the request to the lower-level
      cpuhp support code. This makes the cpu unusable until the cpu device
      is manually set offline and then online again via sysfs.
      
      Instead of directly calling cpu_up/cpu_down, the migration code should
      use the higher-level device core APIs to maintain consistent state and
      serialize operations.
      
      Fixes: 120496ac
      
       ("powerpc: Bring all threads online prior to migration/hibernation")
      Signed-off-by: default avatarNathan Lynch <nathanl@linux.ibm.com>
      Reviewed-by: default avatarGautham R. Shenoy <ego@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/20190802192926.19277-2-nathanl@linux.ibm.com
      
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      6aa455b0
    • Cédric Le Goater's avatar
      powerpc/xmon: Check for HV mode when dumping XIVE info from OPAL · 25c501f0
      Cédric Le Goater authored
      [ Upstream commit c3e0dbd7
      
       ]
      
      Currently, the xmon 'dx' command calls OPAL to dump the XIVE state in
      the OPAL logs and also outputs some of the fields of the internal XIVE
      structures in Linux. The OPAL calls can only be done on baremetal
      (PowerNV) and they crash a pseries machine. Fix by checking the
      hypervisor feature of the CPU.
      Signed-off-by: default avatarCédric Le Goater <clg@kaod.org>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/20190814154754.23682-2-clg@kaod.org
      
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      25c501f0
    • Alexey Kardashevskiy's avatar
      powerpc/powernv/ioda2: Allocate TCE table levels on demand for default DMA window · 437399ed
      Alexey Kardashevskiy authored
      [ Upstream commit c37c792d
      
       ]
      
      We allocate only the first level of multilevel TCE tables for KVM
      already (alloc_userspace_copy==true), and the rest is allocated on demand.
      This is not enabled though for bare metal.
      
      This removes the KVM limitation (implicit, via the alloc_userspace_copy
      parameter) and always allocates just the first level. The on-demand
      allocation of missing levels is already implemented.
      
      As from now on DMA map might happen with disabled interrupts, this
      allocates TCEs with GFP_ATOMIC; otherwise lockdep reports errors 1].
      In practice just a single page is allocated there so chances for failure
      are quite low.
      
      To save time when creating a new clean table, this skips non-allocated
      indirect TCE entries in pnv_tce_free just like we already do in
      the VFIO IOMMU TCE driver.
      
      This changes the default level number from 1 to 2 to reduce the amount
      of memory required for the default 32bit DMA window at the boot time.
      The default window size is up to 2GB which requires 4MB of TCEs which is
      unlikely to be used entirely or at all as most devices these days are
      64bit capable so by switching to 2 levels by default we save 4032KB of
      RAM per a device.
      
      While at this, add __GFP_NOWARN to alloc_pages_node() as the userspace
      can trigger this path via VFIO, see the failure and try creating a table
      again with different parameters which might succeed.
      
      [1]:
      ===
      BUG: sleeping function called from invalid context at mm/page_alloc.c:4596
      in_atomic(): 1, irqs_disabled(): 1, pid: 1038, name: scsi_eh_1
      2 locks held by scsi_eh_1/1038:
       #0: 000000005efd659a (&host->eh_mutex){+.+.}, at: ata_eh_acquire+0x34/0x80
       #1: 0000000006cf56a6 (&(&host->lock)->rlock){....}, at: ata_exec_internal_sg+0xb0/0x5c0
      irq event stamp: 500
      hardirqs last  enabled at (499): [<c000000000cb8a74>] _raw_spin_unlock_irqrestore+0x94/0xd0
      hardirqs last disabled at (500): [<c000000000cb85c4>] _raw_spin_lock_irqsave+0x44/0x120
      softirqs last  enabled at (0): [<c000000000101120>] copy_process.isra.4.part.5+0x640/0x1a80
      softirqs last disabled at (0): [<0000000000000000>] 0x0
      CPU: 73 PID: 1038 Comm: scsi_eh_1 Not tainted 5.2.0-rc6-le_nv2_aikATfstn1-p1 #634
      Call Trace:
      [c000003d064cef50] [c000000000c8e6c4] dump_stack+0xe8/0x164 (unreliable)
      [c000003d064cefa0] [c00000000014ed78] ___might_sleep+0x2f8/0x310
      [c000003d064cf020] [c0000000003ca084] __alloc_pages_nodemask+0x2a4/0x1560
      [c000003d064cf220] [c0000000000c2530] pnv_alloc_tce_level.isra.0+0x90/0x130
      [c000003d064cf290] [c0000000000c2888] pnv_tce+0x128/0x3b0
      [c000003d064cf360] [c0000000000c2c00] pnv_tce_build+0xb0/0xf0
      [c000003d064cf3c0] [c0000000000bbd9c] pnv_ioda2_tce_build+0x3c/0xb0
      [c000003d064cf400] [c00000000004cfe0] ppc_iommu_map_sg+0x210/0x550
      [c000003d064cf510] [c00000000004b7a4] dma_iommu_map_sg+0x74/0xb0
      [c000003d064cf530] [c000000000863944] ata_qc_issue+0x134/0x470
      [c000003d064cf5b0] [c000000000863ec4] ata_exec_internal_sg+0x244/0x5c0
      [c000003d064cf700] [c0000000008642d0] ata_exec_internal+0x90/0xe0
      [c000003d064cf780] [c0000000008650ac] ata_dev_read_id+0x2ec/0x640
      [c000003d064cf8d0] [c000000000878e28] ata_eh_recover+0x948/0x16d0
      [c000003d064cfa10] [c00000000087d760] sata_pmp_error_handler+0x480/0xbf0
      [c000003d064cfbc0] [c000000000884624] ahci_error_handler+0x74/0xe0
      [c000003d064cfbf0] [c000000000879fa8] ata_scsi_port_error_handler+0x2d8/0x7c0
      [c000003d064cfca0] [c00000000087a544] ata_scsi_error+0xb4/0x100
      [c000003d064cfd00] [c000000000802450] scsi_error_handler+0x120/0x510
      [c000003d064cfdb0] [c000000000140c48] kthread+0x1b8/0x1c0
      [c000003d064cfe20] [c00000000000bd8c] ret_from_kernel_thread+0x5c/0x70
      ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
      irq event stamp: 2305
      
      ========================================================
      hardirqs last  enabled at (2305): [<c00000000000e4c8>] fast_exc_return_irq+0x28/0x34
      hardirqs last disabled at (2303): [<c000000000cb9fd0>] __do_softirq+0x4a0/0x654
      WARNING: possible irq lock inversion dependency detected
      5.2.0-rc6-le_nv2_aikATfstn1-p1 #634 Tainted: G        W
      softirqs last  enabled at (2304): [<c000000000cba054>] __do_softirq+0x524/0x654
      softirqs last disabled at (2297): [<c00000000010f278>] irq_exit+0x128/0x180
      --------------------------------------------------------
      swapper/0/0 just changed the state of lock:
      0000000006cf56a6 (&(&host->lock)->rlock){-...}, at: ahci_single_level_irq_intr+0xac/0x120
      but this lock took another, HARDIRQ-unsafe lock in the past:
       (fs_reclaim){+.+.}
      
      and interrupts could create inverse lock ordering between them.
      
      other info that might help us debug this:
       Possible interrupt unsafe locking scenario:
      
             CPU0                    CPU1
             ----                    ----
        lock(fs_reclaim);
                                     local_irq_disable();
                                     lock(&(&host->lock)->rlock);
                                     lock(fs_reclaim);
        <Interrupt>
          lock(&(&host->lock)->rlock);
      
       *** DEADLOCK ***
      
      no locks held by swapper/0/0.
      
      the shortest dependencies between 2nd lock and 1st lock:
       -> (fs_reclaim){+.+.} ops: 167579 {
          HARDIRQ-ON-W at:
                            lock_acquire+0xf8/0x2a0
                            fs_reclaim_acquire.part.23+0x44/0x60
                            kmem_cache_alloc_node_trace+0x80/0x590
                            alloc_desc+0x64/0x270
                            __irq_alloc_descs+0x2e4/0x3a0
                            irq_domain_alloc_descs+0xb0/0x150
                            irq_create_mapping+0x168/0x2c0
                            xics_smp_probe+0x2c/0x98
                            pnv_smp_probe+0x40/0x9c
                            smp_prepare_cpus+0x524/0x6c4
                            kernel_init_freeable+0x1b4/0x650
                            kernel_init+0x2c/0x148
                            ret_from_kernel_thread+0x5c/0x70
          SOFTIRQ-ON-W at:
                            lock_acquire+0xf8/0x2a0
                            fs_reclaim_acquire.part.23+0x44/0x60
                            kmem_cache_alloc_node_trace+0x80/0x590
                            alloc_desc+0x64/0x270
                            __irq_alloc_descs+0x2e4/0x3a0
                            irq_domain_alloc_descs+0xb0/0x150
                            irq_create_mapping+0x168/0x2c0
                            xics_smp_probe+0x2c/0x98
                            pnv_smp_probe+0x40/0x9c
                            smp_prepare_cpus+0x524/0x6c4
                            kernel_init_freeable+0x1b4/0x650
                            kernel_init+0x2c/0x148
                            ret_from_kernel_thread+0x5c/0x70
          INITIAL USE at:
                           lock_acquire+0xf8/0x2a0
                           fs_reclaim_acquire.part.23+0x44/0x60
                           kmem_cache_alloc_node_trace+0x80/0x590
                           alloc_desc+0x64/0x270
                           __irq_alloc_descs+0x2e4/0x3a0
                           irq_domain_alloc_descs+0xb0/0x150
                           irq_create_mapping+0x168/0x2c0
                           xics_smp_probe+0x2c/0x98
                           pnv_smp_probe+0x40/0x9c
                           smp_prepare_cpus+0x524/0x6c4
                           kernel_init_freeable+0x1b4/0x650
                           kernel_init+0x2c/0x148
                           ret_from_kernel_thread+0x5c/0x70
        }
      ===
      Signed-off-by: default avatarAlexey Kardashevskiy <aik@ozlabs.ru>
      Reviewed-by: default avatarAlistair Popple <alistair@popple.id.au>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/20190718051139.74787-4-aik@ozlabs.ru
      
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      437399ed
  2. 05 Oct, 2019 24 commits
    • Shawn Lin's avatar
      arm64: dts: rockchip: limit clock rate of MMC controllers for RK3328 · 174bbcc5
      Shawn Lin authored
      commit 03e61929 upstream.
      
      150MHz is a fundamental limitation of RK3328 Soc, w/o this limitation,
      eMMC, for instance, will run into 200MHz clock rate in HS200 mode, which
      makes the RK3328 boards not always boot properly. By adding it in
      rk3328.dtsi would also obviate the worry of missing it when adding new
      boards.
      
      Fixes: 52e02d37
      
       ("arm64: dts: rockchip: add core dtsi file for RK3328 SoCs")
      Cc: stable@vger.kernel.org
      Cc: Robin Murphy <robin.murphy@arm.com>
      Cc: Liang Chen <cl@rock-chips.com>
      Signed-off-by: default avatarShawn Lin <shawn.lin@rock-chips.com>
      Signed-off-by: default avatarHeiko Stuebner <heiko@sntech.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      174bbcc5
    • Will Deacon's avatar
      arm64: tlb: Ensure we execute an ISB following walk cache invalidation · 8cfe3b8a
      Will Deacon authored
      commit 51696d34 upstream.
      
      05f2d2f8 ("arm64: tlbflush: Introduce __flush_tlb_kernel_pgtable")
      added a new TLB invalidation helper which is used when freeing
      intermediate levels of page table used for kernel mappings, but is
      missing the required ISB instruction after completion of the TLBI
      instruction.
      
      Add the missing barrier.
      
      Cc: <stable@vger.kernel.org>
      Fixes: 05f2d2f8
      
       ("arm64: tlbflush: Introduce __flush_tlb_kernel_pgtable")
      Reviewed-by: default avatarMark Rutland <mark.rutland@arm.com>
      Signed-off-by: default avatarWill Deacon <will@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8cfe3b8a
    • Will Deacon's avatar
      Revert "arm64: Remove unnecessary ISBs from set_{pte,pmd,pud}" · fc7d6bfd
      Will Deacon authored
      commit d0b7a302 upstream.
      
      This reverts commit 24fe1b0e.
      
      Commit 24fe1b0e ("arm64: Remove unnecessary ISBs from
      set_{pte,pmd,pud}") removed ISB instructions immediately following updates
      to the page table, on the grounds that they are not required by the
      architecture and a DSB alone is sufficient to ensure that subsequent data
      accesses use the new translation:
      
        DDI0487E_a, B2-128:
      
        | ... no instruction that appears in program order after the DSB
        | instruction can alter any state of the system or perform any part of
        | its functionality until the DSB completes other than:
        |
        | * Being fetched from memory and decoded
        | * Reading the general-purpose, SIMD and floating-point,
        |   Special-purpose, or System registers that are directly or indirectly
        |   read without causing side-effects.
      
      However, the same document also states the following:
      
        DDI0487E_a, B2-125:
      
        | DMB and DSB instructions affect reads and writes to the memory system
        | generated by Load/Store instructions and data or unified cache
        | maintenance instructions being executed by the PE. Instruction fetches
        | or accesses caused by a hardware translation table access are not
        | explicit accesses.
      
      which appears to claim that the DSB alone is insufficient.  Unfortunately,
      some CPU designers have followed the second clause above, whereas in Linux
      we've been relying on the first. This means that our mapping sequence:
      
      	MOV	X0, <valid pte>
      	STR	X0, [Xptep]	// Store new PTE to page table
      	DSB	ISHST
      	LDR	X1, [X2]	// Translates using the new PTE
      
      can actually raise a translation fault on the load instruction because the
      translation can be performed speculatively before the page table update and
      then marked as "faulting" by the CPU. For user PTEs, this is ok because we
      can handle the spurious fault, but for kernel PTEs and intermediate table
      entries this results in a panic().
      
      Revert the offending commit to reintroduce the missing barriers.
      
      Cc: <stable@vger.kernel.org>
      Fixes: 24fe1b0e
      
       ("arm64: Remove unnecessary ISBs from set_{pte,pmd,pud}")
      Reviewed-by: default avatarMark Rutland <mark.rutland@arm.com>
      Signed-off-by: default avatarWill Deacon <will@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      fc7d6bfd
    • Luis Araneda's avatar
      ARM: zynq: Use memcpy_toio instead of memcpy on smp bring-up · 881edc16
      Luis Araneda authored
      commit b7005d4e upstream.
      
      This fixes a kernel panic on memcpy when
      FORTIFY_SOURCE is enabled.
      
      The initial smp implementation on commit aa7eb2bb
      ("arm: zynq: Add smp support")
      used memcpy, which worked fine until commit ee333554
      ("ARM: 8749/1: Kconfig: Add ARCH_HAS_FORTIFY_SOURCE")
      enabled overflow checks at runtime, producing a read
      overflow panic.
      
      The computed size of memcpy args are:
      - p_size (dst): 4294967295 = (size_t) -1
      - q_size (src): 1
      - size (len): 8
      
      Additionally, the memory is marked as __iomem, so one of
      the memcpy_* functions should be used for read/write.
      
      Fixes: aa7eb2bb
      
       ("arm: zynq: Add smp support")
      Signed-off-by: default avatarLuis Araneda <luaraneda@gmail.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarMichal Simek <michal.simek@xilinx.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      881edc16
    • Lihua Yao's avatar
      ARM: samsung: Fix system restart on S3C6410 · 22092794
      Lihua Yao authored
      commit 16986074 upstream.
      
      S3C6410 system restart is triggered by watchdog reset.
      
      Cc: <stable@vger.kernel.org>
      Fixes: 9f55342c
      
       ("ARM: dts: s3c64xx: Fix infinite interrupt in soft mode")
      Signed-off-by: default avatarLihua Yao <ylhuajnu@outlook.com>
      Signed-off-by: default avatarKrzysztof Kozlowski <krzk@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      22092794
    • Sean Christopherson's avatar
      KVM: x86: Manually calculate reserved bits when loading PDPTRS · 496cf984
      Sean Christopherson authored
      commit 16cfacc8 upstream.
      
      Manually generate the PDPTR reserved bit mask when explicitly loading
      PDPTRs.  The reserved bits that are being tracked by the MMU reflect the
      current paging mode, which is unlikely to be PAE paging in the vast
      majority of flows that use load_pdptrs(), e.g. CR0 and CR4 emulation,
      __set_sregs(), etc...  This can cause KVM to incorrectly signal a bad
      PDPTR, or more likely, miss a reserved bit check and subsequently fail
      a VM-Enter due to a bad VMCS.GUEST_PDPTR.
      
      Add a one off helper to generate the reserved bits instead of sharing
      code across the MMU's calculations and the PDPTR emulation.  The PDPTR
      reserved bits are basically set in stone, and pushing a helper into
      the MMU's calculation adds unnecessary complexity without improving
      readability.
      
      Oppurtunistically fix/update the comment for load_pdptrs().
      
      Note, the buggy commit also introduced a deliberate functional change,
      "Also remove bit 5-6 from rsvd_bits_mask per latest SDM.", which was
      effectively (and correctly) reverted by commit cd9ae5fe ("KVM: x86:
      Fix page-tables reserved bits").  A bit of SDM archaeology shows that
      the SDM from late 2008 had a bug (likely a copy+paste error) where it
      listed bits 6:5 as AVL and A for PDPTEs used for 4k entries but reserved
      for 2mb entries.  I.e. the SDM contradicted itself, and bits 6:5 are and
      always have been reserved.
      
      Fixes: 20c466b5
      
       ("KVM: Use rsvd_bits_mask in load_pdptrs()")
      Cc: stable@vger.kernel.org
      Cc: Nadav Amit <nadav.amit@gmail.com>
      Reported-by: default avatarDoug Reiland <doug.reiland@intel.com>
      Signed-off-by: default avatarSean Christopherson <sean.j.christopherson@intel.com>
      Reviewed-by: default avatarPeter Xu <peterx@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      496cf984
    • Jan Dakinevich's avatar
      KVM: x86: set ctxt->have_exception in x86_decode_insn() · 933e3e2b
      Jan Dakinevich authored
      commit c8848cee upstream.
      
      x86_emulate_instruction() takes into account ctxt->have_exception flag
      during instruction decoding, but in practice this flag is never set in
      x86_decode_insn().
      
      Fixes: 6ea6e843
      
       ("KVM: x86: inject exceptions produced by x86_decode_insn")
      Cc: stable@vger.kernel.org
      Cc: Denis Lunev <den@virtuozzo.com>
      Cc: Roman Kagan <rkagan@virtuozzo.com>
      Cc: Denis Plotnikov <dplotnikov@virtuozzo.com>
      Signed-off-by: default avatarJan Dakinevich <jan.dakinevich@virtuozzo.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      933e3e2b
    • Jan Dakinevich's avatar
      KVM: x86: always stop emulation on page fault · 9723e445
      Jan Dakinevich authored
      commit 8530a79c upstream.
      
      inject_emulated_exception() returns true if and only if nested page
      fault happens. However, page fault can come from guest page tables
      walk, either nested or not nested. In both cases we should stop an
      attempt to read under RIP and give guest to step over its own page
      fault handler.
      
      This is also visible when an emulated instruction causes a #GP fault
      and the VMware backdoor is enabled.  To handle the VMware backdoor,
      KVM intercepts #GP faults; with only the next patch applied,
      x86_emulate_instruction() injects a #GP but returns EMULATE_FAIL
      instead of EMULATE_DONE.   EMULATE_FAIL causes handle_exception_nmi()
      (or gp_interception() for SVM) to re-inject the original #GP because it
      thinks emulation failed due to a non-VMware opcode.  This patch prevents
      the issue as x86_emulate_instruction() will return EMULATE_DONE after
      injecting the #GP.
      
      Fixes: 6ea6e843
      
       ("KVM: x86: inject exceptions produced by x86_decode_insn")
      Cc: stable@vger.kernel.org
      Cc: Denis Lunev <den@virtuozzo.com>
      Cc: Roman Kagan <rkagan@virtuozzo.com>
      Cc: Denis Plotnikov <dplotnikov@virtuozzo.com>
      Signed-off-by: default avatarJan Dakinevich <jan.dakinevich@virtuozzo.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9723e445
    • Madhavan Srinivasan's avatar
      powerpc/imc: Dont create debugfs files for cpu-less nodes · ecfe4b5f
      Madhavan Srinivasan authored
      commit 41ba17f2 upstream.
      
      Commit <684d9840> ('powerpc/powernv: Add debugfs interface for
      imc-mode and imc') added debugfs interface for the nest imc pmu
      devices to support changing of different ucode modes. Primarily adding
      this capability for debug. But when doing so, the code did not
      consider the case of cpu-less nodes. So when reading the _cmd_ or
      _mode_ file of a cpu-less node will create this crash.
      
        Faulting instruction address: 0xc0000000000d0d58
        Oops: Kernel access of bad area, sig: 11 [#1]
        ...
        CPU: 67 PID: 5301 Comm: cat Not tainted 5.2.0-rc6-next-20190627+ #19
        NIP:  c0000000000d0d58 LR: c00000000049aa18 CTR:c0000000000d0d50
        REGS: c00020194548f9e0 TRAP: 0300   Not tainted  (5.2.0-rc6-next-20190627+)
        MSR:  9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE>  CR:28022822  XER: 00000000
        CFAR: c00000000049aa14 DAR: 000000000003fc08 DSISR:40000000 IRQMASK: 0
        ...
        NIP imc_mem_get+0x8/0x20
        LR  simple_attr_read+0x118/0x170
        Call Trace:
          simple_attr_read+0x70/0x170 (unreliable)
          debugfs_attr_read+0x6c/0xb0
          __vfs_read+0x3c/0x70
           vfs_read+0xbc/0x1a0
          ksys_read+0x7c/0x140
          system_call+0x5c/0x70
      
      Patch fixes the issue with a more robust check for vbase to NULL.
      
      Before patch, ls output for the debugfs imc directory
      
        # ls /sys/kernel/debug/powerpc/imc/
        imc_cmd_0    imc_cmd_251  imc_cmd_253  imc_cmd_255  imc_mode_0    imc_mode_251  imc_mode_253  imc_mode_255
        imc_cmd_250  imc_cmd_252  imc_cmd_254  imc_cmd_8    imc_mode_250  imc_mode_252  imc_mode_254  imc_mode_8
      
      After patch, ls output for the debugfs imc directory
      
        # ls /sys/kernel/debug/powerpc/imc/
        imc_cmd_0  imc_cmd_8  imc_mode_0  imc_mode_8
      
      Actual bug here is that, we have two loops with potentially different
      loop counts. That is, in imc_get_mem_addr_nest(), loop count is
      obtained from the dt entries. But in case of export_imc_mode_and_cmd(),
      loop was based on for_each_nid() count. Patch fixes the loop count in
      latter based on the struct mem_info. Ideally it would be better to
      have array size in struct imc_pmu.
      
      Fixes: 684d9840
      
       ('powerpc/powernv: Add debugfs interface for imc-mode and imc')
      Reported-by: default avatarQian Cai <cai@lca.pw>
      Suggested-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarMadhavan Srinivasan <maddy@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/20190827101635.6942-1-maddy@linux.vnet.ibm.com
      
      
      Cc: Jan Stancek <jstancek@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ecfe4b5f
    • Gayatri Kammela's avatar
      x86/cpu: Add Tiger Lake to Intel family · e836cd29
      Gayatri Kammela authored
      [ Upstream commit 6e1c32c5
      
       ]
      
      Add the model numbers/CPUIDs of Tiger Lake mobile and desktop to the
      Intel family.
      Suggested-by: default avatarTony Luck <tony.luck@intel.com>
      Signed-off-by: default avatarGayatri Kammela <gayatri.kammela@intel.com>
      Signed-off-by: default avatarTony Luck <tony.luck@intel.com>
      Reviewed-by: default avatarTony Luck <tony.luck@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rahul Tanwar <rahul.tanwar@linux.intel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: https://lkml.kernel.org/r/20190905193020.14707-2-tony.luck@intel.com
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      e836cd29
    • Harald Freudenberger's avatar
      s390/crypto: xts-aes-s390 fix extra run-time crypto self tests finding · b21919ee
      Harald Freudenberger authored
      [ Upstream commit 9e323d45
      
       ]
      
      With 'extra run-time crypto self tests' enabled, the selftest
      for s390-xts fails with
      
        alg: skcipher: xts-aes-s390 encryption unexpectedly succeeded on
        test vector "random: len=0 klen=64"; expected_error=-22,
        cfg="random: inplace use_digest nosimd src_divs=[2.61%@+4006,
        84.44%@+21, 1.55%@+13, 4.50%@+344, 4.26%@+21, 2.64%@+27]"
      
      This special case with nbytes=0 is not handled correctly and this
      fix now makes sure that -EINVAL is returned when there is en/decrypt
      called with 0 bytes to en/decrypt.
      Signed-off-by: default avatarHarald Freudenberger <freude@linux.ibm.com>
      Signed-off-by: default avatarVasily Gorbik <gor@linux.ibm.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      b21919ee
    • Marek Szyprowski's avatar
      ARM: dts: exynos: Mark LDO10 as always-on on Peach Pit/Pi Chromebooks · 6fceb241
      Marek Szyprowski authored
      [ Upstream commit 5b0eeeaa ]
      
      Commit aff138bf ("ARM: dts: exynos: Add TMU nodes regulator supply
      for Peach boards") assigned LDO10 to Exynos Thermal Measurement Unit,
      but it turned out that it supplies also some other critical parts and
      board freezes/crashes when it is turned off.
      
      The mentioned commit made Exynos TMU a consumer of that regulator and in
      typical case Exynos TMU driver keeps it enabled from early boot. However
      there are such configurations (example is multi_v7_defconfig), in which
      some of the regulators are compiled as modules and are not available
      from early boot. In such case it may happen that LDO10 is turned off by
      regulator core, because it has no consumers yet (in this case consumer
      drivers cannot get it, because the supply regulators for it are not yet
      available). This in turn causes the board to crash. This patch restores
      'always-on' property for the LDO10 regulator.
      
      Fixes: aff138bf
      
       ("ARM: dts: exynos: Add TMU nodes regulator supply for Peach boards")
      Signed-off-by: default avatarMarek Szyprowski <m.szyprowski@samsung.com>
      Signed-off-by: default avatarKrzysztof Kozlowski <krzk@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      6fceb241
    • Song Liu's avatar
      x86/mm/pti: Handle unaligned address gracefully in pti_clone_pagetable() · 7bbb7a9d
      Song Liu authored
      [ Upstream commit 825d0b73
      
       ]
      
      pti_clone_pmds() assumes that the supplied address is either:
      
       - properly PUD/PMD aligned
      or
       - the address is actually mapped which means that independently
         of the mapping level (PUD/PMD/PTE) the next higher mapping
         exists.
      
      If that's not the case the unaligned address can be incremented by PUD or
      PMD size incorrectly. All callers supply mapped and/or aligned addresses,
      but for the sake of robustness it's better to handle that case properly and
      to emit a warning.
      
      [ tglx: Rewrote changelog and added WARN_ON_ONCE() ]
      Signed-off-by: default avatarSong Liu <songliubraving@fb.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: default avatarIngo Molnar <mingo@kernel.org>
      Acked-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Link: https://lkml.kernel.org/r/alpine.DEB.2.21.1908282352470.1938@nanos.tec.linutronix.de
      
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      7bbb7a9d
    • Thomas Gleixner's avatar
      x86/mm/pti: Do not invoke PTI functions when PTI is disabled · 4b7d9c2a
      Thomas Gleixner authored
      [ Upstream commit 990784b5
      
       ]
      
      When PTI is disabled at boot time either because the CPU is not affected or
      PTI has been disabled on the command line, the boot code still calls into
      pti_finalize() which then unconditionally invokes:
      
           pti_clone_entry_text()
           pti_clone_kernel_text()
      
      pti_clone_kernel_text() was called unconditionally before the 32bit support
      was added and 32bit added the call to pti_clone_entry_text().
      
      The call has no side effects as cloning the page tables into the available
      second one, which was allocated for PTI does not create damage. But it does
      not make sense either and in case that this functionality would be extended
      later this might actually lead to hard to diagnose issues.
      
      Neither function should be called when PTI is runtime disabled. Make the
      invocation conditional.
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: default avatarDave Hansen <dave.hansen@linux.intel.com>
      Acked-by: default avatarIngo Molnar <mingo@kernel.org>
      Acked-by: default avatarSong Liu <songliubraving@fb.com>
      Acked-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Link: https://lkml.kernel.org/r/20190828143124.063353972@linutronix.de
      
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      4b7d9c2a
    • Mark Rutland's avatar
      arm64: kpti: ensure patched kernel text is fetched from PoU · eb2485e3
      Mark Rutland authored
      [ Upstream commit f32c7a8e ]
      
      While the MMUs is disabled, I-cache speculation can result in
      instructions being fetched from the PoC. During boot we may patch
      instructions (e.g. for alternatives and jump labels), and these may be
      dirty at the PoU (and stale at the PoC).
      
      Thus, while the MMU is disabled in the KPTI pagetable fixup code we may
      load stale instructions into the I-cache, potentially leading to
      subsequent crashes when executing regions of code which have been
      modified at runtime.
      
      Similarly to commit:
      
        8ec41987
      
       ("arm64: mm: ensure patched kernel text is fetched from PoU")
      
      ... we can invalidate the I-cache after enabling the MMU to prevent such
      issues.
      
      The KPTI pagetable fixup code itself should be clean to the PoC per the
      boot protocol, so no maintenance is required for this code.
      Signed-off-by: default avatarMark Rutland <mark.rutland@arm.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Reviewed-by: default avatarJames Morse <james.morse@arm.com>
      Signed-off-by: default avatarWill Deacon <will@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      eb2485e3
    • Neil Horman's avatar
      x86/apic/vector: Warn when vector space exhaustion breaks affinity · b6194965
      Neil Horman authored
      [ Upstream commit 743dac49
      
       ]
      
      On x86, CPUs are limited in the number of interrupts they can have affined
      to them as they only support 256 interrupt vectors per CPU. 32 vectors are
      reserved for the CPU and the kernel reserves another 22 for internal
      purposes. That leaves 202 vectors for assignement to devices.
      
      When an interrupt is set up or the affinity is changed by the kernel or the
      administrator, the vector assignment code attempts to honor the requested
      affinity mask. If the vector space on the CPUs in that affinity mask is
      exhausted the code falls back to a wider set of CPUs and assigns a vector
      on a CPU outside of the requested affinity mask silently.
      
      While the effective affinity is reflected in the corresponding
      /proc/irq/$N/effective_affinity* files the silent breakage of the requested
      affinity can lead to unexpected behaviour for administrators.
      
      Add a pr_warn() when this happens so that adminstrators get at least
      informed about it in the syslog.
      
      [ tglx: Massaged changelog and made the pr_warn() more informative ]
      
      Reported-by: djuran@redhat.com
      Signed-off-by: default avatarNeil Horman <nhorman@tuxdriver.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Tested-by: djuran@redhat.com
      Link: https://lkml.kernel.org/r/20190822143421.9535-1-nhorman@tuxdriver.com
      
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      b6194965
    • Stefan Agner's avatar
      ARM: dts: imx7-colibri: disable HS400 · dfaf6058
      Stefan Agner authored
      [ Upstream commit a95fbda0
      
       ]
      
      Force HS200 by masking bit 63 of the SDHCI capability register.
      The i.MX ESDHC driver uses SDHCI_QUIRK2_CAPS_BIT63_FOR_HS400. With
      that the stack checks bit 63 to descide whether HS400 is available.
      Using sdhci-caps-mask allows to mask bit 63. The stack then selects
      HS200 as operating mode.
      
      This prevents rare communication errors with minimal effect on
      performance:
      	sdhci-esdhc-imx 30b60000.usdhc: warning! HS400 strobe DLL
      		status REF not lock!
      Signed-off-by: default avatarStefan Agner <stefan.agner@toradex.com>
      Signed-off-by: default avatarPhilippe Schenker <philippe.schenker@toradex.com>
      Reviewed-by: default avatarOleksandr Suvorov <oleksandr.suvorov@toradex.com>
      Signed-off-by: default avatarShawn Guo <shawnguo@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      dfaf6058
    • André Draszik's avatar
      ARM: dts: imx7d: cl-som-imx7: make ethernet work again · c20ee5d9
      André Draszik authored
      [ Upstream commit 9846a452 ]
      
      Recent changes to the atheros at803x driver caused
      ethernet to stop working on this board.
      In particular commit 6d4cd041
      ("net: phy: at803x: disable delay only for RGMII mode")
      and commit cd28d1d6
      
      
      ("net: phy: at803x: Disable phy delay for RGMII mode")
      fix the AR8031 driver to configure the phy's (RX/TX)
      delays as per the 'phy-mode' in the device tree.
      
      This now prevents ethernet from working on this board.
      
      It used to work before those commits, because the
      AR8031 comes out of reset with RX delay enabled, and
      the at803x driver didn't touch the delay configuration
      at all when "rgmii" mode was selected, and because
      arch/arm/mach-imx/mach-imx7d.c:ar8031_phy_fixup()
      unconditionally enables TX delay.
      
      Since above commits ar8031_phy_fixup() also has no
      effect anymore, and the end-result is that all delays
      are disabled in the phy, no ethernet.
      
      Update the device tree to restore functionality.
      Signed-off-by: default avatarAndré Draszik <git@andred.net>
      CC: Ilya Ledvich <ilya@compulab.co.il>
      CC: Igor Grinberg <grinberg@compulab.co.il>
      CC: Rob Herring <robh+dt@kernel.org>
      CC: Mark Rutland <mark.rutland@arm.com>
      CC: Shawn Guo <shawnguo@kernel.org>
      CC: Sascha Hauer <s.hauer@pengutronix.de>
      CC: Pengutronix Kernel Team <kernel@pengutronix.de>
      CC: Fabio Estevam <festevam@gmail.com>
      CC: NXP Linux Team <linux-imx@nxp.com>
      CC: devicetree@vger.kernel.org
      CC: linux-arm-kernel@lists.infradead.org
      Signed-off-by: default avatarShawn Guo <shawnguo@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      c20ee5d9
    • Finn Thain's avatar
      m68k: Prevent some compiler warnings in Coldfire builds · 21927786
      Finn Thain authored
      [ Upstream commit 94c04390 ]
      
      Since commit d3b41b6b
      
       ("m68k: Dispatch nvram_ops calls to Atari or
      Mac functions"), Coldfire builds generate compiler warnings due to the
      unconditional inclusion of asm/atarihw.h and asm/macintosh.h.
      
      The inclusion of asm/atarihw.h causes warnings like this:
      
      In file included from ./arch/m68k/include/asm/atarihw.h:25:0,
                       from arch/m68k/kernel/setup_mm.c:41,
                       from arch/m68k/kernel/setup.c:3:
      ./arch/m68k/include/asm/raw_io.h:39:0: warning: "__raw_readb" redefined
       #define __raw_readb in_8
      
      In file included from ./arch/m68k/include/asm/io.h:6:0,
                       from arch/m68k/kernel/setup_mm.c:36,
                       from arch/m68k/kernel/setup.c:3:
      ./arch/m68k/include/asm/io_no.h:16:0: note: this is the location of the previous definition
       #define __raw_readb(addr) \
      ...
      
      This issue is resolved by dropping the asm/raw_io.h include. It turns out
      that asm/io_mm.h already includes that header file.
      
      Moving the relevant macro definitions helps to clarify this dependency
      and make it safe to include asm/atarihw.h.
      
      The other warnings look like this:
      
      In file included from arch/m68k/kernel/setup_mm.c:48:0,
                       from arch/m68k/kernel/setup.c:3:
      ./arch/m68k/include/asm/macintosh.h:19:35: warning: 'struct irq_data' declared inside parameter list will not be visible outside of this definition or declaration
       extern void mac_irq_enable(struct irq_data *data);
                                         ^~~~~~~~
      ...
      
      This issue is resolved by adding the missing linux/irq.h include.
      Signed-off-by: default avatarFinn Thain <fthain@telegraphics.com.au>
      Acked-by: default avatarGreg Ungerer <gerg@linux-m68k.org>
      Signed-off-by: default avatarGeert Uytterhoeven <geert@linux-m68k.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      21927786
    • Qian Cai's avatar
      arm64/prefetch: fix a -Wtype-limits warning · 7d75275f
      Qian Cai authored
      [ Upstream commit b99286b0 ]
      
      The commit d5370f75
      
       ("arm64: prefetch: add alternative pattern for
      CPUs without a prefetcher") introduced MIDR_IS_CPU_MODEL_RANGE() to be
      used in has_no_hw_prefetch() with rv_min=0 which generates a compilation
      warning from GCC,
      
      In file included from ./arch/arm64/include/asm/cache.h:8,
                     from ./include/linux/cache.h:6,
                     from ./include/linux/printk.h:9,
                     from ./include/linux/kernel.h:15,
                     from ./include/linux/cpumask.h:10,
                     from arch/arm64/kernel/cpufeature.c:11:
      arch/arm64/kernel/cpufeature.c: In function 'has_no_hw_prefetch':
      ./arch/arm64/include/asm/cputype.h:59:26: warning: comparison of
      unsigned expression >= 0 is always true [-Wtype-limits]
      _model == (model) && rv >= (rv_min) && rv <= (rv_max);  \
                              ^~
      arch/arm64/kernel/cpufeature.c:889:9: note: in expansion of macro
      'MIDR_IS_CPU_MODEL_RANGE'
      return MIDR_IS_CPU_MODEL_RANGE(midr, MIDR_THUNDERX,
             ^~~~~~~~~~~~~~~~~~~~~~~
      
      Fix it by converting MIDR_IS_CPU_MODEL_RANGE to a static inline
      function.
      Signed-off-by: default avatarQian Cai <cai@lca.pw>
      Signed-off-by: default avatarWill Deacon <will@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      7d75275f
    • chenzefeng's avatar
      ia64:unwind: fix double free for mod->arch.init_unw_table · 87bc43e2
      chenzefeng authored
      [ Upstream commit c5e5c48c
      
       ]
      
      The function free_module in file kernel/module.c as follow:
      
      void free_module(struct module *mod) {
      	......
      	module_arch_cleanup(mod);
      	......
      	module_arch_freeing_init(mod);
      	......
      }
      
      Both module_arch_cleanup and module_arch_freeing_init function
      would free the mod->arch.init_unw_table, which cause double free.
      
      Here, set mod->arch.init_unw_table = NULL after remove the unwind
      table to avoid double free.
      Signed-off-by: default avatarchenzefeng <chenzefeng2@huawei.com>
      Signed-off-by: default avatarTony Luck <tony.luck@intel.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      87bc43e2
    • Thomas Gleixner's avatar
      x86/apic: Soft disable APIC before initializing it · b40c15c2
      Thomas Gleixner authored
      [ Upstream commit 2640da4c
      
       ]
      
      If the APIC was already enabled on entry of setup_local_APIC() then
      disabling it soft via the SPIV register makes a lot of sense.
      
      That masks all LVT entries and brings it into a well defined state.
      
      Otherwise previously enabled LVTs which are not touched in the setup
      function stay unmasked and might surprise the just booting kernel.
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Acked-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Link: https://lkml.kernel.org/r/20190722105219.068290579@linutronix.de
      
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      b40c15c2
    • Grzegorz Halat's avatar
      x86/reboot: Always use NMI fallback when shutdown via reboot vector IPI fails · ce7fdd5c
      Grzegorz Halat authored
      [ Upstream commit 747d5a1b ]
      
      A reboot request sends an IPI via the reboot vector and waits for all other
      CPUs to stop. If one or more CPUs are in critical regions with interrupts
      disabled then the IPI is not handled on those CPUs and the shutdown hangs
      if native_stop_other_cpus() is called with the wait argument set.
      
      Such a situation can happen when one CPU was stopped within a lock held
      section and another CPU is trying to acquire that lock with interrupts
      disabled. There are other scenarios which can cause such a lockup as well.
      
      In theory the shutdown should be attempted by an NMI IPI after the timeout
      period elapsed. Though the wait loop after sending the reboot vector IPI
      prevents this. It checks the wait request argument and the timeout. If wait
      is set, which is true for sys_reboot() then it won't fall through to the
      NMI shutdown method after the timeout period has finished.
      
      This was an oversight when the NMI shutdown mechanism was added to handle
      the 'reboot IPI is not working' situation. The mechanism was added to deal
      with stuck panic shutdowns, which do not have the wait request set, so the
      'wait request' case was probably not considered.
      
      Remove the wait check from the post reboot vector IPI wait loop and enforce
      that the wait loop in the NMI fallback path is invoked even if NMI IPIs are
      disabled or the registration of the NMI handler fails. That second wait
      loop will then hang if not all CPUs shutdown and the wait argument is set.
      
      [ tglx: Avoid the hard to parse line break in the NMI fallback path,
        	add comments and massage the changelog ]
      
      Fixes: 7d007d21
      
       ("x86/reboot: Use NMI to assist in shutting down if IRQ fails")
      Signed-off-by: default avatarGrzegorz Halat <ghalat@redhat.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Don Zickus <dzickus@redhat.com>
      Link: https://lkml.kernel.org/r/20190628122813.15500-1-ghalat@redhat.com
      
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      ce7fdd5c
    • Thomas Gleixner's avatar
      x86/apic: Make apic_pending_intr_clear() more robust · d29c7b8b
      Thomas Gleixner authored
      [ Upstream commit cc8bf191
      
       ]
      
      In course of developing shorthand based IPI support issues with the
      function which tries to clear eventually pending ISR bits in the local APIC
      were observed.
      
        1) O-day testing triggered the WARN_ON() in apic_pending_intr_clear().
      
           This warning is emitted when the function fails to clear pending ISR
           bits or observes pending IRR bits which are not delivered to the CPU
           after the stale ISR bit(s) are ACK'ed.
      
           Unfortunately the function only emits a WARN_ON() and fails to dump
           the IRR/ISR content. That's useless for debugging.
      
           Feng added spot on debug printk's which revealed that the stale IRR
           bit belonged to the APIC timer interrupt vector, but adding ad hoc
           debug code does not help with sporadic failures in the field.
      
           Rework the loop so the full IRR/ISR contents are saved and on failure
           dumped.
      
        2) The loop termination logic is interesting at best.
      
           If the machine has no TSC or cpu_khz is not known yet it tries 1
           million times to ack stale IRR/ISR bits. What?
      
           With TSC it uses the TSC to calculate the loop termination. It takes a
           timestamp at entry and terminates the loop when:
      
           	  (rdtsc() - start_timestamp) >= (cpu_hkz << 10)
      
           That's roughly one second.
      
           Both methods are problematic. The APIC has 256 vectors, which means
           that in theory max. 256 IRR/ISR bits can be set. In practice this is
           impossible and the chance that more than a few bits are set is close
           to zero.
      
           With the pure loop based approach the 1 million retries are complete
           overkill.
      
           With TSC this can terminate too early in a guest which is running on a
           heavily loaded host even with only a couple of IRR/ISR bits set. The
           reason is that after acknowledging the highest priority ISR bit,
           pending IRRs must get serviced first before the next round of
           acknowledge can take place as the APIC (real and virtualized) does not
           honour EOI without a preceeding interrupt on the CPU. And every APIC
           read/write takes a VMEXIT if the APIC is virtualized. While trying to
           reproduce the issue 0-day reported it was observed that the guest was
           scheduled out long enough under heavy load that it terminated after 8
           iterations.
      
           Make the loop terminate after 512 iterations. That's plenty enough
           in any case and does not take endless time to complete.
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Acked-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Link: https://lkml.kernel.org/r/20190722105219.158847694@linutronix.de
      
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      d29c7b8b
  3. 01 Oct, 2019 1 commit
  4. 21 Sep, 2019 2 commits