1. 21 Nov, 2018 40 commits
    • Ard Biesheuvel's avatar
      efi/arm/libstub: Pack FDT after populating it · 011d0d8b
      Ard Biesheuvel authored
      commit 72a58a63 upstream.
      
      Commit:
      
        24d7c494
      
       ("efi/arm-stub: Round up FDT allocation to mapping size")
      
      increased the allocation size for the FDT image created by the stub to a
      fixed value of 2 MB, to simplify the former code that made several
      attempts with increasing values for the size. This is reasonable
      given that the allocation is of type EFI_LOADER_DATA, which is released
      to the kernel unless it is explicitly memblock_reserve()d by the early
      boot code.
      
      However, this allocation size leaked into the 'size' field of the FDT
      header metadata, and so the entire allocation remains occupied by the
      device tree binary, even if most of it is not used to store device tree
      information.
      
      So call fdt_pack() to shrink the FDT data structure to its minimum size
      after populating all the fields, so that the remaining memory is no
      longer wasted.
      Signed-off-by: default avatarArd Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: <stable@vger.kernel.org> # v4.12+
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-efi@vger.kernel.org
      Fixes: 24d7c494 ("efi/arm-stub: Round up FDT allocation to mapping size")
      Link: http://lkml.kernel.org/r/20181114175544.12860-4-ard.biesheuvel@linaro.org
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      011d0d8b
    • Vasily Averin's avatar
      mm/swapfile.c: use kvzalloc for swap_info_struct allocation · d8569925
      Vasily Averin authored
      commit 873d7bcf upstream.
      
      Commit a2468cc9 ("swap: choose swap device according to numa node")
      changed 'avail_lists' field of 'struct swap_info_struct' to an array.
      In popular linux distros it increased size of swap_info_struct up to 40
      Kbytes and now swap_info_struct allocation requires order-4 page.
      Switch to kvzmalloc allows to avoid unexpected allocation failures.
      
      Link: http://lkml.kernel.org/r/fc23172d-3c75-21e2-d551-8b1808cbe593@virtuozzo.com
      Fixes: a2468cc9
      
       ("swap: choose swap device according to numa node")
      Signed-off-by: default avatarVasily Averin <vvs@virtuozzo.com>
      Acked-by: default avatarAaron Lu <aaron.lu@intel.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Reviewed-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Cc: Huang Ying <ying.huang@intel.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d8569925
    • Mike Kravetz's avatar
      hugetlbfs: fix kernel BUG at fs/hugetlbfs/inode.c:444! · 52fcb8dd
      Mike Kravetz authored
      commit 5e41540c upstream.
      
      This bug has been experienced several times by the Oracle DB team.  The
      BUG is in remove_inode_hugepages() as follows:
      
      	/*
      	 * If page is mapped, it was faulted in after being
      	 * unmapped in caller.  Unmap (again) now after taking
      	 * the fault mutex.  The mutex will prevent faults
      	 * until we finish removing the page.
      	 *
      	 * This race can only happen in the hole punch case.
      	 * Getting here in a truncate operation is a bug.
      	 */
      	if (unlikely(page_mapped(page))) {
      		BUG_ON(truncate_op);
      
      In this case, the elevated map count is not the result of a race.
      Rather it was incorrectly incremented as the result of a bug in the huge
      pmd sharing code.  Consider the following:
      
       - Process A maps a hugetlbfs file of sufficient size and alignment
         (PUD_SIZE) that a pmd page could be shared.
      
       - Process B maps the same hugetlbfs file with the same size and
         alignment such that a pmd page is shared.
      
       - Process B then calls mprotect() to change protections for the mapping
         with the shared pmd. As a result, the pmd is 'unshared'.
      
       - Process B then calls mprotect() again to chage protections for the
         mapping back to their original value. pmd remains unshared.
      
       - Process B then forks and process C is created. During the fork
         process, we do dup_mm -> dup_mmap -> copy_page_range to copy page
         tables. Copying page tables for hugetlb mappings is done in the
         routine copy_hugetlb_page_range.
      
      In copy_hugetlb_page_range(), the destination pte is obtained by:
      
      	dst_pte = huge_pte_alloc(dst, addr, sz);
      
      If pmd sharing is possible, the returned pointer will be to a pte in an
      existing page table.  In the situation above, process C could share with
      either process A or process B.  Since process A is first in the list,
      the returned pte is a pointer to a pte in process A's page table.
      
      However, the check for pmd sharing in copy_hugetlb_page_range is:
      
      	/* If the pagetables are shared don't copy or take references */
      	if (dst_pte == src_pte)
      		continue;
      
      Since process C is sharing with process A instead of process B, the
      above test fails.  The code in copy_hugetlb_page_range which follows
      assumes dst_pte points to a huge_pte_none pte.  It copies the pte entry
      from src_pte to dst_pte and increments this map count of the associated
      page.  This is how we end up with an elevated map count.
      
      To solve, check the dst_pte entry for huge_pte_none.  If !none, this
      implies PMD sharing so do not copy.
      
      Link: http://lkml.kernel.org/r/20181105212315.14125-1-mike.kravetz@oracle.com
      Fixes: c5c99429
      
       ("fix hugepages leak due to pagetable page sharing")
      Signed-off-by: default avatarMike Kravetz <mike.kravetz@oracle.com>
      Reviewed-by: default avatarNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: Prakash Sangappa <prakash.sangappa@oracle.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      52fcb8dd
    • Arnd Bergmann's avatar
      lib/ubsan.c: don't mark __ubsan_handle_builtin_unreachable as noreturn · efcbe502
      Arnd Bergmann authored
      commit 1c23b410 upstream.
      
      gcc-8 complains about the prototype for this function:
      
        lib/ubsan.c:432:1: error: ignoring attribute 'noreturn' in declaration of a built-in function '__ubsan_handle_builtin_unreachable' because it conflicts with attribute 'const' [-Werror=attributes]
      
      This is actually a GCC's bug. In GCC internals
      __ubsan_handle_builtin_unreachable() declared with both 'noreturn' and
      'const' attributes instead of only 'noreturn':
      
         https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84210
      
      Workaround this by removing the noreturn attribute.
      
      [aryabinin: add information about GCC bug in changelog]
      Link: http://lkml.kernel.org/r/20181107144516.4587-1-aryabinin@virtuozzo.com
      
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarAndrey Ryabinin <aryabinin@virtuozzo.com>
      Acked-by: default avatarOlof Johansson <olof@lixom.net>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      efcbe502
    • Eric Biggers's avatar
      crypto: user - fix leaking uninitialized memory to userspace · fdc42744
      Eric Biggers authored
      commit f43f3995 upstream.
      
      All bytes of the NETLINK_CRYPTO report structures must be initialized,
      since they are copied to userspace.  The change from strncpy() to
      strlcpy() broke this.  As a minimal fix, change it back.
      
      Fixes: 4473710d
      
       ("crypto: user - Prepare for CRYPTO_MAX_ALG_NAME expansion")
      Cc: <stable@vger.kernel.org> # v4.12+
      Signed-off-by: default avatarEric Biggers <ebiggers@google.com>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      fdc42744
    • Andreas Gruenbacher's avatar
      gfs2: Put bitmap buffers in put_super · 6c75d139
      Andreas Gruenbacher authored
      commit 10283ea5 upstream.
      
      gfs2_put_super calls gfs2_clear_rgrpd to destroy the gfs2_rgrpd objects
      attached to the resource group glocks.  That function should release the
      buffers attached to the gfs2_bitmap objects (bi_bh), but the call to
      gfs2_rgrp_brelse for doing that is missing.
      
      When gfs2_releasepage later runs across these buffers which are still
      referenced, it refuses to free them.  This causes the pages the buffers
      are attached to to remain referenced as well.  With enough mount/unmount
      cycles, the system will eventually run out of memory.
      
      Fix this by adding the missing call to gfs2_rgrp_brelse in
      gfs2_clear_rgrpd.
      
      (Also fix a gfs2_rgrp_relse -> gfs2_rgrp_brelse typo in a comment.)
      
      Fixes: 39b0f1e9
      
       ("GFS2: Don't brelse rgrp buffer_heads every allocation")
      Cc: stable@vger.kernel.org # v4.2+
      Signed-off-by: default avatarAndreas Gruenbacher <agruenba@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6c75d139
    • Guenter Roeck's avatar
      configfs: replace strncpy with memcpy · 56d10194
      Guenter Roeck authored
      commit 1823342a
      
       upstream.
      
      gcc 8.1.0 complains:
      
      fs/configfs/symlink.c:67:3: warning:
      	'strncpy' output truncated before terminating nul copying as many
      	bytes from a string as its length
      fs/configfs/symlink.c: In function 'configfs_get_link':
      fs/configfs/symlink.c:63:13: note: length computed here
      
      Using strncpy() is indeed less than perfect since the length of data to
      be copied has already been determined with strlen(). Replace strncpy()
      with memcpy() to address the warning and optimize the code a little.
      Signed-off-by: default avatarGuenter Roeck <linux@roeck-us.net>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarNobuhiro Iwamatsu <nobuhiro.iwamatsu@cybertrust.co.jp>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      56d10194
    • Miklos Szeredi's avatar
      fuse: fix leaked notify reply · 6ceec07c
      Miklos Szeredi authored
      commit 7fabaf30 upstream.
      
      fuse_request_send_notify_reply() may fail if the connection was reset for
      some reason (e.g. fs was unmounted).  Don't leak request reference in this
      case.  Besides leaking memory, this resulted in fc->num_waiting not being
      decremented and hence fuse_wait_aborted() left in a hanging and unkillable
      state.
      
      Fixes: 2d45ba38 ("fuse: add retrieve request")
      Fixes: b8f95e5d
      
       ("fuse: umount should wait for all requests")
      Reported-and-tested-by: syzbot+6339eda9cb4ebbc4c37b@syzkaller.appspotmail.com
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@redhat.com>
      Cc: <stable@vger.kernel.org> #v2.6.36
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6ceec07c
    • Lukas Czerner's avatar
      fuse: fix use-after-free in fuse_direct_IO() · a42d933d
      Lukas Czerner authored
      commit ebacb812
      
       upstream.
      
      In async IO blocking case the additional reference to the io is taken for
      it to survive fuse_aio_complete(). In non blocking case this additional
      reference is not needed, however we still reference io to figure out
      whether to wait for completion or not. This is wrong and will lead to
      use-after-free. Fix it by storing blocking information in separate
      variable.
      
      This was spotted by KASAN when running generic/208 fstest.
      Signed-off-by: default avatarLukas Czerner <lczerner@redhat.com>
      Reported-by: default avatarZorro Lang <zlang@redhat.com>
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@redhat.com>
      Fixes: 744742d6
      
       ("fuse: Add reference counting for fuse_io_priv")
      Cc: <stable@vger.kernel.org> # v4.6
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a42d933d
    • Maciej W. Rozycki's avatar
      rtc: hctosys: Add missing range error reporting · 677f423c
      Maciej W. Rozycki authored
      commit 7ce9a992
      
       upstream.
      
      Fix an issue with the 32-bit range error path in `rtc_hctosys' where no
      error code is set and consequently the successful preceding call result
      from `rtc_read_time' is propagated to `rtc_hctosys_ret'.  This in turn
      makes any subsequent call to `hctosys_show' incorrectly report in sysfs
      that the system time has been set from this RTC while it has not.
      
      Set the error to ERANGE then if we can't express the result due to an
      overflow.
      Signed-off-by: default avatarMaciej W. Rozycki <macro@linux-mips.org>
      Fixes: b3a5ac42
      
       ("rtc: hctosys: Ensure system time doesn't overflow time_t")
      Cc: stable@vger.kernel.org # 4.17+
      Signed-off-by: default avatarAlexandre Belloni <alexandre.belloni@bootlin.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      677f423c
    • Scott Mayhew's avatar
      nfsd: COPY and CLONE operations require the saved filehandle to be set · 6d1c38aa
      Scott Mayhew authored
      commit 01310bb7
      
       upstream.
      
      Make sure we have a saved filehandle, otherwise we'll oops with a null
      pointer dereference in nfs4_preprocess_stateid_op().
      Signed-off-by: default avatarScott Mayhew <smayhew@redhat.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarJ. Bruce Fields <bfields@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6d1c38aa
    • Frank Sorenson's avatar
      sunrpc: correct the computation for page_ptr when truncating · a56609f7
      Frank Sorenson authored
      commit 5d7a5bcb
      
       upstream.
      
      When truncating the encode buffer, the page_ptr is getting
      advanced, causing the next page to be skipped while encoding.
      The page is still included in the response, so the response
      contains a page of bogus data.
      
      We need to adjust the page_ptr backwards to ensure we encode
      the next page into the correct place.
      
      We saw this triggered when concurrent directory modifications caused
      nfsd4_encode_direct_fattr() to return nfserr_noent, and the resulting
      call to xdr_truncate_encode() corrupted the READDIR reply.
      Signed-off-by: default avatarFrank Sorenson <sorenson@redhat.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarJ. Bruce Fields <bfields@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a56609f7
    • Christophe Leroy's avatar
      kdb: print real address of pointers instead of hashed addresses · dedde93b
      Christophe Leroy authored
      commit 568fb6f4 upstream.
      
      Since commit ad67b74d ("printk: hash addresses printed with %p"),
      all pointers printed with %p are printed with hashed addresses
      instead of real addresses in order to avoid leaking addresses in
      dmesg and syslog. But this applies to kdb too, with is unfortunate:
      
          Entering kdb (current=0x(ptrval), pid 329) due to Keyboard Entry
          kdb> ps
          15 sleeping system daemon (state M) processes suppressed,
          use 'ps A' to see all.
          Task Addr       Pid   Parent [*] cpu State Thread     Command
          0x(ptrval)      329      328  1    0   R  0x(ptrval) *sh
      
          0x(ptrval)        1        0  0    0   S  0x(ptrval)  init
          0x(ptrval)        3        2  0    0   D  0x(ptrval)  rcu_gp
          0x(ptrval)        4        2  0    0   D  0x(ptrval)  rcu_par_gp
          0x(ptrval)        5        2  0    0   D  0x(ptrval)  kworker/0:0
          0x(ptrval)        6        2  0    0   D  0x(ptrval)  kworker/0:0H
          0x(ptrval)        7        2  0    0   D  0x(ptrval)  kworker/u2:0
          0x(ptrval)        8        2  0    0   D  0x(ptrval)  mm_percpu_wq
          0x(ptrval)       10        2  0    0   D  0x(ptrval)  rcu_preempt
      
      The whole purpose of kdb is to debug, and for debugging real addresses
      need to be known. In addition, data displayed by kdb doesn't go into
      dmesg.
      
      This patch replaces all %p by %px in kdb in order to display real
      addresses.
      
      Fixes: ad67b74d
      
       ("printk: hash addresses printed with %p")
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarChristophe Leroy <christophe.leroy@c-s.fr>
      Signed-off-by: default avatarDaniel Thompson <daniel.thompson@linaro.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      dedde93b
    • Christophe Leroy's avatar
      kdb: use correct pointer when 'btc' calls 'btt' · ce583650
      Christophe Leroy authored
      commit dded2e15 upstream.
      
      On a powerpc 8xx, 'btc' fails as follows:
      
      Entering kdb (current=0x(ptrval), pid 282) due to Keyboard Entry
      kdb> btc
      btc: cpu status: Currently on cpu 0
      Available cpus: 0
      kdb_getarea: Bad address 0x0
      
      when booting the kernel with 'debug_boot_weak_hash', it fails as well
      
      Entering kdb (current=0xba99ad80, pid 284) due to Keyboard Entry
      kdb> btc
      btc: cpu status: Currently on cpu 0
      Available cpus: 0
      kdb_getarea: Bad address 0xba99ad80
      
      On other platforms, Oopses have been observed too, see
      https://github.com/linuxppc/linux/issues/139
      
      This is due to btc calling 'btt' with %p pointer as an argument.
      
      This patch replaces %p by %px to get the real pointer value as
      expected by 'btt'
      
      Fixes: ad67b74d
      
       ("printk: hash addresses printed with %p")
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarChristophe Leroy <christophe.leroy@c-s.fr>
      Reviewed-by: default avatarDaniel Thompson <daniel.thompson@linaro.org>
      Signed-off-by: default avatarDaniel Thompson <daniel.thompson@linaro.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ce583650
    • Eric W. Biederman's avatar
      mount: Prevent MNT_DETACH from disconnecting locked mounts · f3f52974
      Eric W. Biederman authored
      commit 9c8e0a1b upstream.
      
      Timothy Baldwin <timbaldwin@fastmail.co.uk> wrote:
      > As per mount_namespaces(7) unprivileged users should not be able to look under mount points:
      >
      >   Mounts that come as a single unit from more privileged mount are locked
      >   together and may not be separated in a less privileged mount namespace.
      >
      > However they can:
      >
      > 1. Create a mount namespace.
      > 2. In the mount namespace open a file descriptor to the parent of a mount point.
      > 3. Destroy the mount namespace.
      > 4. Use the file descriptor to look under the mount point.
      >
      > I have reproduced this with Linux 4.16.18 and Linux 4.18-rc8.
      >
      > The setup:
      >
      > $ sudo sysctl kernel.unprivileged_userns_clone=1
      > kernel.unprivileged_userns_clone = 1
      > $ mkdir -p A/B/Secret
      > $ sudo mount -t tmpfs hide A/B
      >
      >
      > "Secret" is indeed hidden as expected:
      >
      > $ ls -lR A
      > A:
      > total 0
      > drwxrwxrwt 2 root root 40 Feb 12 21:08 B
      >
      > A/B:
      > total 0
      >
      >
      > The attack revealing "Secret":
      >
      > $ unshare -Umr sh -c "exec unshare -m ls -lR /proc/self/fd/4/ 4<A"
      > /proc/self/fd/4/:
      > total 0
      > drwxr-xr-x 3 root root 60 Feb 12 21:08 B
      >
      > /proc/self/fd/4/B:
      > total 0
      > drwxr-xr-x 2 root root 40 Feb 12 21:08 Secret
      >
      > /proc/self/fd/4/B/Secret:
      > total 0
      
      I tracked this down to put_mnt_ns running passing UMOUNT_SYNC and
      disconnecting all of the mounts in a mount namespace.  Fix this by
      factoring drop_mounts out of drop_collected_mounts and passing
      0 instead of UMOUNT_SYNC.
      
      There are two possible behavior differences that result from this.
      - No longer setting UMOUNT_SYNC will no longer set MNT_SYNC_UMOUNT on
        the vfsmounts being unmounted.  This effects the lazy rcu walk by
        kicking the walk out of rcu mode and forcing it to be a non-lazy
        walk.
      - No longer disconnecting locked mounts will keep some mounts around
        longer as they stay because the are locked to other mounts.
      
      There are only two users of drop_collected mounts: audit_tree.c and
      put_mnt_ns.
      
      In audit_tree.c the mounts are private and there are no rcu lazy walks
      only calls to iterate_mounts. So the changes should have no effect
      except for a small timing effect as the connected mounts are disconnected.
      
      In put_mnt_ns there may be references from process outside the mount
      namespace to the mounts.  So the mounts remaining connected will
      be the bug fix that is needed.  That rcu walks are allowed to continue
      appears not to be a problem especially as the rcu walk change was about
      an implementation detail not about semantics.
      
      Cc: stable@vger.kernel.org
      Fixes: 5ff9d8a6
      
       ("vfs: Lock in place mounts from more privileged users")
      Reported-by: default avatarTimothy Baldwin <timbaldwin@fastmail.co.uk>
      Tested-by: default avatarTimothy Baldwin <timbaldwin@fastmail.co.uk>
      Signed-off-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f3f52974
    • Eric W. Biederman's avatar
      mount: Don't allow copying MNT_UNBINDABLE|MNT_LOCKED mounts · c4585861
      Eric W. Biederman authored
      commit df7342b2 upstream.
      
      Jonathan Calmels from NVIDIA reported that he's able to bypass the
      mount visibility security check in place in the Linux kernel by using
      a combination of the unbindable property along with the private mount
      propagation option to allow a unprivileged user to see a path which
      was purposefully hidden by the root user.
      
      Reproducer:
        # Hide a path to all users using a tmpfs
        root@castiana:~# mount -t tmpfs tmpfs /sys/devices/
        root@castiana:~#
      
        # As an unprivileged user, unshare user namespace and mount namespace
        stgraber@castiana:~$ unshare -U -m -r
      
        # Confirm the path is still not accessible
        root@castiana:~# ls /sys/devices/
      
        # Make /sys recursively unbindable and private
        root@castiana:~# mount --make-runbindable /sys
        root@castiana:~# mount --make-private /sys
      
        # Recursively bind-mount the rest of /sys over to /mnnt
        root@castiana:~# mount --rbind /sys/ /mnt
      
        # Access our hidden /sys/device as an unprivileged user
        root@castiana:~# ls /mnt/devices/
        breakpoint cpu cstate_core cstate_pkg i915 intel_pt isa kprobe
        LNXSYSTM:00 msr pci0000:00 platform pnp0 power software system
        tracepoint uncore_arb uncore_cbox_0 uncore_cbox_1 uprobe virtual
      
      Solve this by teaching copy_tree to fail if a mount turns out to be
      both unbindable and locked.
      
      Cc: stable@vger.kernel.org
      Fixes: 5ff9d8a6
      
       ("vfs: Lock in place mounts from more privileged users")
      Reported-by: default avatarJonathan Calmels <jcalmels@nvidia.com>
      Signed-off-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c4585861
    • Eric W. Biederman's avatar
      mount: Retest MNT_LOCKED in do_umount · 5e64ee87
      Eric W. Biederman authored
      commit 25d202ed
      
       upstream.
      
      It was recently pointed out that the one instance of testing MNT_LOCKED
      outside of the namespace_sem is in ksys_umount.
      
      Fix that by adding a test inside of do_umount with namespace_sem and
      the mount_lock held.  As it helps to fail fails the existing test is
      maintained with an additional comment pointing out that it may be racy
      because the locks are not held.
      
      Cc: stable@vger.kernel.org
      Reported-by: default avatarAl Viro <viro@ZenIV.linux.org.uk>
      Fixes: 5ff9d8a6
      
       ("vfs: Lock in place mounts from more privileged users")
      Signed-off-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5e64ee87
    • Vasily Averin's avatar
      ext4: fix buffer leak in __ext4_read_dirblock() on error path · 68b90875
      Vasily Averin authored
      commit de59fae0 upstream.
      
      Fixes: dc6982ff
      
       ("ext4: refactor code to read directory blocks ...")
      Signed-off-by: default avatarVasily Averin <vvs@virtuozzo.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Cc: stable@kernel.org # 3.9
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      68b90875
    • Vasily Averin's avatar
      ext4: fix buffer leak in ext4_expand_extra_isize_ea() on error path · be7e29ce
      Vasily Averin authored
      commit 53692ec0 upstream.
      
      Fixes: de05ca85
      
       ("ext4: move call to ext4_error() into ...")
      Signed-off-by: default avatarVasily Averin <vvs@virtuozzo.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Cc: stable@kernel.org # 4.17
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      be7e29ce
    • Vasily Averin's avatar
      ext4: fix buffer leak in ext4_xattr_move_to_block() on error path · 4096fc09
      Vasily Averin authored
      commit 6bdc9977 upstream.
      
      Fixes: 3f2571c1 ("ext4: factor out xattr moving")
      Fixes: 6dd4ee7c
      
       ("ext4: Expand extra_inodes space per ...")
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarVasily Averin <vvs@virtuozzo.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Cc: stable@kernel.org # 2.6.23
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4096fc09
    • Vasily Averin's avatar
      ext4: release bs.bh before re-using in ext4_xattr_block_find() · 8113972d
      Vasily Averin authored
      commit 45ae932d upstream.
      
      bs.bh was taken in previous ext4_xattr_block_find() call,
      it should be released before re-using
      
      Fixes: 7e01c8e5
      
       ("ext3/4: fix uninitialized bs in ...")
      Signed-off-by: default avatarVasily Averin <vvs@virtuozzo.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Cc: stable@kernel.org # 2.6.26
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8113972d
    • Vasily Averin's avatar
      ext4: fix buffer leak in ext4_xattr_get_block() on error path · 70ca35b4
      Vasily Averin authored
      commit ecaaf408 upstream.
      
      Fixes: dec214d0
      
       ("ext4: xattr inode deduplication")
      Signed-off-by: default avatarVasily Averin <vvs@virtuozzo.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Cc: stable@kernel.org # 4.13
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      70ca35b4
    • Vasily Averin's avatar
      ext4: fix possible leak of s_journal_flag_rwsem in error path · 181224f9
      Vasily Averin authored
      commit af18e35b upstream.
      
      Fixes: c8585c6f
      
       ("ext4: fix races between changing inode journal ...")
      Signed-off-by: default avatarVasily Averin <vvs@virtuozzo.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Cc: stable@kernel.org # 4.7
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      181224f9
    • Theodore Ts'o's avatar
      ext4: fix possible leak of sbi->s_group_desc_leak in error path · 1124e5a8
      Theodore Ts'o authored
      commit 9e463084 upstream.
      
      Fixes: bfe0a5f4
      
       ("ext4: add more mount time checks of the superblock")
      Reported-by: default avatarVasily Averin <vvs@virtuozzo.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Cc: stable@kernel.org # 4.18
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1124e5a8
    • Theodore Ts'o's avatar
      ext4: avoid possible double brelse() in add_new_gdb() on error path · bf04dace
      Theodore Ts'o authored
      commit 4f32c38b upstream.
      
      Fixes: b4097142
      
       ("ext4: add error checking to calls to ...")
      Reported-by: default avatarVasily Averin <vvs@virtuozzo.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Cc: stable@kernel.org # 2.6.38
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      bf04dace
    • Vasily Averin's avatar
      ext4: fix missing cleanup if ext4_alloc_flex_bg_array() fails while resizing · 65b1ce49
      Vasily Averin authored
      commit f348e224 upstream.
      
      Fixes: 117fff10
      
       ("ext4: grow the s_flex_groups array as needed ...")
      Signed-off-by: default avatarVasily Averin <vvs@virtuozzo.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Cc: stable@kernel.org # 3.7
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      65b1ce49
    • Vasily Averin's avatar
      ext4: avoid buffer leak in ext4_orphan_add() after prior errors · cffd3297
      Vasily Averin authored
      commit feaf264c upstream.
      
      Fixes: d745a8c2 ("ext4: reduce contention on s_orphan_lock")
      Fixes: 6e3617e5
      
       ("ext4: Handle non empty on-disk orphan link")
      Cc: Dmitry Monakhov <dmonakhov@gmail.com>
      Signed-off-by: default avatarVasily Averin <vvs@virtuozzo.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Cc: stable@kernel.org # 2.6.34
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      cffd3297
    • Vasily Averin's avatar
      ext4: avoid buffer leak on shutdown in ext4_mark_iloc_dirty() · da497e7a
      Vasily Averin authored
      commit a6758309 upstream.
      
      ext4_mark_iloc_dirty() callers expect that it releases iloc->bh
      even if it returns an error.
      
      Fixes: 0db1ff22
      
       ("ext4: add shutdown bit and check for it")
      Signed-off-by: default avatarVasily Averin <vvs@virtuozzo.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Cc: stable@kernel.org # 4.11
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      da497e7a
    • Vasily Averin's avatar
      ext4: fix possible inode leak in the retry loop of ext4_resize_fs() · 3caa7b62
      Vasily Averin authored
      commit db6aee62 upstream.
      
      Fixes: 1c6bd717
      
       ("ext4: convert file system to meta_bg if needed ...")
      Signed-off-by: default avatarVasily Averin <vvs@virtuozzo.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Cc: stable@kernel.org # 3.7
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3caa7b62
    • Vasily Averin's avatar
      ext4: missing !bh check in ext4_xattr_inode_write() · dd277533
      Vasily Averin authored
      commit eb6984fa upstream.
      
      According to Ted Ts'o ext4_getblk() called in ext4_xattr_inode_write()
      should not return bh = NULL
      
      The only time that bh could be NULL, then, would be in the case of
      something really going wrong; a programming error elsewhere (perhaps a
      wild pointer dereference) or I/O error causing on-disk file system
      corruption (although that would be highly unlikely given that we had
      *just* allocated the blocks and so the metadata blocks in question
      probably would still be in the cache).
      
      Fixes: e50e5129
      
       ("ext4: xattr-in-inode support")
      Signed-off-by: default avatarVasily Averin <vvs@virtuozzo.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Cc: stable@kernel.org # 4.13
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      dd277533
    • Vasily Averin's avatar
      ext4: avoid potential extra brelse in setup_new_flex_group_blocks() · 4e7e558e
      Vasily Averin authored
      commit 9e402893 upstream.
      
      Currently bh is set to NULL only during first iteration of for cycle,
      then this pointer is not cleared after end of using.
      Therefore rollback after errors can lead to extra brelse(bh) call,
      decrements bh counter and later trigger an unexpected warning in __brelse()
      
      Patch moves brelse() calls in body of cycle to exclude requirement of
      brelse() call in rollback.
      
      Fixes: 33afdcc5
      
       ("ext4: add a function which sets up group blocks ...")
      Signed-off-by: default avatarVasily Averin <vvs@virtuozzo.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Cc: stable@kernel.org # 3.3+
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4e7e558e
    • Vasily Averin's avatar
      ext4: add missing brelse() add_new_gdb_meta_bg()'s error path · 64176ffd
      Vasily Averin authored
      commit 61a9c11e upstream.
      
      Fixes: 01f795f9
      
       ("ext4: add online resizing support for meta_bg ...")
      Signed-off-by: default avatarVasily Averin <vvs@virtuozzo.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Cc: stable@kernel.org # 3.7
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      64176ffd
    • Vasily Averin's avatar
      ext4: add missing brelse() in set_flexbg_block_bitmap()'s error path · feddbc01
      Vasily Averin authored
      commit cea57941 upstream.
      
      Fixes: 33afdcc5
      
       ("ext4: add a function which sets up group blocks ...")
      Cc: stable@kernel.org # 3.3
      Signed-off-by: default avatarVasily Averin <vvs@virtuozzo.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      feddbc01
    • Vasily Averin's avatar
      ext4: add missing brelse() update_backups()'s error path · 9910b257
      Vasily Averin authored
      commit ea0abbb6 upstream.
      
      Fixes: ac27a0ec
      
       ("ext4: initial copy of files from ext3")
      Signed-off-by: default avatarVasily Averin <vvs@virtuozzo.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Cc: stable@kernel.org # 2.6.19
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9910b257
    • Michael Kelley's avatar
      clockevents/drivers/i8253: Add support for PIT shutdown quirk · 79d80b87
      Michael Kelley authored
      commit 35b69a42
      
       upstream.
      
      Add support for platforms where pit_shutdown() doesn't work because of a
      quirk in the PIT emulation. On these platforms setting the counter register
      to zero causes the PIT to start running again, negating the shutdown.
      
      Provide a global variable that controls whether the counter register is
      zero'ed, which platform specific code can override.
      Signed-off-by: default avatarMichael Kelley <mikelley@microsoft.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: "gregkh@linuxfoundation.org" <gregkh@linuxfoundation.org>
      Cc: "devel@linuxdriverproject.org" <devel@linuxdriverproject.org>
      Cc: "daniel.lezcano@linaro.org" <daniel.lezcano@linaro.org>
      Cc: "virtualization@lists.linux-foundation.org" <virtualization@lists.linux-foundation.org>
      Cc: "jgross@suse.com" <jgross@suse.com>
      Cc: "akataria@vmware.com" <akataria@vmware.com>
      Cc: "olaf@aepfle.de" <olaf@aepfle.de>
      Cc: "apw@canonical.com" <apw@canonical.com>
      Cc: vkuznets <vkuznets@redhat.com>
      Cc: "jasowang@redhat.com" <jasowang@redhat.com>
      Cc: "marcelo.cerri@canonical.com" <marcelo.cerri@canonical.com>
      Cc: KY Srinivasan <kys@microsoft.com>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/1541303219-11142-2-git-send-email-mikelley@microsoft.com
      
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      79d80b87
    • Filipe Manana's avatar
      Btrfs: fix data corruption due to cloning of eof block · d62eead0
      Filipe Manana authored
      commit ac765f83 upstream.
      
      We currently allow cloning a range from a file which includes the last
      block of the file even if the file's size is not aligned to the block
      size. This is fine and useful when the destination file has the same size,
      but when it does not and the range ends somewhere in the middle of the
      destination file, it leads to corruption because the bytes between the EOF
      and the end of the block have undefined data (when there is support for
      discard/trimming they have a value of 0x00).
      
      Example:
      
       $ mkfs.btrfs -f /dev/sdb
       $ mount /dev/sdb /mnt
      
       $ export foo_size=$((256 * 1024 + 100))
       $ xfs_io -f -c "pwrite -S 0x3c 0 $foo_size" /mnt/foo
       $ xfs_io -f -c "pwrite -S 0xb5 0 1M" /mnt/bar
      
       $ xfs_io -c "reflink /mnt/foo 0 512K $foo_size" /mnt/bar
      
       $ od -A d -t x1 /mnt/bar
       0000000 b5 b5 b5 b5 b5 b5 b5 b5 b5 b5 b5 b5 b5 b5 b5 b5
       *
       0524288 3c 3c 3c 3c 3c 3c 3c 3c 3c 3c 3c 3c 3c 3c 3c 3c
       *
       0786528 3c 3c 3c 3c 00 00 00 00 00 00 00 00 00 00 00 00
       0786544 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
       *
       0790528 b5 b5 b5 b5 b5 b5 b5 b5 b5 b5 b5 b5 b5 b5 b5 b5
       *
       1048576
      
      The bytes in the range from 786532 (512Kb + 256Kb + 100 bytes) to 790527
      (512Kb + 256Kb + 4Kb - 1) got corrupted, having now a value of 0x00 instead
      of 0xb5.
      
      This is similar to the problem we had for deduplication that got recently
      fixed by commit de02b9f6 ("Btrfs: fix data corruption when
      deduplicating between different files").
      
      Fix this by not allowing such operations to be performed and return the
      errno -EINVAL to user space. This is what XFS is doing as well at the VFS
      level. This change however now makes us return -EINVAL instead of
      -EOPNOTSUPP for cases where the source range maps to an inline extent and
      the destination range's end is smaller then the destination file's size,
      since the detection of inline extents is done during the actual process of
      dropping file extent items (at __btrfs_drop_extents()). Returning the
      -EINVAL error is done early on and solely based on the input parameters
      (offsets and length) and destination file's size. This makes us consistent
      with XFS and anyone else supporting cloning since this case is now checked
      at a higher level in the VFS and is where the -EINVAL will be returned
      from starting with kernel 4.20 (the VFS changed was introduced in 4.20-rc1
      by commit 07d19dc9
      
       ("vfs: avoid problematic remapping requests into
      partial EOF block"). So this change is more geared towards stable kernels,
      as it's unlikely the new VFS checks get removed intentionally.
      
      A test case for fstests follows soon, as well as an update to filter
      existing tests that expect -EOPNOTSUPP to accept -EINVAL as well.
      
      CC: <stable@vger.kernel.org> # 4.4+
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d62eead0
    • Filipe Manana's avatar
      Btrfs: fix infinite loop on inode eviction after deduplication of eof block · c7a082fb
      Filipe Manana authored
      commit 11023d3f upstream.
      
      If we attempt to deduplicate the last block of a file A into the middle of
      a file B, and file A's size is not a multiple of the block size, we end
      rounding the deduplication length to 0 bytes, to avoid the data corruption
      issue fixed by commit de02b9f6 ("Btrfs: fix data corruption when
      deduplicating between different files"). However a length of zero will
      cause the insertion of an extent state with a start value greater (by 1)
      then the end value, leading to a corrupt extent state that will trigger a
      warning and cause chaos such as an infinite loop during inode eviction.
      Example trace:
      
       [96049.833585] ------------[ cut here ]------------
       [96049.833714] WARNING: CPU: 0 PID: 24448 at fs/btrfs/extent_io.c:436 insert_state+0x101/0x120 [btrfs]
       [96049.833767] CPU: 0 PID: 24448 Comm: xfs_io Not tainted 4.19.0-rc7-btrfs-next-39 #1
       [96049.833768] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.11.2-0-gf9626ccb91-prebuilt.qemu-project.org 04/01/2014
       [96049.833780] RIP: 0010:insert_state+0x101/0x120 [btrfs]
       [96049.833783] RSP: 0018:ffffafd2c3707af0 EFLAGS: 00010282
       [96049.833785] RAX: 0000000000000000 RBX: 000000000004dfff RCX: 0000000000000006
       [96049.833786] RDX: 0000000000000007 RSI: ffff99045c143230 RDI: ffff99047b2168a0
       [96049.833787] RBP: ffff990457851cd0 R08: 0000000000000001 R09: 0000000000000000
       [96049.833787] R10: ffffafd2c3707ab8 R11: 0000000000000000 R12: ffff9903b93b12c8
       [96049.833788] R13: 000000000004e000 R14: ffffafd2c3707b80 R15: ffffafd2c3707b78
       [96049.833790] FS:  00007f5c14e7d700(0000) GS:ffff99047b200000(0000) knlGS:0000000000000000
       [96049.833791] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
       [96049.833792] CR2: 00007f5c146abff8 CR3: 0000000115f4c004 CR4: 00000000003606f0
       [96049.833795] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
       [96049.833796] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
       [96049.833796] Call Trace:
       [96049.833809]  __set_extent_bit+0x46c/0x6a0 [btrfs]
       [96049.833823]  lock_extent_bits+0x6b/0x210 [btrfs]
       [96049.833831]  ? _raw_spin_unlock+0x24/0x30
       [96049.833841]  ? test_range_bit+0xdf/0x130 [btrfs]
       [96049.833853]  lock_extent_range+0x8e/0x150 [btrfs]
       [96049.833864]  btrfs_double_extent_lock+0x78/0xb0 [btrfs]
       [96049.833875]  btrfs_extent_same_range+0x14e/0x550 [btrfs]
       [96049.833885]  ? rcu_read_lock_sched_held+0x3f/0x70
       [96049.833890]  ? __kmalloc_node+0x2b0/0x2f0
       [96049.833899]  ? btrfs_dedupe_file_range+0x19a/0x280 [btrfs]
       [96049.833909]  btrfs_dedupe_file_range+0x270/0x280 [btrfs]
       [96049.833916]  vfs_dedupe_file_range_one+0xd9/0xe0
       [96049.833919]  vfs_dedupe_file_range+0x131/0x1b0
       [96049.833924]  do_vfs_ioctl+0x272/0x6e0
       [96049.833927]  ? __fget+0x113/0x200
       [96049.833931]  ksys_ioctl+0x70/0x80
       [96049.833933]  __x64_sys_ioctl+0x16/0x20
       [96049.833937]  do_syscall_64+0x60/0x1b0
       [96049.833939]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
       [96049.833941] RIP: 0033:0x7f5c1478ddd7
       [96049.833943] RSP: 002b:00007ffe15b196a8 EFLAGS: 00000202 ORIG_RAX: 0000000000000010
       [96049.833945] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f5c1478ddd7
       [96049.833946] RDX: 00005625ece322d0 RSI: 00000000c0189436 RDI: 0000000000000004
       [96049.833947] RBP: 0000000000000000 R08: 00007f5c14a46f48 R09: 0000000000000040
       [96049.833948] R10: 0000000000000541 R11: 0000000000000202 R12: 0000000000000000
       [96049.833949] R13: 0000000000000000 R14: 0000000000000004 R15: 00005625ece322d0
       [96049.833954] irq event stamp: 6196
       [96049.833956] hardirqs last  enabled at (6195): [<ffffffff91b00663>] console_unlock+0x503/0x640
       [96049.833958] hardirqs last disabled at (6196): [<ffffffff91a037dd>] trace_hardirqs_off_thunk+0x1a/0x1c
       [96049.833959] softirqs last  enabled at (6114): [<ffffffff92600370>] __do_softirq+0x370/0x421
       [96049.833964] softirqs last disabled at (6095): [<ffffffff91a8dd4d>] irq_exit+0xcd/0xe0
       [96049.833965] ---[ end trace db7b05f01b7fa10c ]---
       [96049.935816] R13: 0000000000000000 R14: 00005562e5259240 R15: 00007ffff092b910
       [96049.935822] irq event stamp: 6584
       [96049.935823] hardirqs last  enabled at (6583): [<ffffffff91b00663>] console_unlock+0x503/0x640
       [96049.935825] hardirqs last disabled at (6584): [<ffffffff91a037dd>] trace_hardirqs_off_thunk+0x1a/0x1c
       [96049.935827] softirqs last  enabled at (6328): [<ffffffff92600370>] __do_softirq+0x370/0x421
       [96049.935828] softirqs last disabled at (6313): [<ffffffff91a8dd4d>] irq_exit+0xcd/0xe0
       [96049.935829] ---[ end trace db7b05f01b7fa123 ]---
       [96049.935840] ------------[ cut here ]------------
       [96049.936065] WARNING: CPU: 1 PID: 24463 at fs/btrfs/extent_io.c:436 insert_state+0x101/0x120 [btrfs]
       [96049.936107] CPU: 1 PID: 24463 Comm: umount Tainted: G        W         4.19.0-rc7-btrfs-next-39 #1
       [96049.936108] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.11.2-0-gf9626ccb91-prebuilt.qemu-project.org 04/01/2014
       [96049.936117] RIP: 0010:insert_state+0x101/0x120 [btrfs]
       [96049.936119] RSP: 0018:ffffafd2c3637bc0 EFLAGS: 00010282
       [96049.936120] RAX: 0000000000000000 RBX: 000000000004dfff RCX: 0000000000000006
       [96049.936121] RDX: 0000000000000007 RSI: ffff990445cf88e0 RDI: ffff99047b2968a0
       [96049.936122] RBP: ffff990457851cd0 R08: 0000000000000001 R09: 0000000000000000
       [96049.936123] R10: ffffafd2c3637b88 R11: 0000000000000000 R12: ffff9904574301e8
       [96049.936124] R13: 000000000004e000 R14: ffffafd2c3637c50 R15: ffffafd2c3637c48
       [96049.936125] FS:  00007fe4b87e72c0(0000) GS:ffff99047b280000(0000) knlGS:0000000000000000
       [96049.936126] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
       [96049.936128] CR2: 00005562e52618d8 CR3: 00000001151c8005 CR4: 00000000003606e0
       [96049.936129] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
       [96049.936131] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
       [96049.936131] Call Trace:
       [96049.936141]  __set_extent_bit+0x46c/0x6a0 [btrfs]
       [96049.936154]  lock_extent_bits+0x6b/0x210 [btrfs]
       [96049.936167]  btrfs_evict_inode+0x1e1/0x5a0 [btrfs]
       [96049.936172]  evict+0xbf/0x1c0
       [96049.936174]  dispose_list+0x51/0x80
       [96049.936176]  evict_inodes+0x193/0x1c0
       [96049.936180]  generic_shutdown_super+0x3f/0x110
       [96049.936182]  kill_anon_super+0xe/0x30
       [96049.936189]  btrfs_kill_super+0x13/0x100 [btrfs]
       [96049.936191]  deactivate_locked_super+0x3a/0x70
       [96049.936193]  cleanup_mnt+0x3b/0x80
       [96049.936195]  task_work_run+0x93/0xc0
       [96049.936198]  exit_to_usermode_loop+0xfa/0x100
       [96049.936201]  do_syscall_64+0x17f/0x1b0
       [96049.936202]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
       [96049.936204] RIP: 0033:0x7fe4b80cfb37
       [96049.936206] RSP: 002b:00007ffff092b688 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6
       [96049.936207] RAX: 0000000000000000 RBX: 00005562e5259060 RCX: 00007fe4b80cfb37
       [96049.936208] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 00005562e525faa0
       [96049.936209] RBP: 00005562e525faa0 R08: 00005562e525f770 R09: 0000000000000015
       [96049.936210] R10: 00000000000006b4 R11: 0000000000000246 R12: 00007fe4b85d1e64
       [96049.936211] R13: 0000000000000000 R14: 00005562e5259240 R15: 00007ffff092b910
       [96049.936211] R13: 0000000000000000 R14: 00005562e5259240 R15: 00007ffff092b910
       [96049.936216] irq event stamp: 6616
       [96049.936219] hardirqs last  enabled at (6615): [<ffffffff91b00663>] console_unlock+0x503/0x640
       [96049.936219] hardirqs last disabled at (6616): [<ffffffff91a037dd>] trace_hardirqs_off_thunk+0x1a/0x1c
       [96049.936222] softirqs last  enabled at (6328): [<ffffffff92600370>] __do_softirq+0x370/0x421
       [96049.936222] softirqs last disabled at (6313): [<ffffffff91a8dd4d>] irq_exit+0xcd/0xe0
       [96049.936223] ---[ end trace db7b05f01b7fa124 ]---
      
      The second stack trace, from inode eviction, is repeated forever due to
      the infinite loop during eviction.
      
      This is the same type of problem fixed way back in 2015 by commit
      113e8283 ("Btrfs: fix inode eviction infinite loop after extent_same
      ioctl") and commit ccccf3d6 ("Btrfs: fix inode eviction infinite loop
      after cloning into it").
      
      So fix this by returning immediately if the deduplication range length
      gets rounded down to 0 bytes, as there is nothing that needs to be done in
      such case.
      
      Example reproducer:
      
       $ mkfs.btrfs -f /dev/sdb
       $ mount /dev/sdb /mnt
      
       $ xfs_io -f -c "pwrite -S 0xe6 0 100" /mnt/foo
       $ xfs_io -f -c "pwrite -S 0xe6 0 1M" /mnt/bar
      
       # Unmount the filesystem and mount it again so that we start without any
       # extent state records when we ask for the deduplication.
       $ umount /mnt
       $ mount /dev/sdb /mnt
      
       $ xfs_io -c "dedupe /mnt/foo 0 500K 100" /mnt/bar
      
       # This unmount triggers the infinite loop.
       $ umount /mnt
      
      A test case for fstests will follow soon.
      
      Fixes: de02b9f6
      
       ("Btrfs: fix data corruption when deduplicating between different files")
      CC: <stable@vger.kernel.org> # 4.19+
      Reviewed-by: default avatarNikolay Borisov <nborisov@suse.com>
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c7a082fb
    • Robbie Ko's avatar
      Btrfs: fix cur_offset in the error case for nocow · 6dcd34f1
      Robbie Ko authored
      commit 506481b2
      
       upstream.
      
      When the cow_file_range fails, the related resources are unlocked
      according to the range [start..end), so the unlock cannot be repeated in
      run_delalloc_nocow.
      
      In some cases (e.g. cur_offset <= end && cow_start != -1), cur_offset is
      not updated correctly, so move the cur_offset update before
      cow_file_range.
      
        kernel BUG at mm/page-writeback.c:2663!
        Internal error: Oops - BUG: 0 [#1] SMP
        CPU: 3 PID: 31525 Comm: kworker/u8:7 Tainted: P O
        Hardware name: Realtek_RTD1296 (DT)
        Workqueue: writeback wb_workfn (flush-btrfs-1)
        task: ffffffc076db3380 ti: ffffffc02e9ac000 task.ti: ffffffc02e9ac000
        PC is at clear_page_dirty_for_io+0x1bc/0x1e8
        LR is at clear_page_dirty_for_io+0x14/0x1e8
        pc : [<ffffffc00033c91c>] lr : [<ffffffc00033c774>] pstate: 40000145
        sp : ffffffc02e9af4f0
        Process kworker/u8:7 (pid: 31525, stack limit = 0xffffffc02e9ac020)
        Call trace:
        [<ffffffc00033c91c>] clear_page_dirty_for_io+0x1bc/0x1e8
        [<ffffffbffc514674>] extent_clear_unlock_delalloc+0x1e4/0x210 [btrfs]
        [<ffffffbffc4fb168>] run_delalloc_nocow+0x3b8/0x948 [btrfs]
        [<ffffffbffc4fb948>] run_delalloc_range+0x250/0x3a8 [btrfs]
        [<ffffffbffc514c0c>] writepage_delalloc.isra.21+0xbc/0x1d8 [btrfs]
        [<ffffffbffc516048>] __extent_writepage+0xe8/0x248 [btrfs]
        [<ffffffbffc51630c>] extent_write_cache_pages.isra.17+0x164/0x378 [btrfs]
        [<ffffffbffc5185a8>] extent_writepages+0x48/0x68 [btrfs]
        [<ffffffbffc4f5828>] btrfs_writepages+0x20/0x30 [btrfs]
        [<ffffffc00033d758>] do_writepages+0x30/0x88
        [<ffffffc0003ba0f4>] __writeback_single_inode+0x34/0x198
        [<ffffffc0003ba6c4>] writeback_sb_inodes+0x184/0x3c0
        [<ffffffc0003ba96c>] __writeback_inodes_wb+0x6c/0xc0
        [<ffffffc0003bac20>] wb_writeback+0x1b8/0x1c0
        [<ffffffc0003bb0f0>] wb_workfn+0x150/0x250
        [<ffffffc0002b0014>] process_one_work+0x1dc/0x388
        [<ffffffc0002b02f0>] worker_thread+0x130/0x500
        [<ffffffc0002b6344>] kthread+0x10c/0x110
        [<ffffffc000284590>] ret_from_fork+0x10/0x40
        Code: d503201f a9025bb5 a90363b7 f90023b9 (d4210000)
      
      CC: stable@vger.kernel.org # 4.4+
      Reviewed-by: default avatarFilipe Manana <fdmanana@suse.com>
      Signed-off-by: default avatarRobbie Ko <robbieko@synology.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6dcd34f1
    • Lu Fengqi's avatar
      btrfs: fix pinned underflow after transaction aborted · 84e7fe7b
      Lu Fengqi authored
      commit fcd5e742 upstream.
      
      When running generic/475, we may get the following warning in dmesg:
      
      [ 6902.102154] WARNING: CPU: 3 PID: 18013 at fs/btrfs/extent-tree.c:9776 btrfs_free_block_groups+0x2af/0x3b0 [btrfs]
      [ 6902.109160] CPU: 3 PID: 18013 Comm: umount Tainted: G        W  O      4.19.0-rc8+ #8
      [ 6902.110971] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
      [ 6902.112857] RIP: 0010:btrfs_free_block_groups+0x2af/0x3b0 [btrfs]
      [ 6902.118921] RSP: 0018:ffffc9000459bdb0 EFLAGS: 00010286
      [ 6902.120315] RAX: ffff880175050bb0 RBX: ffff8801124a8000 RCX: 0000000000170007
      [ 6902.121969] RDX: 0000000000000002 RSI: 0000000000170007 RDI: ffffffff8125fb74
      [ 6902.123716] RBP: ffff880175055d10 R08: 0000000000000000 R09: 0000000000000000
      [ 6902.125417] R10: 0000000000000000 R11: 0000000000000000 R12: ffff880175055d88
      [ 6902.127129] R13: ffff880175050bb0 R14: 0000000000000000 R15: dead000000000100
      [ 6902.129060] FS:  00007f4507223780(0000) GS:ffff88017ba00000(0000) knlGS:0000000000000000
      [ 6902.130996] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [ 6902.132558] CR2: 00005623599cac78 CR3: 000000014b700001 CR4: 00000000003606e0
      [ 6902.134270] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [ 6902.135981] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [ 6902.137836] Call Trace:
      [ 6902.138939]  close_ctree+0x171/0x330 [btrfs]
      [ 6902.140181]  ? kthread_stop+0x146/0x1f0
      [ 6902.141277]  generic_shutdown_super+0x6c/0x100
      [ 6902.142517]  kill_anon_super+0x14/0x30
      [ 6902.143554]  btrfs_kill_super+0x13/0x100 [btrfs]
      [ 6902.144790]  deactivate_locked_super+0x2f/0x70
      [ 6902.146014]  cleanup_mnt+0x3b/0x70
      [ 6902.147020]  task_work_run+0x9e/0xd0
      [ 6902.148036]  do_syscall_64+0x470/0x600
      [ 6902.149142]  ? trace_hardirqs_off_thunk+0x1a/0x1c
      [ 6902.150375]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
      [ 6902.151640] RIP: 0033:0x7f45077a6a7b
      [ 6902.157324] RSP: 002b:00007ffd589f3e68 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6
      [ 6902.159187] RAX: 0000000000000000 RBX: 000055e8eec732b0 RCX: 00007f45077a6a7b
      [ 6902.160834] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 000055e8eec73490
      [ 6902.162526] RBP: 0000000000000000 R08: 000055e8eec734b0 R09: 00007ffd589f26c0
      [ 6902.164141] R10: 0000000000000000 R11: 0000000000000246 R12: 000055e8eec73490
      [ 6902.165815] R13: 00007f4507ac61a4 R14: 0000000000000000 R15: 00007ffd589f40d8
      [ 6902.167553] irq event stamp: 0
      [ 6902.168998] hardirqs last  enabled at (0): [<0000000000000000>]           (null)
      [ 6902.170731] hardirqs last disabled at (0): [<ffffffff810cd810>] copy_process.part.55+0x3b0/0x1f00
      [ 6902.172773] softirqs last  enabled at (0): [<ffffffff810cd810>] copy_process.part.55+0x3b0/0x1f00
      [ 6902.174671] softirqs last disabled at (0): [<0000000000000000>]           (null)
      [ 6902.176407] ---[ end trace 463138c2986b275c ]---
      [ 6902.177636] BTRFS info (device dm-3): space_info 4 has 273465344 free, is not full
      [ 6902.179453] BTRFS info (device dm-3): space_info total=276824064, used=4685824, pinned=18446744073708158976, reserved=0, may_use=0, readonly=65536
      
      In the above line there's "pinned=18446744073708158976" which is an
      unsigned u64 value of -1392640, an obvious underflow.
      
      When transaction_kthread is running cleanup_transaction(), another
      fsstress is running btrfs_commit_transaction(). The
      btrfs_finish_extent_commit() may get the same range as
      btrfs_destroy_pinned_extent() got, which causes the pinned underflow.
      
      Fixes: d4b450cd
      
       ("Btrfs: fix race between transaction commit and empty block group removal")
      CC: stable@vger.kernel.org # 4.4+
      Reviewed-by: default avatarJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: default avatarLu Fengqi <lufq.fnst@cn.fujitsu.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      84e7fe7b
    • Mathieu Malaterre's avatar
      watchdog/core: Add missing prototypes for weak functions · ca35750b
      Mathieu Malaterre authored
      commit 81bd415c upstream.
      
      The split out of the hard lockup detector exposed two new weak functions,
      but no prototypes for them, which triggers the build warning:
      
        kernel/watchdog.c:109:12: warning: no previous prototype for ‘watchdog_nmi_enable’ [-Wmissing-prototypes]
        kernel/watchdog.c:115:13: warning: no previous prototype for ‘watchdog_nmi_disable’ [-Wmissing-prototypes]
      
      Add the prototypes.
      
      Fixes: 73ce0511
      
       ("kernel/watchdog.c: move hardlockup detector to separate file")
      Signed-off-by: default avatarMathieu Malaterre <malat@debian.org>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Babu Moger <babu.moger@oracle.com>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/20180606194232.17653-1-malat@debian.org
      
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ca35750b