1. 20 Sep, 2018 10 commits
    • Roman Gushchin's avatar
      mm: slowly shrink slabs with a relatively small number of objects · 172b06c3
      Roman Gushchin authored
      9092c71b ("mm: use sc->priority for slab shrink targets") changed the
      way that the target slab pressure is calculated and made it
      priority-based:
      
          delta = freeable >> priority;
          delta *= 4;
          do_div(delta, shrinker->seeks);
      
      The problem is that on a default priority (which is 12) no pressure is
      applied at all, if the number of potentially reclaimable objects is less
      than 4096 (1<<12).
      
      This causes the last objects on slab caches of no longer used cgroups to
      (almost) never get reclaimed.  It's obviously a waste of memory.
      
      It can be especially painful, if these stale objects are holding a
      reference to a dying cgroup.  Slab LRU lists are reparented on memcg
      offlining, but corresponding objects are still holding a reference to the
      dying cgroup.  If we don't scan these objects, the dying cgroup can't go
      away.  Most likely, the parent cgroup hasn't any directly charged objects,
      only remaining objects from dying children cgroups.  So it can easily hold
      a reference to hundreds of dying cgroups.
      
      If there are no big spikes in memory pressure, and new memory cgroups are
      created and destroyed periodically, this causes the number of dying
      cgroups grow steadily, causing a slow-ish and hard-to-detect memory
      "leak".  It's not a real leak, as the memory can be eventually reclaimed,
      but it could not happen in a real life at all.  I've seen hosts with a
      steadily climbing number of dying cgroups, which doesn't show any signs of
      a decline in months, despite the host is loaded with a production
      workload.
      
      It is an obvious waste of memory, and to prevent it, let's apply a minimal
      pressure even on small shrinker lists.  E.g.  if there are freeable
      objects, let's scan at least min(freeable, scan_batch) objects.
      
      This fix significantly improves a chance of a dying cgroup to be
      reclaimed, and together with some previous patches stops the steady growth
      of the dying cgroups number on some of our hosts.
      
      Link: http://lkml.kernel.org/r/20180905230759.12236-1-guro@fb.com
      Fixes: 9092c71b
      
       ("mm: use sc->priority for slab shrink targets")
      Signed-off-by: default avatarRoman Gushchin <guro@fb.com>
      Acked-by: default avatarRik van Riel <riel@surriel.com>
      Cc: Josef Bacik <jbacik@fb.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Shakeel Butt <shakeelb@google.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      172b06c3
    • YueHaibing's avatar
    • Joel Fernandes (Google)'s avatar
      mm: shmem.c: Correctly annotate new inodes for lockdep · b45d71fb
      Joel Fernandes (Google) authored
      Directories and inodes don't necessarily need to be in the same lockdep
      class.  For ex, hugetlbfs splits them out too to prevent false positives
      in lockdep.  Annotate correctly after new inode creation.  If its a
      directory inode, it will be put into a different class.
      
      This should fix a lockdep splat reported by syzbot:
      
      > ======================================================
      > WARNING: possible circular locking dependency detected
      > 4.18.0-rc8-next-20180810+ #36 Not tainted
      > ------------------------------------------------------
      > syz-executor900/4483 is trying to acquire lock:
      > 00000000d2bfc8fe (&sb->s_type->i_mutex_key#9){++++}, at: inode_lock
      > include/linux/fs.h:765 [inline]
      > 00000000d2bfc8fe (&sb->s_type->i_mutex_key#9){++++}, at:
      > shmem_fallocate+0x18b/0x12e0 mm/shmem.c:2602
      >
      > but task is already holding lock:
      > 0000000025208078 (ashmem_mutex){+.+.}, at: ashmem_shrink_scan+0xb4/0x630
      > drivers/staging/android/ashmem.c:448
      >
      > which lock already depends on the new lock.
      >
      > -> #2 (ashmem_mutex){+.+.}:
      >        __mutex_lock_common kernel/locking/mutex.c:925 [inline]
      >        __mutex_lock+0x171/0x1700 kernel/locking/mutex.c:1073
      >        mutex_lock_nested+0x16/0x20 kernel/locking/mutex.c:1088
      >        ashmem_mmap+0x55/0x520 drivers/staging/android/ashmem.c:361
      >        call_mmap include/linux/fs.h:1844 [inline]
      >        mmap_region+0xf27/0x1c50 mm/mmap.c:1762
      >        do_mmap+0xa10/0x1220 mm/mmap.c:1535
      >        do_mmap_pgoff include/linux/mm.h:2298 [inline]
      >        vm_mmap_pgoff+0x213/0x2c0 mm/util.c:357
      >        ksys_mmap_pgoff+0x4da/0x660 mm/mmap.c:1585
      >        __do_sys_mmap arch/x86/kernel/sys_x86_64.c:100 [inline]
      >        __se_sys_mmap arch/x86/kernel/sys_x86_64.c:91 [inline]
      >        __x64_sys_mmap+0xe9/0x1b0 arch/x86/kernel/sys_x86_64.c:91
      >        do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
      >        entry_SYSCALL_64_after_hwframe+0x49/0xbe
      >
      > -> #1 (&mm->mmap_sem){++++}:
      >        __might_fault+0x155/0x1e0 mm/memory.c:4568
      >        _copy_to_user+0x30/0x110 lib/usercopy.c:25
      >        copy_to_user include/linux/uaccess.h:155 [inline]
      >        filldir+0x1ea/0x3a0 fs/readdir.c:196
      >        dir_emit_dot include/linux/fs.h:3464 [inline]
      >        dir_emit_dots include/linux/fs.h:3475 [inline]
      >        dcache_readdir+0x13a/0x620 fs/libfs.c:193
      >        iterate_dir+0x48b/0x5d0 fs/readdir.c:51
      >        __do_sys_getdents fs/readdir.c:231 [inline]
      >        __se_sys_getdents fs/readdir.c:212 [inline]
      >        __x64_sys_getdents+0x29f/0x510 fs/readdir.c:212
      >        do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
      >        entry_SYSCALL_64_after_hwframe+0x49/0xbe
      >
      > -> #0 (&sb->s_type->i_mutex_key#9){++++}:
      >        lock_acquire+0x1e4/0x540 kernel/locking/lockdep.c:3924
      >        down_write+0x8f/0x130 kernel/locking/rwsem.c:70
      >        inode_lock include/linux/fs.h:765 [inline]
      >        shmem_fallocate+0x18b/0x12e0 mm/shmem.c:2602
      >        ashmem_shrink_scan+0x236/0x630 drivers/staging/android/ashmem.c:455
      >        ashmem_ioctl+0x3ae/0x13a0 drivers/staging/android/ashmem.c:797
      >        vfs_ioctl fs/ioctl.c:46 [inline]
      >        file_ioctl fs/ioctl.c:501 [inline]
      >        do_vfs_ioctl+0x1de/0x1720 fs/ioctl.c:685
      >        ksys_ioctl+0xa9/0xd0 fs/ioctl.c:702
      >        __do_sys_ioctl fs/ioctl.c:709 [inline]
      >        __se_sys_ioctl fs/ioctl.c:707 [inline]
      >        __x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:707
      >        do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
      >        entry_SYSCALL_64_after_hwframe+0x49/0xbe
      >
      > other info that might help us debug this:
      >
      > Chain exists of:
      >   &sb->s_type->i_mutex_key#9 --> &mm->mmap_sem --> ashmem_mutex
      >
      >  Possible unsafe locking scenario:
      >
      >        CPU0                    CPU1
      >        ----                    ----
      >   lock(ashmem_mutex);
      >                                lock(&mm->mmap_sem);
      >                                lock(ashmem_mutex);
      >   lock(&sb->s_type->i_mutex_key#9);
      >
      >  *** DEADLOCK ***
      >
      > 1 lock held by syz-executor900/4483:
      >  #0: 0000000025208078 (ashmem_mutex){+.+.}, at:
      > ashmem_shrink_scan+0xb4/0x630 drivers/staging/android/ashmem.c:448
      
      Link: http://lkml.kernel.org/r/20180821231835.166639-1-joel@joelfernandes.org
      
      Signed-off-by: default avatarJoel Fernandes (Google) <joel@joelfernandes.org>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Reviewed-by: default avatarNeilBrown <neilb@suse.com>
      Suggested-by: default avatarNeilBrown <neilb@suse.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b45d71fb
    • Dominique Martinet's avatar
      fs/proc/kcore.c: fix invalid memory access in multi-page read optimization · a1b3d2f2
      Dominique Martinet authored
      The 'm' kcore_list item could point to kclist_head, and it is incorrect to
      look at m->addr / m->size in this case.
      
      There is no choice but to run through the list of entries for every
      address if we did not find any entry in the previous iteration
      
      Reset 'm' to NULL in that case at Omar Sandoval's suggestion.
      
      [akpm@linux-foundation.org: add comment]
      Link: http://lkml.kernel.org/r/1536100702-28706-1-git-send-email-asmadeus@codewreck.org
      Fixes: bf991c22
      
       ("proc/kcore: optimize multiple page reads")
      Signed-off-by: default avatarDominique Martinet <asmadeus@codewreck.org>
      Reviewed-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Cc: Omar Sandoval <osandov@osandov.com>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Cc: Eric Biederman <ebiederm@xmission.com>
      Cc: James Morse <james.morse@arm.com>
      Cc: Bhupesh Sharma <bhsharma@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a1b3d2f2
    • Pasha Tatashin's avatar
      mm: disable deferred struct page for 32-bit arches · 889c695d
      Pasha Tatashin authored
      Deferred struct page init is needed only on systems with large amount of
      physical memory to improve boot performance.  32-bit systems do not
      benefit from this feature.
      
      Jiri reported a problem where deferred struct pages do not work well with
      x86-32:
      
      [    0.035162] Dentry cache hash table entries: 131072 (order: 7, 524288 bytes)
      [    0.035725] Inode-cache hash table entries: 65536 (order: 6, 262144 bytes)
      [    0.036269] Initializing CPU#0
      [    0.036513] Initializing HighMem for node 0 (00036ffe:0007ffe0)
      [    0.038459] page:f6780000 is uninitialized and poisoned
      [    0.038460] raw: ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff
      [    0.039509] page dumped because: VM_BUG_ON_PAGE(1 && PageCompound(page))
      [    0.040038] ------------[ cut here ]------------
      [    0.040399] kernel BUG at include/linux/page-flags.h:293!
      [    0.040823] invalid opcode: 0000 [#1] SMP PTI
      [    0.041166] CPU: 0 PID: 0 Comm: swapper Not tainted 4.19.0-rc1_pt_jiri #9
      [    0.041694] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-20171110_100015-anatol 04/01/2014
      [    0.042496] EIP: free_highmem_page+0x64/0x80
      [    0.042839] Code: 13 46 d8 c1 e8 18 5d 83 e0 03 8d 04 c0 c1 e0 06 ff 80 ec 5f 44 d8 c3 8d b4 26 00 00 00 00 ba 08 65 28 d8 89 d8 e8 fc 71 02 00 <0f> 0b 8d 76 00 8d bc 27 00 00 00 00 ba d0 b1 26 d8 89 d8 e8 e4 71
      [    0.044338] EAX: 0000003c EBX: f6780000 ECX: 00000000 EDX: d856cbe8
      [    0.044868] ESI: 0007ffe0 EDI: d838df20 EBP: d838df00 ESP: d838defc
      [    0.045372] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 EFLAGS: 00210086
      [    0.045913] CR0: 80050033 CR2: 00000000 CR3: 18556000 CR4: 00040690
      [    0.046413] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
      [    0.046913] DR6: fffe0ff0 DR7: 00000400
      [    0.047220] Call Trace:
      [    0.047419]  add_highpages_with_active_regions+0xbd/0x10d
      [    0.047854]  set_highmem_pages_init+0x5b/0x71
      [    0.048202]  mem_init+0x2b/0x1e8
      [    0.048460]  start_kernel+0x1d2/0x425
      [    0.048757]  i386_start_kernel+0x93/0x97
      [    0.049073]  startup_32_smp+0x164/0x168
      [    0.049379] Modules linked in:
      [    0.049626] ---[ end trace 337949378db0abbb ]---
      
      We free highmem pages before their struct pages are initialized:
      
      mem_init()
       set_highmem_pages_init()
        add_highpages_with_active_regions()
         free_highmem_page()
          .. Access uninitialized struct page here..
      
      Because there is no reason to have this feature on 32-bit systems, just
      disable it.
      
      Link: http://lkml.kernel.org/r/20180831150506.31246-1-pavel.tatashin@microsoft.com
      Fixes: 2e3ca40f
      
       ("mm: relax deferred struct page requirements")
      Signed-off-by: default avatarPavel Tatashin <pavel.tatashin@microsoft.com>
      Reported-by: default avatarJiri Slaby <jslaby@suse.cz>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      889c695d
    • KJ Tsanaktsidis's avatar
      fork: report pid exhaustion correctly · f83606f5
      KJ Tsanaktsidis authored
      Make the clone and fork syscalls return EAGAIN when the limit on the
      number of pids /proc/sys/kernel/pid_max is exceeded.
      
      Currently, when the pid_max limit is exceeded, the kernel will return
      ENOSPC from the fork and clone syscalls.  This is contrary to the
      documented behaviour, which explicitly calls out the pid_max case as one
      where EAGAIN should be returned.  It also leads to really confusing error
      messages in userspace programs which will complain about a lack of disk
      space when they fail to create processes/threads for this reason.
      
      This error is being returned because alloc_pid() uses the idr api to find
      a new pid; when there are none available, idr_alloc_cyclic() returns
      -ENOSPC, and this is being propagated back to userspace.
      
      This behaviour has been broken before, and was explicitly fixed in
      commit 35f71bc0 ("fork: report pid reservation failure properly"),
      so I think -EAGAIN is definitely the right thing to return in this case.
      The current behaviour change dates from commit 95846ecf ("pid:
      replace pid bitmap implementation with IDR AIP") and was I believe
      unintentional.
      
      This patch has no impact on the case where allocating a pid fails because
      the child reaper for the namespace is dead; that case will still return
      -ENOMEM.
      
      Link: http://lkml.kernel.org/r/20180903111016.46461-1-ktsanaktsidis@zendesk.com
      Fixes: 95846ecf
      
       ("pid: replace pid bitmap implementation with IDR AIP")
      Signed-off-by: default avatarKJ Tsanaktsidis <ktsanaktsidis@zendesk.com>
      Reviewed-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: Gargi Sharma <gs051095@gmail.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f83606f5
    • Miguel Ojeda's avatar
      Compiler Attributes: naked can be shared · ae596de1
      Miguel Ojeda authored
      The naked attribute is supported by at least gcc >= 4.6 (for ARM,
      which is the only current user), gcc >= 8 (for x86), clang >= 3.1
      and icc >= 13. See https://godbolt.org/z/350Dyc
      
      Therefore, move it out of compiler-gcc.h so that the definition
      is shared by all compilers.
      
      This also fixes Clang support for ARM32 --- 815f0ddb
      ("include/linux/compiler*.h: make compiler-*.h mutually exclusive").
      
      Fixes: 815f0ddb
      
       ("include/linux/compiler*.h: make compiler-*.h mutually exclusive")
      Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
      Cc: Eli Friedman <efriedma@codeaurora.org>
      Cc: Christopher Li <sparse@chrisli.org>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
      Cc: Joe Perches <joe@perches.com>
      Cc: Dominique Martinet <asmadeus@codewreck.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: linux-sparse@vger.kernel.org
      Suggested-by: default avatarArnd Bergmann <arnd@arndb.de>
      Tested-by: default avatarStefan Agner <stefan@agner.ch>
      Reviewed-by: default avatarStefan Agner <stefan@agner.ch>
      Reviewed-by: default avatarLuc Van Oostenryck <luc.vanoostenryck@gmail.com>
      Reviewed-by: default avatarNick Desaulniers <ndesaulniers@google.com>
      Signed-off-by: default avatarMiguel Ojeda <miguel.ojeda.sandonis@gmail.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ae596de1
    • Miguel Ojeda's avatar
      Compiler Attributes: naked was fixed in gcc 4.6 · d124b44f
      Miguel Ojeda authored
      Commit 9c695203 ("compiler-gcc.h: gcc-4.5 needs noclone
      and noinline on __naked functions") added noinline and noclone
      as a workaround for a gcc 4.5 bug, which was resolved in 4.6.0.
      
      Since now the minimum gcc supported version is 4.6,
      we can clean it up.
      
      See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=44290
      and https://godbolt.org/z/h6NMIL
      
      Fixes: 815f0ddb
      
       ("include/linux/compiler*.h: make compiler-*.h mutually exclusive")
      Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
      Cc: Eli Friedman <efriedma@codeaurora.org>
      Cc: Christopher Li <sparse@chrisli.org>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
      Cc: Joe Perches <joe@perches.com>
      Cc: Dominique Martinet <asmadeus@codewreck.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: linux-sparse@vger.kernel.org
      Tested-by: default avatarStefan Agner <stefan@agner.ch>
      Reviewed-by: default avatarStefan Agner <stefan@agner.ch>
      Reviewed-by: default avatarLuc Van Oostenryck <luc.vanoostenryck@gmail.com>
      Reviewed-by: default avatarNick Desaulniers <ndesaulniers@google.com>
      Signed-off-by: default avatarMiguel Ojeda <miguel.ojeda.sandonis@gmail.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d124b44f
    • Greg Kroah-Hartman's avatar
      Merge tag 'mtd/fixes-for-4.19-rc5' of git://git.infradead.org/linux-mtd · 4b92e7fd
      Greg Kroah-Hartman authored
      Boris writes:
        "- Fixes a bug in the ->read/write_reg() implementation of the m25p80
           driver
         - Make sure of_node_get/put() calls are balanced in the partition
           parsing code
         - Fix a race in the denali NAND controller driver
         - Fix false positive WARN_ON() in the marvell NAND controller driver"
      
      * tag 'mtd/fixes-for-4.19-rc5' of git://git.infradead.org/linux-mtd:
        mtd: devices: m25p80: Make sure the buffer passed in op is DMA-able
        mtd: partitions: fix unbalanced of_node_get/put()
        mtd: rawnand: denali: fix a race condition when DMA is kicked
        mtd: rawnand: marvell: prevent harmless warnings
      4b92e7fd
    • Greg Kroah-Hartman's avatar
      Merge tag 'sound-4.19-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound · d8292084
      Greg Kroah-Hartman authored
      Takashi writes:
        "sound fixes for 4.19-rc5
      
         here comes a collection of various fixes, mostly for stable-tree
         or regression fixes.
      
         Two relatively high LOCs are about the (rather simple) conversion of
         uapi integer types in topology API, and a regression fix about HDMI
         hotplug notification on AMD HD-audio.  The rest are all small
         individual fixes like ASoC Intel Skylake race condition, minor
         uninitialized page leak in emu10k1 ioctl, Firewire audio error paths,
         and so on."
      
      * tag 'sound-4.19-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (33 commits)
        ALSA: fireworks: fix memory leak of response buffer at error path
        ALSA: oxfw: fix memory leak of discovered stream formats at error path
        ALSA: oxfw: fix memory leak for model-dependent data at error path
        ALSA: bebob: fix memory leak for M-Audio FW1814 and ProjectMix I/O at error path
        ALSA: hda - Enable runtime PM only for discrete GPU
        ALSA: oxfw: fix memory leak of private data
        ALSA: firewire-tascam: fix memory leak of private data
        ALSA: firewire-digi00x: fix memory leak of private data
        sound: don't call skl_init_chip() to reset intel skl soc
        sound: enable interrupt after dma buffer initialization
        Revert "ASoC: Intel: Skylake: Acquire irq after RIRB allocation"
        ALSA: emu10k1: fix possible info leak to userspace on SNDRV_EMU10K1_IOCTL_INFO
        ASoC: cs4265: fix MMTLR Data switch control
        ASoC: AMD: Ensure reset bit is cleared before configuring
        ALSA: fireface: fix memory leak in ff400_switch_fetching_mode()
        ALSA: bebob: use address returned by kmalloc() instead of kernel stack for streaming DMA mapping
        ASoC: rsnd: don't fallback to PIO mode when -EPROBE_DEFER
        ASoC: rsnd: adg: care clock-frequency size
        ASoC: uniphier: change status to orphan
        ASoC: rsnd: fixup not to call clk_get/set under non-atomic
        ...
      d8292084
  2. 19 Sep, 2018 5 commits
  3. 18 Sep, 2018 12 commits
  4. 17 Sep, 2018 13 commits
    • Vaibhav Nagarnaik's avatar
      ring-buffer: Allow for rescheduling when removing pages · 83f36555
      Vaibhav Nagarnaik authored
      When reducing ring buffer size, pages are removed by scheduling a work
      item on each CPU for the corresponding CPU ring buffer. After the pages
      are removed from ring buffer linked list, the pages are free()d in a
      tight loop. The loop does not give up CPU until all pages are removed.
      In a worst case behavior, when lot of pages are to be freed, it can
      cause system stall.
      
      After the pages are removed from the list, the free() can happen while
      the work is rescheduled. Call cond_resched() in the loop to prevent the
      system hangup.
      
      Link: http://lkml.kernel.org/r/20180907223129.71994-1-vnagarnaik@google.com
      
      Cc: stable@vger.kernel.org
      Fixes: 83f40318
      
       ("ring-buffer: Make removal of ring buffer pages atomic")
      Reported-by: default avatarJason Behmer <jbehmer@google.com>
      Signed-off-by: default avatarVaibhav Nagarnaik <vnagarnaik@google.com>
      Signed-off-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      83f36555
    • Greg Kroah-Hartman's avatar
      Merge tag 'spi-fix-v4.19-rc4' of https://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi · 3918c21e
      Greg Kroah-Hartman authored
      Mark writes:
        "spi: Fixes for v4.19
      
        As well as one driver fix there's a couple of fixes here which address
        issues with the use of IDRs for allocation of dynamic bus numbers,
        ensuring that dynamic bus numbers interact well with static bus numbers
        assigned via DT and otherwise."
      
      * tag 'spi-fix-v4.19-rc4' of https://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
        spi: spi-fsl-dspi: fix broken DSPI_EOQ_MODE
        spi: Fix double IDR allocation with DT aliases
        spi: fix IDR collision on systems with both fixed and dynamic SPI bus numbers
      3918c21e
    • Takashi Iwai's avatar
      Merge tag 'asoc-v4.19-rc4' of... · 196f4eee
      Takashi Iwai authored
      Merge tag 'asoc-v4.19-rc4' of https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus
      
      ASoC: Fixes for v4.19
      
      This is the usual set of small fixes scatterd around various drivers,
      plus one fix for DAPM and a UAPI build fix.  There's not a huge amount
      that stands out here relative to anything else.
      196f4eee
    • zhong jiang's avatar
      net: ethernet: Fix a unused function warning. · c7348091
      zhong jiang authored
      
      
      Fix the following compile warning:
      
      drivers/net/ethernet/microchip/lan743x_main.c:2964:12: warning: ‘lan743x_pm_suspend’ defined but not used [-Wunused-function]
       static int lan743x_pm_suspend(struct device *dev)
      drivers/net/ethernet/microchip/lan743x_main.c:2987:12: warning: ‘lan743x_pm_resume’ defined but not used [-Wunused-function]
       static int lan743x_pm_resume(struct device *dev)
      Signed-off-by: default avatarzhong jiang <zhongjiang@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c7348091
    • Andrew Lunn's avatar
      net: dsa: mv88e6xxx: Fix ATU Miss Violation · ddca24df
      Andrew Lunn authored
      Fix a cut/paste error and a typo which results in ATU miss violations
      not being reported.
      
      Fixes: 0977644c
      
       ("net: dsa: mv88e6xxx: Decode ATU problem interrupt")
      Signed-off-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ddca24df
    • Daniel Borkmann's avatar
      tls: fix currently broken MSG_PEEK behavior · 50c6b58a
      Daniel Borkmann authored
      In kTLS MSG_PEEK behavior is currently failing, strace example:
      
        [pid  2430] socket(AF_INET, SOCK_STREAM, IPPROTO_IP) = 3
        [pid  2430] socket(AF_INET, SOCK_STREAM, IPPROTO_IP) = 4
        [pid  2430] bind(4, {sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("0.0.0.0")}, 16) = 0
        [pid  2430] listen(4, 10)               = 0
        [pid  2430] getsockname(4, {sa_family=AF_INET, sin_port=htons(38855), sin_addr=inet_addr("0.0.0.0")}, [16]) = 0
        [pid  2430] connect(3, {sa_family=AF_INET, sin_port=htons(38855), sin_addr=inet_addr("0.0.0.0")}, 16) = 0
        [pid  2430] setsockopt(3, SOL_TCP, 0x1f /* TCP_??? */, [7564404], 4) = 0
        [pid  2430] setsockopt(3, 0x11a /* SOL_?? */, 1, "\3\0033\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 40) = 0
        [pid  2430] accept(4, {sa_family=AF_INET, sin_port=htons(49636), sin_addr=inet_addr("127.0.0.1")}, [16]) = 5
        [pid  2430] setsockopt(5, SOL_TCP, 0x1f /* TCP_??? */, [7564404], 4) = 0
        [pid  2430] setsockopt(5, 0x11a /* SOL_?? */, 2, "\3\0033\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 40) = 0
        [pid  2430] close(4)                    = 0
        [pid  2430] sendto(3, "test_read_peek", 14, 0, NULL, 0) = 14
        [pid  2430] sendto(3, "_mult_recs\0", 11, 0, NULL, 0) = 11
        [pid  2430] recvfrom(5, "test_read_peektest_read_peektest"..., 64, MSG_PEEK, NULL, NULL) = 64
      
      As can be seen from strace, there are two TLS records sent,
      i) 'test_read_peek' and ii) '_mult_recs\0' where we end up
      peeking 'test_read_peektest_read_peektest'. This is clearly
      wrong, and what happens is that given peek cannot call into
      tls_sw_advance_skb() to unpause strparser and proceed with
      the next skb, we end up looping over the current one, copying
      the 'test_read_peek' over and over into the user provided
      buffer.
      
      Here, we can only peek into the currently held skb (current,
      full TLS record) as otherwise we would end up having to hold
      all the original skb(s) (depending on the peek depth) in a
      separate queue when unpausing strparser to process next
      records, minimally intrusive is to return only up to the
      current record's size (which likely was what c46234eb
      ("tls: RX path for ktls") originally intended as well). Thus,
      after patch we properly peek the first record:
      
        [pid  2046] wait4(2075,  <unfinished ...>
        [pid  2075] socket(AF_INET, SOCK_STREAM, IPPROTO_IP) = 3
        [pid  2075] socket(AF_INET, SOCK_STREAM, IPPROTO_IP) = 4
        [pid  2075] bind(4, {sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("0.0.0.0")}, 16) = 0
        [pid  2075] listen(4, 10)               = 0
        [pid  2075] getsockname(4, {sa_family=AF_INET, sin_port=htons(55115), sin_addr=inet_addr("0.0.0.0")}, [16]) = 0
        [pid  2075] connect(3, {sa_family=AF_INET, sin_port=htons(55115), sin_addr=inet_addr("0.0.0.0")}, 16) = 0
        [pid  2075] setsockopt(3, SOL_TCP, 0x1f /* TCP_??? */, [7564404], 4) = 0
        [pid  2075] setsockopt(3, 0x11a /* SOL_?? */, 1, "\3\0033\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 40) = 0
        [pid  2075] accept(4, {sa_family=AF_INET, sin_port=htons(45732), sin_addr=inet_addr("127.0.0.1")}, [16]) = 5
        [pid  2075] setsockopt(5, SOL_TCP, 0x1f /* TCP_??? */, [7564404], 4) = 0
        [pid  2075] setsockopt(5, 0x11a /* SOL_?? */, 2, "\3\0033\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 40) = 0
        [pid  2075] close(4)                    = 0
        [pid  2075] sendto(3, "test_read_peek", 14, 0, NULL, 0) = 14
        [pid  2075] sendto(3, "_mult_recs\0", 11, 0, NULL, 0) = 11
        [pid  2075] recvfrom(5, "test_read_peek", 64, MSG_PEEK, NULL, NULL) = 14
      
      Fixes: c46234eb
      
       ("tls: RX path for ktls")
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      50c6b58a
    • David S. Miller's avatar
      Merge branch 'hv_netvsc-associate-VF-and-PV-device-by-serial-number' · aa079bd0
      David S. Miller authored
      
      
      Stephen Hemminger says:
      
      ====================
      hv_netvsc: associate VF and PV device by serial number
      
      The Hyper-V implementation of PCI controller has concept of 32 bit serial number
      (not to be confused with PCI-E serial number).  This value is sent in the protocol
      from the host to indicate SR-IOV VF device is attached to a synthetic NIC.
      
      Using the serial number (instead of MAC address) to associate the two devices
      avoids lots of potential problems when there are duplicate MAC addresses from
      tunnels or layered devices.
      
      The patch set is broken into two parts, one is for the PCI controller
      and the other is for the netvsc device. Normally, these go through different
      trees but sending them together here for better review. The PCI changes
      were submitted previously, but the main review comment was "why do you
      need this?". This is why.
      
      v2 - slot name can be shorter.
           remove locking when creating pci_slots; see comment for explaination
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      aa079bd0
    • Stephen Hemminger's avatar
      hv_netvsc: pair VF based on serial number · 00d7ddba
      Stephen Hemminger authored
      
      
      Matching network device based on MAC address is problematic
      since a non VF network device can be creted with a duplicate MAC
      address causing confusion and problems.  The VMBus API does provide
      a serial number that is a better matching method.
      Signed-off-by: default avatarStephen Hemminger <sthemmin@microsoft.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      00d7ddba
    • Stephen Hemminger's avatar
      PCI: hv: support reporting serial number as slot information · a15f2c08
      Stephen Hemminger authored
      
      
      The Hyper-V host API for PCI provides a unique "serial number" which
      can be used as basis for sysfs PCI slot table. This can be useful
      for cases where userspace wants to find the PCI device based on
      serial number.
      
      When an SR-IOV NIC is added, the host sends an attach message
      with serial number. The kernel doesn't use the serial number, but
      it is useful when doing the same thing in a userspace driver such
      as the DPDK. By having /sys/bus/pci/slots/N it provides a direct
      way to find the matching PCI device.
      
      There maybe some cases where serial number is not unique such
      as when using GPU's. But the PCI slot infrastructure will handle
      that.
      
      This has a side effect which may also be useful. The common udev
      network device naming policy uses the slot information (rather
      than PCI address).
      Signed-off-by: default avatarStephen Hemminger <sthemmin@microsoft.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a15f2c08
    • Michael Chan's avatar
      bnxt_en: Fix VF mac address regression. · 28ea334b
      Michael Chan authored
      The recent commit to always forward the VF MAC address to the PF for
      approval may not work if the PF driver or the firmware is older.  This
      will cause the VF driver to fail during probe:
      
        bnxt_en 0000:00:03.0 (unnamed net_device) (uninitialized): hwrm req_type 0xf seq id 0x5 error 0xffff
        bnxt_en 0000:00:03.0 (unnamed net_device) (uninitialized): VF MAC address 00:00:17:02:05:d0 not approved by the PF
        bnxt_en 0000:00:03.0: Unable to initialize mac address.
        bnxt_en: probe of 0000:00:03.0 failed with error -99
      
      We fix it by treating the error as fatal only if the VF MAC address is
      locally generated by the VF.
      
      Fixes: 707e7e96
      
       ("bnxt_en: Always forward VF MAC address to the PF.")
      Reported-by: default avatarSeth Forshee <seth.forshee@canonical.com>
      Reported-by: default avatarSiwei Liu <loseweigh@gmail.com>
      Signed-off-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      28ea334b
    • Eric Dumazet's avatar
      ipv6: fix possible use-after-free in ip6_xmit() · bbd6528d
      Eric Dumazet authored
      In the unlikely case ip6_xmit() has to call skb_realloc_headroom(),
      we need to call skb_set_owner_w() before consuming original skb,
      otherwise we risk a use-after-free.
      
      Bring IPv6 in line with what we do in IPv4 to fix this.
      
      Fixes: 1da177e4
      
       ("Linux-2.6.12-rc2")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bbd6528d
    • Colin Ian King's avatar
      net: hp100: fix always-true check for link up state · a7f38002
      Colin Ian King authored
      
      
      The operation ~(p100_inb(VG_LAN_CFG_1) & HP100_LINK_UP) returns a value
      that is always non-zero and hence the wait for the link to drop always
      terminates prematurely.  Fix this by using a logical not operator instead
      of a bitwise complement.  This issue has been in the driver since
      pre-2.6.12-rc2.
      
      Detected by CoverityScan, CID#114157 ("Logical vs. bitwise operator")
      Signed-off-by: default avatarColin Ian King <colin.king@canonical.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a7f38002
    • Nicolas Ferre's avatar
      ARM: dts: at91: add new compatibility string for macb on sama5d3 · 321cc359
      Nicolas Ferre authored
      
      
      We need this new compatibility string as we experienced different behavior
      for this 10/100Mbits/s macb interface on this particular SoC.
      Backward compatibility is preserved as we keep the alternative strings.
      Signed-off-by: default avatarNicolas Ferre <nicolas.ferre@microchip.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      321cc359