1. 20 Jun, 2013 17 commits
  2. 13 Jun, 2013 14 commits
    • Linus Torvalds's avatar
      Merge tag 'acpi-3.10-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm · 25e33ed9
      Linus Torvalds authored
      Pull ACPI fix from Rafael Wysocki:
       "This is an alternative fix for the regression introduced in 3.9 whose
        previous fix had to be reverted right before 3.10-rc5, because it
        broke one of the Tony's machines.
      
        In this one the check is confined to the ACPI video driver (which is
        the only one causing the problem to happen in the first place) and the
        Tony's box shouldn't even notice it.
      
         - ACPI fix for an issue causing ACPI video driver to attempt to bind
           to devices it shouldn't touch from Rafael J Wysocki."
      
      * tag 'acpi-3.10-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
        ACPI / video: Do not bind to device objects with a scan handler
      25e33ed9
    • Linus Torvalds's avatar
      Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · cb03dc09
      Linus Torvalds authored
      Pull x86 fixes from Peter Anvin:
       "Another set of fixes, the biggest bit of this is yet another tweak to
        the UEFI anti-bricking code; apparently we finally got some feedback
        from Samsung as to what makes at least their systems fail.  This set
        should actually fix the boot regressions that some other systems (e.g.
        SGI) have exhibited.
      
        Other than that, there is a patch to avoid a panic with particularly
        unhappy memory layouts and two minor protocol fixes which may or may
        not be manifest bugs"
      
      * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86: Fix typo in kexec register clearing
        x86, relocs: Move __vvar_page from S_ABS to S_REL
        Modify UEFI anti-bricking code
        x86: Fix adjust_range_size_mask calling position
      cb03dc09
    • Linus Torvalds's avatar
      Merge branch 'rcu/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu · cb7e9704
      Linus Torvalds authored
      Pull RCU fixes from Paul McKenney:
       "I must confess that this past merge window was not RCU's best showing.
        This series contains three more fixes for RCU regressions:
      
         1.   A fix to __DECLARE_TRACE_RCU() that causes it to act as an
              interrupt from idle rather than as a task switch from idle.
              This change is needed due to the recent use of _rcuidle()
              tracepoints that can be invoked from interrupt handlers as well
              as from idle.  Without this fix, invoking _rcuidle() tracepoints
              from interrupt handlers results in splats and (more seriously)
              confusion on RCU's part as to whether a given CPU is idle or not.
              This confusion can in turn result in too-short grace periods and
              therefore random memory corruption.
      
         2.   A fix to a subtle deadlock that could result due to RCU doing
              a wakeup while holding one of its rcu_node structure's locks.
              Although the probability of occurrence is low, it really
              does happen.  The fix, courtesy of Steven Rostedt, uses
              irq_work_queue() to avoid the deadlock.
      
         3.   A fix to a silent deadlock (invisible to lockdep) due to the
              interaction of timeouts posted by RCU debug code enabled by
              CONFIG_PROVE_RCU_DELAY=y, grace-period initialization, and CPU
              hotplug operations.  This will not occur in production kernels,
              but really does occur in randconfig testing.  Diagnosis courtesy
              of Steven Rostedt"
      
      * 'rcu/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu:
        rcu: Fix deadlock with CPU hotplug, RCU GP init, and timer migration
        rcu: Don't call wakeup() with rcu_node structure ->lock held
        trace: Allow idle-safe tracepoints to be called from irq
      cb7e9704
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux · dcae7f2d
      Linus Torvalds authored
      Pull s390 fixes from Martin Schwidefsky:
       "Three kvm related memory management fixes, a fix for show_trace, a fix
        for early console output and a patch from Ben to help prevent compile
        errors in regard to irq functions (or our lack thereof)"
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
        s390/pci: Implement IRQ functions if !PCI
        s390/sclp: fix new line detection
        s390/pgtable: make pgste lock an explicit barrier
        s390/pgtable: Save pgste during modify_prot_start/commit
        s390/dumpstack: fix address ranges for asynchronous and panic stack
        s390/pgtable: Fix guest overindication for change bit
      dcae7f2d
    • Linus Torvalds's avatar
      Merge tag 'asoc-v3.10-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound · 509768f7
      Linus Torvalds authored
      Pull ASoC sound updates from Mark Brown:
       "Takashi is travelling at the minute and it'd be good to get the
        MAINTAINERS update in here merged so sending directly.
      
        As well as the usual driver specifics we've got a couple of core fixes
        here, one fixing capabilities for unidirectional streams and the other
        fixing suspend while audio streams are active.
      
        The suspend fix is a little involved but mostly as a result of
        removing some special casing that was doing the wrong thing."
      
      * tag 'asoc-v3.10-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound:
        ASoC: tlv320aic3x: Remove deadlock from snd_soc_dapm_put_volsw_aic3x()
        ASoC: dapm: Treat DAI widgets like AIF widgets for power
        ASoC: arizona: Correct AEC loopback enable
        ASoC: pcm: Require both CODEC and CPU support when declaring stream caps
        MAINTAINERS: Remove myself from Wolfson maintainers
        ASoC: wm8994: Ensure microphone detection state is reset on removal
        ASoC: wm8994: Avoid leaking pm_runtime reference on removed jack race
        ASoC: cs42l52: fix hp_gain_enum shift value.
        ASoC: cs42l52: use correct PCM mixer TLV dB scale to match datasheet.
      509768f7
    • Linus Torvalds's avatar
      Merge tag 'md-3.10-fixes' of git://neil.brown.name/md · 82ea4be6
      Linus Torvalds authored
      Pull md bugfixes from Neil Brown:
       "A few bugfixes for md
      
        Some tagged for -stable"
      
      * tag 'md-3.10-fixes' of git://neil.brown.name/md:
        md/raid1,5,10: Disable WRITE SAME until a recovery strategy is in place
        md/raid1,raid10: use freeze_array in place of raise_barrier in various places.
        md/raid1: consider WRITE as successful only if at least one non-Faulty and non-rebuilding drive completed it.
        md: md_stop_writes() should always freeze recovery.
      82ea4be6
    • Josh Triplett's avatar
      turbostat: Increase output buffer size to accommodate C8-C10 · b844db31
      Josh Triplett authored
      
      
      On platforms with C8-C10 support, the additional C-states cause
      turbostat to overrun its output buffer of 128 bytes per CPU.  Increase
      this to 256 bytes per CPU.
      
      [ As a bugfix, this should go into 3.10; however, since the C8-C10
        support didn't go in until after 3.9, this need not go into any stable
        kernel. ]
      Signed-off-by: default avatarJosh Triplett <josh@joshtriplett.org>
      Cc: Len Brown <len.brown@intel.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      b844db31
    • H. Peter Anvin's avatar
      Merge tag 'efi-urgent' into x86/urgent · 45df901c
      H. Peter Anvin authored
      
      
       * More tweaking to the EFI variable anti-bricking algorithm. Quite a
         few users were reporting boot regressions in v3.9. This has now been
         fixed with a more accurate "minimum storage requirement to avoid
         bricking" value from Samsung (5K instead of 50%) and code to trigger
         garbage collection when we near our limit - Matthew Garrett.
      Signed-off-by: default avatarH. Peter Anvin <hpa@linux.intel.com>
      45df901c
    • H. Peter Anvin's avatar
      md/raid1,5,10: Disable WRITE SAME until a recovery strategy is in place · 5026d7a9
      H. Peter Anvin authored
      
      
      There are cases where the kernel will believe that the WRITE SAME
      command is supported by a block device which does not, in fact,
      support WRITE SAME.  This currently happens for SATA drivers behind a
      SAS controller, but there are probably a hundred other ways that can
      happen, including drive firmware bugs.
      
      After receiving an error for WRITE SAME the block layer will retry the
      request as a plain write of zeroes, but mdraid will consider the
      failure as fatal and consider the drive failed.  This has the effect
      that all the mirrors containing a specific set of data are each
      offlined in very rapid succession resulting in data loss.
      
      However, just bouncing the request back up to the block layer isn't
      ideal either, because the whole initial request-retry sequence should
      be inside the write bitmap fence, which probably means that md needs
      to do its own conversion of WRITE SAME to write zero.
      
      Until the failure scenario has been sorted out, disable WRITE SAME for
      raid1, raid5, and raid10.
      
      [neilb: added raid5]
      
      This patch is appropriate for any -stable since 3.7 when write_same
      support was added.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarH. Peter Anvin <hpa@linux.intel.com>
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      5026d7a9
    • NeilBrown's avatar
      md/raid1,raid10: use freeze_array in place of raise_barrier in various places. · e2d59925
      NeilBrown authored
      Various places in raid1 and raid10 are calling raise_barrier when they
      really should call freeze_array.
      The former is only intended to be called from "make_request".
      The later has extra checks for 'nr_queued' and makes a call to
      flush_pending_writes(), so it is safe to call it from within the
      management thread.
      
      Using raise_barrier will sometimes deadlock.  Using freeze_array
      should not.
      
      As 'freeze_array' currently expects one request to be pending (in
      handle_read_error - the only previous caller), we need to pass
      it the number of pending requests (extra) to ignore.
      
      The deadlock was made particularly noticeable by commits
      050b6615 (raid10) and 6b740b8d
      
       (raid1) which
      appeared in 3.4, so the fix is appropriate for any -stable
      kernel since then.
      
      This patch probably won't apply directly to some early kernels and
      will need to be applied by hand.
      
      Cc: stable@vger.kernel.org
      Reported-by: default avatarAlexander Lyakas <alex.bolshoy@gmail.com>
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      e2d59925
    • Alex Lyakas's avatar
      md/raid1: consider WRITE as successful only if at least one non-Faulty and... · 3056e3ae
      Alex Lyakas authored
      
      md/raid1: consider WRITE as successful only if at least one non-Faulty and non-rebuilding drive completed it.
      
      Without that fix, the following scenario could happen:
      
      - RAID1 with drives A and B; drive B was freshly-added and is rebuilding
      - Drive A fails
      - WRITE request arrives to the array. It is failed by drive A, so
      r1_bio is marked as R1BIO_WriteError, but the rebuilding drive B
      succeeds in writing it, so the same r1_bio is marked as
      R1BIO_Uptodate.
      - r1_bio arrives to handle_write_finished, badblocks are disabled,
      md_error()->error() does nothing because we don't fail the last drive
      of raid1
      - raid_end_bio_io()  calls call_bio_endio()
      - As a result, in call_bio_endio():
              if (!test_bit(R1BIO_Uptodate, &r1_bio->state))
                      clear_bit(BIO_UPTODATE, &bio->bi_flags);
      this code doesn't clear the BIO_UPTODATE flag, and the whole master
      WRITE succeeds, back to the upper layer.
      
      So we returned success to the upper layer, even though we had written
      the data onto the rebuilding drive only. But when we want to read the
      data back, we would not read from the rebuilding drive, so this data
      is lost.
      
      [neilb - applied identical change to raid10 as well]
      
      This bug can result in lost data, so it is suitable for any
      -stable kernel.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarAlex Lyakas <alex@zadarastorage.com>
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      3056e3ae
    • NeilBrown's avatar
      md: md_stop_writes() should always freeze recovery. · 6b6204ee
      NeilBrown authored
      
      
      __md_stop_writes() will currently sometimes freeze recovery.
      So any caller must be ready for that to happen, and indeed they are.
      
      However if __md_stop_writes() doesn't freeze_recovery, then
      a recovery could start before mddev_suspend() is called, which
      could be awkward.  This can particularly cause problems or dm-raid.
      
      So change __md_stop_writes() to always freeze recovery.  This is safe
      and more predicatable.
      Reported-by: default avatarBrassow Jonathan <jbrassow@redhat.com>
      Tested-by: default avatarBrassow Jonathan <jbrassow@redhat.com>
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      6b6204ee
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 26e04462
      Linus Torvalds authored
      Pull networking update from David Miller:
      
       1) Fix dump iterator in nfnl_acct_dump() and ctnl_timeout_dump() to
          dump all objects properly, from Pablo Neira Ayuso.
      
       2) xt_TCPMSS must use the default MSS of 536 when no MSS TCP option is
          present.  Fix from Phil Oester.
      
       3) qdisc_get_rtab() looks for an existing matching rate table and uses
          that instead of creating a new one.  However, it's key matching is
          incomplete, it fails to check to make sure the ->data[] array is
          identical too.  Fix from Eric Dumazet.
      
       4) ip_vs_dest_entry isn't fully initialized before copying back to
          userspace, fix from Dan Carpenter.
      
       5) Fix ubuf reference counting regression in vhost_net, from Jason
          Wang.
      
       6) When sock_diag dumps a socket filter back to userspace, we have to
          translate it out of the kernel's internal representation first.
          From Nicolas Dichtel.
      
       7) davinci_mdio holds a spinlock while calling pm_runtime, which
          sleeps.  Fix from Sebastian Siewior.
      
       8) Timeout check in sh_eth_check_reset is off by one, from Sergei
          Shtylyov.
      
       9) If sctp socket init fails, we can NULL deref during cleanup.  Fix
          from Daniel Borkmann.
      
      10) netlink_mmap() does not propagate errors properly, from Patrick
          McHardy.
      
      11) Disable powersave and use minstrel by default in ath9k.  From Sujith
          Manoharan.
      
      12) Fix a regression in that SOCK_ZEROCOPY is not set on tuntap sockets
          which prevents vhost from being able to use zerocopy.  From Jason
          Wang.
      
      13) Fix race between port lookup and TX path in team driver, from Jiri
          Pirko.
      
      14) Missing length checks in bluetooth L2CAP packet parsing, from Johan
          Hedberg.
      
      15) rtlwifi fails to connect to networking using any encryption method
          other than WPA2.  Fix from Larry Finger.
      
      16) Fix iwlegacy build due to incorrect CONFIG_* ifdeffing for power
          management stuff.  From Yijing Wang.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (35 commits)
        b43: stop format string leaking into error msgs
        ath9k: Use minstrel rate control by default
        Revert "ath9k_hw: Update rx gain initval to improve rx sensitivity"
        ath9k: Disable PowerSave by default
        net: wireless: iwlegacy: fix build error for il_pm_ops
        rtlwifi: Fix a false leak indication for PCI devices
        wl12xx/wl18xx: scan all 5ghz channels
        wl12xx: increase minimum singlerole firmware version required
        wl12xx: fix minimum required firmware version for wl127x multirole
        rtlwifi: rtl8192cu: Fix problem in connecting to WEP or WPA(1) networks
        mwifiex: debugfs: Fix out of bounds array access
        Bluetooth: Fix mgmt handling of power on failures
        Bluetooth: Fix missing length checks for L2CAP signalling PDUs
        Bluetooth: btmrvl: support Marvell Bluetooth device SD8897
        Bluetooth: Fix checks for LE support on LE-only controllers
        team: fix checks in team_get_first_port_txable_rcu()
        team: move add to port list before port enablement
        team: check return value of team_get_port_by_index_rcu() for NULL
        tuntap: set SOCK_ZEROCOPY flag during open
        netlink: fix error propagation in netlink_mmap()
        ...
      26e04462
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid · 645a9929
      Linus Torvalds authored
      Pull input layer bugfix from Jiri Kosina:
       "Memory leak regression fix from Benjamin Tissoires"
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid:
        HID: multitouch: prevent memleak with the allocated name
      645a9929
  3. 12 Jun, 2013 9 commits
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.dk/linux-block · b2cc9c19
      Linus Torvalds authored
      Pull block layer fixes from Jens Axboe:
       "Outside of bcache (which really isn't super big), these are all
        few-liners.  There are a few important fixes in here:
      
         - Fix blk pm sleeping when holding the queue lock
      
         - A small collection of bcache fixes that have been done and tested
           since bcache was included in this merge window.
      
         - A fix for a raid5 regression introduced with the bio changes.
      
         - Two important fixes for mtip32xx, fixing an oops and potential data
           corruption (or hang) due to wrong bio iteration on stacked devices."
      
      * 'for-linus' of git://git.kernel.dk/linux-block:
        scatterlist: sg_set_buf() argument must be in linear mapping
        raid5: Initialize bi_vcnt
        pktcdvd: silence static checker warning
        block: remove refs to XD disks from documentation
        blkpm: avoid sleep when holding queue lock
        mtip32xx: Correctly handle bio->bi_idx != 0 conditions
        mtip32xx: Fix NULL pointer dereference during module unload
        bcache: Fix error handling in init code
        bcache: clarify free/available/unused space
        bcache: drop "select CLOSURES"
        bcache: Fix incompatible pointer type warning
      b2cc9c19
    • Linus Torvalds's avatar
      Merge branch 'akpm' (updates from Andrew Morton) · a568fa1c
      Linus Torvalds authored
      Merge misc fixes from Andrew Morton:
       "Bunch of fixes and one little addition to math64.h"
      
      * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (27 commits)
        include/linux/math64.h: add div64_ul()
        mm: memcontrol: fix lockless reclaim hierarchy iterator
        frontswap: fix incorrect zeroing and allocation size for frontswap_map
        kernel/audit_tree.c:audit_add_tree_rule(): protect `rule' from kill_rules()
        mm: migration: add migrate_entry_wait_huge()
        ocfs2: add missing lockres put in dlm_mig_lockres_handler
        mm/page_alloc.c: fix watermark check in __zone_watermark_ok()
        drivers/misc/sgi-gru/grufile.c: fix info leak in gru_get_config_info()
        aio: fix io_destroy() regression by using call_rcu()
        rtc-at91rm9200: use shadow IMR on at91sam9x5
        rtc-at91rm9200: add shadow interrupt mask
        rtc-at91rm9200: refactor interrupt-register handling
        rtc-at91rm9200: add configuration support
        rtc-at91rm9200: add match-table compile guard
        fs/ocfs2/namei.c: remove unecessary ERROR when removing non-empty directory
        swap: avoid read_swap_cache_async() race to deadlock while waiting on discard I/O completion
        drivers/rtc/rtc-twl.c: fix missing device_init_wakeup() when booted with device tree
        cciss: fix broken mutex usage in ioctl
        audit: wait_for_auditd() should use TASK_UNINTERRUPTIBLE
        drivers/rtc/rtc-cmos.c: fix accidentally enabling rtc channel
        ...
      a568fa1c
    • Alex Shi's avatar
      include/linux/math64.h: add div64_ul() · c2853c8d
      Alex Shi authored
      
      
      There is div64_long() to handle the s64/long division, but no mocro do
      u64/ul division.  It is necessary in some scenarios, so add this
      function.
      
      [akpm@linux-foundation.org: coding-style fixes]
      Signed-off-by: default avatarAlex Shi <alex.shi@intel.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      c2853c8d
    • Johannes Weiner's avatar
      mm: memcontrol: fix lockless reclaim hierarchy iterator · 89dc991f
      Johannes Weiner authored
      
      
      The lockless reclaim hierarchy iterator currently has a misplaced
      barrier that can lead to use-after-free crashes.
      
      The reclaim hierarchy iterator consist of a sequence count and a
      position pointer that are read and written locklessly, with memory
      barriers enforcing ordering.
      
      The write side sets the position pointer first, then updates the
      sequence count to "publish" the new position.  Likewise, the read side
      must read the sequence count first, then the position.  If the sequence
      count is up to date, it's guaranteed that the position is up to date as
      well:
      
        writer:                         reader:
        iter->position = position       if iter->sequence == expected:
        smp_wmb()                           smp_rmb()
        iter->sequence = sequence           position = iter->position
      
      However, the read side barrier is currently misplaced, which can lead to
      dereferencing stale position pointers that no longer point to valid
      memory.  Fix this.
      Signed-off-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Reported-by: default avatarTejun Heo <tj@kernel.org>
      Reviewed-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarMichal Hocko <mhocko@suse.cz>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Glauber Costa <glommer@parallels.com>
      Cc: <stable@kernel.org>		[3.10+]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      89dc991f
    • Akinobu Mita's avatar
      frontswap: fix incorrect zeroing and allocation size for frontswap_map · 7b57976d
      Akinobu Mita authored
      
      
      The bitmap accessed by bitops must have enough size to hold the required
      numbers of bits rounded up to a multiple of BITS_PER_LONG.  And the
      bitmap must not be zeroed by memset() if the number of bits cleared is
      not a multiple of BITS_PER_LONG.
      
      This fixes incorrect zeroing and allocation size for frontswap_map.  The
      incorrect zeroing part doesn't cause any problem because frontswap_map
      is freed just after zeroing.  But the wrongly calculated allocation size
      may cause the problem.
      
      For 32bit systems, the allocation size of frontswap_map is about twice
      as large as required size.  For 64bit systems, the allocation size is
      smaller than requeired if the number of bits is not a multiple of
      BITS_PER_LONG.
      Signed-off-by: default avatarAkinobu Mita <akinobu.mita@gmail.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      7b57976d
    • Chen Gang's avatar
      kernel/audit_tree.c:audit_add_tree_rule(): protect `rule' from kill_rules() · 736f3203
      Chen Gang authored
      
      
      audit_add_tree_rule() must set 'rule->tree = NULL;' firstly, to protect
      the rule itself freed in kill_rules().
      
      The reason is when it is killed, the 'rule' itself may have already
      released, we should not access it.  one example: we add a rule to an
      inode, just at the same time the other task is deleting this inode.
      
      The work flow for adding a rule:
      
          audit_receive() -> (need audit_cmd_mutex lock)
            audit_receive_skb() ->
              audit_receive_msg() ->
                audit_receive_filter() ->
                  audit_add_rule() ->
                    audit_add_tree_rule() -> (need audit_filter_mutex lock)
                      ...
                      unlock audit_filter_mutex
                      get_tree()
                      ...
                      iterate_mounts() -> (iterate all related inodes)
                        tag_mount() ->
                          tag_trunk() ->
                            create_trunk() -> (assume it is 1st rule)
                              fsnotify_add_mark() ->
                                fsnotify_add_inode_mark() ->  (add mark to inode->i_fsnotify_marks)
                              ...
                              get_tree(); (each inode will get one)
                      ...
                      lock audit_filter_mutex
      
      The work flow for deleting an inode:
      
          __destroy_inode() ->
           fsnotify_inode_delete() ->
             __fsnotify_inode_delete() ->
              fsnotify_clear_marks_by_inode() ->  (get mark from inode->i_fsnotify_marks)
                fsnotify_destroy_mark() ->
                 fsnotify_destroy_mark_locked() ->
                   audit_tree_freeing_mark() ->
                     evict_chunk() ->
                       ...
                       tree->goner = 1
                       ...
                       kill_rules() ->   (assume current->audit_context == NULL)
                         call_rcu() ->   (rule->tree != NULL)
                           audit_free_rule_rcu() ->
                             audit_free_rule()
                       ...
                       audit_schedule_prune() ->  (assume current->audit_context == NULL)
                         kthread_run() ->    (need audit_cmd_mutex and audit_filter_mutex lock)
                           prune_one() ->    (delete it from prue_list)
                             put_tree(); (match the original get_tree above)
      Signed-off-by: default avatarChen Gang <gang.chen@asianux.com>
      Cc: Eric Paris <eparis@redhat.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      736f3203
    • Naoya Horiguchi's avatar
      mm: migration: add migrate_entry_wait_huge() · 30dad309
      Naoya Horiguchi authored
      
      
      When we have a page fault for the address which is backed by a hugepage
      under migration, the kernel can't wait correctly and do busy looping on
      hugepage fault until the migration finishes.  As a result, users who try
      to kick hugepage migration (via soft offlining, for example) occasionally
      experience long delay or soft lockup.
      
      This is because pte_offset_map_lock() can't get a correct migration entry
      or a correct page table lock for hugepage.  This patch introduces
      migration_entry_wait_huge() to solve this.
      Signed-off-by: default avatarNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Reviewed-by: default avatarRik van Riel <riel@redhat.com>
      Reviewed-by: default avatarWanpeng Li <liwanp@linux.vnet.ibm.com>
      Reviewed-by: default avatarMichal Hocko <mhocko@suse.cz>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: <stable@vger.kernel.org>	[2.6.35+]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      30dad309
    • Xue jiufei's avatar
      ocfs2: add missing lockres put in dlm_mig_lockres_handler · 27749f2f
      Xue jiufei authored
      
      
      dlm_mig_lockres_handler() is missing a dlm_lockres_put() on an error path.
      Signed-off-by: default avatarjoyce <xuejiufei@huawei.com>
      Reviewed-by: default avatarshencanquan <shencanquan@huawei.com>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      27749f2f
    • Tomasz Stanislawski's avatar
      mm/page_alloc.c: fix watermark check in __zone_watermark_ok() · 026b0814
      Tomasz Stanislawski authored
      
      
      The watermark check consists of two sub-checks.  The first one is:
      
      	if (free_pages <= min + lowmem_reserve)
      		return false;
      
      The check assures that there is minimal amount of RAM in the zone.  If
      CMA is used then the free_pages is reduced by the number of free pages
      in CMA prior to the over-mentioned check.
      
      	if (!(alloc_flags & ALLOC_CMA))
      		free_pages -= zone_page_state(z, NR_FREE_CMA_PAGES);
      
      This prevents the zone from being drained from pages available for
      non-movable allocations.
      
      The second check prevents the zone from getting too fragmented.
      
      	for (o = 0; o < order; o++) {
      		free_pages -= z->free_area[o].nr_free << o;
      		min >>= 1;
      		if (free_pages <= min)
      			return false;
      	}
      
      The field z->free_area[o].nr_free is equal to the number of free pages
      including free CMA pages.  Therefore the CMA pages are subtracted twice.
      This may cause a false positive fail of __zone_watermark_ok() if the CMA
      area gets strongly fragmented.  In such a case there are many 0-order
      free pages located in CMA.  Those pages are subtracted twice therefore
      they will quickly drain free_pages during the check against
      fragmentation.  The test fails even though there are many free non-cma
      pages in the zone.
      
      This patch fixes this issue by subtracting CMA pages only for a purpose of
      (free_pages <= min + lowmem_reserve) check.
      
      Laura said:
      
        We were observing allocation failures of higher order pages (order 5 =
        128K typically) under tight memory conditions resulting in driver
        failure.  The output from the page allocation failure showed plenty of
        free pages of the appropriate order/type/zone and mostly CMA pages in
        the lower orders.
      
        For full disclosure, we still observed some page allocation failures
        even after applying the patch but the number was drastically reduced and
        those failures were attributed to fragmentation/other system issues.
      Signed-off-by: default avatarTomasz Stanislawski <t.stanislaws@samsung.com>
      Signed-off-by: default avatarKyungmin Park <kyungmin.park@samsung.com>
      Tested-by: default avatarLaura Abbott <lauraa@codeaurora.org>
      Cc: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
      Acked-by: default avatarMinchan Kim <minchan@kernel.org>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Tested-by: default avatarMarek Szyprowski <m.szyprowski@samsung.com>
      Cc: <stable@vger.kernel.org>	[3.7+]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      026b0814