1. 05 Nov, 2013 8 commits
    • Tom Zanussi's avatar
      tracing: Make register/unregister_ftrace_command __init · 38de93ab
      Tom Zanussi authored
      register/unregister_ftrace_command() are only ever called from __init
      functions, so can themselves be made __init.
      Also make register_snapshot_cmd() __init for the same reason.
      Link: http://lkml.kernel.org/r/d4042c8cadb7ae6f843ac9a89a24e1c6a3099727.1382620672.git.tom.zanussi@linux.intel.com
      Signed-off-by: default avatarTom Zanussi <tom.zanussi@linux.intel.com>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
    • Tom Zanussi's avatar
      tracing: Update event filters for multibuffer · f306cc82
      Tom Zanussi authored
      The trace event filters are still tied to event calls rather than
      event files, which means you don't get what you'd expect when using
      filters in the multibuffer case:
        # echo 'bytes_alloc > 8192' > /sys/kernel/debug/tracing/events/kmem/kmalloc/filter
        # cat /sys/kernel/debug/tracing/events/kmem/kmalloc/filter
        bytes_alloc > 8192
        # mkdir /sys/kernel/debug/tracing/instances/test1
        # echo 'bytes_alloc > 2048' > /sys/kernel/debug/tracing/instances/test1/events/kmem/kmalloc/filter
        # cat /sys/kernel/debug/tracing/events/kmem/kmalloc/filter
        bytes_alloc > 2048
        # cat /sys/kernel/debug/tracing/instances/test1/events/kmem/kmalloc/filter
        bytes_alloc > 2048
      Setting the filter in tracing/instances/test1/events shouldn't affect
      the same event in tracing/events as it does above.
        # echo 'bytes_alloc > 8192' > /sys/kernel/debug/tracing/events/kmem/kmalloc/filter
        # cat /sys/kernel/debug/tracing/events/kmem/kmalloc/filter
        bytes_alloc > 8192
        # mkdir /sys/kernel/debug/tracing/instances/test1
        # echo 'bytes_alloc > 2048' > /sys/kernel/debug/tracing/instances/test1/events/kmem/kmalloc/filter
        # cat /sys/kernel/debug/tracing/events/kmem/kmalloc/filter
        bytes_alloc > 8192
        # cat /sys/kernel/debug/tracing/instances/test1/events/kmem/kmalloc/filter
        bytes_alloc > 2048
      We'd like to just move the filter directly from ftrace_event_call to
      ftrace_event_file, but there are a couple cases that don't yet have
      multibuffer support and therefore have to continue using the current
      event_call-based filters.  For those cases, a new USE_CALL_FILTER bit
      is added to the event_call flags, whose main purpose is to keep the
      old behavior for those cases until they can be updated with
      multibuffer support; at that point, the USE_CALL_FILTER flag (and the
      new associated call_filter_check_discard() function) can go away.
      The multibuffer support also made filter_current_check_discard()
      redundant, so this change removes that function as well and replaces
      it with filter_check_discard() (or call_filter_check_discard() as
      Link: http://lkml.kernel.org/r/f16e9ce4270c62f46b2e966119225e1c3cca7e60.1382620672.git.tom.zanussi@linux.intel.com
      Signed-off-by: default avatarTom Zanussi <tom.zanussi@linux.intel.com>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
    • Jamie Iles's avatar
      recordmcount.pl: Add support for __fentry__ · f02b625d
      Jamie Iles authored
      With gcc 4.6.0 the -mfentry feature places the function profiling call
      at the start of the function. When this is used, the call is to
      __fentry__ and not mcount.  This is required for Ksplice as the C
      version of recordmcount doesn't insert section symbols for the
      __mcount_loc section so we fall back to the perl version.
      Based on 48bb5dc6 (ftrace: Make
      recordmcount.c handle __fentry__).
      Link: http://lkml.kernel.org/r/1383648129-10724-1-git-send-email-jamie.iles@oracle.com
      Signed-off-by: default avatarJamie Iles <jamie.iles@oracle.com>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
    • Steven Rostedt (Red Hat)'s avatar
      ftrace: Have control op function callback only trace when RCU is watching · b5aa3a47
      Steven Rostedt (Red Hat) authored
      Dave Jones reported that trinity would be able to trigger the following
      back trace:
       [ INFO: suspicious RCU usage. ]
       3.10.0-rc2+ #38 Not tainted
       include/linux/rcupdate.h:771 rcu_read_lock() used illegally while idle!
       other info that might help us debug this:
       RCU used illegally from idle CPU!  rcu_scheduler_active = 1, debug_locks = 0
       RCU used illegally from extended quiescent state!
       1 lock held by trinity-child1/18786:
        #0:  (rcu_read_lock){.+.+..}, at: [<ffffffff8113dd48>] __perf_event_overflow+0x108/0x310
       stack backtrace:
       CPU: 3 PID: 18786 Comm: trinity-child1 Not tainted 3.10.0-rc2+ #38
        0000000000000000 ffff88020767bac8 ffffffff816e2f6b ffff88020767baf8
        ffffffff810b5897 ffff88021de92520 0000000000000000 ffff88020767bbf8
        0000000000000000 ffff88020767bb78 ffffffff8113ded4 ffffffff8113dd48
       Call Trace:
        [<ffffffff816e2f6b>] dump_stack+0x19/0x1b
        [<ffffffff810b5897>] lockdep_rcu_suspicious+0xe7/0x120
        [<ffffffff8113ded4>] __perf_event_overflow+0x294/0x310
        [<ffffffff8113dd48>] ? __perf_event_overflow+0x108/0x310
        [<ffffffff81309289>] ? __const_udelay+0x29/0x30
        [<ffffffff81076054>] ? __rcu_read_unlock+0x54/0xa0
        [<ffffffff816f4000>] ? ftrace_call+0x5/0x2f
        [<ffffffff8113dfa1>] perf_swevent_overflow+0x51/0xe0
        [<ffffffff8113e08f>] perf_swevent_event+0x5f/0x90
        [<ffffffff8113e1c9>] perf_tp_event+0x109/0x4f0
        [<ffffffff8113e36f>] ? perf_tp_event+0x2af/0x4f0
        [<ffffffff81074630>] ? __rcu_read_lock+0x20/0x20
        [<ffffffff8112d79f>] perf_ftrace_function_call+0xbf/0xd0
        [<ffffffff8110e1e1>] ? ftrace_ops_control_func+0x181/0x210
        [<ffffffff81074630>] ? __rcu_read_lock+0x20/0x20
        [<ffffffff81100cae>] ? rcu_eqs_enter_common+0x5e/0x470
        [<ffffffff8110e1e1>] ftrace_ops_control_func+0x181/0x210
        [<ffffffff816f4000>] ftrace_call+0x5/0x2f
        [<ffffffff8110e229>] ? ftrace_ops_control_func+0x1c9/0x210
        [<ffffffff816f4000>] ? ftrace_call+0x5/0x2f
        [<ffffffff81074635>] ? debug_lockdep_rcu_enabled+0x5/0x40
        [<ffffffff81074635>] ? debug_lockdep_rcu_enabled+0x5/0x40
        [<ffffffff81100cae>] ? rcu_eqs_enter_common+0x5e/0x470
        [<ffffffff8110112a>] rcu_eqs_enter+0x6a/0xb0
        [<ffffffff81103673>] rcu_user_enter+0x13/0x20
        [<ffffffff8114541a>] user_enter+0x6a/0xd0
        [<ffffffff8100f6d8>] syscall_trace_leave+0x78/0x140
        [<ffffffff816f46af>] int_check_syscall_exit_work+0x34/0x3d
       ------------[ cut here ]------------
      Perf uses rcu_read_lock() but as the function tracer can trace functions
      even when RCU is not currently active, this makes the rcu_read_lock()
      used by perf ineffective.
      As perf is currently the only user of the ftrace_ops_control_func() and
      perf is also the only function callback that actively uses rcu_read_lock(),
      the quick fix is to prevent the ftrace_ops_control_func() from calling
      its callbacks if RCU is not active.
      With Paul's new "rcu_is_watching()" we can tell if RCU is active or not.
      Reported-by: default avatarDave Jones <davej@redhat.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
    • Steven Rostedt's avatar
      rcu: Do not trace rcu_is_watching() functions · 9418fb20
      Steven Rostedt authored
      As perf uses the rcu_read_lock() primitives for recording into its
      ring buffer, perf tracing can not be called when RCU in inactive.
      With the perf function tracing, there are functions that can be
      traced when RCU is not active, and perf must not have its function
      callback called when this is the case.
      Luckily, Paul McKenney has created a way to detect when RCU is
      active or not with the rcu_is_watching() function. Unfortunately,
      this function can also be traced, and if that happens it can cause
      a bit of overhead for the perf function calls that do the check.
      Recursion protection prevents anything bad from happening, but
      there is a bit of added overhead for every function being traced that
      must detect that the rcu_is_watching() is also being traced.
      As rcu_is_watching() is a helper routine and not part of the
      critical logic in RCU, it does not need to be traced in order to
      debug RCU itself. Add the "notrace" annotation to all the rcu_is_watching()
      calls such that we never trace it.
      Link: http://lkml.kernel.org/r/20131104202736.72dd8e45@gandalf.local.home
      Acked-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
    • Steven Rostedt (Red Hat)'s avatar
      Merge branch 'idle.2013.09.25a' of... · 44847da1
      Steven Rostedt (Red Hat) authored
      Merge branch 'idle.2013.09.25a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into HEAD
      Need to use Paul McKenney's "rcu_is_watching()" changes to fix
      a perf/ftrace bug.
    • Kevin Hao's avatar
      ftrace/x86: skip over the breakpoint for ftrace caller · ab4ead02
      Kevin Hao authored
      In commit 8a4d0a68
       "ftrace: Use breakpoint method to update ftrace
      caller", we choose to use breakpoint method to update the ftrace
      caller. But we also need to skip over the breakpoint in function
      ftrace_int3_handler() for them. Otherwise weird things would happen.
      Cc: stable@vger.kernel.org # 3.5+
      Signed-off-by: default avatarKevin Hao <haokexin@gmail.com>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
    • Cody P Schafer's avatar
      trace/trace_stat: use rbtree postorder iteration helper instead of opencoding · 9cd804ac
      Cody P Schafer authored
      Use rbtree_postorder_for_each_entry_safe() to destroy the rbtree instead
      of opencoding an alternate postorder iteration that modifies the tree
      Link: http://lkml.kernel.org/r/1383345566-25087-2-git-send-email-cody@linux.vnet.ibm.com
      Signed-off-by: default avatarCody P Schafer <cody@linux.vnet.ibm.com>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
  2. 19 Oct, 2013 5 commits
    • Namhyung Kim's avatar
      ftrace: Add set_graph_notrace filter · 29ad23b0
      Namhyung Kim authored
      The set_graph_notrace filter is analogous to set_ftrace_notrace and
      can be used for eliminating uninteresting part of function graph trace
      output.  It also works with set_graph_function nicely.
        # cd /sys/kernel/debug/tracing/
        # echo do_page_fault > set_graph_function
        # perf ftrace live true
         2)               |  do_page_fault() {
         2)               |    __do_page_fault() {
         2)   0.381 us    |      down_read_trylock();
         2)   0.055 us    |      __might_sleep();
         2)   0.696 us    |      find_vma();
         2)               |      handle_mm_fault() {
         2)               |        handle_pte_fault() {
         2)               |          __do_fault() {
         2)               |            filemap_fault() {
         2)               |              find_get_page() {
         2)   0.033 us    |                __rcu_read_lock();
         2)   0.035 us    |                __rcu_read_unlock();
         2)   1.696 us    |              }
         2)   0.031 us    |              __might_sleep();
         2)   2.831 us    |            }
         2)               |            _raw_spin_lock() {
         2)   0.046 us    |              add_preempt_count();
         2)   0.841 us    |            }
         2)   0.033 us    |            page_add_file_rmap();
         2)               |            _raw_spin_unlock() {
         2)   0.057 us    |              sub_preempt_count();
         2)   0.568 us    |            }
         2)               |            unlock_page() {
         2)   0.084 us    |              page_waitqueue();
         2)   0.126 us    |              __wake_up_bit();
         2)   1.117 us    |            }
         2)   7.729 us    |          }
         2)   8.397 us    |        }
         2)   8.956 us    |      }
         2)   0.085 us    |      up_read();
         2) + 12.745 us   |    }
         2) + 13.401 us   |  }
        # echo handle_mm_fault > set_graph_notrace
        # perf ftrace live true
         1)               |  do_page_fault() {
         1)               |    __do_page_fault() {
         1)   0.205 us    |      down_read_trylock();
         1)   0.041 us    |      __might_sleep();
         1)   0.344 us    |      find_vma();
         1)   0.069 us    |      up_read();
         1)   4.692 us    |    }
         1)   5.311 us    |  }
      Link: http://lkml.kernel.org/r/1381739066-7531-5-git-send-email-namhyung@kernel.org
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
    • Namhyung Kim's avatar
      ftrace: Narrow down the protected area of graph_lock · 6a10108b
      Namhyung Kim authored
      The parser set up is just a generic utility that uses local variables
      allocated by the function. There's no need to hold the graph_lock for
      this set up.
      This also makes the code simpler.
      Link: http://lkml.kernel.org/r/1381739066-7531-4-git-send-email-namhyung@kernel.org
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
    • Namhyung Kim's avatar
      ftrace: Introduce struct ftrace_graph_data · faf982a6
      Namhyung Kim authored
      The struct ftrace_graph_data is for generalizing the access to
      set_graph_function file.  This is a preparation for adding support to
      Link: http://lkml.kernel.org/r/1381739066-7531-3-git-send-email-namhyung@kernel.org
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
    • Namhyung Kim's avatar
      ftrace: Get rid of ftrace_graph_filter_enabled · 9aa72b4b
      Namhyung Kim authored
      The ftrace_graph_filter_enabled means that user sets function filter
      and it always has same meaning of ftrace_graph_count > 0.
      Link: http://lkml.kernel.org/r/1381739066-7531-2-git-send-email-namhyung@kernel.org
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
    • Steven Rostedt's avatar
      tracing: Fix potential out-of-bounds in trace_get_user() · 057db848
      Steven Rostedt authored
      Andrey reported the following report:
      ERROR: AddressSanitizer: heap-buffer-overflow on address ffff8800359c99f3
      ffff8800359c99f3 is located 0 bytes to the right of 243-byte region [ffff8800359c9900, ffff8800359c99f3)
      Accessed by thread T13003:
        #0 ffffffff810dd2da (asan_report_error+0x32a/0x440)
        #1 ffffffff810dc6b0 (asan_check_region+0x30/0x40)
        #2 ffffffff810dd4d3 (__tsan_write1+0x13/0x20)
        #3 ffffffff811cd19e (ftrace_regex_release+0x1be/0x260)
        #4 ffffffff812a1065 (__fput+0x155/0x360)
        #5 ffffffff812a12de (____fput+0x1e/0x30)
        #6 ffffffff8111708d (task_work_run+0x10d/0x140)
        #7 ffffffff810ea043 (do_exit+0x433/0x11f0)
        #8 ffffffff810eaee4 (do_group_exit+0x84/0x130)
        #9 ffffffff810eafb1 (SyS_exit_group+0x21/0x30)
        #10 ffffffff81928782 (system_call_fastpath+0x16/0x1b)
      Allocated by thread T5167:
        #0 ffffffff810dc778 (asan_slab_alloc+0x48/0xc0)
        #1 ffffffff8128337c (__kmalloc+0xbc/0x500)
        #2 ffffffff811d9d54 (trace_parser_get_init+0x34/0x90)
        #3 ffffffff811cd7b3 (ftrace_regex_open+0x83/0x2e0)
        #4 ffffffff811cda7d (ftrace_filter_open+0x2d/0x40)
        #5 ffffffff8129b4ff (do_dentry_open+0x32f/0x430)
        #6 ffffffff8129b668 (finish_open+0x68/0xa0)
        #7 ffffffff812b66ac (do_last+0xb8c/0x1710)
        #8 ffffffff812b7350 (path_openat+0x120/0xb50)
        #9 ffffffff812b8884 (do_filp_open+0x54/0xb0)
        #10 ffffffff8129d36c (do_sys_open+0x1ac/0x2c0)
        #11 ffffffff8129d4b7 (SyS_open+0x37/0x50)
        #12 ffffffff81928782 (system_call_fastpath+0x16/0x1b)
      Shadow bytes around the buggy address:
        ffff8800359c9700: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
        ffff8800359c9780: fd fd fd fd fd fd fd fd fa fa fa fa fa fa fa fa
        ffff8800359c9800: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
        ffff8800359c9880: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
        ffff8800359c9900: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
      =>ffff8800359c9980: 00 00 00 00 00 00 00 00 00 00 00 00 00 00[03]fb
        ffff8800359c9a00: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
        ffff8800359c9a80: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
        ffff8800359c9b00: fa fa fa fa fa fa fa fa 00 00 00 00 00 00 00 00
        ffff8800359c9b80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
        ffff8800359c9c00: 00 00 00 00 00 00 00 00 fa fa fa fa fa fa fa fa
      Shadow byte legend (one shadow byte represents 8 application bytes):
        Addressable:           00
        Partially addressable: 01 02 03 04 05 06 07
        Heap redzone:          fa
        Heap kmalloc redzone:  fb
        Freed heap region:     fd
        Shadow gap:            fe
      The out-of-bounds access happens on 'parser->buffer[parser->idx] = 0;'
      Although the crash happened in ftrace_regex_open() the real bug
      occurred in trace_get_user() where there's an incrementation to
      parser->idx without a check against the size. The way it is triggered
      is if userspace sends in 128 characters (EVENT_BUF_SIZE + 1), the loop
      that reads the last character stores it and then breaks out because
      there is no more characters. Then the last character is read to determine
      what to do next, and the index is incremented without checking size.
      Then the caller of trace_get_user() usually nulls out the last character
      with a zero, but since the index is equal to the size, it writes a nul
      character after the allocated space, which can corrupt memory.
      Luckily, only root user has write access to this file.
      Link: http://lkml.kernel.org/r/20131009222323.04fd1a0d@gandalf.local.home
      Reported-by: default avatarAndrey Konovalov <andreyknvl@google.com>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
  3. 10 Oct, 2013 1 commit
  4. 06 Oct, 2013 4 commits
    • Linus Torvalds's avatar
      Linux 3.12-rc4 · d0e639c9
      Linus Torvalds authored
    • Eric W. Biederman's avatar
      net: Update the sysctl permissions handler to test effective uid/gid · 2433c8f0
      Eric W. Biederman authored
      Modify the code to use current_euid(), and in_egroup_p, as in done
      in fs/proc/proc_sysctl.c:test_perm()
      Cc: stable@vger.kernel.org
      Reviewed-by: default avatarEric Sandeen <sandeen@redhat.com>
      Reported-by: default avatarEric Sandeen <sandeen@redhat.com>
      Signed-off-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending · 13caa8ed
      Linus Torvalds authored
      Pull SCSI target fixes from Nicholas Bellinger:
       "Here are the outstanding target fixes queued up for v3.12-rc4 code.
        The highlights include:
         - Make vhost/scsi tag percpu_ida_alloc() use GFP_ATOMIC
         - Allow sess_cmd_map allocation failure fallback to use vzalloc
         - Fix COMPARE_AND_WRITE se_cmd->data_length bug with FILEIO backends
         - Fixes for COMPARE_AND_WRITE callback recursive failure OOPs + non
           zero scsi_status bug
         - Make iscsi-target do acknowledgement tag release from RX context
         - Setup iscsi-target with extra (cmdsn_depth / 2) percpu_ida tags
        Also included is a iscsi-target patch CC'ed for v3.10+ that avoids
        legacy wait_for_task=true release during fast-past StatSN
        acknowledgement, and two other SRP target related patches that address
        long-standing issues that are CC'ed for v3.3+.
        Extra thanks to Thomas Glanzmann for his testing feedback with
      * git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending:
        iscsi-target; Allow an extra tag_num / 2 number of percpu_ida tags
        iscsi-target: Perform release of acknowledged tags from RX context
        iscsi-target: Only perform wait_for_tasks when performing shutdown
        target: Fail on non zero scsi_status in compare_and_write_callback
        target: Fix recursive COMPARE_AND_WRITE callback failure
        target: Reset data_length for COMPARE_AND_WRITE to NoLB * block_size
        ib_srpt: always set response for task management
        target: Fall back to vzalloc upon ->sess_cmd_map kzalloc failure
        vhost/scsi: Use GFP_ATOMIC with percpu_ida_alloc for obtaining tag
        ib_srpt: Destroy cm_id before destroying QP.
        target: Fix xop->dbl assignment in target_xcopy_parse_segdesc_02
    • Linus Torvalds's avatar
      Merge branch 'fixes' of git://git.infradead.org/users/vkoul/slave-dma · 831ae3c1
      Linus Torvalds authored
      Pull slave-dmaengine fixes from Vinod Koul:
       "Here is the slave dmanegine fixes.  We have the fix for deadlock issue
        on imx-dma by Michael and Josh's edma config fix along with author
      * 'fixes' of git://git.infradead.org/users/vkoul/slave-dma:
        dmaengine: imx-dma: fix callback path in tasklet
        dmaengine: imx-dma: fix lockdep issue between irqhandler and tasklet
        dmaengine: imx-dma: fix slow path issue in prep_dma_cyclic
        dma/Kconfig: Make TI_EDMA select TI_PRIV_EDMA
        edma: Update author email address
  5. 05 Oct, 2013 9 commits
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs · e62063d6
      Linus Torvalds authored
      Pull btrfs fixes from Chris Mason:
       "This is a small collection of fixes, including a regression fix from
        Liu Bo that solves rare crashes with compression on.
        I've merged my for-linus up to 3.12-rc3 because the top commit is only
        meant for 3.12.  The rest of the fixes are also available in my master
        branch on top of my last 3.11 based pull"
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
        btrfs: Fix crash due to not allocating integrity data for a bioset
        Btrfs: fix a use-after-free bug in btrfs_dev_replace_finishing
        Btrfs: eliminate races in worker stopping code
        Btrfs: fix crash of compressed writes
        Btrfs: fix transid verify errors when recovering log tree
    • Linus Torvalds's avatar
      Merge tag 'gpio-v3.12-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio · 85f6d2db
      Linus Torvalds authored
      Pull GPIO fixes from Linus Walleij:
       "Two patches for the OMAP driver, dealing with setting up IRQs properly
        on the device tree boot path"
      * tag 'gpio-v3.12-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio:
        gpio/omap: auto-setup a GPIO when used as an IRQ
        gpio/omap: maintain GPIO and IRQ usage separately
    • Linus Torvalds's avatar
      Merge tag 'usb-3.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb · 4ed54764
      Linus Torvalds authored
      Pull USB fixes from Greg KH:
       "Here are none fixes for various USB driver problems.  The majority are
        gadget/musb fixes, but there are some new device ids in here as well"
      * tag 'usb-3.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
        usb: chipidea: add Intel Clovertrail pci id
        usb: gadget: s3c-hsotg: fix can_write limit for non-periodic endpoints
        usb: gadget: f_fs: fix error handling
        usb: musb: dsps: do not bind to "musb-hdrc"
        USB: serial: option: Ignore card reader interface on Huawei E1750
        usb: musb: gadget: fix otg active status flag
        usb: phy: gpio-vbus: fix deferred probe from __init
        usb: gadget: pxa25x_udc: fix deferred probe from __init
        usb: musb: fix otg default state
    • Linus Torvalds's avatar
      Merge tag 'tty-3.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty · e3757a1f
      Linus Torvalds authored
      Pull tty fixes from Greg KH:
       "Here are two tty driver fixes for 3.12-rc4.
        One fixes the reported regression in the n_tty code that a number of
        people found recently, and the other one fixes an issue with xen
        consoles that broke in 3.10"
      * tag 'tty-3.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
        xen/hvc: allow xenboot console to be used again
        tty: Fix pty master read() after slave closes
    • Linus Torvalds's avatar
      Merge tag 'staging-3.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging · 20fa7867
      Linus Torvalds authored
      Pull staging fixes from Greg KH:
       "Here are 4 tiny staging and iio driver fixes for 3.12-rc4.  Nothing
        major, just some small fixes for reported issues"
      * tag 'staging-3.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
        staging: comedi: ni_65xx: (bug fix) confine insn_bits to one subdevice
        iio:magnetometer: Bugfix magnetometer default output registers
        iio: Remove debugfs entries in iio_device_unregister()
        iio: amplifiers: ad8366: Remove regulator_put
    • Darrick J. Wong's avatar
      btrfs: Fix crash due to not allocating integrity data for a bioset · b208c2f7
      Darrick J. Wong authored
      When btrfs creates a bioset, we must also allocate the integrity data pool.
      Otherwise btrfs will crash when it tries to submit a bio to a checksumming
       BUG: unable to handle kernel NULL pointer dereference at 0000000000000018
       IP: [<ffffffff8111e28a>] mempool_alloc+0x4a/0x150
       PGD 2305e4067 PUD 23063d067 PMD 0
       Oops: 0000 [#1] PREEMPT SMP
       Modules linked in: btrfs scsi_debug xfs ext4 jbd2 ext3 jbd mbcache
      sch_fq_codel eeprom lpc_ich mfd_core nfsd exportfs auth_rpcgss af_packet
      raid6_pq xor zlib_deflate libcrc32c [last unloaded: scsi_debug]
       CPU: 1 PID: 4486 Comm: mount Not tainted 3.12.0-rc1-mcsum #2
       Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
       task: ffff8802451c9720 ti: ffff880230698000 task.ti: ffff880230698000
       RIP: 0010:[<ffffffff8111e28a>]  [<ffffffff8111e28a>] mempool_alloc+0x4a/0x150
       RSP: 0018:ffff880230699688  EFLAGS: 00010286
       RAX: 0000000000000001 RBX: 0000000000000000 RCX: 00000000005f8445
       RDX: 0000000000000001 RSI: 0000000000000010 RDI: 0000000000000000
       RBP: ffff8802306996f8 R08: 0000000000011200 R09: 0000000000000008
       R10: 0000000000000020 R11: ffff88009d6e8000 R12: 0000000000011210
       R13: 0000000000000030 R14: ffff8802306996b8 R15: ffff8802451c9720
       FS:  00007f25b8a16800(0000) GS:ffff88024fc80000(0000) knlGS:0000000000000000
       CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
       CR2: 0000000000000018 CR3: 0000000230576000 CR4: 00000000000007e0
        ffff8802451c9720 0000000000000002 ffffffff81a97100 0000000000281250
        ffffffff81a96480 ffff88024fc99150 ffff880228d18200 0000000000000000
        0000000000000000 0000000000000040 ffff880230e8c2e8 ffff8802459dc900
       Call Trace:
        [<ffffffff811b2208>] bio_integrity_alloc+0x48/0x1b0
        [<ffffffff811b26fc>] bio_integrity_prep+0xac/0x360
        [<ffffffff8111e298>] ? mempool_alloc+0x58/0x150
        [<ffffffffa03e8041>] ? alloc_extent_state+0x31/0x110 [btrfs]
        [<ffffffff81241579>] blk_queue_bio+0x1c9/0x460
        [<ffffffff8123e58a>] generic_make_request+0xca/0x100
        [<ffffffff8123e639>] submit_bio+0x79/0x160
        [<ffffffffa03f865e>] btrfs_map_bio+0x48e/0x5b0 [btrfs]
        [<ffffffffa03c821a>] btree_submit_bio_hook+0xda/0x110 [btrfs]
        [<ffffffffa03e7eba>] submit_one_bio+0x6a/0xa0 [btrfs]
        [<ffffffffa03ef450>] read_extent_buffer_pages+0x250/0x310 [btrfs]
        [<ffffffff8125eef6>] ? __radix_tree_preload+0x66/0xf0
        [<ffffffff8125f1c5>] ? radix_tree_insert+0x95/0x260
        [<ffffffffa03c66f6>] btree_read_extent_buffer_pages.constprop.128+0xb6/0x120
        [<ffffffffa03c8c1a>] read_tree_block+0x3a/0x60 [btrfs]
        [<ffffffffa03caefd>] open_ctree+0x139d/0x2030 [btrfs]
        [<ffffffffa03a282a>] btrfs_mount+0x53a/0x7d0 [btrfs]
        [<ffffffff8113ab0b>] ? pcpu_alloc+0x8eb/0x9f0
        [<ffffffff81167305>] ? __kmalloc_track_caller+0x35/0x1e0
        [<ffffffff81176ba0>] mount_fs+0x20/0xd0
        [<ffffffff81191096>] vfs_kern_mount+0x76/0x120
        [<ffffffff81193320>] do_mount+0x200/0xa40
        [<ffffffff81135cdb>] ? strndup_user+0x5b/0x80
        [<ffffffff81193bf0>] SyS_mount+0x90/0xe0
        [<ffffffff8156d31d>] system_call_fastpath+0x1a/0x1f
       Code: 4c 8d 75 a8 4c 89 6d e8 45 89 e0 4c 8d 6f 30 48 89 5d d8 41 83 e0 af 48
      89 fb 49 83 c6 18 4c 89 7d f8 65 4c 8b 3c 25 c0 b8 00 00 <48> 8b 73 18 44 89 c7
      44 89 45 98 ff 53 20 48 85 c0 48 89 c2 74
       RIP  [<ffffffff8111e28a>] mempool_alloc+0x4a/0x150
        RSP <ffff880230699688>
       CR2: 0000000000000018
       ---[ end trace 7a96042017ed21e2 ]---
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: default avatarJosef Bacik <jbacik@fusionio.com>
      Signed-off-by: default avatarChris Mason <chris.mason@fusionio.com>
    • Chris Mason's avatar
      Merge branch 'for-linus' into for-linus-3.12 · 1329dfc8
      Chris Mason authored
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.samba.org/sfrench/cifs-2.6 · a5c984cc
      Linus Torvalds authored
      Pull CIFS fixes from Steve French:
       "Small set of cifs fixes.  Most important is Jeff's fix that works
        around disconnection problems which can be caused by simultaneous use
        of user space tools (starting a long running smbclient backup then
        doing a cifs kernel mount) or multiple cifs mounts through a NAT, and
        Jim's fix to deal with reexport of cifs share.
        I expect to send two more cifs fixes next week (being tested now) -
        fixes to address an SMB2 unmount hang when server dies and a fix for
        cifs symlink handling of Windows "NFS" symlinks"
      * 'for-linus' of git://git.samba.org/sfrench/cifs-2.6:
        [CIFS] update cifs.ko version
        [CIFS] Remove ext2 flags that have been moved to fs.h
        [CIFS] Provide sane values for nlink
        cifs: stop trying to use virtual circuits
        CIFS: FS-Cache: Uncache unread pages in cifs_readpages() before freeing them
    • Linus Torvalds's avatar
      Merge tag 'pci-v3.12-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci · 95167aad
      Linus Torvalds authored
      Pull PCI fix from Bjorn Helgaas:
       "We merged what was intended to be an MMCONFIG cleanup, but in fact,
        for systems without _CBA (which is almost everything), it broke
        extended config space for domain 0 and it broke all config space for
        other domains.
        This reverts the change"
      * tag 'pci-v3.12-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
        Revert "x86/PCI: MMCONFIG: Check earlier for MMCONFIG region at address zero"
  6. 04 Oct, 2013 13 commits
    • Bjorn Helgaas's avatar
      Revert "x86/PCI: MMCONFIG: Check earlier for MMCONFIG region at address zero" · 67d470e0
      Bjorn Helgaas authored
      This reverts commit 07f9b61c.
      07f9b61c was intended to be a cleanup that didn't change anything, but in
      fact, for systems without _CBA (which is almost everything), it broke
      extended config space for domain 0 and all config space for other domains.
      Reference: http://lkml.kernel.org/r/20131004011806.GE20450@dangermouse.emea.sgi.com
      Reported-by: default avatarHedi Berriche <hedi@sgi.com>
      Signed-off-by: default avatarBjorn Helgaas <bhelgaas@google.com>
    • Linus Torvalds's avatar
      Merge tag 'pm+acpi-3.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm · 7dee8dff
      Linus Torvalds authored
      Pull ACPI and power management fixes from Rafael Wysocki:
       - The resume part of user space driven hibernation (s2disk) is now
         broken after the change that moved the creation of memory bitmaps to
         after the freezing of tasks, because I forgot that the resume utility
         loaded the image before freezing tasks and needed the bitmaps for
         that.  The fix adds special handling for that case.
       - One of recent commits changed the export of acpi_bus_get_device() to
         EXPORT_SYMBOL_GPL(), which was technically correct but broke existing
         binary modules using that function including one in particularly
         widespread use.  Change it back to EXPORT_SYMBOL().
       - The intel_pstate driver sometimes fails to disable turbo if its
         no_turbo sysfs attribute is set.  Fix from Srinivas Pandruvada.
       - One of recent cpufreq fixes forgot to update a check in cpufreq-cpu0
         which still (incorrectly) treats non-NULL as non-error.  Fix from
         Philipp Zabel.
       - The SPEAr cpufreq driver uses a wrong variable type in one place
         preventing it from catching errors returned by one of the functions
         called by it.  Fix from Sachin Kamat.
      * tag 'pm+acpi-3.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
        ACPI: Use EXPORT_SYMBOL() for acpi_bus_get_device()
        intel_pstate: fix no_turbo
        cpufreq: cpufreq-cpu0: NULL is a valid regulator, part 2
        cpufreq: SPEAr: Fix incorrect variable type
        PM / hibernate: Fix user space driven resume regression
    • Linus Torvalds's avatar
      Merge tag 'xfs-for-linus-v3.12-rc4' of git://oss.sgi.com/xfs/xfs · 3dbecf0a
      Linus Torvalds authored
      Pull xfs bugfixes from Ben Myers:
       "There are lockdep annotations for project quotas, a fix for dirent
        dtype support on v4 filesystems, a fix for a memory leak in recovery,
        and a fix for the build error that resulted from it.  D'oh"
      * tag 'xfs-for-linus-v3.12-rc4' of git://oss.sgi.com/xfs/xfs:
        xfs: Use kmem_free() instead of free()
        xfs: fix memory leak in xlog_recover_add_to_trans
        xfs: dirent dtype presence is dependent on directory magic numbers
        xfs: lockdep needs to know about 3 dquot-deep nesting
    • Linus Torvalds's avatar
      selinux: remove 'flags' parameter from avc_audit() · ab354062
      Linus Torvalds authored
      Now avc_audit() has no more users with that parameter. Remove it.
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    • Linus Torvalds's avatar
      selinux: avc_has_perm_flags has no more users · cb4fbe57
      Linus Torvalds authored
      .. so get rid of it.  The only indirect users were all the
      avc_has_perm() callers which just expanded to have a zero flags
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    • Ilya Dryomov's avatar
      Btrfs: fix a use-after-free bug in btrfs_dev_replace_finishing · 1357272f
      Ilya Dryomov authored
      free_device rcu callback, scheduled from btrfs_rm_dev_replace_srcdev,
      can be processed before btrfs_scratch_superblock is called, which would
      result in a use-after-free on btrfs_device contents.  Fix this by
      zeroing the superblock before the rcu callback is registered.
      Cc: Stefan Behrens <sbehrens@giantdisaster.de>
      Signed-off-by: default avatarIlya Dryomov <idryomov@gmail.com>
      Signed-off-by: default avatarJosef Bacik <jbacik@fusionio.com>
    • Ilya Dryomov's avatar
      Btrfs: eliminate races in worker stopping code · 964fb15a
      Ilya Dryomov authored
      The current implementation of worker threads in Btrfs has races in
      worker stopping code, which cause all kinds of panics and lockups when
      running btrfs/011 xfstest in a loop.  The problem is that
      btrfs_stop_workers is unsynchronized with respect to check_idle_worker,
      check_busy_worker and __btrfs_start_workers.
      E.g., check_idle_worker race flow:
             btrfs_stop_workers():            check_idle_worker(aworker):
      - grabs the lock
      - splices the idle list into the
        working list
      - removes the first worker from the
        working list
      - releases the lock to wait for
        its kthread's completion
                                        - grabs the lock
                                        - if aworker is on the working list,
                                          moves aworker from the working list
                                          to the idle list
                                        - releases the lock
      - grabs the lock
      - puts the worker
      - removes the second worker from the
        working list
              btrfs_stop_workers returns, aworker is on the idle list
                       FS is umounted, memory is freed
                    aworker is waken up, fireworks ensue
      With this applied, I wasn't able to trigger the problem in 48 hours,
      whereas previously I could reliably reproduce at least one of these
      races within an hour.
      Reported-by: default avatarDavid Sterba <dsterba@suse.cz>
      Signed-off-by: default avatarIlya Dryomov <idryomov@gmail.com>
      Signed-off-by: default avatarJosef Bacik <jbacik@fusionio.com>
    • Liu Bo's avatar
      Btrfs: fix crash of compressed writes · 385fe0be
      Liu Bo authored
      The crash[1] is found by xfstests/generic/208 with "-o compress",
      it's not reproduced everytime, but it does panic.
      The bug is quite interesting, it's actually introduced by a recent commit
      Btrfs: actually limit the size of delalloc range).
      Btrfs implements delay allocation, so during writeback, we
      (1) get a page A and lock it
      (2) search the state tree for delalloc bytes and lock all pages within the range
      (3) process the delalloc range, including find disk space and create
          ordered extent and so on.
      (4) submit the page A.
      It runs well in normal cases, but if we're in a racy case, eg.
      buffered compressed writes and aio-dio writes,
      sometimes we may fail to lock all pages in the 'delalloc' range,
      in which case, we need to fall back to search the state tree again with
      a smaller range limit(max_bytes = PAGE_CACHE_SIZE - offset).
      The mentioned commit has a side effect, that is, in the fallback case,
      we can find delalloc bytes before the index of the page we already have locked,
      so we're in the case of (delalloc_end <= *start) and return with (found > 0).
      This ends with not locking delalloc pages but making ->writepage still
      process them, and the crash happens.
      This fixes it by just thinking that we find nothing and returning to caller
      as the caller knows how to deal with it properly.
      ------------[ cut here ]------------
      kernel BUG at mm/page-writeback.c:2170!
      CPU: 2 PID: 11755 Comm: btrfs-delalloc- Tainted: G           O 3.11.0+ #8
      RIP: 0010:[<ffffffff810f5093>]  [<ffffffff810f5093>] clear_page_dirty_for_io+0x1e/0x83
      [ 4934.248731] Stack:
      [ 4934.248731]  ffff8801477e5dc8 ffffea00049b9f00 ffff8801869f9ce8 ffffffffa02b841a
      [ 4934.248731]  0000000000000000 0000000000000000 0000000000000fff 0000000000000620
      [ 4934.248731]  ffff88018db59c78 ffffea0005da8d40 ffffffffa02ff860 00000001810016c0
      [ 4934.248731] Call Trace:
      [ 4934.248731]  [<ffffffffa02b841a>] extent_range_clear_dirty_for_io+0xcf/0xf5 [btrfs]
      [ 4934.248731]  [<ffffffffa02a8889>] compress_file_range+0x1dc/0x4cb [btrfs]
      [ 4934.248731]  [<ffffffff8104f7af>] ? detach_if_pending+0x22/0x4b
      [ 4934.248731]  [<ffffffffa02a8bad>] async_cow_start+0x35/0x53 [btrfs]
      [ 4934.248731]  [<ffffffffa02c694b>] worker_loop+0x14b/0x48c [btrfs]
      [ 4934.248731]  [<ffffffffa02c6800>] ? btrfs_queue_worker+0x25c/0x25c [btrfs]
      [ 4934.248731]  [<ffffffff810608f5>] kthread+0x8d/0x95
      [ 4934.248731]  [<ffffffff81060868>] ? kthread_freezable_should_stop+0x43/0x43
      [ 4934.248731]  [<ffffffff814fe09c>] ret_from_fork+0x7c/0xb0
      [ 4934.248731]  [<ffffffff81060868>] ? kthread_freezable_should_stop+0x43/0x43
      [ 4934.248731] Code: ff 85 c0 0f 94 c0 0f b6 c0 59 5b 5d c3 0f 1f 44 00 00 55 48 89 e5 41 54 53 48 89 fb e8 2c de 00 00 49 89 c4 48 8b 03 a8 01 75 02 <0f> 0b 4d 85 e4 74 52 49 8b 84 24 80 00 00 00 f6 40 20 01 75 44
      [ 4934.248731] RIP  [<ffffffff810f5093>] clear_page_dirty_for_io+0x1e/0x83
      [ 4934.248731]  RSP <ffff8801869f9c48>
      [ 4934.280307] ---[ end trace 36f06d3f8750236a ]---
      Signed-off-by: default avatarLiu Bo <bo.li.liu@oracle.com>
      Signed-off-by: default avatarJosef Bacik <jbacik@fusionio.com>
    • Josef Bacik's avatar
      Btrfs: fix transid verify errors when recovering log tree · 60e7cd3a
      Josef Bacik authored
      If we crash with a log, remount and recover that log, and then crash before we
      can commit another transaction we will get transid verify errors on the next
      mount.  This is because we were not zero'ing out the log when we committed the
      transaction after recovery.  This is ok as long as we commit another transaction
      at some point in the future, but if you abort or something else goes wrong you
      can end up in this weird state because the recovery stuff says that the tree log
      should have a generation+1 of the super generation, which won't be the case of
      the transaction that was started for recovery.  Fix this by removing the check
      and _always_ zero out the log portion of the super when we commit a transaction.
      This fixes the transid verify issues I was seeing with my force errors tests.
      Signed-off-by: default avatarJosef Bacik <jbacik@fusionio.com>
    • Linus Torvalds's avatar
      selinux: remove 'flags' parameter from inode_has_perm · 19e49834
      Linus Torvalds authored
      Every single user passes in '0'.  I think we had non-zero users back in
      some stone age when selinux_inode_permission() was implemented in terms
      of inode_has_perm(), but that complicated case got split up into a
      totally separate code-path so that we could optimize the much simpler
      special cases.
      See commit 2e334057
       ("SELinux: delay initialization of audit data in
      selinux_inode_permission") for example.
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    • Thierry Reding's avatar
      xfs: Use kmem_free() instead of free() · b2a42f78
      Thierry Reding authored
      This fixes a build failure caused by calling the free() function which
      does not exist in the Linux kernel.
      Signed-off-by: default avatarThierry Reding <treding@nvidia.com>
      Reviewed-by: default avatarMark Tinguely <tinguely@sgi.com>
      Signed-off-by: default avatarBen Myers <bpm@sgi.com>
      (cherry picked from commit aaaae980)
    • tinguely@sgi.com's avatar
      xfs: fix memory leak in xlog_recover_add_to_trans · 9b3b77fe
      tinguely@sgi.com authored
      Free the memory in error path of xlog_recover_add_to_trans().
      Normally this memory is freed in recovery pass2, but is leaked
      in the error path.
      Signed-off-by: default avatarMark Tinguely <tinguely@sgi.com>
      Reviewed-by: default avatarEric Sandeen <sandeen@redhat.com>
      Signed-off-by: default avatarBen Myers <bpm@sgi.com>
      (cherry picked from commit 519ccb81)
    • Dave Chinner's avatar
      xfs: dirent dtype presence is dependent on directory magic numbers · 6d313498
      Dave Chinner authored
      The determination of whether a directory entry contains a dtype
      field originally was dependent on the filesystem having CRCs
      enabled. This meant that the format for dtype beign enabled could be
      determined by checking the directory block magic number rather than
      doing a feature bit check. This was useful in that it meant that we
      didn't need to pass a struct xfs_mount around to functions that
      were already supplied with a directory block header.
      Unfortunately, the introduction of dtype fields into the v4
      structure via a feature bit meant this "use the directory block
      magic number" method of discriminating the dirent entry sizes is
      broken. Hence we need to convert the places that use magic number
      checks to use feature bit checks so that they work correctly and not
      by chance.
      The current code works on v4 filesystems only because the dirent
      size roundup covers the extra byte needed by the dtype field in the
      places where this problem occurs.
      Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarBen Myers <bpm@sgi.com>
      Signed-off-by: default avatarBen Myers <bpm@sgi.com>
      (cherry picked from commit 367993e7)