1. 20 Feb, 2014 2 commits
  2. 14 Feb, 2014 2 commits
  3. 13 Feb, 2014 3 commits
  4. 12 Feb, 2014 1 commit
  5. 11 Feb, 2014 4 commits
  6. 10 Feb, 2014 8 commits
  7. 09 Feb, 2014 3 commits
  8. 07 Feb, 2014 7 commits
  9. 06 Feb, 2014 5 commits
    • Shaohua Li's avatar
      swap: add a simple detector for inappropriate swapin readahead · 579f8290
      Shaohua Li authored
      
      
      This is a patch to improve swap readahead algorithm.  It's from Hugh and
      I slightly changed it.
      
      Hugh's original changelog:
      
      swapin readahead does a blind readahead, whether or not the swapin is
      sequential.  This may be ok on harddisk, because large reads have
      relatively small costs, and if the readahead pages are unneeded they can
      be reclaimed easily - though, what if their allocation forced reclaim of
      useful pages? But on SSD devices large reads are more expensive than
      small ones: if the readahead pages are unneeded, reading them in caused
      significant overhead.
      
      This patch adds very simplistic random read detection.  Stealing the
      PageReadahead technique from Konstantin Khlebnikov's patch, avoiding the
      vma/anon_vma sophistications of Shaohua Li's patch, swapin_nr_pages()
      simply looks at readahead's current success rate, and narrows or widens
      its readahead window accordingly.  There is little science to its
      heuristic: it's about as stupid as can be whilst remaining effective.
      
      The table below shows elapsed times (in centiseconds) when running a
      single repetitive swapping load across a 1000MB mapping in 900MB ram
      with 1GB swap (the harddisk tests had taken painfully too long when I
      used mem=500M, but SSD shows similar results for that).
      
      Vanilla is the 3.6-rc7 kernel on which I started; Shaohua denotes his
      Sep 3 patch in mmotm and linux-next; HughOld denotes my Oct 1 patch
      which Shaohua showed to be defective; HughNew this Nov 14 patch, with
      page_cluster as usual at default of 3 (8-page reads); HughPC4 this same
      patch with page_cluster 4 (16-page reads); HughPC0 with page_cluster 0
      (1-page reads: no readahead).
      
      HDD for swapping to harddisk, SSD for swapping to VertexII SSD.  Seq for
      sequential access to the mapping, cycling five times around; Rand for
      the same number of random touches.  Anon for a MAP_PRIVATE anon mapping;
      Shmem for a MAP_SHARED anon mapping, equivalent to tmpfs.
      
      One weakness of Shaohua's vma/anon_vma approach was that it did not
      optimize Shmem: seen below.  Konstantin's approach was perhaps mistuned,
      50% slower on Seq: did not compete and is not shown below.
      
      HDD        Vanilla Shaohua HughOld HughNew HughPC4 HughPC0
      Seq Anon     73921   76210   75611   76904   78191  121542
      Seq Shmem    73601   73176   73855   72947   74543  118322
      Rand Anon   895392  831243  871569  845197  846496  841680
      Rand Shmem 1058375 1053486  827935  764955  764376  756489
      
      SSD        Vanilla Shaohua HughOld HughNew HughPC4 HughPC0
      Seq Anon     24634   24198   24673   25107   21614   70018
      Seq Shmem    24959   24932   25052   25703   22030   69678
      Rand Anon    43014   26146   28075   25989   26935   25901
      Rand Shmem   45349   45215   28249   24268   24138   24332
      
      These tests are, of course, two extremes of a very simple case: under
      heavier mixed loads I've not yet observed any consistent improvement or
      degradation, and wider testing would be welcome.
      
      Shaohua Li:
      
      Test shows Vanilla is slightly better in sequential workload than Hugh's
      patch.  I observed with Hugh's patch sometimes the readahead size is
      shrinked too fast (from 8 to 1 immediately) in sequential workload if
      there is no hit.  And in such case, continuing doing readahead is good
      actually.
      
      I don't prepare a sophisticated algorithm for the sequential workload
      because so far we can't guarantee sequential accessed pages are swap out
      sequentially.  So I slightly change Hugh's heuristic - don't shrink
      readahead size too fast.
      
      Here is my test result (unit second, 3 runs average):
      	Vanilla		Hugh		New
      Seq	356		370		360
      Random	4525		2447		2444
      
      Attached graph is the swapin/swapout throughput I collected with 'vmstat
      2'.  The first part is running a random workload (till around 1200 of
      the x-axis) and the second part is running a sequential workload.
      swapin and swapout throughput are almost identical in steady state in
      both workloads.  These are expected behavior.  while in Vanilla, swapin
      is much bigger than swapout especially in random workload (because wrong
      readahead).
      
      Original patches by: Shaohua Li and Konstantin Khlebnikov.
      
      [fengguang.wu@intel.com: swapin_nr_pages() can be static]
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Signed-off-by: default avatarShaohua Li <shli@fusionio.com>
      Signed-off-by: default avatarFengguang Wu <fengguang.wu@intel.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Wu Fengguang <fengguang.wu@intel.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Konstantin Khlebnikov <khlebnikov@openvz.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      579f8290
    • Pablo Neira Ayuso's avatar
      netfilter: nf_tables: fix racy rule deletion · 0165d932
      Pablo Neira Ayuso authored
      
      
      We may lost race if we flush the rule-set (which happens asynchronously
      via call_rcu) and we try to remove the table (that userspace assumes
      to be empty).
      
      Fix this by recovering synchronous rule and chain deletion. This was
      introduced time ago before we had no batch support, and synchronous
      rule deletion performance was not good. Now that we have the batch
      support, we can just postpone the purge of old rule in a second step
      in the commit phase. All object deletions are synchronous after this
      patch.
      
      As a side effect, we save memory as we don't need rcu_head per rule
      anymore.
      
      Cc: Patrick McHardy <kaber@trash.net>
      Reported-by: default avatarArturo Borrero Gonzalez <arturo.borrero.glez@gmail.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      0165d932
    • Lars-Peter Clausen's avatar
      gpio: consumer.h: Move forward declarations outside #ifdef · a3485d08
      Lars-Peter Clausen authored
      
      
      Make sure that the forward declared structs in gpio/consumer.h are also visible
      on the else branch of the CONFIG_GPIOLIB #ifdef.
      
      Fixes the following warnings and their associated errors when CONFIG_GPIOLIB is
      not selected:
      	include/linux/gpio/consumer.h:67:14: warning: 'struct device' declared inside parameter list
      	include/linux/gpio/consumer.h:67:14: warning: its scope is only this definition or declaration, which is probably not what you want
      	[...]
      Signed-off-by: default avatarLars-Peter Clausen <lars@metafoo.de>
      Reviewed-by: default avatarAlexandre Courbot <acourbot@nvidia.com>
      Signed-off-by: default avatarLinus Walleij <linus.walleij@linaro.org>
      a3485d08
    • Patrick McHardy's avatar
      netfilter: nf_tables: add reject module for NFPROTO_INET · 05513e9e
      Patrick McHardy authored
      
      
      Add a reject module for NFPROTO_INET. It does nothing but dispatch
      to the AF-specific modules based on the hook family.
      Signed-off-by: default avatarPatrick McHardy <kaber@trash.net>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      05513e9e
    • Patrick McHardy's avatar
      netfilter: nft_reject: split up reject module into IPv4 and IPv6 specifc parts · cc4723ca
      Patrick McHardy authored
      
      
      Currently the nft_reject module depends on symbols from ipv6. This is
      wrong since no generic module should force IPv6 support to be loaded.
      Split up the module into AF-specific and a generic part.
      Signed-off-by: default avatarPatrick McHardy <kaber@trash.net>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      cc4723ca
  10. 05 Feb, 2014 5 commits
    • Patrick McHardy's avatar
      netfilter: nf_tables: add AF specific expression support · 64d46806
      Patrick McHardy authored
      
      
      For the reject module, we need to add AF-specific implementations to
      get rid of incorrect module dependencies. Try to load an AF-specific
      module first and fall back to generic modules.
      Signed-off-by: default avatarPatrick McHardy <kaber@trash.net>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      64d46806
    • Linus Torvalds's avatar
      execve: use 'struct filename *' for executable name passing · c4ad8f98
      Linus Torvalds authored
      
      
      This changes 'do_execve()' to get the executable name as a 'struct
      filename', and to free it when it is done.  This is what the normal
      users want, and it simplifies and streamlines their error handling.
      
      The controlled lifetime of the executable name also fixes a
      use-after-free problem with the trace_sched_process_exec tracepoint: the
      lifetime of the passed-in string for kernel users was not at all
      obvious, and the user-mode helper code used UMH_WAIT_EXEC to serialize
      the pathname allocation lifetime with the execve() having finished,
      which in turn meant that the trace point that happened after
      mm_release() of the old process VM ended up using already free'd memory.
      
      To solve the kernel string lifetime issue, this simply introduces
      "getname_kernel()" that works like the normal user-space getname()
      function, except with the source coming from kernel memory.
      
      As Oleg points out, this also means that we could drop the tcomm[] array
      from 'struct linux_binprm', since the pathname lifetime now covers
      setup_new_exec().  That would be a separate cleanup.
      Reported-by: default avatarIgor Zhbanov <i.zhbanov@samsung.com>
      Tested-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      c4ad8f98
    • Pablo Neira Ayuso's avatar
      netfilter: nf_conntrack: don't release a conntrack with non-zero refcnt · e53376be
      Pablo Neira Ayuso authored
      
      
      With this patch, the conntrack refcount is initially set to zero and
      it is bumped once it is added to any of the list, so we fulfill
      Eric's golden rule which is that all released objects always have a
      refcount that equals zero.
      
      Andrey Vagin reports that nf_conntrack_free can't be called for a
      conntrack with non-zero ref-counter, because it can race with
      nf_conntrack_find_get().
      
      A conntrack slab is created with SLAB_DESTROY_BY_RCU. Non-zero
      ref-counter says that this conntrack is used. So when we release
      a conntrack with non-zero counter, we break this assumption.
      
      CPU1                                    CPU2
      ____nf_conntrack_find()
                                              nf_ct_put()
                                               destroy_conntrack()
                                              ...
                                              init_conntrack
                                               __nf_conntrack_alloc (set use = 1)
      atomic_inc_not_zero(&ct->use) (use = 2)
                                               if (!l4proto->new(ct, skb, dataoff, timeouts))
                                                nf_conntrack_free(ct); (use = 2 !!!)
                                              ...
                                              __nf_conntrack_alloc (set use = 1)
       if (!nf_ct_key_equal(h, tuple, zone))
        nf_ct_put(ct); (use = 0)
         destroy_conntrack()
                                              /* continue to work with CT */
      
      After applying the path "[PATCH] netfilter: nf_conntrack: fix RCU
      race in nf_conntrack_find_get" another bug was triggered in
      destroy_conntrack():
      
      <4>[67096.759334] ------------[ cut here ]------------
      <2>[67096.759353] kernel BUG at net/netfilter/nf_conntrack_core.c:211!
      ...
      <4>[67096.759837] Pid: 498649, comm: atdd veid: 666 Tainted: G         C ---------------    2.6.32-042stab084.18 #1 042stab084_18 /DQ45CB
      <4>[67096.759932] RIP: 0010:[<ffffffffa03d99ac>]  [<ffffffffa03d99ac>] destroy_conntrack+0x15c/0x190 [nf_conntrack]
      <4>[67096.760255] Call Trace:
      <4>[67096.760255]  [<ffffffff814844a7>] nf_conntrack_destroy+0x17/0x30
      <4>[67096.760255]  [<ffffffffa03d9bb5>] nf_conntrack_find_get+0x85/0x130 [nf_conntrack]
      <4>[67096.760255]  [<ffffffffa03d9fb2>] nf_conntrack_in+0x352/0xb60 [nf_conntrack]
      <4>[67096.760255]  [<ffffffffa048c771>] ipv4_conntrack_local+0x51/0x60 [nf_conntrack_ipv4]
      <4>[67096.760255]  [<ffffffff81484419>] nf_iterate+0x69/0xb0
      <4>[67096.760255]  [<ffffffff814b5b00>] ? dst_output+0x0/0x20
      <4>[67096.760255]  [<ffffffff814845d4>] nf_hook_slow+0x74/0x110
      <4>[67096.760255]  [<ffffffff814b5b00>] ? dst_output+0x0/0x20
      <4>[67096.760255]  [<ffffffff814b66d5>] raw_sendmsg+0x775/0x910
      <4>[67096.760255]  [<ffffffff8104c5a8>] ? flush_tlb_others_ipi+0x128/0x130
      <4>[67096.760255]  [<ffffffff8100bc4e>] ? apic_timer_interrupt+0xe/0x20
      <4>[67096.760255]  [<ffffffff8100bc4e>] ? apic_timer_interrupt+0xe/0x20
      <4>[67096.760255]  [<ffffffff814c136a>] inet_sendmsg+0x4a/0xb0
      <4>[67096.760255]  [<ffffffff81444e93>] ? sock_sendmsg+0x13/0x140
      <4>[67096.760255]  [<ffffffff81444f97>] sock_sendmsg+0x117/0x140
      <4>[67096.760255]  [<ffffffff8102e299>] ? native_smp_send_reschedule+0x49/0x60
      <4>[67096.760255]  [<ffffffff81519beb>] ? _spin_unlock_bh+0x1b/0x20
      <4>[67096.760255]  [<ffffffff8109d930>] ? autoremove_wake_function+0x0/0x40
      <4>[67096.760255]  [<ffffffff814960f0>] ? do_ip_setsockopt+0x90/0xd80
      <4>[67096.760255]  [<ffffffff8100bc4e>] ? apic_timer_interrupt+0xe/0x20
      <4>[67096.760255]  [<ffffffff8100bc4e>] ? apic_timer_interrupt+0xe/0x20
      <4>[67096.760255]  [<ffffffff814457c9>] sys_sendto+0x139/0x190
      <4>[67096.760255]  [<ffffffff810efa77>] ? audit_syscall_entry+0x1d7/0x200
      <4>[67096.760255]  [<ffffffff810ef7c5>] ? __audit_syscall_exit+0x265/0x290
      <4>[67096.760255]  [<ffffffff81474daf>] compat_sys_socketcall+0x13f/0x210
      <4>[67096.760255]  [<ffffffff8104dea3>] ia32_sysret+0x0/0x5
      
      I have reused the original title for the RFC patch that Andrey posted and
      most of the original patch description.
      
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Andrew Vagin <avagin@parallels.com>
      Cc: Florian Westphal <fw@strlen.de>
      Reported-by: default avatarAndrew Vagin <avagin@parallels.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Acked-by: default avatarAndrew Vagin <avagin@parallels.com>
      e53376be
    • Geert Uytterhoeven's avatar
      of/device: Nullify match table in of_match_device() for CONFIG_OF=n · 1db73ae3
      Geert Uytterhoeven authored
      
      
      If the of_device_id table inside a device driver is protected by #ifdef
      CONFIG_OF, the driver still has to provide a dummy declaration of the
      table, or wrap it inside of_match_ptr(), when calling of_match_device()
      in the CONFIG_OF=n case, else the driver fails to compile with e.g.
      
      drivers/spi/spi-rspi.c: In function 'rspi_probe':
      drivers/spi/spi-rspi.c:1203:26: error: 'rspi_of_match' undeclared (first use in this function)
      drivers/spi/spi-rspi.c:1203:26: note: each undeclared identifier is reported only once for each function it appears in
      
      Make of_match_device() nullify the table pointer if CONFIG_OF=n to fix
      this.
      Reported-by: default avatarYoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
      Signed-off-by: default avatarGeert Uytterhoeven <geert+renesas@linux-m68k.org>
      Signed-off-by: default avatarRob Herring <robh@kernel.org>
      1db73ae3
    • Rob Herring's avatar
      of: restructure for_each macros to fix compile warnings · 662372e4
      Rob Herring authored
      Commit 00b2c76a
      
       "include/linux/of.h: make for_each_child_of_node()
      reference its args when CONFIG_OF=n" fixed warnings for unused
      variables, but introduced variable "used uninitialized" warnings.
      Simply initializing the variables would result in "set but not used"
      warnings with W=1.
      
      Fix both types of warnings by making all the for_each macros
      unconditional and rely on the dummy static inline functions to
      initialize and reference any variables.
      Reported-by: default avatarGeert Uytterhoeven <geert@linux-m68k.org>
      Cc: David Howells <dhowells@redhat.com>
      Signed-off-by: default avatarRob Herring <robh@kernel.org>
      Acked-by: default avatarGrant Likely <grant.likely@linaro.org>
      662372e4