1. 23 Apr, 2014 13 commits
    • Tejun Heo's avatar
      cgroup: implement dynamic subtree controller enable/disable on the default hierarchy · f8f22e53
      Tejun Heo authored
      
      
      cgroup is switching away from multiple hierarchies and will use one
      unified default hierarchy where controllers can be dynamically enabled
      and disabled per subtree.  The default hierarchy will serve as the
      unified hierarchy to which all controllers are attached and a css on
      the default hierarchy would need to also serve the tasks of descendant
      cgroups which don't have the controller enabled - ie. the tree may be
      collapsed from leaf towards root when viewed from specific
      controllers.  This has been implemented through effective css in the
      previous patches.
      
      This patch finally implements dynamic subtree controller
      enable/disable on the default hierarchy via a new knob -
      "cgroup.subtree_control" which controls which controllers are enabled
      on the child cgroups.  Let's assume a hierarchy like the following.
      
        root - A - B - C
                     \ D
      
      root's "cgroup.subtree_control" determines which controllers are
      enabled on A.  A's on B.  B's on C and D.  This coincides with the
      fact that controllers on the immediate sub-level are used to
      distribute the resources of the parent.  In fact, it's natural to
      assume that resource control knobs of a child belong to its parent.
      Enabling a controller in "cgroup.subtree_control" declares that
      distribution of the respective resources of the cgroup will be
      controlled.  Note that this means that controller enable states are
      shared among siblings.
      
      The default hierarchy has an extra restriction - only cgroups which
      don't contain any task may have controllers enabled in
      "cgroup.subtree_control".  Combined with the other properties of the
      default hierarchy, this guarantees that, from the view point of
      controllers, tasks are only on the leaf cgroups.  In other words, only
      leaf csses may contain tasks.  This rules out situations where child
      cgroups compete against internal tasks of the parent, which is a
      competition between two different types of entities without any clear
      way to determine resource distribution between the two.  Different
      controllers handle it differently and all the implemented behaviors
      are ambiguous, ad-hoc, cumbersome and/or just wrong.  Having this
      structural constraints imposed from cgroup core removes the burden
      from controller implementations and enables showing one consistent
      behavior across all controllers.
      
      When a controller is enabled or disabled, css associations for the
      controller in the subtrees of each child should be updated.  After
      enabling, the whole subtree of a child should point to the new css of
      the child.  After disabling, the whole subtree of a child should point
      to the cgroup's css.  This is implemented by first updating cgroup
      states such that cgroup_e_css() result points to the appropriate css
      and then invoking cgroup_update_dfl_csses() which migrates all tasks
      in the affected subtrees to the self cgroup on the default hierarchy.
      
      * When read, "cgroup.subtree_control" lists all the currently enabled
        controllers on the children of the cgroup.
      
      * White-space separated list of controller names prefixed with either
        '+' or '-' can be written to "cgroup.subtree_control".  The ones
        prefixed with '+' are enabled on the controller and '-' disabled.
      
      * A controller can be enabled iff the parent's
        "cgroup.subtree_control" enables it and disabled iff no child's
        "cgroup.subtree_control" has it enabled.
      
      * If a cgroup has tasks, no controller can be enabled via
        "cgroup.subtree_control".  Likewise, if "cgroup.subtree_control" has
        some controllers enabled, tasks can't be migrated into the cgroup.
      
      * All controllers which aren't bound on other hierarchies are
        automatically associated with the root cgroup of the default
        hierarchy.  All the controllers which are bound to the default
        hierarchy are listed in the read-only file "cgroup.controllers" in
        the root directory.
      
      * "cgroup.controllers" in all non-root cgroups is read-only file whose
        content is equal to that of "cgroup.subtree_control" of the parent.
        This indicates which controllers can be used in the cgroup's
        "cgroup.subtree_control".
      
      This is still experimental and there are some holes, one of which is
      that ->can_attach() failure during cgroup_update_dfl_csses() may leave
      the cgroups in an undefined state.  The issues will be addressed by
      future patches.
      
      v2: Non-root cgroups now also have "cgroup.controllers".
      
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarLi Zefan <lizefan@huawei.com>
      f8f22e53
    • Tejun Heo's avatar
      cgroup: prepare migration path for unified hierarchy · f817de98
      Tejun Heo authored
      
      
      Unified hierarchy implementation would require re-migrating tasks onto
      the same cgroup on the default hierarchy to reflect updated effective
      csses.  Update cgroup_migrate_prepare_dst() so that it accepts NULL as
      the destination cgrp.  When NULL is specified, the destination is
      considered to be the cgroup on the default hierarchy associated with
      each css_set.
      
      After this change, the identity check in cgroup_migrate_add_src()
      isn't sufficient for noop detection as the associated csses may change
      without any cgroup association changing.  The only way to tell whether
      a migration is noop or not is testing whether the source and
      destination csets are identical.  The noop check in
      cgroup_migrate_add_src() is removed and cset identity test is added to
      cgroup_migreate_prepare_dst().  If it's detected that source and
      destination csets are identical, the cset is removed removed from
      @preloaded_csets and all the migration nodes are cleared which makes
      cgroup_migrate() ignore the cset.
      
      Also, make the function append the destination css_sets to
      @preloaded_list so that destination css_sets always come after source
      css_sets.
      
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarLi Zefan <lizefan@huawei.com>
      f817de98
    • Tejun Heo's avatar
      cgroup: update subsystem rebind restrictions · 7fd8c565
      Tejun Heo authored
      
      
      Because the default root couldn't have any non-root csses attached to
      it, rebinding away from it was always allowed; however, the default
      hierarchy will soon host the unified hierarchy and have non-root csses
      so the rebind restrictions need to be updated accordingly.
      
      Instead of special casing rebinding from the default hierarchy and
      then checking whether the source hierarchy has children cgroups, which
      implies non-root csses for !dfl hierarchies, simply check whether the
      source hierarchy has non-root csses for the subsystem using
      css_next_child().
      
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarLi Zefan <lizefan@huawei.com>
      7fd8c565
    • Tejun Heo's avatar
      cgroup: add css_set->dfl_cgrp · 6803c006
      Tejun Heo authored
      
      
      To implement the unified hierarchy behavior, we'll need to be able to
      determine the associated cgroup on the default hierarchy from css_set.
      Let's add css_set->dfl_cgrp so that it can be accessed conveniently
      and efficiently.
      
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarLi Zefan <lizefan@huawei.com>
      6803c006
    • Tejun Heo's avatar
      cgroup: allow cgroup creation and suppress automatic css creation in the unified hierarchy · bd53d617
      Tejun Heo authored
      
      
      Now that effective css handling has been added and iterators updated
      accordingly, it's safe to allow cgroup creation in the default
      hierarchy.  Unblock cgroup creation in the default hierarchy.
      
      As the default hierarchy will implement explicit enabling and
      disabling of controllers on each cgroup, suppress automatic css
      enabling on cgroup creation.
      
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarLi Zefan <lizefan@huawei.com>
      bd53d617
    • Tejun Heo's avatar
      cgroup: cgroup->subsys[] should be cleared after the css is offlined · e3297803
      Tejun Heo authored
      
      
      After a css finishes offlining, offline_css() mistakenly performs
      RCU_INIT_POINTER(css->cgroup->subsys[ss->id], css) which just sets the
      cgroup->subsys[] pointer to the current value.  The intention was to
      clear it after offline is complete, not reassign the same value.
      
      Update it to assign NULL instead of the current value.  This makes
      cgroup_css() to return NULL once offline is complete.  All the
      existing users of the function either can handle NULL return already
      or guarantee that the css doesn't get offlined.
      
      While this is a bugfix, as css lifetime is currently tied to the
      cgroup it belongs to, this bug doesn't cause any actual problems.
      
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarLi Zefan <lizefan@huawei.com>
      e3297803
    • Tejun Heo's avatar
      cgroup: teach css_task_iter about effective csses · 3ebb2b6e
      Tejun Heo authored
      
      
      Currently, css_task_iter iterates tasks associated with a css by
      visiting each css_set associated with the owning cgroup and walking
      tasks of each of them.  This works fine for !unified hierarchies as
      each cgroup has its own css for each associated subsystem on the
      hierarchy; however, on the planned unified hierarchy, a cgroup may not
      have csses associated and its tasks would be considered associated
      with the matching css of the nearest ancestor which has the subsystem
      enabled.
      
      This means that on the default unified hierarchy, just walking all
      tasks associated with a cgroup isn't enough to walk all tasks which
      are associated with the specified css.  If any of its children doesn't
      have the matching css enabled, task iteration should also include all
      tasks from the subtree.  We already added cgroup->e_csets[] to list
      all css_sets effectively associated with a given css and walk css_sets
      on that list instead to achieve such iteration.
      
      This patch updates css_task_iter iteration such that it walks css_sets
      on cgroup->e_csets[] instead of cgroup->cset_links if iteration is
      requested on an non-dummy css.  Thanks to the previous iteration
      update, this change can be achieved with the addition of
      css_task_iter->ss and minimal updates to css_advance_task_iter() and
      css_task_iter_start().
      
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarLi Zefan <lizefan@huawei.com>
      3ebb2b6e
    • Tejun Heo's avatar
      cgroup: reorganize css_task_iter · 0f0a2b4f
      Tejun Heo authored
      
      
      This patch reorganizes css_task_iter so that adding effective css
      support is easier.
      
      * s/->cset_link/->cset_pos/ and s/->task/->task_pos/ for consistency
      
      * ->origin_css is used to determine whether the iteration reached the
        last css_set.  Replace it with explicit ->cset_head so that
        css_advance_task_iter() doesn't have to know the termination
        condition directly.
      
      * css_task_iter_next() currently assumes that it's walking list of
        cgrp_cset_link and reaches into the current cset through the current
        link to determine the termination conditions for task walking.  As
        this won't always be true for effective css walking, add
        ->tasks_head and ->mg_tasks_head and use them to control task
        walking so that css_task_iter_next() doesn't have to know how
        css_sets are being walked.
      
      This patch doesn't make any behavior changes.  The iteration logic
      stays unchanged after the patch.
      
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarLi Zefan <lizefan@huawei.com>
      0f0a2b4f
    • Tejun Heo's avatar
      cgroup: make css_next_child() skip missing csses · 3b281afb
      Tejun Heo authored
      
      
      css_next_child() walks the children of the specified css.  It does
      this by finding the next cgroup and then returning the requested css.
      On the default unified hierarchy, a cgroup may not have a css
      associated with it even if the hierarchy has the subsystem enabled.
      This patch updates css_next_child() so that it skips children without
      the requested css associated.
      
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarLi Zefan <lizefan@huawei.com>
      3b281afb
    • Tejun Heo's avatar
      cgroup: implement cgroup->e_csets[] · 2d8f243a
      Tejun Heo authored
      
      
      On the default unified hierarchy, a cgroup may be associated with
      csses of its ancestors, which means that a css of a given cgroup may
      be associated with css_sets of descendant cgroups.  This means that we
      can't walk all tasks associated with a css by iterating the css_sets
      associated with the cgroup as there are css_sets which are pointing to
      the css but linked on the descendants.
      
      This patch adds per-subsystem list heads cgroup->e_csets[].  Any
      css_set which is pointing to a css is linked to
      css->cgroup->e_csets[$SUBSYS_ID] through
      css_set->e_cset_node[$SUBSYS_ID].  The lists are protected by
      css_set_rwsem and will allow us to walk all css_sets associated with a
      given css so that we can find out all associated tasks.
      
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarLi Zefan <lizefan@huawei.com>
      2d8f243a
    • Tejun Heo's avatar
      cgroup: introduce effective cgroup_subsys_state · aec3dfcb
      Tejun Heo authored
      
      
      In the planned default unified hierarchy, controllers may get
      dynamically attached to and detached from a cgroup and a cgroup may
      not have csses for all the controllers associated with the hierarchy.
      
      When a cgroup doesn't have its own css for a given controller, the css
      of the nearest ancestor with the controller enabled will be used,
      which is called the effective css.  This patch introduces
      cgroup_e_css() and for_each_e_css() to access the effective csses and
      convert compare_css_sets(), find_existing_css_set() and
      cgroup_migrate() to use the effective csses so that they can handle
      cgroups with partial csses correctly.
      
      This means that for two css_sets to be considered identical, they
      should have both matching csses and cgroups.  compare_css_sets()
      already compares both, not for correctness but for optimization.  As
      this now becomes a matter of correctness, update the comments
      accordingly.
      
      For all !default hierarchies, cgroup_e_css() always equals
      cgroup_css(), so this patch doesn't change behavior.
      
      While at it, fix incorrect locking comment for for_each_css().
      
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarLi Zefan <lizefan@huawei.com>
      aec3dfcb
    • Tejun Heo's avatar
      cgroup: update cgroup->subsys_mask to ->child_subsys_mask and restore cgroup_root->subsys_mask · f392e51c
      Tejun Heo authored
      94419627
      
       ("cgroup: move ->subsys_mask from cgroupfs_root to
      cgroup") moved ->subsys_mask from cgroup_root to cgroup to prepare for
      the unified hierarhcy; however, it turns out that carrying the
      subsys_mask of the children in the parent, instead of itself, is a lot
      more natural.  This patch restores cgroup_root->subsys_mask and morphs
      cgroup->subsys_mask into cgroup->child_subsys_mask.
      
      * Uses of root->cgrp.subsys_mask are restored to root->subsys_mask.
      
      * Remove automatic setting and clearing of cgrp->subsys_mask and
        instead just inherit ->child_subsys_mask from the parent during
        cgroup creation.  Note that this doesn't affect any current
        behaviors.
      
      * Undo __kill_css() separation.
      
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarLi Zefan <lizefan@huawei.com>
      f392e51c
    • Tejun Heo's avatar
      cgroup: cgroup_apply_cftypes() shouldn't skip the default hierarhcy · ea8fd3b4
      Tejun Heo authored
      
      
      cgroup_apply_cftypes() skip creating or removing files if the
      subsystem is attached to the default hierarchy, which led to missing
      files in the root of the default hierarchy.
      
      Skipping made sense when the default hierarchy was dummy; however, now
      that the default hierarchy is full functional and planned to be used
      as the unified hierarchy, it shouldn't be skipped over.
      
      Reported-by: default avatarLi Zefan <lizefan@huawei.com>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarLi Zefan <lizefan@huawei.com>
      ea8fd3b4
  2. 17 Apr, 2014 1 commit
    • Li Zefan's avatar
      cgroup: fix the retry path of cgroup_mount() · e37a06f1
      Li Zefan authored
      
      
      If we hit the retry path, we'll call parse_cgroupfs_options() again,
      but the string we pass to it has been modified by the previous call
      to this function.
      
      This bug can be observed by:
      
        # mount -t cgroup -o name=foo,cpuset xxx /mnt && umount /mnt && \
          mount -t cgroup -o name=foo,cpuset xxx /mnt
        mount: wrong fs type, bad option, bad superblock on xxx,
               missing codepage or helper program, or other error
        ...
      
      The second mount passed "name=foo,cpuset" to the parser, and then it
      hit the retry path and call the parser again, but this time the string
      passed to the parser is "name=foo".
      
      To fix this, we avoid calling parse_cgroupfs_options() again in this
      case.
      
      Signed-off-by: default avatarLi Zefan <lizefan@huawei.com>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      e37a06f1
  3. 13 Apr, 2014 12 commits
    • Linus Torvalds's avatar
      Linux 3.15-rc1 · c9eaa447
      Linus Torvalds authored
      c9eaa447
    • Geert Uytterhoeven's avatar
      mm: Initialize error in shmem_file_aio_read() · f7c1d074
      Geert Uytterhoeven authored
      Some versions of gcc even warn about it:
      
        mm/shmem.c: In function ‘shmem_file_aio_read’:
        mm/shmem.c:1414: warning: ‘error’ may be used uninitialized in this function
      
      If the loop is aborted during the first iteration by one of the two
      first break statements, error will be uninitialized.
      
      Introduced by commit 6e58e79d
      
       ("introduce copy_page_to_iter, kill
      loop over iovec in generic_file_aio_read()").
      
      Signed-off-by: default avatarGeert Uytterhoeven <geert@linux-m68k.org>
      Acked-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      f7c1d074
    • Geert Uytterhoeven's avatar
      cifs: Use min_t() when comparing "size_t" and "unsigned long" · e686bd8d
      Geert Uytterhoeven authored
      On 32 bit, size_t is "unsigned int", not "unsigned long", causing the
      following warning when comparing with PAGE_SIZE, which is always "unsigned
      long":
      
        fs/cifs/file.c: In function ‘cifs_readdata_to_iov’:
        fs/cifs/file.c:2757: warning: comparison of distinct pointer types lacks a cast
      
      Introduced by commit 7f25bba8
      
       ("cifs_iovec_read: keep iov_iter
      between the calls of cifs_readdata_to_iov()"), which changed the
      signedness of "remaining" and the code from min_t() to min().
      
      Signed-off-by: default avatarGeert Uytterhoeven <geert@linux-m68k.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      e686bd8d
    • Linus Torvalds's avatar
      Merge branch 'slab/next' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/linux · bf3a3407
      Linus Torvalds authored
      Pull slab changes from Pekka Enberg:
       "The biggest change is byte-sized freelist indices which reduces slab
        freelist memory usage:
      
          https://lkml.org/lkml/2013/12/2/64"
      
      * 'slab/next' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/linux:
        mm: slab/slub: use page->list consistently instead of page->lru
        mm/slab.c: cleanup outdated comments and unify variables naming
        slab: fix wrongly used macro
        slub: fix high order page allocation problem with __GFP_NOFAIL
        slab: Make allocations with GFP_ZERO slightly more efficient
        slab: make more slab management structure off the slab
        slab: introduce byte sized index for the freelist of a slab
        slab: restrict the number of objects in a slab
        slab: introduce helper functions to get/set free object
        slab: factor out calculate nr objects in cache_estimate
      bf3a3407
    • Linus Torvalds's avatar
      Merge branch 'misc' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild · 321d03c8
      Linus Torvalds authored
      Pull misc kbuild changes from Michal Marek:
       "Here is the non-critical part of kbuild:
         - One bogus coccinelle check removed, one check fixed not to suggest
           the obsolete PTR_RET macro
         - scripts/tags.sh does not index the generated *.mod.c files
         - new objdiff tool to list differences between two versions of an
           object file
         - A fix for scripts/bootgraph.pl"
      
      * 'misc' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild:
        scripts/coccinelle: Use PTR_ERR_OR_ZERO
        scripts/bootgraph.pl: Add graphic header
        scripts: objdiff: detect object code changes between two commits
        Coccicheck: Remove memcpy to struct assignment test
        scripts/tags.sh: Ignore *.mod.c
      321d03c8
    • Mikulas Patocka's avatar
      sym53c8xx_2: Set DID_REQUEUE return code when aborting squeue · fd1232b2
      Mikulas Patocka authored
      
      
      This patch fixes I/O errors with the sym53c8xx_2 driver when the disk
      returns QUEUE FULL status.
      
      When the controller encounters an error (including QUEUE FULL or BUSY
      status), it aborts all not yet submitted requests in the function
      sym_dequeue_from_squeue.
      
      This function aborts them with DID_SOFT_ERROR.
      
      If the disk has full tag queue, the request that caused the overflow is
      aborted with QUEUE FULL status (and the scsi midlayer properly retries
      it until it is accepted by the disk), but the sym53c8xx_2 driver aborts
      the following requests with DID_SOFT_ERROR --- for them, the midlayer
      does just a few retries and then signals the error up to sd.
      
      The result is that disk returning QUEUE FULL causes request failures.
      
      The error was reproduced on 53c895 with COMPAQ BD03685A24 disk
      (rebranded ST336607LC) with command queue 48 or 64 tags.  The disk has
      64 tags, but under some access patterns it return QUEUE FULL when there
      are less than 64 pending tags.  The SCSI specification allows returning
      QUEUE FULL anytime and it is up to the host to retry.
      
      Signed-off-by: default avatarMikulas Patocka <mpatocka@redhat.com>
      Cc: Matthew Wilcox <matthew@wil.cx>
      Cc: James Bottomley <JBottomley@Parallels.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      fd1232b2
    • Paul Mackerras's avatar
      powerpc: Don't try to set LPCR unless we're in hypervisor mode · 18aa0da3
      Paul Mackerras authored
      Commit 8f619b54
      
       ("powerpc/ppc64: Do not turn AIL (reloc-on
      interrupts) too early") added code to set the AIL bit in the LPCR
      without checking whether the kernel is running in hypervisor mode.  The
      result is that when the kernel is running as a guest (i.e., under
      PowerKVM or PowerVM), the processor takes a privileged instruction
      interrupt at that point, causing a panic.  The visible result is that
      the kernel hangs after printing "returning from prom_init".
      
      This fixes it by checking for hypervisor mode being available before
      setting LPCR.  If we are not in hypervisor mode, we enable relocation-on
      interrupts later in pSeries_setup_arch using the H_SET_MODE hcall.
      
      Signed-off-by: default avatarPaul Mackerras <paulus@samba.org>
      Acked-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      18aa0da3
    • Davidlohr Bueso's avatar
      futex: update documentation for ordering guarantees · d7e8af1a
      Davidlohr Bueso authored
      Commits 11d4616b ("futex: revert back to the explicit waiter
      counting code") and 69cd9eba
      
       ("futex: avoid race between requeue and
      wake") changed some of the finer details of how we think about futexes.
      One was a late fix and the other a consequence of overlooking the whole
      requeuing logic.
      
      The first change caused our documentation to be incorrect, and the
      second made us aware that we need to explicitly add more details to it.
      
      Signed-off-by: default avatarDavidlohr Bueso <davidlohr@hp.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      d7e8af1a
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 454fd351
      Linus Torvalds authored
      Pull yet more networking updates from David Miller:
      
       1) Various fixes to the new Redpine Signals wireless driver, from
          Fariya Fatima.
      
       2) L2TP PPP connect code takes PMTU from the wrong socket, fix from
          Dmitry Petukhov.
      
       3) UFO and TSO packets differ in whether they include the protocol
          header in gso_size, account for that in skb_gso_transport_seglen().
         From Florian Westphal.
      
       4) If VLAN untagging fails, we double free the SKB in the bridging
          output path.  From Toshiaki Makita.
      
       5) Several call sites of sk->sk_data_ready() were referencing an SKB
          just added to the socket receive queue in order to calculate the
          second argument via skb->len.  This is dangerous because the moment
          the skb is added to the receive queue it can be consumed in another
          context and freed up.
      
          It turns out also that none of the sk->sk_data_ready()
          implementations even care about this second argument.
      
          So just kill it off and thus fix all these use-after-free bugs as a
          side effect.
      
       6) Fix inverted test in tcp_v6_send_response(), from Lorenzo Colitti.
      
       7) pktgen needs to do locking properly for LLTX devices, from Daniel
          Borkmann.
      
       8) xen-netfront driver initializes TX array entries in RX loop :-) From
          Vincenzo Maffione.
      
       9) After refactoring, some tunnel drivers allow a tunnel to be
          configured on top itself.  Fix from Nicolas Dichtel.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (46 commits)
        vti: don't allow to add the same tunnel twice
        gre: don't allow to add the same tunnel twice
        drivers: net: xen-netfront: fix array initialization bug
        pktgen: be friendly to LLTX devices
        r8152: check RTL8152_UNPLUG
        net: sun4i-emac: add promiscuous support
        net/apne: replace IS_ERR and PTR_ERR with PTR_ERR_OR_ZERO
        net: ipv6: Fix oif in TCP SYN+ACK route lookup.
        drivers: net: cpsw: enable interrupts after napi enable and clearing previous interrupts
        drivers: net: cpsw: discard all packets received when interface is down
        net: Fix use after free by removing length arg from sk_data_ready callbacks.
        Drivers: net: hyperv: Address UDP checksum issues
        Drivers: net: hyperv: Negotiate suitable ndis version for offload support
        Drivers: net: hyperv: Allocate memory for all possible per-pecket information
        bridge: Fix double free and memory leak around br_allowed_ingress
        bonding: Remove debug_fs files when module init fails
        i40evf: program RSS LUT correctly
        i40evf: remove open-coded skb_cow_head
        ixgb: remove open-coded skb_cow_head
        igbvf: remove open-coded skb_cow_head
        ...
      454fd351
    • Linus Torvalds's avatar
      Merge tag 'blackfin-for-linus' of... · fd18f00d
      Linus Torvalds authored
      Merge tag 'blackfin-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/realmz6/blackfin-linux
      
      Pull blackfin updates from Steven Miao:
       "Code cleanup, some previously ignored patches, and bug fixes"
      
      * tag 'blackfin-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/realmz6/blackfin-linux:
        blackfin: cleanup board files
        bf609: clock: drop unused clock bit set/clear functions
        Blackfin: bf537: rename "CONFIG_ADT75"
        Blackfin: bf537: rename "CONFIG_AD7314"
        Blackfin: bf537: rename ad2s120x ->ad2s1200
        blackfin: bf537: fix typo "CONFIG_SND_SOC_ADV80X_MODULE"
        blackfin: dma: current count mmr is read only
        bfin_crc: Move architecture independant crc header file out of the blackfin folder.
        bf54x: drop unuesd HOST status,control,timeout registers bit define macros
        blackfin: portmux: cleanup head file
        Blackfin: remove "config IP_CHECKSUM_L1"
        blackfin: Remove GENERIC_GPIO config option again
        blackfin:Use generic /proc/interrupts implementation
        blackfin: bf60x: fix typo "CONFIG_PM_BFIN_WAKE_PA15_POL"
      fd18f00d
    • Linus Torvalds's avatar
      Merge tag 'remoteproc-3.15-cleanups' of... · de0c9cf9
      Linus Torvalds authored
      Merge tag 'remoteproc-3.15-cleanups' of git://git.kernel.org/pub/scm/linux/kernel/git/ohad/remoteproc
      
      Pull remoteproc cleanups from Ohad Ben-Cohen:
       "Several remoteproc cleanup patches coming from Jingoo Han, Julia
        Lawall and Uwe Kleine-König"
      
      * tag 'remoteproc-3.15-cleanups' of git://git.kernel.org/pub/scm/linux/kernel/git/ohad/remoteproc:
        remoteproc/ste_modem: staticize local symbols
        remoteproc/davinci: simplify use of devm_ioremap_resource
        remoteproc/davinci: drop needless devm_clk_put
      de0c9cf9
    • Linus Torvalds's avatar
      Merge tag 'llvmlinux-for-v3.15' of git://git.linuxfoundation.org/llvmlinux/kernel · 09c9b61d
      Linus Torvalds authored
      Pull llvm patches from Behan Webster:
       "These are some initial updates to support compiling the kernel with
        clang.
      
        These patches have been through the proper reviews to the best of my
        ability, and have been soaking in linux-next for a few weeks.  These
        patches by themselves still do not completely allow clang to be used
        with the kernel code, but lay the foundation for other patches which
        are still under review.
      
        Several other of the LLVMLinux patches have been already added via
        maintainer trees"
      
      * tag 'llvmlinux-for-v3.15' of git://git.linuxfoundation.org/llvmlinux/kernel:
        x86: LLVMLinux: Fix "incomplete type const struct x86cpu_device_id"
        x86 kbuild: LLVMLinux: More cc-options added for clang
        x86, acpi: LLVMLinux: Remove nested functions from Thinkpad ACPI
        LLVMLinux: Add support for clang to compiler.h and new compiler-clang.h
        LLVMLinux: Remove warning about returning an uninitialized variable
        kbuild: LLVMLinux: Fix LINUX_COMPILER definition script for compilation with clang
        Documentation: LLVMLinux: Update Documentation/dontdiff
        kbuild: LLVMLinux: Adapt warnings for compilation with clang
        kbuild: LLVMLinux: Add Kbuild support for building kernel with Clang
      09c9b61d
  4. 12 Apr, 2014 14 commits
    • Linus Torvalds's avatar
      Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending · 141eaccd
      Linus Torvalds authored
      Pull SCSI target updates from Nicholas Bellinger:
       "Here are the target pending updates for v3.15-rc1.  Apologies in
        advance for waiting until the second to last day of the merge window
        to send these out.
      
        The highlights this round include:
      
         - iser-target support for T10 PI (DIF) offloads (Sagi + Or)
         - Fix Task Aborted Status (TAS) handling in target-core (Alex Leung)
         - Pass in transport supported PI at session initialization (Sagi + MKP + nab)
         - Add WRITE_INSERT + READ_STRIP T10 PI support in target-core (nab + Sagi)
         - Fix iscsi-target ERL=2 ASYNC_EVENT connection pointer bug (nab)
         - Fix tcm_fc use-after-free of ft_tpg (Andy Grover)
         - Use correct ib_sg_dma primitives in ib_isert (Mike Marciniszyn)
      
        Also, note the virtio-scsi + vhost-scsi changes to expose T10 PI
        metadata into KVM guest have been left-out for now, as there where a
        few comments from MST + Paolo that where not able to be addressed in
        time for v3.15.  Please expect this feature for v3.16-rc1"
      
      * 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending: (43 commits)
        ib_srpt: Use correct ib_sg_dma primitives
        target/tcm_fc: Rename ft_tport_create to ft_tport_get
        target/tcm_fc: Rename ft_{add,del}_lport to {add,del}_wwn
        target/tcm_fc: Rename structs and list members for clarity
        target/tcm_fc: Limit to 1 TPG per wwn
        target/tcm_fc: Don't export ft_lport_list
        target/tcm_fc: Fix use-after-free of ft_tpg
        target: Add check to prevent Abort Task from aborting itself
        target: Enable READ_STRIP emulation in target_complete_ok_work
        target/sbc: Add sbc_dif_read_strip software emulation
        target: Enable WRITE_INSERT emulation in target_execute_cmd
        target/sbc: Add sbc_dif_generate software emulation
        target/sbc: Only expose PI read_cap16 bits when supported by fabric
        target/spc: Only expose PI mode page bits when supported by fabric
        target/spc: Only expose PI inquiry bits when supported by fabric
        target: Pass in transport supported PI at session initialization
        target/iblock: Fix double bioset_integrity_free bug
        Target/sbc: Initialize COMPARE_AND_WRITE write_sg scatterlist
        target/rd: T10-Dif: RAM disk is allocating more space than required.
        iscsi-target: Fix ERL=2 ASYNC_EVENT connection pointer bug
        ...
      141eaccd
    • Linus Torvalds's avatar
      Merge branch 'v4l_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media · 93094449
      Linus Torvalds authored
      Pull media fixes from Mauro Carvalho Chehab:
       "A series of bug fix patches for v3.15-rc1.  Most are just driver
        fixes.  There are some changes at remote controller core level, fixing
        some definitions on a new API added for Kernel v3.15.
      
        It also adds the missing include at include/uapi/linux/v4l2-common.h,
        to allow its compilation on userspace, as pointed by you"
      
      * 'v4l_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: (24 commits)
        [media] gpsca: remove the risk of a division by zero
        [media] stk1160: warrant a NUL terminated string
        [media] v4l: ti-vpe: retain v4l2_buffer flags for captured buffers
        [media] v4l: ti-vpe: Set correct field parameter for output and capture buffers
        [media] v4l: ti-vpe: zero out reserved fields in try_fmt
        [media] v4l: ti-vpe: Fix initial configuration queue data
        [media] v4l: ti-vpe: Use correct bus_info name for the device in querycap
        [media] v4l: ti-vpe: report correct capabilities in querycap
        [media] v4l: ti-vpe: Allow usage of smaller images
        [media] v4l: ti-vpe: Use video_device_release_empty
        [media] v4l: ti-vpe: Make sure in job_ready that we have the needed number of dst_bufs
        [media] lgdt3305: include sleep functionality in lgdt3304_ops
        [media] drx-j: use customise option correctly
        [media] m88rs2000: fix sparse static warnings
        [media] r820t: fix size and init values
        [media] rc-core: remove generic scancode filter
        [media] rc-core: split dev->s_filter
        [media] rc-core: do not change 32bit NEC scancode format for now
        [media] rtl28xxu: remove duplicate ID 0458:707f Genius TVGo DVB-T03
        [media] xc2028: add missing break to switch
        ...
      93094449
    • Linus Torvalds's avatar
      Merge tag 'ntb-3.15' of git://github.com/jonmason/ntb · 07f5fef9
      Linus Torvalds authored
      Pull PCIe non-transparent bridge fixes and features from Jon Mason:
       "NTB driver bug fixes to address issues in list traversal, skb leak in
        ntb_netdev, a typo, and a leak of msix entries in the error path.
        Clean ups of the event handling logic, as well as a overall style
        cleanup.  Finally, the driver was converted to use the new
        pci_enable_msix_range logic (and the refactoring to go along with it)"
      
      * tag 'ntb-3.15' of git://github.com/jonmason/ntb:
        ntb: Use pci_enable_msix_range() instead of pci_enable_msix()
        ntb: Split ntb_setup_msix() into separate BWD/SNB routines
        ntb: Use pci_msix_vec_count() to obtain number of MSI-Xs
        NTB: Code Style Clean-up
        NTB: client event cleanup
        ntb: Fix leakage of ntb_device::msix_entries[] array
        NTB: Fix typo in setting one translation register
        ntb_netdev: Fix skb free issue in open
        ntb_netdev: Fix list_for_each_entry exit issue
      07f5fef9
    • Linus Torvalds's avatar
      ceph: fix pr_fmt() redefinition · 96c57ade
      Linus Torvalds authored
      
      
      The vfs merge caused a latent bug to show up:
      
         In file included from fs/ceph/super.h:4:0,
                          from fs/ceph/ioctl.c:3:
         include/linux/ceph/ceph_debug.h:4:0: warning: "pr_fmt" redefined [enabled by default]
          #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
          ^
         In file included from include/linux/kernel.h:13:0,
                          from include/linux/uio.h:12,
                          from include/linux/socket.h:7,
                          from include/uapi/linux/in.h:22,
                          from include/linux/in.h:23,
                          from fs/ceph/ioctl.c:1:
         include/linux/printk.h:214:0: note: this is the location of the previous definition
          #define pr_fmt(fmt) fmt
          ^
      
      where the reason is that <linux/ceph_debug.h> is included much too late
      for the "pr_fmt()" define.
      
      The include of <linux/ceph_debug.h> needs to be the first include in the
      file, but fs/ceph/ioctl.c had for some reason missed that, and it wasn't
      noticeable until some unrelated header file changes brought in an
      indirect earlier include of <linux/kernel.h>.
      
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      96c57ade
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs · 5166701b
      Linus Torvalds authored
      Pull vfs updates from Al Viro:
       "The first vfs pile, with deep apologies for being very late in this
        window.
      
        Assorted cleanups and fixes, plus a large preparatory part of iov_iter
        work.  There's a lot more of that, but it'll probably go into the next
        merge window - it *does* shape up nicely, removes a lot of
        boilerplate, gets rid of locking inconsistencie between aio_write and
        splice_write and I hope to get Kent's direct-io rewrite merged into
        the same queue, but some of the stuff after this point is having
        (mostly trivial) conflicts with the things already merged into
        mainline and with some I want more testing.
      
        This one passes LTP and xfstests without regressions, in addition to
        usual beating.  BTW, readahead02 in ltp syscalls testsuite has started
        giving failures since "mm/readahead.c: fix readahead failure for
        memoryless NUMA nodes and limit readahead pages" - might be a false
        positive, might be a real regression..."
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (63 commits)
        missing bits of "splice: fix racy pipe->buffers uses"
        cifs: fix the race in cifs_writev()
        ceph_sync_{,direct_}write: fix an oops on ceph_osdc_new_request() failure
        kill generic_file_buffered_write()
        ocfs2_file_aio_write(): switch to generic_perform_write()
        ceph_aio_write(): switch to generic_perform_write()
        xfs_file_buffered_aio_write(): switch to generic_perform_write()
        export generic_perform_write(), start getting rid of generic_file_buffer_write()
        generic_file_direct_write(): get rid of ppos argument
        btrfs_file_aio_write(): get rid of ppos
        kill the 5th argument of generic_file_buffered_write()
        kill the 4th argument of __generic_file_aio_write()
        lustre: don't open-code kernel_recvmsg()
        ocfs2: don't open-code kernel_recvmsg()
        drbd: don't open-code kernel_recvmsg()
        constify blk_rq_map_user_iov() and friends
        lustre: switch to kernel_sendmsg()
        ocfs2: don't open-code kernel_sendmsg()
        take iov_iter stuff to mm/iov_iter.c
        process_vm_access: tidy up a bit
        ...
      5166701b
    • David S. Miller's avatar
      Merge branch 'tunnels' · eda43ce0
      David S. Miller authored
      
      
      Nicolas Dichtel says:
      
      ====================
      tunnels: don't allow to add the same tunnel twice
      
      This series fixes the check of an existing tunnel with the same
      parameters when a new tunnel is added.  I've checked all users of
      ip_tunnel_newlink(): gre, gretap, ipip and vti. The bug exists only
      for gre and vti.
      ====================
      
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      eda43ce0
    • Nicolas Dichtel's avatar
      vti: don't allow to add the same tunnel twice · 8d89dcdf
      Nicolas Dichtel authored
      Before the patch, it was possible to add two times the same tunnel:
      ip l a vti1 type vti remote 10.16.0.121 local 10.16.0.249 key 41
      ip l a vti2 type vti remote 10.16.0.121 local 10.16.0.249 key 41
      
      It was possible, because ip_tunnel_newlink() calls ip_tunnel_find() with the
      argument dev->type, which was set only later (when calling ndo_init handler
      in register_netdevice()). Let's set this type in the setup handler, which is
      called before newlink handler.
      
      Introduced by commit b9959fd3
      
       ("vti: switch to new ip tunnel code").
      
      CC: Cong Wang <amwang@redhat.com>
      CC: Steffen Klassert <steffen.klassert@secunet.com>
      Signed-off-by: default avatarNicolas Dichtel <nicolas.dichtel@6wind.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8d89dcdf
    • Nicolas Dichtel's avatar
      gre: don't allow to add the same tunnel twice · 5a455275
      Nicolas Dichtel authored
      Before the patch, it was possible to add two times the same tunnel:
      ip l a gre1 type gre remote 10.16.0.121 local 10.16.0.249
      ip l a gre2 type gre remote 10.16.0.121 local 10.16.0.249
      
      It was possible, because ip_tunnel_newlink() calls ip_tunnel_find() with the
      argument dev->type, which was set only later (when calling ndo_init handler
      in register_netdevice()). Let's set this type in the setup handler, which is
      called before newlink handler.
      
      Introduced by commit c5441932
      
       ("GRE: Refactor GRE tunneling code.").
      
      CC: Pravin B Shelar <pshelar@nicira.com>
      Signed-off-by: default avatarNicolas Dichtel <nicolas.dichtel@6wind.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5a455275
    • Vincenzo Maffione's avatar
      drivers: net: xen-netfront: fix array initialization bug · 810d8ced
      Vincenzo Maffione authored
      
      
      This patch fixes the initialization of an array used in the TX
      datapath that was mistakenly initialized together with the
      RX datapath arrays. An out of range array access could happen
      when RX and TX rings had different sizes.
      
      Signed-off-by: default avatarVincenzo Maffione <v.maffione@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      810d8ced
    • David S. Miller's avatar
      Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net · dcfba949
      David S. Miller authored
      
      
      Jeff Kirsher says:
      
      ====================
      Intel Wired LAN Driver Updates
      
      This series contains updates to e1000, e1000e, igb, igbvf, ixgb, ixgbe,
      ixgbevf and i40evf.
      
      Mark fixes an issue with ixgbe and ixgbevf by adding a bit to indicate
      when workqueues have been initialized.  This permits the register read
      error handling from attempting to use them prior to that, which also
      generates warnings.  Checking for a detected removal after initializing
      the work queues allows the probe function to return an error without
      getting the workqueue involved.  Further, if the error_detected
      callback is entered before the workqueues are initialized, exit without
      recovery since the device initialization was so truncated.
      
      Francois Romieu provides several patches to all the drivers to remove
      the open coded skb_cow_head.
      
      Jakub Kicinski provides a fix for igb where last_rx_timestamp should be
      updated only when Rx time stamp is read.
      
      Mitch provides a fix for i40evf where a recent change broke the RSS LUT
      programming causing it to be programmed with all 0's.
      ====================
      
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      dcfba949
    • Linus Torvalds's avatar
      Merge tag 'trace-3.15-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace · 0a7418f5
      Linus Torvalds authored
      Pull more tracing updates from Steven Rostedt:
       "This includes the final patch to clean up and fix the issue with the
        design of tracepoints and how a user could register a tracepoint and
        have that tracepoint not be activated but no error was shown.
      
        The design was for an out of tree module but broke in tree users.  The
        clean up was to remove the saving of the hash table of tracepoint
        names such that they can be enabled before they exist (enabling a
        module tracepoint before that module is loaded).  This added more
        complexity than needed.  The clean up was to remove that code and just
        enable tracepoints that exist or fail if they do not.
      
        This removed a lot of code as well as the complexity that it brought.
        As a side effect, instead of registering a tracepoint by its name, the
        tracepoint needs to be registered with the tracepoint descriptor.
        This removes having to duplicate the tracepoint names that are
        enabled.
      
        The second patch was added that simplified the way modules were
        searched for.
      
        This cleanup required changes that were in the 3.15 queue as well as
        some changes that were added late in the 3.14-rc cycle.  This final
        change waited till the two were merged in upstream and then the change
        was added and full tests were run.  Unfortunately, the test found some
        errors, but after it was already submitted to the for-next branch and
        not to be rebased.  Sparse errors were detected by Fengguang Wu's bot
        tests, and my internal tests discovered that the anonymous union
        initialization triggered a bug in older gcc compilers.  Luckily, there
        was a bugzilla for the gcc bug which gave a work around to the
        problem.  The third and fourth patch handled the sparse error and the
        gcc bug respectively.
      
        A final patch was tagged along to fix a missing documentation for the
        README file"
      
      * tag 'trace-3.15-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
        tracing: Add missing function triggers dump and cpudump to README
        tracing: Fix anonymous unions in struct ftrace_event_call
        tracepoint: Fix sparse warnings in tracepoint.c
        tracepoint: Simplify tracepoint module search
        tracepoint: Use struct pointer instead of name hash for reg/unreg tracepoints
      0a7418f5
    • Linus Torvalds's avatar
      Merge git://git.infradead.org/users/eparis/audit · 0b747172
      Linus Torvalds authored
      Pull audit updates from Eric Paris.
      
      * git://git.infradead.org/users/eparis/audit: (28 commits)
        AUDIT: make audit_is_compat depend on CONFIG_AUDIT_COMPAT_GENERIC
        audit: renumber AUDIT_FEATURE_CHANGE into the 1300 range
        audit: do not cast audit_rule_data pointers pointlesly
        AUDIT: Allow login in non-init namespaces
        audit: define audit_is_compat in kernel internal header
        kernel: Use RCU_INIT_POINTER(x, NULL) in audit.c
        sched: declare pid_alive as inline
        audit: use uapi/linux/audit.h for AUDIT_ARCH declarations
        syscall_get_arch: remove useless function arguments
        audit: remove stray newline from audit_log_execve_info() audit_panic() call
        audit: remove stray newlines from audit_log_lost messages
        audit: include subject in login records
        audit: remove superfluous new- prefix in AUDIT_LOGIN messages
        audit: allow user processes to log from another PID namespace
        audit: anchor all pid references in the initial pid namespace
        audit: convert PPIDs to the inital PID namespace.
        pid: get pid_t ppid of task in init_pid_ns
        audit: rename the misleading audit_get_context() to audit_take_context()
        audit: Add generic compat syscall support
        audit: Add CONFIG_HAVE_ARCH_AUDITSYSCALL
        ...
      0b747172
    • Al Viro's avatar
      missing bits of "splice: fix racy pipe->buffers uses" · a786c06d
      Al Viro authored
      
      
      that commit has fixed only the parts of that mess in fs/splice.c itself;
      there had been more in several other ->splice_read() instances...
      
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      a786c06d
    • Al Viro's avatar
      cifs: fix the race in cifs_writev() · 19dfc1f5
      Al Viro authored
      
      
      O_APPEND handling there hadn't been completely fixed by Pavel's
      patch; it checks the right value, but it's racy - we can't really
      do that until i_mutex has been taken.
      
      Fix by switching to __generic_file_aio_write() (open-coding
      generic_file_aio_write(), actually) and pulling mutex_lock() above
      inode_size_read().
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      19dfc1f5