1. 02 Feb, 2011 1 commit
  2. 07 Jan, 2011 1 commit
    • Nick Piggin's avatar
      fs: rcu-walk for path lookup · 31e6b01f
      Nick Piggin authored
      
      
      Perform common cases of path lookups without any stores or locking in the
      ancestor dentry elements. This is called rcu-walk, as opposed to the current
      algorithm which is a refcount based walk, or ref-walk.
      
      This results in far fewer atomic operations on every path element,
      significantly improving path lookup performance. It also avoids cacheline
      bouncing on common dentries, significantly improving scalability.
      
      The overall design is like this:
      * LOOKUP_RCU is set in nd->flags, which distinguishes rcu-walk from ref-walk.
      * Take the RCU lock for the entire path walk, starting with the acquiring
        of the starting path (eg. root/cwd/fd-path). So now dentry refcounts are
        not required for dentry persistence.
      * synchronize_rcu is called when unregistering a filesystem, so we can
        access d_ops and i_ops during rcu-walk.
      * Similarly take the vfsmount lock for the entire path walk. So now mnt
        refcounts are not required for persistence. Also we are free to perform mount
        lookups, and to assume dentry mount points and mount roots are stable up and
        down the path.
      * Have a per-dentry seqlock to protect the dentry name, parent, and inode,
        so we can load this tuple atomically, and also check whether any of its
        members have changed.
      * Dentry lookups (based on parent, candidate string tuple) recheck the parent
        sequence after the child is found in case anything changed in the parent
        during the path walk.
      * inode is also RCU protected so we can load d_inode and use the inode for
        limited things.
      * i_mode, i_uid, i_gid can be tested for exec permissions during path walk.
      * i_op can be loaded.
      
      When we reach the destination dentry, we lock it, recheck lookup sequence,
      and increment its refcount and mountpoint refcount. RCU and vfsmount locks
      are dropped. This is termed "dropping rcu-walk". If the dentry refcount does
      not match, we can not drop rcu-walk gracefully at the current point in the
      lokup, so instead return -ECHILD (for want of a better errno). This signals the
      path walking code to re-do the entire lookup with a ref-walk.
      
      Aside from the final dentry, there are other situations that may be encounted
      where we cannot continue rcu-walk. In that case, we drop rcu-walk (ie. take
      a reference on the last good dentry) and continue with a ref-walk. Again, if
      we can drop rcu-walk gracefully, we return -ECHILD and do the whole lookup
      using ref-walk. But it is very important that we can continue with ref-walk
      for most cases, particularly to avoid the overhead of double lookups, and to
      gain the scalability advantages on common path elements (like cwd and root).
      
      The cases where rcu-walk cannot continue are:
      * NULL dentry (ie. any uncached path element)
      * parent with d_inode->i_op->permission or ACLs
      * dentries with d_revalidate
      * Following links
      
      In future patches, permission checks and d_revalidate become rcu-walk aware. It
      may be possible eventually to make following links rcu-walk aware.
      
      Uncached path elements will always require dropping to ref-walk mode, at the
      very least because i_mutex needs to be grabbed, and objects allocated.
      Signed-off-by: default avatarNick Piggin <npiggin@kernel.dk>
      31e6b01f
  3. 05 Jan, 2011 1 commit
  4. 30 Nov, 2010 1 commit
  5. 15 Nov, 2010 1 commit
  6. 20 Oct, 2010 3 commits
  7. 02 Aug, 2010 1 commit
  8. 28 Jul, 2010 1 commit
  9. 16 Jul, 2010 1 commit
  10. 12 Apr, 2010 13 commits
  11. 30 Mar, 2010 1 commit
    • Tejun Heo's avatar
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking... · 5a0e3ad6
      Tejun Heo authored
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h
      
      percpu.h is included by sched.h and module.h and thus ends up being
      included when building most .c files.  percpu.h includes slab.h which
      in turn includes gfp.h making everything defined by the two files
      universally available and complicating inclusion dependencies.
      
      percpu.h -> slab.h dependency is about to be removed.  Prepare for
      this change by updating users of gfp and slab facilities include those
      headers directly instead of assuming availability.  As this conversion
      needs to touch large number of source files, the following script is
      used as the basis of conversion.
      
        http://userweb.kernel.org/~tj/misc/slabh-sweep.py
      
      
      
      The script does the followings.
      
      * Scan files for gfp and slab usages and update includes such that
        only the necessary includes are there.  ie. if only gfp is used,
        gfp.h, if slab is used, slab.h.
      
      * When the script inserts a new include, it looks at the include
        blocks and try to put the new include such that its order conforms
        to its surrounding.  It's put in the include block which contains
        core kernel includes, in the same order that the rest are ordered -
        alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
        doesn't seem to be any matching order.
      
      * If the script can't find a place to put a new include (mostly
        because the file doesn't have fitting include block), it prints out
        an error message indicating which .h file needs to be added to the
        file.
      
      The conversion was done in the following steps.
      
      1. The initial automatic conversion of all .c files updated slightly
         over 4000 files, deleting around 700 includes and adding ~480 gfp.h
         and ~3000 slab.h inclusions.  The script emitted errors for ~400
         files.
      
      2. Each error was manually checked.  Some didn't need the inclusion,
         some needed manual addition while adding it to implementation .h or
         embedding .c file was more appropriate for others.  This step added
         inclusions to around 150 files.
      
      3. The script was run again and the output was compared to the edits
         from #2 to make sure no file was left behind.
      
      4. Several build tests were done and a couple of problems were fixed.
         e.g. lib/decompress_*.c used malloc/free() wrappers around slab
         APIs requiring slab.h to be added manually.
      
      5. The script was run on all .h files but without automatically
         editing them as sprinkling gfp.h and slab.h inclusions around .h
         files could easily lead to inclusion dependency hell.  Most gfp.h
         inclusion directives were ignored as stuff from gfp.h was usually
         wildly available and often used in preprocessor macros.  Each
         slab.h inclusion directive was examined and added manually as
         necessary.
      
      6. percpu.h was updated not to include slab.h.
      
      7. Build test were done on the following configurations and failures
         were fixed.  CONFIG_GCOV_KERNEL was turned off for all tests (as my
         distributed build env didn't work with gcov compiles) and a few
         more options had to be turned off depending on archs to make things
         build (like ipr on powerpc/64 which failed due to missing writeq).
      
         * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
         * powerpc and powerpc64 SMP allmodconfig
         * sparc and sparc64 SMP allmodconfig
         * ia64 SMP allmodconfig
         * s390 SMP allmodconfig
         * alpha SMP allmodconfig
         * um on x86_64 SMP allmodconfig
      
      8. percpu.h modifications were reverted so that it could be applied as
         a separate patch and serve as bisection point.
      
      Given the fact that I had only a couple of failures from tests on step
      6, I'm fairly confident about the coverage of this conversion patch.
      If there is a breakage, it's likely to be something in one of the arch
      headers which should be easily discoverable easily on most builds of
      the specific arch.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Guess-its-ok-by: default avatarChristoph Lameter <cl@linux-foundation.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
      5a0e3ad6
  12. 23 Feb, 2010 1 commit
    • wzt.wzt@gmail.com's avatar
      Security: add static to security_ops and default_security_ops variable · 189b3b1c
      wzt.wzt@gmail.com authored
      
      
      Enhance the security framework to support resetting the active security
      module. This eliminates the need for direct use of the security_ops and
      default_security_ops variables outside of security.c, so make security_ops
      and default_security_ops static. Also remove the secondary_ops variable as
      a cleanup since there is no use for that. secondary_ops was originally used by
      SELinux to call the "secondary" security module (capability or dummy),
      but that was replaced by direct calls to capability and the only
      remaining use is to save and restore the original security ops pointer
      value if SELinux is disabled by early userspace based on /etc/selinux/config.
      Further, if we support this directly in the security framework, then we can
      just use &default_security_ops for this purpose since that is now available.
      Signed-off-by: default avatarZhitong Wang <zhitong.wangzt@alibaba-inc.com>
      Acked-by: default avatarStephen Smalley <sds@tycho.nsa.gov>
      Signed-off-by: default avatarJames Morris <jmorris@namei.org>
      189b3b1c
  13. 04 Feb, 2010 1 commit
  14. 10 Jan, 2010 1 commit
  15. 16 Dec, 2009 1 commit
  16. 09 Nov, 2009 1 commit
    • Eric Paris's avatar
      security: report the module name to security_module_request · dd8dbf2e
      Eric Paris authored
      
      
      For SELinux to do better filtering in userspace we send the name of the
      module along with the AVC denial when a program is denied module_request.
      
      Example output:
      
      type=SYSCALL msg=audit(11/03/2009 10:59:43.510:9) : arch=x86_64 syscall=write success=yes exit=2 a0=3 a1=7fc28c0d56c0 a2=2 a3=7fffca0d7440 items=0 ppid=1727 pid=1729 auid=unset uid=root gid=root euid=root suid=root fsuid=root egid=root sgid=root fsgid=root tty=(none) ses=unset comm=rpc.nfsd exe=/usr/sbin/rpc.nfsd subj=system_u:system_r:nfsd_t:s0 key=(null)
      type=AVC msg=audit(11/03/2009 10:59:43.510:9) : avc:  denied  { module_request } for  pid=1729 comm=rpc.nfsd kmod="net-pf-10" scontext=system_u:system_r:nfsd_t:s0 tcontext=system_u:system_r:kernel_t:s0 tclass=system
      Signed-off-by: default avatarEric Paris <eparis@redhat.com>
      Signed-off-by: default avatarJames Morris <jmorris@namei.org>
      dd8dbf2e
  17. 11 Oct, 2009 2 commits
  18. 24 Sep, 2009 1 commit
  19. 10 Sep, 2009 1 commit
    • David P. Quigley's avatar
      LSM/SELinux: inode_{get,set,notify}secctx hooks to access LSM security context information. · 1ee65e37
      David P. Quigley authored
      
      
      This patch introduces three new hooks. The inode_getsecctx hook is used to get
      all relevant information from an LSM about an inode. The inode_setsecctx is
      used to set both the in-core and on-disk state for the inode based on a context
      derived from inode_getsecctx.The final hook inode_notifysecctx will notify the
      LSM of a change for the in-core state of the inode in question. These hooks are
      for use in the labeled NFS code and addresses concerns of how to set security
      on an inode in a multi-xattr LSM. For historical reasons Stephen Smalley's
      explanation of the reason for these hooks is pasted below.
      
      Quote Stephen Smalley
      
      inode_setsecctx:  Change the security context of an inode.  Updates the
      in core security context managed by the security module and invokes the
      fs code as needed (via __vfs_setxattr_noperm) to update any backing
      xattrs that represent the context.  Example usage:  NFS server invokes
      this hook to change the security context in its incore inode and on the
      backing file system to a value provided by the client on a SETATTR
      operation.
      
      inode_notifysecctx:  Notify the security module of what the security
      context of an inode should be.  Initializes the incore security context
      managed by the security module for this inode.  Example usage:  NFS
      client invokes this hook to initialize the security context in its
      incore inode to the value provided by the server for the file when the
      server returned the file's attributes to the client.
      Signed-off-by: default avatarDavid P. Quigley <dpquigl@tycho.nsa.gov>
      Acked-by: default avatarSerge Hallyn <serue@us.ibm.com>
      Signed-off-by: default avatarJames Morris <jmorris@namei.org>
      1ee65e37
  20. 07 Sep, 2009 2 commits
  21. 02 Sep, 2009 1 commit
    • David Howells's avatar
      KEYS: Add a keyctl to install a process's session keyring on its parent [try #6] · ee18d64c
      David Howells authored
      
      
      Add a keyctl to install a process's session keyring onto its parent.  This
      replaces the parent's session keyring.  Because the COW credential code does
      not permit one process to change another process's credentials directly, the
      change is deferred until userspace next starts executing again.  Normally this
      will be after a wait*() syscall.
      
      To support this, three new security hooks have been provided:
      cred_alloc_blank() to allocate unset security creds, cred_transfer() to fill in
      the blank security creds and key_session_to_parent() - which asks the LSM if
      the process may replace its parent's session keyring.
      
      The replacement may only happen if the process has the same ownership details
      as its parent, and the process has LINK permission on the session keyring, and
      the session keyring is owned by the process, and the LSM permits it.
      
      Note that this requires alteration to each architecture's notify_resume path.
      This has been done for all arches barring blackfin, m68k* and xtensa, all of
      which need assembly alteration to support TIF_NOTIFY_RESUME.  This allows the
      replacement to be performed at the point the parent process resumes userspace
      execution.
      
      This allows the userspace AFS pioctl emulation to fully emulate newpag() and
      the VIOCSETTOK and VIOCSETTOK2 pioctls, all of which require the ability to
      alter the parent process's PAG membership.  However, since kAFS doesn't use
      PAGs per se, but rather dumps the keys into the session keyring, the session
      keyring of the parent must be replaced if, for example, VIOCSETTOK is passed
      the newpag flag.
      
      This can be tested with the following program:
      
      	#include <stdio.h>
      	#include <stdlib.h>
      	#include <keyutils.h>
      
      	#define KEYCTL_SESSION_TO_PARENT	18
      
      	#define OSERROR(X, S) do { if ((long)(X) == -1) { perror(S); exit(1); } } while(0)
      
      	int main(int argc, char **argv)
      	{
      		key_serial_t keyring, key;
      		long ret;
      
      		keyring = keyctl_join_session_keyring(argv[1]);
      		OSERROR(keyring, "keyctl_join_session_keyring");
      
      		key = add_key("user", "a", "b", 1, keyring);
      		OSERROR(key, "add_key");
      
      		ret = keyctl(KEYCTL_SESSION_TO_PARENT);
      		OSERROR(ret, "KEYCTL_SESSION_TO_PARENT");
      
      		return 0;
      	}
      
      Compiled and linked with -lkeyutils, you should see something like:
      
      	[dhowells@andromeda ~]$ keyctl show
      	Session Keyring
      	       -3 --alswrv   4043  4043  keyring: _ses
      	355907932 --alswrv   4043    -1   \_ keyring: _uid.4043
      	[dhowells@andromeda ~]$ /tmp/newpag
      	[dhowells@andromeda ~]$ keyctl show
      	Session Keyring
      	       -3 --alswrv   4043  4043  keyring: _ses
      	1055658746 --alswrv   4043  4043   \_ user: a
      	[dhowells@andromeda ~]$ /tmp/newpag hello
      	[dhowells@andromeda ~]$ keyctl show
      	Session Keyring
      	       -3 --alswrv   4043  4043  keyring: hello
      	340417692 --alswrv   4043  4043   \_ user: a
      
      Where the test program creates a new session keyring, sticks a user key named
      'a' into it and then installs it on its parent.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Signed-off-by: default avatarJames Morris <jmorris@namei.org>
      ee18d64c
  22. 31 Aug, 2009 1 commit
    • Paul Moore's avatar
      lsm: Add hooks to the TUN driver · 2b980dbd
      Paul Moore authored
      
      
      The TUN driver lacks any LSM hooks which makes it difficult for LSM modules,
      such as SELinux, to enforce access controls on network traffic generated by
      TUN users; this is particularly problematic for virtualization apps such as
      QEMU and KVM.  This patch adds three new LSM hooks designed to control the
      creation and attachment of TUN devices, the hooks are:
      
       * security_tun_dev_create()
         Provides access control for the creation of new TUN devices
      
       * security_tun_dev_post_create()
         Provides the ability to create the necessary socket LSM state for newly
         created TUN devices
      
       * security_tun_dev_attach()
         Provides access control for attaching to existing, persistent TUN devices
         and the ability to update the TUN device's socket LSM state as necessary
      Signed-off-by: default avatarPaul Moore <paul.moore@hp.com>
      Acked-by: default avatarEric Paris <eparis@parisplace.org>
      Acked-by: default avatarSerge Hallyn <serue@us.ibm.com>
      Acked-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarJames Morris <jmorris@namei.org>
      2b980dbd
  23. 17 Aug, 2009 2 commits
    • Eric Paris's avatar
      security: define round_hint_to_min in !CONFIG_SECURITY · 1d995973
      Eric Paris authored
      
      
      Fix the header files to define round_hint_to_min() and to define
      mmap_min_addr_handler() in the !CONFIG_SECURITY case.
      
      Built and tested with !CONFIG_SECURITY
      Signed-off-by: default avatarEric Paris <eparis@redhat.com>
      Signed-off-by: default avatarJames Morris <jmorris@namei.org>
      1d995973
    • Eric Paris's avatar
      Security/SELinux: seperate lsm specific mmap_min_addr · 788084ab
      Eric Paris authored
      
      
      Currently SELinux enforcement of controls on the ability to map low memory
      is determined by the mmap_min_addr tunable.  This patch causes SELinux to
      ignore the tunable and instead use a seperate Kconfig option specific to how
      much space the LSM should protect.
      
      The tunable will now only control the need for CAP_SYS_RAWIO and SELinux
      permissions will always protect the amount of low memory designated by
      CONFIG_LSM_MMAP_MIN_ADDR.
      
      This allows users who need to disable the mmap_min_addr controls (usual reason
      being they run WINE as a non-root user) to do so and still have SELinux
      controls preventing confined domains (like a web server) from being able to
      map some area of low memory.
      Signed-off-by: default avatarEric Paris <eparis@redhat.com>
      Signed-off-by: default avatarJames Morris <jmorris@namei.org>
      788084ab