1. 03 Jun, 2016 4 commits
  2. 31 May, 2016 1 commit
  3. 28 May, 2016 7 commits
    • George Spelvin's avatar
      <linux/hash.h>: Add support for architecture-specific functions · 468a9428
      George Spelvin authored
      
      
      This is just the infrastructure; there are no users yet.
      
      This is modelled on CONFIG_ARCH_RANDOM; a CONFIG_ symbol declares
      the existence of <asm/hash.h>.
      
      That file may define its own versions of various functions, and define
      HAVE_* symbols (no CONFIG_ prefix!) to suppress the generic ones.
      
      Included is a self-test (in lib/test_hash.c) that verifies the basics.
      It is NOT in general required that the arch-specific functions compute
      the same thing as the generic, but if a HAVE_* symbol is defined with
      the value 1, then equality is tested.
      Signed-off-by: default avatarGeorge Spelvin <linux@sciencehorizons.net>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Greg Ungerer <gerg@linux-m68k.org>
      Cc: Andreas Schwab <schwab@linux-m68k.org>
      Cc: Philippe De Muyter <phdm@macq.eu>
      Cc: linux-m68k@lists.linux-m68k.org
      Cc: Alistair Francis <alistai@xilinx.com>
      Cc: Michal Simek <michal.simek@xilinx.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Cc: uclinux-h8-devel@lists.sourceforge.jp
      468a9428
    • George Spelvin's avatar
      Eliminate bad hash multipliers from hash_32() and hash_64() · ef703f49
      George Spelvin authored
      The "simplified" prime multipliers made very bad hash functions, so get rid
      of them.  This completes the work of 689de1d6
      
      .
      
      To avoid the inefficiency which was the motivation for the "simplified"
      multipliers, hash_64() on 32-bit systems is changed to use a different
      algorithm.  It makes two calls to hash_32() instead.
      
      drivers/media/usb/dvb-usb-v2/af9015.c uses the old GOLDEN_RATIO_PRIME_32
      for some horrible reason, so it inherits a copy of the old definition.
      Signed-off-by: default avatarGeorge Spelvin <linux@sciencehorizons.net>
      Cc: Antti Palosaari <crope@iki.fi>
      Cc: Mauro Carvalho Chehab <m.chehab@samsung.com>
      ef703f49
    • George Spelvin's avatar
      Change hash_64() return value to 32 bits · 92d56774
      George Spelvin authored
      
      
      That's all that's ever asked for, and it makes the return
      type of hash_long() consistent.
      
      It also allows (upcoming patch) an optimized implementation
      of hash_64 on 32-bit machines.
      
      I tried adding a BUILD_BUG_ON to ensure the number of bits requested
      was never more than 32 (most callers use a compile-time constant), but
      adding <linux/bug.h> to <linux/hash.h> breaks the tools/perf compiler
      unless tools/perf/MANIFEST is updated, and understanding that code base
      well enough to update it is too much trouble.  I did the rest of an
      allyesconfig build with such a check, and nothing tripped.
      Signed-off-by: default avatarGeorge Spelvin <linux@sciencehorizons.net>
      92d56774
    • George Spelvin's avatar
      <linux/sunrpc/svcauth.h>: Define hash_str() in terms of hashlen_string() · 917ea166
      George Spelvin authored
      
      
      Finally, the first use of previous two patches: eliminate the
      separate ad-hoc string hash functions in the sunrpc code.
      
      Now hash_str() is a wrapper around hash_string(), and hash_mem() is
      likewise a wrapper around full_name_hash().
      
      Note that sunrpc code *does* call hash_mem() with a zero length, which
      is why the previous patch needed to handle that in full_name_hash().
      (Thanks, Bruce, for finding that!)
      
      This also eliminates the only caller of hash_long which asks for
      more than 32 bits of output.
      
      The comment about the quality of hashlen_string() and full_name_hash()
      is jumping the gun by a few patches; they aren't very impressive now,
      but will be improved greatly later in the series.
      Signed-off-by: default avatarGeorge Spelvin <linux@sciencehorizons.net>
      Tested-by: default avatarJ. Bruce Fields <bfields@redhat.com>
      Acked-by: default avatarJ. Bruce Fields <bfields@redhat.com>
      Cc: Jeff Layton <jlayton@poochiereds.net>
      Cc: linux-nfs@vger.kernel.org
      917ea166
    • George Spelvin's avatar
      fs/namei.c: Add hashlen_string() function · fcfd2fbf
      George Spelvin authored
      
      
      We'd like to make more use of the highly-optimized dcache hash functions
      throughout the kernel, rather than have every subsystem create its own,
      and a function that hashes basic null-terminated strings is required
      for that.
      
      (The name is to emphasize that it returns both hash and length.)
      
      It's actually useful in the dcache itself, specifically d_alloc_name().
      Other uses in the next patch.
      
      full_name_hash() is also tweaked to make it more generally useful:
      1) Take a "char *" rather than "unsigned char *" argument, to
         be consistent with hash_name().
      2) Handle zero-length inputs.  If we want more callers, we don't want
         to make them worry about corner cases.
      Signed-off-by: default avatarGeorge Spelvin <linux@sciencehorizons.net>
      fcfd2fbf
    • George Spelvin's avatar
      Pull out string hash to <linux/stringhash.h> · f4bcbe79
      George Spelvin authored
      
      
      ... so they can be used without the rest of <linux/dcache.h>
      
      The hashlen_* macros will make sense next patch.
      Signed-off-by: default avatarGeorge Spelvin <linux@sciencehorizons.net>
      f4bcbe79
    • Al Viro's avatar
      switch ->setxattr() to passing dentry and inode separately · 3767e255
      Al Viro authored
      smack ->d_instantiate() uses ->setxattr(), so to be able to call it before
      we'd hashed the new dentry and attached it to inode, we need ->setxattr()
      instances getting the inode as an explicit argument rather than obtaining
      it from dentry.
      
      Similar change for ->getxattr() had been done in commit ce23e640
      
      .  Unlike
      ->getxattr() (which is used by both selinux and smack instances of
      ->d_instantiate()) ->setxattr() is used only by smack one and unfortunately
      it got missed back then.
      Reported-by: default avatarSeung-Woo Kim <sw0312.kim@samsung.com>
      Tested-by: default avatarCasey Schaufler <casey@schaufler-ca.com>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      3767e255
  4. 27 May, 2016 4 commits
    • Linus Torvalds's avatar
      make IS_ERR_VALUE() complain about non-pointer-sized arguments · aa00edc1
      Linus Torvalds authored
      
      
      Now that the allmodconfig x86-64 build is clean wrt IS_ERR_VALUE() uses
      on integers, add a cast to a pointer and back to the argument, so that
      any new mis-uses of IS_ERR_VALUE() will cause warnings like
      
         warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
      
      so that we don't re-introduce any bogus uses.
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      aa00edc1
    • Linus Torvalds's avatar
      mm: remove more IS_ERR_VALUE abuses · 5d22fc25
      Linus Torvalds authored
      
      
      The do_brk() and vm_brk() return value was "unsigned long" and returned
      the starting address on success, and an error value on failure.  The
      reasons are entirely historical, and go back to it basically behaving
      like the mmap() interface does.
      
      However, nobody actually wanted that interface, and it causes totally
      pointless IS_ERR_VALUE() confusion.
      
      What every single caller actually wants is just the simpler integer
      return of zero for success and negative error number on failure.
      
      So just convert to that much clearer and more common calling convention,
      and get rid of all the IS_ERR_VALUE() uses wrt vm_brk().
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      5d22fc25
    • Linus Torvalds's avatar
      mm: fix section mismatch warning · 7ded384a
      Linus Torvalds authored
      The register_page_bootmem_info_node() function needs to be marked __init
      in order to avoid a new warning introduced by commit f65e91df
      
       ("mm:
      use early_pfn_to_nid in register_page_bootmem_info_node").
      
      Otherwise you'll get a warning about how a non-init function calls
      early_pfn_to_nid (which is __meminit)
      
      Cc: Yang Shi <yang.shi@linaro.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      7ded384a
    • Al Viro's avatar
      switch xattr_handler->set() to passing dentry and inode separately · 59301226
      Al Viro authored
      
      
      preparation for similar switch in ->setxattr() (see the next commit for
      rationale).
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      59301226
  5. 26 May, 2016 5 commits
  6. 25 May, 2016 19 commits
    • Zhang Zhuoyu's avatar
      ceph: make logical calculation functions return bool · 3b33f692
      Zhang Zhuoyu authored
      
      
      This patch makes serverl logical caculation functions return bool to
      improve readability due to these particular functions only using 0/1
      as their return value.
      
      No functional change.
      Signed-off-by: default avatarZhang Zhuoyu <zhangzhuoyu@cmss.chinamobile.com>
      3b33f692
    • Yan, Zheng's avatar
      ceph: using hash value to compose dentry offset · f3c4ebe6
      Yan, Zheng authored
      
      
      If MDS sorts dentries in dirfrag in hash order, we use hash value to
      compose dentry offset. dentry offset is:
      
        (0xff << 52) | ((24 bits hash) << 28) |
        (the nth entry hash hash collision)
      
      This offset is stable across directory fragmentation. This alos means
      there is no need to reset readdir offset if directory get fragmented
      in the middle of readdir.
      Signed-off-by: default avatarYan, Zheng <zyan@redhat.com>
      f3c4ebe6
    • Yan, Zheng's avatar
      ceph: define 'end/complete' in readdir reply as bit flags · 956d39d6
      Yan, Zheng authored
      
      
      Set a flag in readdir request, which indicates that client interprets
      'end/complete' as bit flags. So that mds can reply additional flags in
      readdir reply.
      Signed-off-by: default avatarYan, Zheng <zyan@redhat.com>
      956d39d6
    • Ilya Dryomov's avatar
      737cc81e
    • Ilya Dryomov's avatar
      libceph: replace ceph_monc_request_next_osdmap() · 7cca78c9
      Ilya Dryomov authored
      
      
      ... with a wrapper around maybe_request_map() - no need for two
      osdmap-specific functions.
      Signed-off-by: default avatarIlya Dryomov <idryomov@gmail.com>
      7cca78c9
    • Ilya Dryomov's avatar
      libceph: pool deletion detection · 4609245e
      Ilya Dryomov authored
      
      
      This adds the "map check" infrastructure for sending osdmap version
      checks on CALC_TARGET_POOL_DNE and completing in-flight requests with
      -ENOENT if the target pool doesn't exist or has just been deleted.
      Signed-off-by: default avatarIlya Dryomov <idryomov@gmail.com>
      4609245e
    • Ilya Dryomov's avatar
      libceph: async MON client generic requests · d0b19705
      Ilya Dryomov authored
      
      
      For map check, we are going to need to send CEPH_MSG_MON_GET_VERSION
      messages asynchronously and get a callback on completion.  Refactor MON
      client to allow firing off generic requests asynchronously and add an
      async variant of ceph_monc_get_version().  ceph_monc_do_statfs() is
      switched over and remains sync.
      Signed-off-by: default avatarIlya Dryomov <idryomov@gmail.com>
      d0b19705
    • Ilya Dryomov's avatar
      libceph: support for checking on status of watch · b07d3c4b
      Ilya Dryomov authored
      
      
      Implement ceph_osdc_watch_check() to be able to check on status of
      watch.  Note that the time it takes for a watch/notify event to get
      delivered through the notify_wq is taken into account.
      Signed-off-by: default avatarIlya Dryomov <idryomov@gmail.com>
      b07d3c4b
    • Ilya Dryomov's avatar
      libceph: support for sending notifies · 19079203
      Ilya Dryomov authored
      
      
      Implement ceph_osdc_notify() for sending notifies.
      
      Due to the fact that the current messenger can't do read-in into
      pagelists (it can only do write-out from them), I had to go with a page
      vector for a NOTIFY_COMPLETE payload, for now.
      Signed-off-by: default avatarIlya Dryomov <idryomov@gmail.com>
      19079203
    • Ilya Dryomov's avatar
      libceph, rbd: ceph_osd_linger_request, watch/notify v2 · 922dab61
      Ilya Dryomov authored
      
      
      This adds support and switches rbd to a new, more reliable version of
      watch/notify protocol.  As with the OSD client update, this is mostly
      about getting the right structures linked into the right places so that
      reconnects are properly sent when needed.  watch/notify v2 also
      requires sending regular pings to the OSDs - send_linger_ping().
      
      A major change from the old watch/notify implementation is the
      introduction of ceph_osd_linger_request - linger requests no longer
      piggy back on ceph_osd_request.  ceph_osd_event has been merged into
      ceph_osd_linger_request.
      
      All the details are now hidden within libceph, the interface consists
      of a simple pair of watch/unwatch functions and ceph_osdc_notify_ack().
      ceph_osdc_watch() does return ceph_osd_linger_request, but only to keep
      the lifetime management simple.
      
      ceph_osdc_notify_ack() accepts an optional data payload, which is
      relayed back to the notifier.
      
      Portions of this patch are loosely based on work by Douglas Fuller
      <dfuller@redhat.com> and Mike Christie <michaelc@cs.wisc.edu>.
      Signed-off-by: default avatarIlya Dryomov <idryomov@gmail.com>
      922dab61
    • Ilya Dryomov's avatar
      libceph: a major OSD client update · 5aea3dcd
      Ilya Dryomov authored
      
      
      This is a major sync up, up to ~Jewel.  The highlights are:
      
      - per-session request trees (vs a global per-client tree)
      - per-session locking (vs a global per-client rwlock)
      - homeless OSD session
      - no ad-hoc global per-client lists
      - support for pool quotas
      - foundation for watch/notify v2 support
      - foundation for map check (pool deletion detection) support
      
      The switchover is incomplete: lingering requests can be setup and
      teared down but aren't ever reestablished.  This functionality is
      restored with the introduction of the new lingering infrastructure
      (ceph_osd_linger_request, linger_work, etc) in a later commit.
      Signed-off-by: default avatarIlya Dryomov <idryomov@gmail.com>
      5aea3dcd
    • Ilya Dryomov's avatar
      libceph: protect osdc->osd_lru list with a spinlock · 9dd2845c
      Ilya Dryomov authored
      
      
      OSD client is getting moved from the big per-client lock to a set of
      per-session locks.  The big rwlock would only be held for read most of
      the time, so a global osdc->osd_lru needs additional protection.
      Signed-off-by: default avatarIlya Dryomov <idryomov@gmail.com>
      9dd2845c
    • Ilya Dryomov's avatar
      libceph: handle_one_map() · 42c1b124
      Ilya Dryomov authored
      
      
      Separate osdmap handling from decoding and iterating over a bag of maps
      in a fresh MOSDMap message.  This sets up the scene for the updated OSD
      client.
      
      Of particular importance here is the addition of pi->was_full, which
      can be used to answer "did this pool go full -> not-full in this map?".
      This is the key bit for supporting pool quotas.
      
      We won't be able to downgrade map_sem for much longer, so drop
      downgrade_write().
      Signed-off-by: default avatarIlya Dryomov <idryomov@gmail.com>
      42c1b124
    • Ilya Dryomov's avatar
      libceph: allocate dummy osdmap in ceph_osdc_init() · e5253a7b
      Ilya Dryomov authored
      
      
      This leads to a simpler osdmap handling code, particularly when dealing
      with pi->was_full, which is introduced in a later commit.
      Signed-off-by: default avatarIlya Dryomov <idryomov@gmail.com>
      e5253a7b
    • Ilya Dryomov's avatar
      libceph: redo callbacks and factor out MOSDOpReply decoding · fe5da05e
      Ilya Dryomov authored
      
      
      If you specify ACK | ONDISK and set ->r_unsafe_callback, both
      ->r_callback and ->r_unsafe_callback(true) are called on ack.  This is
      very confusing.  Redo this so that only one of them is called:
      
          ->r_unsafe_callback(true), on ack
          ->r_unsafe_callback(false), on commit
      
      or
      
          ->r_callback, on ack|commit
      
      Decode everything in decode_MOSDOpReply() to reduce clutter.
      Signed-off-by: default avatarIlya Dryomov <idryomov@gmail.com>
      fe5da05e
    • Ilya Dryomov's avatar
      libceph: drop msg argument from ceph_osdc_callback_t · 85e084fe
      Ilya Dryomov authored
      
      
      finish_read(), its only user, uses it to get to hdr.data_len, which is
      what ->r_result is set to on success.  This gains us the ability to
      safely call callbacks from contexts other than reply, e.g. map check.
      Signed-off-by: default avatarIlya Dryomov <idryomov@gmail.com>
      85e084fe
    • Ilya Dryomov's avatar
      libceph: switch to calc_target(), part 2 · bb873b53
      Ilya Dryomov authored
      
      
      The crux of this is getting rid of ceph_osdc_build_request(), so that
      MOSDOp can be encoded not before but after calc_target() calculates the
      actual target.  Encoding now happens within ceph_osdc_start_request().
      
      Also nuked is the accompanying bunch of pointers into the encoded
      buffer that was used to update fields on each send - instead, the
      entire front is re-encoded.  If we want to support target->name_len !=
      base->name_len in the future, there is no other way, because oid is
      surrounded by other fields in the encoded buffer.
      
      Encoding OSD ops and adding data items to the request message were
      mixed together in osd_req_encode_op().  While we want to re-encode OSD
      ops, we don't want to add duplicate data items to the message when
      resending, so all call to ceph_osdc_msg_data_add() are factored out
      into a new setup_request_data().
      Signed-off-by: default avatarIlya Dryomov <idryomov@gmail.com>
      bb873b53
    • Ilya Dryomov's avatar
      libceph: switch to calc_target(), part 1 · a66dd383
      Ilya Dryomov authored
      
      
      Replace __calc_request_pg() and most of __map_request() with
      calc_target() and start using req->r_t.
      
      ceph_osdc_build_request() however still encodes base_oid, because it's
      called before calc_target() is and target_oid is empty at that point in
      time; a printf in osdc_show() also shows base_oid.  This is fixed in
      "libceph: switch to calc_target(), part 2".
      Signed-off-by: default avatarIlya Dryomov <idryomov@gmail.com>
      a66dd383
    • Ilya Dryomov's avatar
      libceph: introduce ceph_osd_request_target, calc_target() · 63244fa1
      Ilya Dryomov authored
      
      
      Introduce ceph_osd_request_target, containing all mapping-related
      fields of ceph_osd_request and calc_target() for calculating mappings
      and populating it.
      Signed-off-by: default avatarIlya Dryomov <idryomov@gmail.com>
      63244fa1