Commit e9e3d724 authored by Neil Horman's avatar Neil Horman Committed by Linus Torvalds
Browse files

nfs4: Ensure that ACL pages sent over NFS were not allocated from the slab (v3)

The "bad_page()" page allocator sanity check was reported recently (call
chain as follows):


It occurs because an skb with a fraglist was freed from the tcp
retransmit queue when it was acked, but a page on that fraglist had
PG_Slab set (indicating it was allocated from the Slab allocator (which
means the free path above can't safely free it via put_page.

We tracked this back to an nfsv4 setacl operation, in which the nfs code
attempted to fill convert the passed in buffer to an array of pages in
__nfs4_proc_set_acl, which gets used by the skb->frags list in
xs_sendpages.  __nfs4_proc_set_acl just converts each page in the buffer
to a page struct via virt_to_page, but the vfs allocates the buffer via
kmalloc, meaning the PG_slab bit is set.  We can't create a buffer with
kmalloc and free it later in the tcp ack path with put_page, so we need
to either:

1) ensure that when we create the list of pages, no page struct has
   PG_Slab set


2) not use a page list to send this data

Given that these buffers can be multiple pages and arbitrarily sized, I
think (1) is the right way to go.  I've written the below patch to
allocate a page from the buddy allocator directly and copy the data over
to it.  This ensures that we have a put_page free-able page for every
entry that winds up on an skb frag list, so it can be safely freed when
the frame is acked.  We do a put page on each entry after the
rpc_call_sync call so as to drop our own reference count to the page,
leaving only the ref count taken by tcp_sendpages.  This way the data
will be properly freed when the ack comes in

Successfully tested by myself to solve the above oops.

Note, as this is the result of a setacl operation that exceeded a page
of data, I think this amounts to a local DOS triggerable by an
uprivlidged user, so I'm CCing security on this as well.
Signed-off-by: default avatarNeil Horman <>
CC: Trond Myklebust <>
CC: Jeff Layton <>
Signed-off-by: default avatarLinus Torvalds <>
parent 3256f80f
......@@ -51,6 +51,7 @@
#include <linux/sunrpc/bc_xprt.h>
#include <linux/xattr.h>
#include <linux/utsname.h>
#include <linux/mm.h>
#include "nfs4_fs.h"
#include "delegation.h"
......@@ -3252,6 +3253,35 @@ static void buf_to_pages(const void *buf, size_t buflen,
static int buf_to_pages_noslab(const void *buf, size_t buflen,
struct page **pages, unsigned int *pgbase)
struct page *newpage, **spages;
int rc = 0;
size_t len;
spages = pages;
do {
len = min(PAGE_CACHE_SIZE, buflen);
newpage = alloc_page(GFP_KERNEL);
if (newpage == NULL)
goto unwind;
memcpy(page_address(newpage), buf, len);
buf += len;
buflen -= len;
*pages++ = newpage;
} while (buflen != 0);
return rc;
for(; rc > 0; rc--)
return -ENOMEM;
struct nfs4_cached_acl {
int cached;
size_t len;
......@@ -3420,13 +3450,23 @@ static int __nfs4_proc_set_acl(struct inode *inode, const void *buf, size_t bufl
.rpc_argp = &arg,
.rpc_resp = &res,
int ret;
int ret, i;
if (!nfs4_server_supports_acls(server))
i = buf_to_pages_noslab(buf, buflen, arg.acl_pages, &arg.acl_pgbase);
if (i < 0)
return i;
buf_to_pages(buf, buflen, arg.acl_pages, &arg.acl_pgbase);
ret = nfs4_call_sync(server, &msg, &arg, &res, 1);
* Free each page after tx, so the only ref left is
* held by the network stack
for (; i > 0; i--)
* Acl update can result in inode attribute update.
* so mark the attribute cache invalid.
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment