• Nishanth Aravamudan's avatar
    hugetlb: introduce nr_overcommit_hugepages sysctl · d1c3fb1f
    Nishanth Aravamudan authored
    hugetlb: introduce nr_overcommit_hugepages sysctl
    While examining the code to support /proc/sys/vm/hugetlb_dynamic_pool, I
    became convinced that having a boolean sysctl was insufficient:
    1) To support per-node control of hugepages, I have previously submitted
    patches to add a sysfs attribute related to nr_hugepages. However, with
    a boolean global value and per-mount quota enforcement constraining the
    dynamic pool, adding corresponding control of the dynamic pool on a
    per-node basis seems inconsistent to me.
    2) Administration of the hugetlb dynamic pool with multiple hugetlbfs
    mount points is, arguably, more arduous than it needs to be. Each quota
    would need to be set separately, and the sum would need to be monitored.
    To ease the administration, and to help make the way for per-node
    control of the static & dynamic hugepage pool, I added a separate
    sysctl, nr_overcommit_hugepages. This value serves as a high watermark
    for the overall hugepage pool, while nr_hugepages serves as a low
    watermark. The boolean sysctl can then be removed, as the condition
    	nr_overcommit_hugepages > 0
    indicates the same administrative setting as
    	hugetlb_dynamic_pool == 1
    Quotas still serve as local enforcement of the size of the pool on a
    per-mount basis.
    A few caveats:
    1) There is a race whereby the global surplus huge page counter is
    incremented before a hugepage has allocated. Another process could then
    try grow the pool, and fail to convert a surplus huge page to a normal
    huge page and instead allocate a fresh huge page. I believe this is
    benign, as no memory is leaked (the actual pages are still tracked
    correctly) and the counters won't go out of sync.
    2) Shrinking the static pool while a surplus is in effect will allow the
    number of surplus huge pages to exceed the overcommit value. As long as
    this condition holds, however, no more surplus huge pages will be
    allowed on the system until one of the two sysctls are increased
    sufficiently, or the surplus huge pages go out of use and are freed.
    Successfully tested on x86_64 with the current libhugetlbfs snapshot,
    modified to use the new sysctl.
    Signed-off-by: default avatarNishanth Aravamudan <nacc@us.ibm.com>
    Acked-by: default avatarAdam Litke <agl@us.ibm.com>
    Cc: William Lee Irwin III <wli@holomorphy.com>
    Cc: Dave Hansen <haveblue@us.ibm.com>
    Cc: David Gibson <david@gibson.dropbear.id.au>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>