- 12 Jan, 2006 9 commits
-
-
Andi Kleen authored
Signed-off-by:
Andi Kleen <ak@suse.de> Signed-off-by:
Linus Torvalds <torvalds@osdl.org>
-
Andi Kleen authored
No functional changes Signed-off-by:
Andi Kleen <ak@suse.de> Signed-off-by:
Linus Torvalds <torvalds@osdl.org>
-
Andi Kleen authored
Fix off by one when checking if the machine has enougn memory to need IOMMU This caused the IOMMUs to be needlessly enabled for mem=4G Based on a patch from Jon Mason Signed-off-by: jdmason@us.ibm.com Signed-off-by:
Andi Kleen <ak@suse.de> Signed-off-by:
Linus Torvalds <torvalds@osdl.org>
-
Andi Kleen authored
Much better to deal with these than with the magic numbers. And remove the comment describing the bits - kernel source is no replacement for an architecture manual. Signed-off-by:
Andi Kleen <ak@suse.de> Signed-off-by:
Linus Torvalds <torvalds@osdl.org>
-
Andi Kleen authored
Don't need to do the vmalloc check for the module range because its PML4 is shared with the kernel text. Also removed an unnecessary TLB flush. Pointed out by Jan Beulich Cc: jbeulich@novell.com Signed-off-by:
Andi Kleen <ak@suse.de> Signed-off-by:
Linus Torvalds <torvalds@osdl.org>
-
Andi Kleen authored
When we don't know the node a PCI bus is connected to return -1. This matches the generic code. Noticed by Ravikiran G Thirumalai <kiran@scalex86.org> Cc: Ravikiran G Thirumalai <kiran@scalex86.org> Signed-off-by:
Andi Kleen <ak@suse.de> Signed-off-by:
Linus Torvalds <torvalds@osdl.org>
-
Andi Kleen authored
A lot of Opteron BIOS just pass 10 in all SLIT entries (10 is the normalized unit). This is actually worse than the default heuristic because it leads to pci_distance not knowing the difference between local and remote nodes anymore. This messes up some NUMA heuristics in generic code. In this case it's better to fall back to the default heuristic which just does nodea == nodeb ? 10 : 20. This patch does some basic sanity checking on the SLIT and only accepts the SLIT when it passes. Invariants enforced are: - Node to itself shall be 10 - Any other distance shouldn't be 10 - Distances smaller than 10 are illegal Signed-off-by:
Andi Kleen <ak@suse.de> Signed-off-by:
Linus Torvalds <torvalds@osdl.org>
-
Jan Beulich authored
Adjust page fault protection error check before considering it to be a vmalloc synchronization candidate. Signed-off-by:
Andi Kleen <ak@suse.de> Signed-off-by:
Linus Torvalds <torvalds@osdl.org>
-
Jan Beulich authored
This adjusts things so that handlers of the die() notifier will have sufficient information about the trap currently being handled. It also adjusts the notify_die() prototype to (again) match that of i386. Signed-off-by:
Andi Kleen <ak@suse.de> Signed-off-by:
Linus Torvalds <torvalds@osdl.org>
-
- 06 Jan, 2006 2 commits
-
-
Arjan van de Ven authored
x86-64 specific parts to make the .rodata section read only Signed-off-by:
Arjan van de Ven <arjan@infradead.org> Signed-off-by:
Ingo Molnar <mingo@elte.hu> Cc: Andi Kleen <ak@muc.de> Signed-off-by:
Andrew Morton <akpm@osdl.org> Signed-off-by:
Linus Torvalds <torvalds@osdl.org>
-
Arjan van de Ven authored
Bug fix required for the .rodata work on x86-64: when change_page_attr() and friends need to break up a 2Mb page into 4Kb pages, it always set the NX bit on the PMD, which causes the cpu to consider the entire 2Mb region to be NX regardless of the actual PTE perms. This is fine in general, with one big exception: the 2Mb page that covers the last part of the kernel .text! The fix is to not invent a new permission for the new PMD entry, but to just inherit the existing one minus the PSE bit. Signed-off-by:
Arjan van de Ven <arjan@infradead.org> Signed-off-by:
Ingo Molnar <mingo@elte.hu> Cc: Andi Kleen <ak@muc.de> Signed-off-by:
Andrew Morton <akpm@osdl.org> Signed-off-by:
Linus Torvalds <torvalds@osdl.org>
-
- 29 Dec, 2005 1 commit
-
-
Ravikiran G Thirumalai authored
Currently, we do not pass the correct start_pfn to e820_hole_size, to calculate holes. Following patch fixes that. The bug results in incorrect number of node_present_pages for each pgdat and causes ugly output in /sys and probably VM inbalances. Signed-off-by:
Alok N Kataria <alokk@calsoftinc.com> Signed-off-by:
Ravikiran Thirumalai <kiran@scalex86.org> Signed-off-by:
Andi Kleen <ak@suse.de> Sighed-off-by:
Shair Fultheim <shai@scalex86.org> Sighed-off-by:
Linus Torvalds <torvalds@osdl.org>
-
- 15 Dec, 2005 1 commit
-
-
Al Viro authored
Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk> Signed-off-by:
Linus Torvalds <torvalds@osdl.org>
-
- 13 Dec, 2005 2 commits
-
-
Eric Dumazet authored
As reported by Keith Mannthey, there are problems in populate_memnodemap() The bug was that the compute_hash_shift() was returning 31, with incorrect initialization of memnodemap[] To correct the bug, we must use (1UL << shift) instead of (1 << shift) to avoid an integer overflow, and we must check that shift < 64 to avoid an infinite loop. Signed-off-by:
Eric Dumazet <dada1@cosmosbay.com> Signed-off-by:
Andi Kleen <ak@suse.de> Signed-off-by:
Andrew Morton <akpm@osdl.org> Signed-off-by:
Linus Torvalds <torvalds@osdl.org>
-
Andi Kleen authored
It's illegal because it can sleep. Use a two step lookup scheme instead. First look up the vm_struct, then change the direct mapping, then finally unmap it. That's ok because nobody can change the particular virtual address range as long as the vm_struct is still in the global list. Also added some LinuxDoc documentation to iounmap. Signed-off-by:
Andi Kleen <ak@suse.de> Signed-off-by:
Andrew Morton <akpm@osdl.org> Signed-off-by:
Linus Torvalds <torvalds@osdl.org>
-
- 15 Nov, 2005 11 commits
-
-
Bob Picco authored
Fix up booting with sparse mem enabled. Otherwise it would just cause an early PANIC at boot. Signed-off-by:
Bob Picco <bob.picco@hp.com> Signed-off-by:
Andi Kleen <ak@suse.de> Signed-off-by:
Linus Torvalds <torvalds@osdl.org>
-
Andi Kleen authored
CONFIG_CHECKING covered some debugging code used in the early times of the port. But it wasn't even SMP safe for quite some time and the bugs it checked for seem to be gone. This patch removes all the code to verify GS at kernel entry. There haven't been any new bugs in this area for a long time. Previously it also covered the sysctl for the page fault tracing. That didn't make much sense because that code was unconditionally compiled in. I made that a boot option now because it is typically only useful at boot. Signed-off-by:
Andi Kleen <ak@suse.de> Signed-off-by:
Linus Torvalds <torvalds@osdl.org>
-
Magnus Damm authored
The current x86_64 NUMA memory code is inconsequent when it comes to node memory ranges. The exact behaviour varies depending on which config option that is used. setup_node_bootmem() has start and end as arguments and these are used to calculate the size of the node like this: (end - start). This is all fine if end is pointing to the first non-available byte. The problem is that the current x86_64 code sometimes treats it as the last present byte and sometimes as the first non-available byte. The result is that some configurations might lose a page at the end of the range. This patch tries to fix CONFIG_ACPI_NUMA, CONFIG_K8_NUMA and CONFIG_NUMA_EMU so they all treat the end variable as the first non-available byte. This is the same way as the single node code. The patch is boot tested on dual x86_64 hardware with the above configurations, but maybe the removed code is needed as some workaround? Signed-off-by:
Magnus Damm <magnus@valinux.co.jp> Signed-off-by:
Andi Kleen <ak@suse.de> Signed-off-by:
Linus Torvalds <torvalds@osdl.org>
-
Eric Dumazet authored
Compute the highest possible value for memnode_shift, in order to reduce footprint of memnodemap[] to the minimum, thus making all users (phys_to_nid(), kfree()), more cache friendly. Before the patch : Node 0 MemBase 0000000000000000 Limit 00000001ffffffff Node 1 MemBase 0000000200000000 Limit 00000003ffffffff Using 23 for the hash shift. Max adder is 3ffffffff After the patch : Node 0 MemBase 0000000000000000 Limit 00000001ffffffff Node 1 MemBase 0000000200000000 Limit 00000003ffffffff Using 33 for the hash shift. In this case, only 2 bytes of memnodemap[] are used, instead of 2048 Signed-off-by:
Eric Dumazet <dada1@cosmosbay.com> Signed-off-by:
Andi Kleen <ak@suse.de> Signed-off-by:
Linus Torvalds <torvalds@osdl.org>
-
Andi Kleen authored
Minor victory on the continuous quest against all stray extern. Signed-off-by:
Andi Kleen <ak@suse.de> Signed-off-by:
Linus Torvalds <torvalds@osdl.org>
-
Andi Kleen authored
Minor cleanup - remove obsolete extern Signed-off-by:
Andi Kleen <ak@suse.de> Signed-off-by:
Linus Torvalds <torvalds@osdl.org>
-
Andi Kleen authored
Adding __initdata_* to asm-generic/sections.h Replaces a lot of open coded externs in arch/x86_64/* I had to change __bss_end to __bss_stop to match the other architectures. Signed-off-by:
Andi Kleen <ak@suse.de> Signed-off-by:
Linus Torvalds <torvalds@osdl.org>
-
Siddha, Suresh B authored
We should zap the low mappings, as soon as possible, so that we can catch kernel bugs more effectively. Previously early boot had NULL mapped and didn't trap on NULL references. This patch introduces boot_level4_pgt, which will always have low identity addresses mapped. Druing boot, all the processors will use this as their level4 pgt. On BP, we will switch to init_level4_pgt as soon as we enter C code and zap the low mappings as soon as we are done with the usage of identity low mapped addresses. On AP's we will zap the low mappings as soon as we jump to C code. Signed-off-by:
Suresh Siddha <suresh.b.siddha@intel.com> Signed-off-by:
Ashok Raj <ashok.raj@intel.com> Signed-off-by:
Andi Kleen <ak@suse.de> Signed-off-by:
Linus Torvalds <torvalds@osdl.org>
-
Andi Kleen authored
Not go from the CPU number to an mapping array. Mode number is often used now in fast paths. This also adds a generic numa_node_id to all the topology includes Suggested by Eric Dumazet Signed-off-by:
Andi Kleen <ak@suse.de> Signed-off-by:
Linus Torvalds <torvalds@osdl.org>
-
Andi Kleen authored
The VM needs to know about lost memory in zones to accurately balance dirty pages. This patch accounts mem_map in there too, which fixes a constant errror of a few percent. Also some other misc mappings and the kernel text itself are accounted too. Signed-off-by:
Andi Kleen <ak@suse.de> Signed-off-by:
Linus Torvalds <torvalds@osdl.org>
-
Andi Kleen authored
Add a new 4GB GFP_DMA32 zone between the GFP_DMA and GFP_NORMAL zones. As a bit of historical background: when the x86-64 port was originally designed we had some discussion if we should use a 16MB DMA zone like i386 or a 4GB DMA zone like IA64 or both. Both was ruled out at this point because it was in early 2.4 when VM is still quite shakey and had bad troubles even dealing with one DMA zone. We settled on the 16MB DMA zone mainly because we worried about older soundcards and the floppy. But this has always caused problems since then because device drivers had trouble getting enough DMA able memory. These days the VM works much better and the wide use of NUMA has proven it can deal with many zones successfully. So this patch adds both zones. This helps drivers who need a lot of memory below 4GB because their hardware is not accessing more (graphic drivers - proprietary and free ones, video frame buffer drivers, sound drivers etc.). Previously they could only use IOMMU+16MB GFP_DMA, which was not enough memory. Another common problem is that hardware who has full memory addressing for >4GB misses it for some control structures in memory (like transmit rings or other metadata). They tended to allocate memory in the 16MB GFP_DMA or the IOMMU/swiotlb then using pci_alloc_consistent, but that can tie up a lot of precious 16MB GFPDMA/IOMMU/swiotlb memory (even on AMD systems the IOMMU tends to be quite small) especially if you have many devices. With the new zone pci_alloc_consistent can just put this stuff into memory below 4GB which works better. One argument was still if the zone should be 4GB or 2GB. The main motivation for 2GB would be an unnamed not so unpopular hardware raid controller (mostly found in older machines from a particular four letter company) who has a strange 2GB restriction in firmware. But that one works ok with swiotlb/IOMMU anyways, so it doesn't really need GFP_DMA32. I chose 4GB to be compatible with IA64 and because it seems to be the most common restriction. The new zone is so far added only for x86-64. For other architectures who don't set up this new zone nothing changes. Architectures can set a compatibility define in Kconfig CONFIG_DMA_IS_DMA32 that will define GFP_DMA32 as GFP_DMA. Otherwise it's a nop because on 32bit architectures it's normally not needed because GFP_NORMAL (=0) is DMA able enough. One problem is still that GFP_DMA means different things on different architectures. e.g. some drivers used to have #ifdef ia64 use GFP_DMA (trusting it to be 4GB) #elif __x86_64__ (use other hacks like the swiotlb because 16MB is not enough) ... . This was quite ugly and is now obsolete. These should be now converted to use GFP_DMA32 unconditionally. I haven't done this yet. Or best only use pci_alloc_consistent/dma_alloc_coherent which will use GFP_DMA32 transparently. Signed-off-by:
Andi Kleen <ak@suse.de> Signed-off-by:
Linus Torvalds <torvalds@osdl.org>
-
- 30 Oct, 2005 1 commit
-
-
Hugh Dickins authored
First step in pushing down the page_table_lock. init_mm.page_table_lock has been used throughout the architectures (usually for ioremap): not to serialize kernel address space allocation (that's usually vmlist_lock), but because pud_alloc,pmd_alloc,pte_alloc_kernel expect caller holds it. Reverse that: don't lock or unlock init_mm.page_table_lock in any of the architectures; instead rely on pud_alloc,pmd_alloc,pte_alloc_kernel to take and drop it when allocating a new one, to check lest a racing task already did. Similarly no page_table_lock in vmalloc's map_vm_area. Some temporary ugliness in __pud_alloc and __pmd_alloc: since they also handle user mms, which are converted only by a later patch, for now they have to lock differently according to whether or not it's init_mm. If sources get muddled, there's a danger that an arch source taking init_mm.page_table_lock will be mixed with common source also taking it (or neither take it). So break the rules and make another change, which should break the build for such a mismatch: remove the redundant mm arg from pte_alloc_kernel (ppc64 scrapped its distinct ioremap_mm in 2.6.13). Exceptions: arm26 used pte_alloc_kernel on user mm, now pte_alloc_map; ia64 used pte_alloc_map on init_mm, now pte_alloc_kernel; parisc had bad args to pmd_alloc and pte_alloc_kernel in unused USE_HPPA_IOREMAP code; ppc64 map_io_page forgot to unlock on failure; ppc mmu_mapin_ram and ppc64 im_free took page_table_lock for no good reason. Signed-off-by:
Hugh Dickins <hugh@veritas.com> Signed-off-by:
Andrew Morton <akpm@osdl.org> Signed-off-by:
Linus Torvalds <torvalds@osdl.org>
-
- 10 Oct, 2005 1 commit
-
-
Andi Kleen authored
Noticed by Terence Ripperda Undo wrong change in global_flush_tlb. We need to flush the caches in all cases, not just when pages were reverted. This was a bogus optimization added earlier, but it was wrong. Signed-off-by:
Andi Kleen <ak@suse.de> Signed-off-by:
Linus Torvalds <torvalds@osdl.org>
-
- 30 Sep, 2005 2 commits
-
-
Ravikiran G Thirumalai authored
The tests Alok carried out on Petr's box confirmed that cpu_to_node[BP] is not setup early enough by numa_init_array due to the x86_64 changes in 2.6.14-rc*, and unfortunately set wrongly by the work around code in numa_init_array(). cpu_to_node[0] gets set with 1 early and later gets set properly to 0 during identify_cpu() when all cpus are brought up, but confusing the numa slab in the process. Here is a quick fix for this. The right fix obviously is to have cpu_to_node[bsp] setup early for numa_init_array(). The following patch will fix the problem now, and the code can stay on even when cpu_to_node{BP] gets fixed early correctly. Thanks to Petr for access to his box. Signed off by: Ravikiran Thirumalai <kiran@scalex86.org> Signed-off-by:
Alok N Kataria <alokk@calsoftinc.com> Signed-off-by:
Andrew Morton <akpm@osdl.org> Signed-off-by:
Linus Torvalds <torvalds@osdl.org>
-
Ravikiran G Thirumalai authored
Fix the BP node_to_cpumask. 2.6.14-rc* broke the boot cpu bit as the cpu_to_node(0) is now not setup early enough for numa_init_array. cpu_to_node[] is setup much later at srat_detect_node on acpi srat based em64t machines. This seems like a problem on amd machines too, Tested on em64t though. /sys/devices/system/node/node0/cpumap shows up sanely after this patch. Signed off by: Ravikiran Thirumalai <kiran@scalex86.org> Signed-off-by:
Shai Fultheim <shai@scalex86.org> Cc: Andi Kleen <ak@muc.de> Signed-off-by:
Andrew Morton <akpm@osdl.org> Signed-off-by:
Linus Torvalds <torvalds@osdl.org>
-
- 12 Sep, 2005 10 commits
-
-
Andi Kleen authored
The nodes are not set online yet at this point. Signed-off-by:
Andi Kleen <ak@suse.de> Signed-off-by:
Linus Torvalds <torvalds@osdl.org>
-
Andi Kleen authored
Signed-off-by:
Linus Torvalds <torvalds@osdl.org>
-
Andi Kleen authored
Also use for_each_node_mask instead of hand crafted loops. Signed-off-by:
Linus Torvalds <torvalds@osdl.org>
-
Jan Beulich authored
Rather than blindly re-enabling interrupts in oops_end(), save their state in oope_begin() and then restore that state. Signed-off-by:
Jan Beulich <jbeulich@novell.com> Signed-off-by:
Andi Kleen <ak@suse.de> Signed-off-by:
Linus Torvalds <torvalds@osdl.org>
-
Andi Kleen authored
- Report PXMs instead of nodes - Report the correct PXM, not always the one of node 1. - Only warn for the case of a PXM overlapping by itself Signed-off-by:
Andi Kleen <ak@suse.de> Signed-off-by:
Linus Torvalds <torvalds@osdl.org>
-
Andi Kleen authored
- Add KERN_INFO to printks (from i386) - Use longs instead of ints to accumulate pages. - Fix broken indenting. Signed-off-by:
Andi Kleen <ak@suse.de> Signed-off-by:
Linus Torvalds <torvalds@osdl.org>
-
Andi Kleen authored
Since this is shared code I had to implement it for i386 too Signed-off-by:
Andi Kleen <ak@suse.de> Signed-off-by:
Linus Torvalds <torvalds@osdl.org>
-
Andi Kleen authored
The FLATMEM people added it, but there doesn't seem a good reason because end_pfn is identical. Signed-off-by:
Andi Kleen <ak@suse.de> Signed-off-by:
Linus Torvalds <torvalds@osdl.org>
-
Andi Kleen authored
Avoids a very dumb loop Signed-off-by:
Andi Kleen <ak@suse.de> Signed-off-by:
Linus Torvalds <torvalds@osdl.org>
-
Andi Kleen authored
It's already handled in the main swiotlb code. Signed-off-by:
Andi Kleen <ak@suse.de> Signed-off-by:
Linus Torvalds <torvalds@osdl.org>
-