Commit 6d2160bf authored by Felix Blyakher's avatar Felix Blyakher
Browse files

Merge branch 'master' of...

Merge branch 'master' of git:// into for-linus
parents f0e0059b b1792e36
......@@ -33,10 +33,12 @@ o Gnu make 3.79.1 # make --version
o binutils 2.12 # ld -v
o util-linux 2.10o # fdformat --version
o module-init-tools 0.9.10 # depmod -V
o e2fsprogs 1.29 # tune2fs
o e2fsprogs 1.41.4 # e2fsck -V
o jfsutils 1.1.3 # fsck.jfs -V
o reiserfsprogs 3.6.3 # reiserfsck -V 2>&1|grep reiserfsprogs
o xfsprogs 2.6.0 # xfs_db -V
o squashfs-tools 4.0 # mksquashfs -version
o btrfs-progs 0.18 # btrfsck
o pcmciautils 004 # pccardctl -V
o quota-tools 3.09 # quota -V
o PPP 2.4.0 # pppd --version
......@@ -483,17 +483,25 @@ values. To do the latter, you can stick the following in your .emacs file:
(* (max steps 1)
(add-hook 'c-mode-common-hook
(lambda ()
;; Add kernel style
'("linux" (c-offsets-alist
(add-hook 'c-mode-hook
(lambda ()
(let ((filename (buffer-file-name)))
;; Enable kernel mode for the appropriate files
(when (and filename
(string-match "~/src/linux-trees" filename))
(string-match (expand-file-name "~/src/linux-trees")
(setq indent-tabs-mode t)
(c-set-style "linux")
(c-set-offset 'arglist-cont-nonempty
(c-set-style "linux-tabs-only")))))
This will make emacs go better with the kernel coding style for C
files below ~/src/linux-trees.
......@@ -5,7 +5,7 @@
This document describes the DMA API. For a more gentle introduction
phrased in terms of the pci_ equivalents (and actual examples) see
This API is split into two pieces. Part I describes the API and the
corresponding pci_ API. Part II describes the extensions to the API
......@@ -41,6 +41,12 @@ GPL version 2.
<revremark>Added generic platform drivers and offset attribute.</revremark>
......@@ -312,6 +318,16 @@ interested in translating it, please email me
pointed to by addr.
<filename>offset</filename>: The offset, in bytes, that has to be
added to the pointer returned by <function>mmap()</function> to get
to the actual device memory. This is important if the device's memory
is not page aligned. Remember that pointers returned by
<function>mmap()</function> are always page aligned, so it is good
style to always add this offset.
......@@ -594,6 +610,78 @@ framework to set up sysfs files for this region. Simply leave it alone.
<sect1 id="using_uio_pdrv">
<title>Using uio_pdrv for platform devices</title>
In many cases, UIO drivers for platform devices can be handled in a
generic way. In the same place where you define your
<varname>struct platform_device</varname>, you simply also implement
your interrupt handler and fill your
<varname>struct uio_info</varname>. A pointer to this
<varname>struct uio_info</varname> is then used as
<varname>platform_data</varname> for your platform device.
You also need to set up an array of <varname>struct resource</varname>
containing addresses and sizes of your memory mappings. This
information is passed to the driver using the
<varname>.resource</varname> and <varname>.num_resources</varname>
elements of <varname>struct platform_device</varname>.
You now have to set the <varname>.name</varname> element of
<varname>struct platform_device</varname> to
<varname>"uio_pdrv"</varname> to use the generic UIO platform device
driver. This driver will fill the <varname>mem[]</varname> array
according to the resources given, and register the device.
The advantage of this approach is that you only have to edit a file
you need to edit anyway. You do not have to create an extra driver.
<sect1 id="using_uio_pdrv_genirq">
<title>Using uio_pdrv_genirq for platform devices</title>
Especially in embedded devices, you frequently find chips where the
irq pin is tied to its own dedicated interrupt line. In such cases,
where you can be really sure the interrupt is not shared, we can take
the concept of <varname>uio_pdrv</varname> one step further and use a
generic interrupt handler. That's what
<varname>uio_pdrv_genirq</varname> does.
The setup for this driver is the same as described above for
<varname>uio_pdrv</varname>, except that you do not implement an
interrupt handler. The <varname>.handler</varname> element of
<varname>struct uio_info</varname> must remain
<varname>NULL</varname>. The <varname>.irq_flags</varname> element
must not contain <varname>IRQF_SHARED</varname>.
You will set the <varname>.name</varname> element of
<varname>struct platform_device</varname> to
<varname>"uio_pdrv_genirq"</varname> to use this driver.
The generic interrupt handler of <varname>uio_pdrv_genirq</varname>
will simply disable the interrupt line using
<function>disable_irq_nosync()</function>. After doing its work,
userspace can reenable the interrupt by writing 0x00000001 to the UIO
device file. The driver already implements an
<function>irq_control()</function> to make this possible, you must not
implement your own.
Using <varname>uio_pdrv_genirq</varname> not only saves a few lines of
interrupt handler code. You also do not need to know anything about
the chip's internal registers to create the kernel part of the driver.
All you need to know is the irq number of the pin the chip is
connected to.
<chapter id="userspace_driver" xreflabel="Writing a driver in user space">
[ NOTE: The virt_to_bus() and bus_to_virt() functions have been
superseded by the functionality provided by the PCI DMA
interface (see Documentation/DMA-mapping.txt). They continue
superseded by the functionality provided by the PCI DMA interface
(see Documentation/PCI/PCI-DMA-mapping.txt). They continue
to be documented below for historical purposes, but new code
must not use them. --davidm 00/12/12 ]
......@@ -186,8 +186,9 @@ a virtual address mapping (unlike the earlier scheme of virtual address
do not have a corresponding kernel virtual address space mapping) and
low-memory pages.
Note: Please refer to DMA-mapping.txt for a discussion on PCI high mem DMA
aspects and mapping of scatter gather lists, and support for 64 bit PCI.
Note: Please refer to Documentation/PCI/PCI-DMA-mapping.txt for a discussion
on PCI high mem DMA aspects and mapping of scatter gather lists, and support
for 64 bit PCI.
Special handling is required only for cases where i/o needs to happen on
pages at physical memory addresses beyond what the device can support. In these
......@@ -953,14 +954,14 @@ elevator_allow_merge_fn called whenever the block layer determines
results in some sort of conflict internally,
this hook allows it to do that.
elevator_dispatch_fn fills the dispatch queue with ready requests.
elevator_dispatch_fn* fills the dispatch queue with ready requests.
I/O schedulers are free to postpone requests by
not filling the dispatch queue unless @force
is non-zero. Once dispatched, I/O schedulers
are not allowed to manipulate the requests -
they belong to generic dispatch queue.
elevator_add_req_fn called to add a new request into the scheduler
elevator_add_req_fn* called to add a new request into the scheduler
elevator_queue_empty_fn returns true if the merge queue is empty.
Drivers shouldn't use this, but rather check
......@@ -990,7 +991,7 @@ elevator_activate_req_fn Called when device driver first sees a request.
elevator_deactivate_req_fn Called when device driver decides to delay
a request by requeueing it.
elevator_exit_fn Allocate and free any elevator specific storage
for a queue.
Queue sysfs files
This text file will detail the queue files that are located in the sysfs tree
for each block device. Note that stacked devices typically do not export
any settings, since their queue merely functions are a remapping target.
These files are the ones found in the /sys/block/xxx/queue/ directory.
Files denoted with a RO postfix are readonly and the RW postfix means
hw_sector_size (RO)
This is the hardware sector size of the device, in bytes.
max_hw_sectors_kb (RO)
This is the maximum number of kilobytes supported in a single data transfer.
max_sectors_kb (RW)
This is the maximum number of kilobytes that the block layer will allow
for a filesystem request. Must be smaller than or equal to the maximum
size allowed by the hardware.
nomerges (RW)
This enables the user to disable the lookup logic involved with IO merging
requests in the block layer. Merging may still occur through a direct
1-hit cache, since that comes for (almost) free. The IO scheduler will not
waste cycles doing tree/hash lookups for merges if nomerges is 1. Defaults
to 0, enabling all merges.
nr_requests (RW)
This controls how many requests may be allocated in the block layer for
read or write requests. Note that the total allocated number may be twice
this amount, since it applies only to reads or writes (not the accumulated
read_ahead_kb (RW)
Maximum number of kilobytes to read-ahead for filesystems on this block
rq_affinity (RW)
If this option is enabled, the block layer will migrate request completions
to the CPU that originally submitted the request. For some workloads
this provides a significant reduction in CPU cycles due to caching effects.
scheduler (RW)
When read, this file will display the current and available IO schedulers
for this block device. The currently active IO scheduler will be enclosed
in [] brackets. Writing an IO scheduler name to this file will switch
control of this block device to that new IO scheduler. Note that writing
an IO scheduler name to this file will attempt to load that IO scheduler
module, if it isn't already present in the system.
Jens Axboe <>, February 2009
Memory Resource Controller(Memcg) Implementation Memo.
Last Updated: 2008/12/15
Base Kernel Version: based on 2.6.28-rc8-mm.
Last Updated: 2009/1/19
Base Kernel Version: based on 2.6.29-rc2.
Because VM is getting complex (one of reasons is memcg...), memcg's behavior
is complex. This is a document for memcg's internal behavior.
......@@ -340,3 +340,23 @@ Under below explanation, we assume CONFIG_MEM_RES_CTRL_SWAP=y.
# mount -t cgroup none /cgroup -t cpuset,memory,cpu,devices
and do task move, mkdir, rmdir etc...under this.
9.7 swapoff.
Besides management of swap is one of complicated parts of memcg,
call path of swap-in at swapoff is not same as usual swap-in path..
It's worth to be tested explicitly.
For example, test like following is good.
# mount -t cgroup none /cgroup -t memory
# mkdir /cgroup/test
# echo 40M > /cgroup/test/memory.limit_in_bytes
# echo 0 > /cgroup/test/tasks
Run malloc(100M) program under this. You'll see 60M of swaps.
# move all tasks in /cgroup/test to /cgroup
# /sbin/swapoff -a
# rmdir /test/cgroup
# kill malloc task.
Of course, tmpfs v.s. swapoff test should be tested, too.
......@@ -251,7 +251,7 @@ NFS/RDMA Setup
Instruct the server to listen on the RDMA transport:
$ echo rdma 2050 > /proc/fs/nfsd/portlist
$ echo rdma 20049 > /proc/fs/nfsd/portlist
- On the client system
......@@ -263,7 +263,7 @@ NFS/RDMA Setup
Regardless of how the client was built (module or built-in), use this
command to mount the NFS/RDMA server:
$ mount -o rdma,port=2050 <IPoIB-server-name-or-address>:/<export> /mnt
$ mount -o rdma,port=20049 <IPoIB-server-name-or-address>:/<export> /mnt
To verify that the mount is using RDMA, run "cat /proc/mounts" and check
the "proto" field for the given mount.
......@@ -2027,6 +2027,34 @@ increase the likelihood of this process being killed by the oom-killer. Valid
values are in the range -16 to +15, plus the special value -17, which disables
oom-killing altogether for this process.
The process to be killed in an out-of-memory situation is selected among all others
based on its badness score. This value equals the original memory size of the process
and is then updated according to its CPU time (utime + stime) and the
run time (uptime - start time). The longer it runs the smaller is the score.
Badness score is divided by the square root of the CPU time and then by
the double square root of the run time.
Swapped out tasks are killed first. Half of each child's memory size is added to
the parent's score if they do not share the same memory. Thus forking servers
are the prime candidates to be killed. Having only one 'hungry' child will make
parent less preferable than the child.
/proc/<pid>/oom_score shows process' current badness score.
The following heuristics are then applied:
* if the task was reniced, its score doubles
* superuser or direct hardware access tasks (CAP_SYS_ADMIN, CAP_SYS_RESOURCE
or CAP_SYS_RAWIO) have their score divided by 4
* if oom condition happened in one cpuset and checked task does not belong
to it, its score is divided by 8
* the resulting score is multiplied by two to the power of oom_adj, i.e.
points <<= oom_adj when it is positive and
points >>= -(oom_adj) otherwise
The task with the highest badness score is then selected and its children
are killed, process itself will be killed in an OOM situation when it does
not have children or some of them disabled oom like described above.
2.13 /proc/<pid>/oom_score - Display current oom-killer score
......@@ -12,11 +12,11 @@ file at first.
翻訳団体: JF プロジェクト < >
翻訳日: 2007/12/30
翻訳日: 2009/1/14
翻訳者: Tsugikazu Shibata <tshibata at ab dot jp dot nec dot com>
校正者: 武井伸光さん、<takei at webmasters dot gr dot jp>
かねこさん (Seiji Kaneko) <skaneko at a2 dot mbn dot or dot jp>
......@@ -38,12 +38,15 @@ linux-2.6.24/Documentation/stable_kernel_rules.txt
- ビルドエラー(CONFIG_BROKENになっているものを除く), oops, ハング、デー
タ破壊、現実のセキュリティ問題、その他 "ああ、これはダメだね"という
- 新しい device ID とクオークも受け入れられる。
- どのように競合状態が発生するかの説明も一緒に書かれていない限り、
- いかなる些細な修正も含めることはできない。(スペルの修正、空白のクリー
- 対応するサブシステムメンテナが受け入れたものでなければならない。
- Documentation/SubmittingPatches の規則に従ったものでなければならない。
- パッチ自体か同等の修正が Linus のツリーに既に存在しなければならない。
  Linus のツリーでのコミットID を -stable へのパッチ投稿の際に引用す
-stable ツリーにパッチを送付する手続き-
......@@ -52,8 +55,10 @@ linux-2.6.24/Documentation/stable_kernel_rules.txt
- 送信者はパッチがキューに受け付けられた際には ACK を、却下された場合
には NAK を受け取る。この反応は開発者たちのスケジュールによって、数
- もし受け取られたら、パッチは他の開発者たちのレビューのために
-stable キューに追加される。
- もし受け取られたら、パッチは他の開発者たちと関連するサブシステムの
メンテナーによるレビューのために -stable キューに追加される。
- パッチに のアドレスが付加されているときには、それ
が Linus のツリーに入る時に自動的に stable チームに email される。
- セキュリティパッチはこのエイリアス ( に送られるべ
きではなく、代わりに のアドレスに送られる。
......@@ -3,7 +3,7 @@ Environment variables
Additional options to pass when preprocessing. The preprocessing options
will be used in all cases where kbuild do preprocessing including
will be used in all cases where kbuild does preprocessing including
building C files and assembler files.
......@@ -16,7 +16,7 @@ Additional options to the C compiler.
Set the kbuild verbosity. Can be assinged same values as "V=...".
Set the kbuild verbosity. Can be assigned same values as "V=...".
See make help for the full list.
Setting "V=..." takes precedence over KBUILD_VERBOSE.
......@@ -35,14 +35,14 @@ KBUILD_OUTPUT
Specify the output directory when building the kernel.
The output directory can also be specificed using "O=...".
Setting "O=..." takes precedence over KBUILD_OUTPUT
Setting "O=..." takes precedence over KBUILD_OUTPUT.
Set ARCH to the architecture to be built.
In most cases the name of the architecture is the same as the
directory name found in the arch/ directory.
But some architectures suach as x86 and sparc has aliases.
But some architectures such as x86 and sparc have aliases.
x86: i386 for 32 bit, x86_64 for 64 bit
sparc: sparc for 32 bit, sparc64 for 64 bit
......@@ -63,7 +63,7 @@ CF is often used on the command-line like this:
INSTALL_PATH specifies where to place the updated kernel and system map
images. Default is /boot, but you can set it to other values
images. Default is /boot, but you can set it to other values.
......@@ -90,7 +90,7 @@ INSTALL_MOD_STRIP will used as the options to the strip command.
INSTALL_FW_PATH specify where to install the firmware blobs.
INSTALL_FW_PATH specifies where to install the firmware blobs.
The default value is:
......@@ -99,7 +99,7 @@ The value can be overridden in which case the default value is ignored.
INSTALL_HDR_PATH specify where to install user space headers when
INSTALL_HDR_PATH specifies where to install user space headers when
executing "make headers_*".
The default value is:
......@@ -112,22 +112,23 @@ The value can be overridden in which case the default value is ignored.
KBUILD_MODPOST_WARN can be set to avoid error out in case of undefined
symbols in the final module linking stage.
KBUILD_MODPOST_WARN can be set to avoid errors in case of undefined
symbols in the final module linking stage. It changes such errors
into warnings.
KBUILD_MODPOST_NOFINAL can be set to skip the final link of modules.
This is solely usefull to speed up test compiles.
This is solely useful to speed up test compiles.
For modules use symbols from another modules.
For modules that use symbols from other modules.
See more details in modules.txt.
For tags/TAGS/cscope targets, you can specify more than one archs
to be included in the databases, separated by blankspace. e.g.
For tags/TAGS/cscope targets, you can specify more than one arch
to be included in the databases, separated by blank space. E.g.:
$ make ALLSOURCE_ARCHS="x86 mips arm" tags
......@@ -577,9 +577,6 @@ and is between 256 and 4096 characters. It is defined in the file
a memory unit (amount[KMG]). See also
Documentation/kdump/kdump.txt for a example.
cs4232= [HW,OSS]
Format: <io>,<irq>,<dma>,<dma2>,<mpuio>,<mpuirq>
cs89x0_dma= [HW,NET]
Format: <dma>
......@@ -732,10 +729,6 @@ and is between 256 and 4096 characters. It is defined in the file
Default value is 0.
Value can be changed at runtime via /selinux/enforce.
es1371= [HW,OSS]
Format: <spdif>,[<nomix>,[<amplifier>]]
See also header of sound/oss/es1371.c.
ether= [HW,NET] Ethernet cards parameters
This option is obsoleted by the "netdev=" option, which
has equivalent usage. See its documentation for details.
# This creates the demonstration utility "lguest" which runs a Linux guest.
CFLAGS:=-Wall -Wmissing-declarations -Wmissing-prototypes -O3 -I../../include -I../../arch/x86/include
CFLAGS:=-Wall -Wmissing-declarations -Wmissing-prototypes -O3 -I../../include -I../../arch/x86/include -U_FORTIFY_SOURCE
all: lguest
......@@ -2,13 +2,13 @@
IP-aliases are additional IP-addresses/masks hooked up to a base
interface by adding a colon and a string when running ifconfig.
This string is usually numeric, but this is not a must.
IP-Aliases are avail if CONFIG_INET (`standard' IPv4 networking)
is configured in the kernel.
IP-aliases are an obsolete way to manage multiple IP-addresses/masks
per interface. Newer tools such as iproute2 support multiple
address/prefixes per interface, but aliases are still supported
for backwards compatibility.
An alias is formed by adding a colon and a string when running ifconfig.
This string is usually numeric, but this is not a must.
o Alias creation.
Alias creation is done by 'magic' interface naming: eg. to create a
......@@ -38,16 +38,3 @@ o Relationship with main device
If the base device is shut down the added aliases will be deleted
Please finger or e-mail me:
Juan Jose Ciarlante <>
Updated by Erik Schoenfelder <schoenfr@gaertner.DE>
; local variables:
; mode: indented-text
; mode: auto-fill
; end:
......@@ -51,7 +51,8 @@ Built-in netconsole starts immediately after the TCP stack is
initialized and attempts to bring up the supplied dev at the supplied
The remote host can run either 'netcat -u -l -p <port>' or syslogd.
The remote host can run either 'netcat -u -l -p <port>',
'nc -l -u <port>' or syslogd.
Dynamic reconfiguration:
MPC5200 Device Tree Bindings
(c) 2006-2009 Secret Lab Technologies Ltd
Grant Likely <>
Naming conventions
For mpc5200 on-chip devices, the format for each compatible value is
<chip>-<device>[-<mode>]. The OS should be able to match a device driver
to the device based solely on the compatible value. If two drivers
match on the compatible list; the 'most compatible' driver should be
The split between the MPC5200 and the MPC5200B leaves a bit of a
conundrum. How should the compatible property be set up to provide
maximum compatibility information; but still accurately describe the
chip? For the MPC5200; the answer is easy. Most of the SoC devices
originally appeared on the MPC5200. Since they didn't exist anywhere
else; the 5200 compatible properties will contain only one item;
The 5200B is almost the same as the 5200, but not quite. It fixes
silicon bugs and it adds a small number of enhancements. Most of the
devices either provide exactly the same interface as on the 5200. A few
devices have extra functions but still have a backwards compatible mode.
To express this information as completely as possible, 5200B device trees
should have two items in the compatible list:
compatible = "fsl,mpc5200b-<device>","fsl,mpc5200-<device>";
It is *strongly* recommended that 5200B device trees follow this convention
(instead of only listing the base mpc5200 item).
ie. ethernet on mpc5200: compatible = "fsl,mpc5200-fec";
ethernet on mpc5200b: compatible = "fsl,mpc5200b-fec", "fsl,mpc5200-fec";
Modal devices, like PSCs, also append the configured function to the
end of the compatible field. ie. A PSC in i2s mode would specify
"fsl,mpc5200-psc-i2s", not "fsl,mpc5200-i2s". This convention is chosen to
avoid naming conflicts with non-psc devices providing the same
function. For example, "fsl,mpc5200-spi" and "fsl,mpc5200-psc-spi" describe
the mpc5200 simple spi device and a PSC spi mode respectively.
At the time of writing, exact chip may be either 'fsl,mpc5200' or
The soc node
This node describes the on chip SOC peripherals. Every mpc5200 based
board will have this node, and as such there is a common naming
convention for SOC devices.
Required properties:
name description
---- -----------
ranges Memory range of the internal memory mapped registers.
Should be <0 [baseaddr] 0xc000>
reg Should be <[baseaddr] 0x100>
compatible mpc5200: "fsl,mpc5200-immr"
mpc5200b: "fsl,mpc5200b-immr"
system-frequency 'fsystem' frequency in Hz; XLB, IPB, USB and PCI
clocks are derived from the fsystem clock.
bus-frequency IPB bus frequency in Hz. Clock rate
used by most of the soc devices.