| By Philip Copeland | Article Rating: |
|
| October 20, 2006 03:00 PM EDT | Reads: |
8,176 |
Surely with its army of kernel hackers and proofreaders, the Linux kernel should be close to perfect. The reality is that most kernel hackers focus on only a piece of the kernel - VM, disk I/O, a particular driver, or some part of the network stack - and mistakes are made through oversight or because a certain condition wasn't given its due.
Everyone also has slightly different hardware and support chipsets, so it's not feasible for kernel writers to test on the entire range of available hardware. As a result, kernel writers depend heavily on accurate bug reports sent back upstream after mini-releases.
Because Linux is distributed as highly configurable source code, it needs to be tested with many different options to find combinations that stimulate bugs. Many operating system vendors only certify their product for a limited set of base options and hardware setups.
On the other hand, Linux supports a number of different instruction set architectures, different CPU variations for a given architecture, even mainframe processors (e.g., the S/390). Generally, developers don't own or have the resources to test on all of these different machine types. Furthermore, Linux supports a very large number of hardware devices, and there may be conflicts between different devices that can only be uncovered by widespread use by the public. As an interface between user applications and hardware, the kernel also needs to be tested by running multiple user-mode programs on it. This enables the testing of many combinations of system calls and other loads on the system.
Failure of a kernel in a production system usually has a greater impact than the failure of a user-mode program. If the kernel is flaky, machines can crash, the file system can get corrupted, and users can lose data and the use of their machine until they either reboot or the problem is resolved.
Worse, a buggy kernel can cause incorrect functioning of an otherwise reliable program. This kind of bug is insidious and can be very frustrating to track down.
There are also several things not directly related to the kernel than can go wrong, the most common being mismatched modutils for module loading. Testing kernels is almost always a significant undertaking; however, the reward is a better kernel. (Note: You should report bugs to the linux-kernel mailing list at linux-kernel@vger.kernel.org.)
Given a kernel, you want to test for the kinds of bugs that you should be on the lookout for in your code. You'll need to be on guard against heap corruption, buffer overflows, race conditions, failure to protect a critical memory region, and missing '=', '+' or '-' in comparison tests.
The most straightforward and methodical way to test this is by writing test tools that try out all of the different system calls, vary their parameters over the acceptable ranges, and ensure that the results returned are also within the documented range. You should also try making system calls with illegal parameters to ensure that an appropriate error code is returned.
There are some test tools available for Linux and the most significant is probably the Linux test project suite of programs at http://ltp.sourceforge.net/, which incorporates a comprehensive list of tools commonly used for testing the various Linux components. Most are concerned with getting expected results to known datasets or speed benchmarking of various routines. Nevertheless, these are important tests; however, you should be more focused on writing your own unique code for which no test has been written yet.
The main difficulty with testing for a reported bug is setting up an environment under which the bug can be exposed. Many tests try to artificially generate abnormal situations like heavy system load or heavy memory pressure, which don't represent a realistic situation. Quite a few bugs are found when people start using more extreme hardware configurations, such as 4GB+ of RAM, multiple network interfaces (three or four 4-port Ethernet cards), multiple Fiber Channel cards in a multi-initiator configuration, etc. These tend to be setups that aren't generally tested because they're not commonly found in a developer's environment.
Writing Tests
Let's assume a bug in your kernel code has been reported, which we determine must be due to a heavy system load under memory pressure (i.e., there are several big processes running). We also suspect that somewhere in the kernel or in our code an assumption about memory allocation completing without checking the return value, or something similar, has occurred. It's something that hasn't shown up in any normal testing even though there have been three obscure reports about it. Needless to say, there's concern that something potentially harmful is happening.
Test 1: Memory Management System
Problems can arise from artifacts in the kernel memory management system. One well-known situation is memory fragmentation. Linux kernel memory management attempts to allocate memory requests in continuous groups, but where it can't, the memory ends up being fragmented. Occasionally, if an application makes a request for another chunk of memory, it may not be able to allocate it because there isn't room for a continuous memory allocation.
xxxx____xxxx_____xxxxxxxx________xxxxx____xxxx
(This is an overly simplistic model of the reality)
In the representation above, there are 3x4 page gaps, and one 8-page gap, but if 2x8 page-sized requests come in, only one will succeed despite there being 12 pages of memory left. This may leave the developer scratching her head wondering why the system is reporting sufficient memory but the application is experiencing memory starvation/out-of-memory problems. It's possible to cat ???????? /proc/buddyinfo to see how fragmented the currently free pages are. For example, a system that's been up and running for a while may look like this:
Node 0, zone Normal 2892 3014 65 23 1 0 0
1 1 1 0
(steps up in 1,2,4,8,16,...)
On this system, 2,892 single pages of memory could be allocated immediately, 3,014 pairs exist, but only 65 groups of four pages can be found. If something comes along that needs a lot of higher-order allocations, the available memory will be exhausted quickly, and those allocations may start to fail even though in real terms there's sufficient memory.
That describes what might be happening, but from a debugging point-of-view how can this situation be mimicked? One way is to just play out a normal run of whatever the system does and hope the error happens again. Unfortunately, this could take weeks. Instead, something to artificially generate the fragmentation is needed. This isn't easily done from a userspace program either, so it will be necessary to turn to some kernel module programming to generate the situation.
http://oss.oracle.com/projects/codefragments/src/trunk/fragment-slab/
When run, this module can artificially starve the system of contiguous pairing and provides a means to test an application or kernel code in a memory-hostile environment.
Test 2: Memory Pressure
Another persistent problem is a situation known as memory pressure, which often leads to a system slowing to a dreadful crawl or, worse, locking up under load. One of the fundamental sources for memory pressure is the file-system page-cache usage, along with the buffer_head entries that control them.
Another problem area are inode and dentry cache entries in the slab cache. Linux struggles to keep both of these under control. User space processes provide another obvious source of memory use. These are partially handled by the OOM killer subsystem. This is all further complicated by the swap process, which, in an effort to increase available memory, is used by the system to kick out items considered to be 'idle' processes or process memory. One of the easiest ways of generating this situation is to simply reboot the box using the mem= boot time option to reduce the amount of memory available. Since one of the main sources is file-system page-cache use, a kernel module can be written to artificially generate a load through allocations via the kernel buffer_head_allocation/deallocation routines.
http://oss.oracle.com/projects/codefragments/src/trunk/bufferheads
It's possible to artificially increase or decrease the number of buffers used by plugging in values on the command line when the module is loaded or by modifying the values via the /proc interface. These modules aren't complex. The most complex thing about them is the ability to dynamically change values on-the-fly without having to unload/reload the modules.
The main advantage of using a kernel module to create the test environment for memory fragmentation is that we can create a fairly realistic environment in a few seconds as opposed to having to wait days or weeks for the same situation to occur naturally. Trying to recreate the same situation using a normal user land ????? program would be thwarted by the kernel's memory management system trying to stack all the requests in a manner that defragments the system on-the-fly.
In the second example, the main advantage is that we can do in a few seconds what would take quite a while in user land ????? (Have you ever tried to create a million files on an FS before? It takes a VERY long time). So while it's not impossible to create the environments from user land ????? programs, the kernel module option offers a means to create extreme conditions quickly and at a low cost.
These are only two examples relating to memory use. This could easily be extended into creating artificially hostile network environments or a crippled IO system. You simply need to know in advance what type of conditions you want to create for your application to run in.
Published October 20, 2006 Reads 8,176
Copyright © 2006 Ulitzer, Inc. — All Rights Reserved.
Syndicated stories and blog feeds, all rights reserved by the author.
More Stories By Philip Copeland
Philip Copeland, senior software developer in Oracle’s Linux Engineering group, has been working with open source software for more than 10 years.
















Ulitzer content is offered under Creative Commons "Attribution Non-Commercial No Derivatives" License.
For any reuse or distribution, you must make clear to others the license terms of this work.
The best way to do this is with a link to this web page.
Any of the above conditions can be waived if you get written permission from Ulitzer, Inc.
Nothing in this license impairs or restricts the author's moral rights.