In Memory Core Dump
Authors: Dave Winchell
Jeff Moyer (mainly dave)
Josh Huber (again, mainly dave)
Copyright (C) 1999,2000 Mission Critical Linux, Inc.
Special thanks go out to Werner Almesberger for his work on bootimg, which
allows for the booting of a linux system without leaving protected mode.
Without this bit of code, the in-memory core dump facility would not work for
the intel platform. Thanks a bunch, Werner!
(Instructions for installing can be found in the INSTALL file in
this directory.)
This crash dump does not rely on the disk subsystems during the crash.
Instead, the crash is saved in memory at crash time and then saved to a
file on the subsequent boot.
The save at crash time is accomplished by selecting pages that are not
[free, user anon, user shared, file page cache] and compressing them into
pages that are above a certain address, a certain distance from the end of
memory, not locked, and are members of [free, user anon, user shared, file
page cache]. A reboot is then requested with the option to preserve
memory. Early in the boot process, the pages containing the dump are
marked "reserved." Later in the boot process, they are written to a file
and freed. On a 510M machine the size of the compressed dump is 5M to 25M
with little system activity. The crash file can be found in /var/dumps.
The sysrq c will produce a crash, useful for a hung system. It does not
work if the system is stuck at interrupt level.
By default when a kernel Oops is encountered, the machine will no longer
panic as it has in the past. If you would like it to, to gain debugging
data, you can enable it through sysctl:
echo 1 > /proc/sys/kernel/panic_on_oops
-or-
sysctl -w kernel/panic_on_oops=1
NOTES ON BOOTING (Intel only)
-----------------------------
Most Intel BIOSs clear memory on reboot, so we decided to use Bootimg to
reboot the machine to a new kernel without doing an actual machine
restart. There are a couple of issues with this method:
1) The kernel needs to be loaded before the crash occurs (obviously) which
uses a little extra memory. It also prevents taking crash dumps before
the new kernel is loaded, preventing dumps from panics early on kernel
boot. Loading the kernel from an initrd would be a solution to this
problem.
2) Because we don't reboot through the BIOS, the video doesn't get reset.
This does cause problems if you panic while in X. Sometimes the machine
reboots (messing up the console), sometimes it just hangs. We're working
on a solution for this, but don't have one yet.
After rebooting the machine, the system may be in an unstable state.
Because of this, we have the init script start before any other services,
save the core, and perform a clean restart. This slightly increases the
core save time, but not by a significant amount.
There do exist Intel machines which don't clear memory on reboot, and if
you would like to see if yours is one of them, set the kernel to reboot
with the parameter "reboot=w". This tells the kernel to do a warm reboot,
instructing the BIOS to skip the memory check. Sometimes this works,
most of the time it doesn't. When you try this, be sure to disable the
'Reboot via bootimg' option in the kernel.
===========================================================================
These are some cryptic notes on what files changed and why (this is in
reference to the 2.3/2.4 patch)
This is only for the curious.
./init/main.c
Here we call crash_init, which reserved memory needed at panic time.
./kernel/panic.c
Panic() call modified to call our core dumping mechanism. We also
_never_ sync the disks at panic time as this can lead to problems
not only with deadlocks, but with data corruption.
./kernel/crash.c
New file. This is the core of the coredumping code.
./kernel/bootimg.c
New file. This is the code for rebooting Intel machines without going
through the BIOS.
./kernel/zlib.c
./kernel/zlib.h
Stolen code. We need to find a pretty way not to duplicate code.
When we do, this will go away.
./mm/page_alloc.c
./mm/memory.c
Modified to set and clear bits on memory allocations and frees with
regard to pages we want to save during a core dump.
./include/linux/crash.h
Obvious.
./include/linux/bootimg.h
Arch. independant include file for bootimg.
./include/linux/mm.h
Added the flags referred to above for memory alloc/free.
./include/asm-{alpha,ppc,i386}/crash.h
Architecture specific header file for crash.
./include/asm-i386/bootimg.h
Architecture specific header file for bootimg. Only works on Intel
for the moment.
./kernel/Makefile
./Makefile
./arch/i386/kernel/Makefile
./arch/alpha/kernel/Makefile
Obvious.
./arch/i386/kernel/crash.c
./arch/alpha/kernel/crash.c
Obvious.
./arch/alpha/kernel/process.c
Added LINUX_REBOOT_CMD_COREDUMP flag to the reboot code as
we do things in a different order when taking a dump than on a
normal reboot.
./arch/alpha/kernel/traps.c
In die_if_kernel() add call to panic()
(do_page_fault -> die_if_kernel -> panic)
./drivers/char/sysrq.c
Sysrq c calls panic to save dump.
./arch/i386/kernel/process.c
Machine_restart calls bootimg instead of going through the normal
boot path. This is how we get away with not clearing memory.
./arch/i386/kernel/smp.c
stop_this_cpu no longer static. We call this from the
crash_halt_or_reboot function.
./arch/i386/kernel/traps.c
In show_registers() add call to panic().
(do_page_fault -> die -> show_registers -> panic)
Use of this website and the materials within are subject to
the Terms and Conditions - any other use is strictly
prohibited
|