Signal 8
From Pirates@Home
Signal 8 is a Unix software signal, also called SIGFPE, which indicates a floating point exception. In other words, there has been a problem in doing floating point arithmetic.
It may be difficult at first to determine that the cause of an error is a Signal 8 SIGFPE. On Einstein@Home the science application will end with a computation error with error code 38. The error output may contain the message
APP DEBUG: Application caught signal 8.
There are many possible causes of a SIGFPE, and they can either be due to hardware problems or software problems.
The kernel preemptions settings are the likely reason that the Einstein@Home application fails to run correctly on the XO Laptop.
Contents |
Hardware Problems
Floating point errors could be caused by overheating, overclocking, or failure of a component.
To avoid overheating, make sure the fans are working and air is flowing freely out of the computer case. It may help to blow out any dust or "dust bunnies" in the case.
Failure of a component may be indicated if floating point errors happen frequently or regardless of which software is running.
Overclocking changes the speed of the CPU clock to make it run faster than the nominal design speed. To see if overclocking is an issue, change the clock speed voltages back to nominal values (turn off the overclocking) and see if that stops the errors.
Software Problems
The most likely cause of a SIGFPE due to software seems to be a problem with the Linux kernel preemptions settings. Other causes may be possible as well, so first try to rule out kernel preemption as the cause.
Linux Kernel Preemption
Problems have been reported on Einstein@Home with SIGFPE errors (code 38/Signal 8) on Linux computer where the kernel was built with preemption turned on. This seems to affect kernels between version 2.6.20 and probably 2.6.27 (or whatever kernel will get this fix).
How do you know if your kernel was built with preemption turned on? There is a kernel feature which allows the entire configuration used to build the kernel to be viewed. If your kernel has this feature you can inspect the .config file by viewing the file /proc/config.gz. You can select out just the preemption settings with the command
zcat /proc/config.gz | grep PREEMPT
If you do not have /proc/config.gz you may still get a hint from `/proc/sys/kernel/version`. On the XO laptop one finds
% cat /proc/sys/kernel/version #1 PREEMPT Wed Nov 21 00:39:06 EST 2007
which seems to indicate that CONFIG_PREEMPT=y was set when the kernel was built.
If you build your own kernel you can view the settings from the .config file with the command `grep PREEMPT .config`, which should produce something like:
#CONFIG_PREEMPT_NONE is not set CONFIG_PREEMPT_VOLUNTARY=y #CONFIG_PREEMPT is not set
Either turn off preemption enitirely, or use voluntary preemption. These are set in the kernel configuration dialogue under Processor type and features -> Preemption Model. You should not select the low-latency preemptable kernel option.
See Also
- Computation errors since months (Einstein@Home forums)
- Einstein@Home on OLPC (Einstein@Home forums)
- CONFIG_PREEMPT causes corruption of application's FPU stack
