All about Linux signals - Linux Programming Blog

ajross · on April 2, 2012

Arrgh. How does an article titled "All about Linux signals" not even mention the signalfd() syscall once? Folks: if you're on Linux and need to use signals (i.e. you're reading this article for any practical reason), you need to be using signalfd(). It's not portable to other OSes, but it solves pretty much all the reentrancy and synchronization issues discussed both here and in the article in a very clean way. You get a file descriptor on which you can block cleanly (c.f. select/poll/epoll) and from which you read event notifications on received signals.

Seriously, use it. And OP: please update the blog post to at least point people at it. There's no excuse on a modern linux system for trying to use anything else.

halayli · on April 3, 2012

http://www.linuxprogrammingblog.com/all-about-linux-signals?...

http://www.linuxprogrammingblog.com/code-examples/signalfd

I just wish freebsd and linux folks agree on a standard instead of dealing with multiple ways to handle async events via different poller implementations and hacks.

krakensden · on April 3, 2012

It is, you just missed it. FTFA:

> First we must block the signals we want to handle with signalfd(2) using sigprocmask(2). This function will be described later. Then we call signalfd(2) to create a file descriptor...

ajross · on April 3, 2012

OK, I call uncle. I can't find that text. It's not on the page titled "waiting for a signal", which seems like the obvious candidate. Nor is it in "signals and threads", which would have been my second choice. Nor "That's not everything!". Enlighten me with a link?

krakensden · on April 3, 2012

I clicked "show full page", but I think it's on page 3

http://www.linuxprogrammingblog.com/all-about-linux-signals?...

calloc · on April 2, 2012

One thing to please keep in mind (knowing that it is called the linux programming blog ...) is that if you program your app around specific Linux only signals/and or methods (aka signalfd) it might be and most likely will be more difficult to port your code to other platforms such as Solaris/BSD/HP-UX/AIX and the like.

Having ported my fair share of code from open source projects that only ran on Linux to FreeBSD I feel like too much emphasis is put on using Linux only stuff making it difficult or extremely hard to port code in a clean manner.

krakensden · on April 3, 2012

On the other hand, signalfd is kind of wonderful, and it's not like FreeBSD is about to suddenly experience a renaissance.

X-Istence · on April 3, 2012

Feeding a troll ... may not be my smartest move yet.

Yes, signalfd() is wonderful and nice, but it makes it a pain in the behind to port your software. If you need to use signals and do it in an event loop of some sort take a look at libev which abstracts away the OS and handles it in the best way possible depending on the system it is run on.

Not only that, but libev will make use of epoll/kqueue/select/poll as required on the OS it is running on, so you get easier portability with the various different event mechanisms that do exist across systems.

krakensden · on April 3, 2012

I understand that signalfd() is unportable, the question is, why port?

calloc · on April 3, 2012

Many people do still want to run FreeBSD, and it may not be you the original developer that has to port it, but the ports maintainer that puts it in the FreeBSD ports tree, or a company that already has everything running on top of FreeBSD (think Yahoo) wants to use the software, now they have to take the time to port your Linux only code, when using standard POSIX compliant code would have made it much simpler, possibly as simple as a ./configure && make rather than introducing extra work.

Why limit your audience to one single operating system?

eropple · on April 3, 2012

OS X, however?

It's not particularly hard to emulate (most of) signalfd()'s behavior by writing a byte to a pipe and checking for it in your main loop.

krakensden · on April 3, 2012

Surely you mean iOS. In which case, I'm not sure why you would be spending much time worrying about signals.

eropple · on April 3, 2012

Er, no, thank you, I very much do not mean iOS. I mean OS X, which, irrespective of your (obnoxiously communicated) opinion of it, is rather popular amongst Unix developers and it tends to be a good idea to write your Unix code to support it.

sciurus · on April 2, 2012

Part 3 of Neil Brown's "Ghosts of Unix past" series has an interesting take on signals. The conclusion is that "signal handlers are perfectly workable for some of the early use cases (e.g. SIGSEGV) it seems that they were pushed beyond their competence very early, thus producing a broken design for which there have been repeated attempts at repair. While it may now be possible to write code that handles signal delivery reliably, it is still very easy to get it wrong. The replacement that we find in signalfd() promises to make event handling significantly easier and so more reliable."

https://lwn.net/Articles/414618/

antirez · on April 2, 2012

Signals are one of the most dangerous features of POSIX, but this guide is awesome. I used this resource while writing the Redis 2.6 "Software Watchdog" feature (documented at the end of this page: http://redis.io/topics/latency). Cool that it was posted here just a few days later.

javert · on April 2, 2012

What exactly makes you say they are dangerous? Genuinely curious. Certain kinds of programmer error?

(I haven't finished reading the guide yet... apologies if it's in there... but I'd say I'm pretty familiar with signals.)

gchpaco · on April 2, 2012

Signals do not interact gracefully with the rest of libc or the kernel, let alone concurrency. Signals can come in and interrupt kernel system calls--this is what EINTR is there for. When in the signal handler itself, the list of libc functions it is safe to call is small; the Single UNIX specification only guarantees less than 120 functions. V7 signals are not reliable and so it is dangerous to be in the signal handler for an extended period of time; but even BSD and SysV reliable signals have a list of caveats as long as my arm. The synchronous (think SIGFPE or SIGSEGV)/asynchronous (think SIGINT) distinction is an additional complication that the interface needed like it needed an additional head. And I haven't even gotten into the interactions between signals and process groups. Basically signals are a primitive, crude and dangerous form of IPC that one is nonetheless obligated to pay attention to.

As a rule of thumb I generally prefer to use signal handlers to set a small amount of global data and nothing else, and have the main interrupt loop notice and deal with the condition. It's possible to use weird siglongjmp things to get to the main loop if you are not there already, but (like longjmp in general) it is kind of weird and bizarre.

caf · on April 2, 2012

The best thing to do in a signal handler is often to write() a single byte to a pipe.

Your main loop can then notice that the other end of the pipe is readable (the main loop is normally watching file descriptors for activity anyway).

emmelaich · on April 3, 2012

I believe this is referred to as the 'self pipe' trick. Dan Bernstein says he came up with it in 1990.

http://cr.yp.to/docs/selfpipe.html

gchpaco · on April 3, 2012

That's a good point, and what I actually did last time this came up. signalfd on Linux just codifies this convention, but it is (particularly in evented code) an excellent idea.

javert · on April 2, 2012

I've actually done some quite intensive work with signals, but the reason I asked the question was just to make sure I wasn't missing something. Yes, it's very complicated, and if one doesn't have a complete understanding, problems can arise.

When in the signal handler itself, the list of libc functions it is safe to call is small; the Single UNIX specification only guarantees less than 120 functions.

I don't think this is precisely accurate. I think it's safe to call libc functions anywhere, but the point is that you have to ensure that non-reentrant functions are not called simultaneously by the same thread.

One way to do that is to never call those functions in a signal handler, but if you really know what you're doing (i.e., you know the signal hanlder is not interrupting the function you want to call), you can call it in the signal handler.

Does this sound right? I'm not being pedantic; I'm actually trying to make sure I have it right in my head.

caf · on April 2, 2012

The problem is that you don't know what other functions the libc function you want to call itself calls under the hood.

For example, take the classic case of printf() - sure, you might be able to guarantee that your signal handler can never interrupt an ongoing printf() call elsewhere, but what if printf() calls malloc() internally, and your signal handler has interrupted a malloc()?

That's why there's a (short) list of async-signal-safe functions in POSIX (see http://pubs.opengroup.org/onlinepubs/9699919799/functions/V2...)

calloc · on April 2, 2012

There are certain race conditions if you rely on signals to interrupt system calls and the interrupt happens before the syscall is called (after setting the signal, but before the syscall happens).

Since your code could have been anywhere in the execution path you can only do very limited amount of things within the signal handler that won't cause catastrophic issues otherwise.

javert · on April 2, 2012

certain race conditions if you rely on signals to interrupt system calls and the interrupt happens before the syscall is called

I'm pretty familiar with signals, but I don't really know what the problem is here. Why would you rely on signals to interrupt system calls? Maybe if I know why you would do that, I will see why it's a problem if the signal comes before the syscall happens.

spc476 · on April 3, 2012

At work, we have a time-critical program. In that program, we need to lock a file in order to modify it safely. The files themselves are stored on NFS (don't ask---this was a design decision from long ago). fcntl() (used to lock files) does not include a timeout. We need a timeout because the program is time-critical. NFS is not a local filesystem, but across the network. If the network is flaky, we don't want to hang on locking the file.

So we have to rely upon SIGALRM (via setitimer()) to interrupt the fcntl() call so we don't hang indefinitely.

javert · on April 3, 2012

Interesting, thanks.

X-Istence · on April 3, 2012

The examples given in the article linked explain this pretty well actually...

For example, lets say you want Ctrl + C to quit the program (bare with me, very simple example), and you have a select() call. So you set up your signal handlers, and then start filling the required structures for the select() call, before select() is called though you receive a SIGINT, your signal handler simply sets a global flag that is checked when select() returns (with errno == EINTR). So now the flag is set, the handler has done its job, and select() gets called and your program now starts waiting on file descriptors.

What you really wanted to happen is that the program would see the flag, clean up nicely and quit. Now the user has to send a second Ctrl + C to interrupt the select() syscall and to have the code executed that checks for errno == EINTR and the global flag that was set in the signal handler.

This is but a simple example of a race condition that can exist. The SIGALRM example given by the other HN user is also an excellent example of when things can go awry when not intentioned, and unless you program your signal handlers with that in mind you may get results you weren't expecting.

javert · on April 3, 2012

Thanks.

That example doesn't live up to the hype that was mentioned by a previous poster and that had me worried. However, that hype was probably overstated.

For example, "there are certain race conditions if you rely on signals to interrupt system calls and the interrupt happens before the syscall is called" -> seems to imply there are race conditions inherent to the situation, not race conditions that can be introduced by programmer mistake (which is pretty obvious, IMO).

Also "since your code could have been anywhere in the execution path you can only do very limited amount of things within the signal handler that won't cause catastrophic issues otherwise" -> I just don't believe that; you can do lots of things in the signal handler if you know what you're doing.

calloc · on April 3, 2012

Sure, you can do lots of things in the signal handler, if you are careful, but that is the point, it is more likely you will make a mistake than not. It is very simple to think, let me add this to the signal handler it will make my life easier and cause all kinds of weird issues that happen randomly like a heisenbug...

javert · on April 3, 2012

Wow, I cannot fathom how I could be downvoted on an honest and purely technical question.

antirez · on April 2, 2012

reentrancy of everything called inside a signal handler, interrupted system calls to handle, complexity in masking/unmasking, reinstalling of signal handler, and so forth. It's a very complex system.

javert · on April 2, 2012

That's what I figured was being referred to. Those things aren't so bad if you really know what you're doing.

noselasd · on April 2, 2012

I'd like some more around the "Interrupting system calls" section. This is a real can of worms. By default, signal() defaults to BSD semantics , which auto restarts syscalls, as do the SA_RESTART flag when the handler is established with sigaction(). Except not all calls are auto restarted, e.g. select/poll/sleep,

I'd really like to see a list of syscalls that are not restarted no matter the flags. And the macros that (implicitly) turn on SysV behavior of signal() ...

antirez · on April 2, 2012

Another thing that's not very clear from the document is: if the signal arrives in the course of a non-interruptible system call, when will it be delivered? As soon as the system call ends or anytime in the (near) future? Probably POSIX does not define this behavior, but it's useful for the programmer to know how most operating systems handle this.

Another interesting thing that I had to read in the Linux source code to be sure is that non-fatal signals can't interrupt a write(2) against the filesystem (and you can't have a short write count returned because of a signal). This is an assumption many programs rely on.

calloc · on April 2, 2012

The Linux man page on signals is surprisingly helpful in that regard: http://linux.die.net/man/7/signal

thelastnode · on April 2, 2012

Took me a second to find, but here's the all-on-one-page version: http://www.linuxprogrammingblog.com/all-about-linux-signals?...