Look Ma, No Threads!
Java’s NIO, node.js, Netty, Tornado, Twisted, Perl’s POE…they all have in common one thing: non-blocking IO.
Node.js is based on V8. V8 is a JavaScript implementation. It comes form a place (the Browser) where blocking is anathema. It’s just not permitted. You can’t block the browser. Node.js’ community follows this to its core: thou shall not block. Ever. It permeates its culture and all the software the community accepts. If something blocks, well, sorry, but it’s going to be rejected by the community.
This is why the approach isn’t as strong in the cultures of Python, Java and even Perl. They’re all willing, in fact its the default, to accept blocking operations. They see threads where the Node community sees non-blocking.
epoll? kpoll? select? If your goal is to learn about non-blocking IO, you can start simpler than that. You can start with a basic C program interacting with stdin and stdout. Consider:
/* first.c */ #include <stdio.h> #include <stdlib.h> #include <memory.h> char* int_to_binary_ascii( int x ) { static char buff[8*sizeof(int) + 1]; unsigned int ii, num_bits = (8 * sizeof(unsigned int)); memset(buff,'\0',sizeof(buff)); for ( ii = 0; ii < num_bits; ii++ ) { buff[num_bits - ii - 1] = ( x & (1 << ii ) ) ? '1' : '0'; } return buff; } /* Use a terminal escape to clear the line...*/ void clear_line () { printf("%c[2K\r", 27); } int main (void) { long ii; for ( ii=0;; ++ii ) { clear_line(); printf("% 10ld: %s",ii, int_to_binary_ascii(ii)); fflush(stdout); } return 0; }
This program is a tight (infinite) loop printing out a counter as an integer and the counter represented as a string of ones and zeros. It’s mesmerizing to watch the ones march ever more slowly to the left isn’t it? If you run it, you should see output something like:
2622411: 00000000001010000000001111001011
It will print continuously to the same line, clearing it and printing again. To run this program, lets us a shell script, make a file called run.sh
:
set -e set -u set -x trap "stty sane" INT TERM EXIT F="$1" if [ -e $F ]; then SRC=$F F=$(basename $F .c) else SRC=$F.c fi function compile () { gcc -O2 -Werror -Wall -o $2 $1 } compile $SRC $F stty raw ./$F rm $F
This script does an important thing that we’ll need in a moment. It puts the terminal into raw mode. This will allow unfiltered (un-cooked or raw) keyboard input to be passed directly to the program. In sane mode, the shell holds your input until you press enter. We don’t want that, we want the characters you type to go immediately to our program as you type them. We have to use raw mode to make this happen. When you use this script to run our program, you won’t be able to stop it with CTRL+C. You’ll have to go to another terminal, find out it’s pid and kill it. You can do this with something like:
kill $(ps aux | grep ./[f]irst | awk '{print $2}')
The reason you won’t be able to use CTRL+C any longer is because of that bit about ‘raw’ mode. In raw mode, the shell allows keyboard input (all of it) to go directly to the program. In ‘sane’, or normal, mode, keyboard input like CTRL+C or CTRL+Z is handled by the shell. In the case of CTRL+C, the shell will send a signal to the foreground process: a SIGINT. For most programs, like ours, a SIGINT will kill the process.
The shell script compiles the program, puts the terminal into raw mode, runs the program and then ensures the terminal is returned to ‘sane’ mode with the ‘trap EXIT’.
The program gives us no way to tell it to exit short of killing it. It would be nice if we could ask it to stop. We could stop it by waiting for some input:
/* second.c */ #include <stdio.h> #include <stdlib.h> #include <memory.h> #include <unistd.h> char* int_to_binary_ascii( int x ) { static char buff[8*sizeof(int) + 1]; unsigned int ii, num_bits = (8 * sizeof(unsigned int)); memset(buff,'\0',sizeof(buff)); for ( ii = 0; ii < num_bits; ii++ ) { buff[num_bits - ii - 1] = ( x & (1 << ii ) ) ? '1' : '0'; } return buff; } /* Use a terminal escape to clear the line...*/ void clear_line () { printf("%c[2K\r", 27); } int main (void) { long ii; char c; for ( ii=0;; ++ii ) { clear_line(); printf("% 10ld: %s",ii, int_to_binary_ascii(ii)); fflush(stdout); read(0,&c, sizeof(char)); if ( 'q' == c ) { break; } } return 0; }
This works, kinda. It certainly exits when we press ‘q’, but the program is blocked awaiting our input. We can keep pressing keys, but it doesn’t run without us. How can we get it to run independently of waiting for input, but react if input is present? You put stdin in non-blocking mode:
/* third.c */ #include <stdio.h> #include <stdlib.h> #include <memory.h> #include <unistd.h> #include <fcntl.h> char* int_to_binary_ascii( int x ) { static char buff[8*sizeof(int) + 1]; unsigned int ii, num_bits = (8 * sizeof(unsigned int)); memset(buff,'\0',sizeof(buff)); for ( ii = 0; ii < num_bits; ii++ ) { buff[num_bits - ii - 1] = ( x & (1 << ii ) ) ? '1' : '0'; } return buff; } /* Use a terminal escape to clear the line...*/ void clear_line () { printf("%c[2K\r", 27); } int main (void) { long ii; char c; int flags = fcntl(0,F_GETFL,0); flags |= O_NONBLOCK; fcntl(0,F_SETFL,flags); for ( ii=0;; ++ii ) { clear_line(); printf("% 10ld: %s",ii, int_to_binary_ascii(ii)); if ( 1 == read(0,&c, sizeof(char)) ) { printf("\r\n"); printf("You pressed: '%c'\r\n",c); fflush(stdout); if ( 'q' == c ) { break; } } fflush(stdout); } return 0; }
The key lines are where you see fcntl
, where we get the current set of flags for stdin
(file descriptor 0), ensure O_NONBLOCK
is set in the flags and then set the newly configured flags for stdin. When you run the program and type a few characters into it, you should see something like:
> bash run.sh third.c + trap 'stty sane' INT TERM EXIT + F=third.c + '[' -e third.c ']' + SRC=third.c ++ basename third.c .c + F=third + compile third.c third + gcc -O2 -Werror -Wall -o third third.c + stty raw + ./third 343404: 00000000000001010011110101101100 You pressed: 'a' 390059: 00000000000001011111001110101011 You pressed: 'b' 416735: 00000000000001100101101111011111 You pressed: 'c' 466926: 00000000000001110001111111101110 You pressed: 'd' 490793: 00000000000001110111110100101001 You pressed: 'e' 529303: 00000000000010000001001110010111 You pressed: 'f'
That change allows us to call read
– and now read won’t block. In the cases where read
is called and there is no input, it returns -1. If you read the man page for read
you’ll see that it also sets errno
, and if we print errno
:
/* fourth.c */ #include <stdio.h> #include <stdlib.h> #include <memory.h> #include <unistd.h> #include <fcntl.h> #include <errno.h> char* int_to_binary_ascii( int x ) { static char buff[8*sizeof(int) + 1]; unsigned int ii, num_bits = (8 * sizeof(unsigned int)); memset(buff,'\0',sizeof(buff)); for ( ii = 0; ii < num_bits; ii++ ) { buff[num_bits - ii - 1] = ( x & (1 << ii ) ) ? '1' : '0'; } return buff; } /* Use a terminal escape to clear the line...*/ void clear_line () { printf("%c[2K\r", 27); } int main (void) { long ii; char c; int flags = fcntl(0,F_GETFL,0); flags |= O_NONBLOCK; fcntl(0,F_SETFL,flags); for ( ii=0;; ++ii ) { clear_line(); printf("% 10ld: %s errno:%d",ii, int_to_binary_ascii(ii), errno); if ( 1 == read(0,&c, sizeof(char)) ) { printf("\r\n"); printf("You pressed: '%c'\r\n",c); fflush(stdout); if ( 'q' == c ) { break; } } fflush(stdout); } return 0; }
On my Mac, it prints 35. If I look in /usr/include/sys/errno.h
I see:
#define EAGAIN 35 /* Resource temporarily unavailable */ #define EWOULDBLOCK EAGAIN /* Operation would block */
Which can be taken to mean it would block, if it weren’t in non-blocking mode that is :)
Conclusion
These same techniques can be used on sockets as well as file handles. This technique is the foundation all of these non-blocking frameworks are built on under the hood. If you’re interested in learning more, the next function to study is select
(or poll
, epoll
, or kpoll
), which allows you to quickly determine which, of a set of, file handle is has input ready and can be read from, or can be written to, without blocking.
Kyle