<a href="https://banu.com/blog/2/how-to-use-epoll-a-complete-example-in-c/">https://banu.com/blog/2/how-to-use-epoll-a-complete-example-in-c/</a>
Thursday, 2 June 2011 @ 1238 GMT by Mukund Sivaraman
Network servers are traditionally implemented using a separate process or thread per connection. For high performance applications that need to handle a very large number of clients simultaneously, this approach won't work well, because factors such as resource
notification method which tells you when you can read or write more data on a socket.
This article is an introduction to Linux's epoll(7) facility, which is the best readiness notification facility in Linux. We will write sample code for a complete TCP server implementation in C. I assume you have C programming experience,
know how to compile and run programs on Linux, and can read manpages of the various C functions that are used.
epoll was introduced in Linux 2.6, and is not available in other UNIX-like operating systems. It provides a facility similar to theselect(2) and
poll(2) functions:
select(2) can monitor up to <code>FD_SETSIZE</code> number of descriptors at a time, typically a small number determined at libc's compile time.
poll(2) doesn't have a fixed limit of descriptors it can monitor at a time, but apart from other things, even we have to perform a linear scan of all the passed descriptors every time to check readiness notification, which is O(n) and slow.
epoll has no such fixed limits, and does not perform any linear scans. Hence it is able to perform better and handle a larger number of events.
An epoll instance is created by epoll_create(2) or epoll_create1(2) (they take different arguments), which return an epoll instance.epoll_ctl(2) is used to add/remove descriptors to be watched on the epoll
instance. To wait for events on the watched set,epoll_wait(2) is used, which blocks until events are available. Please see their manpages for more info.
When descriptors are added to an epoll instance, they can be added in two modes:level triggered and
edge triggered. When you use level triggered mode, and data is available for reading,epoll_wait(2) will always return with ready events. If you don't read the data completely, and callepoll_wait(2) on the epoll
instance watching the descriptor again, it will return again with a ready event because data is available. In edge triggered mode, you will only get a readiness notfication once. If you don't read the data fully, and call
epoll_wait(2) on the epoll instance watching the descriptor again, it will block because the readiness event was already delivered.
The epoll event structure that you pass to epoll_ctl(2) is shown below. With every descriptor being watched, you can associate an integer or a pointer as user data.
Let's write code now. We'll implement a tiny TCP server that prints everything sent to the socket on standard output. We'll begin by writing a functioncreate_and_bind() which creates and binds a TCP socket:
create_and_bind() contains a standard code block for a portable way of getting a IPv4 or IPv6 socket. It accepts a<code>port</code> argument as a string, where
<code>argv[1]</code> can be passed. Thegetaddrinfo(3) function returns a bunch of
<code>addrinfo</code> structures in<code>result</code>, which are compatible with the hints passed in the
<code>hints</code> argument. The<code>addrinfo</code> struct looks like this:
We walk through the structures one by one and try creating sockets using them, until we are able to both create and bind a socket. If we were successful,create_and_bind() returns the socket descriptor. If unsuccessful, it returns -1.
Next, let's write a function to make a socket non-blocking. make_socket_non_blocking() sets the<code>O_NONBLOCK</code> flag on the descriptor passed in the
<code>sfd</code> argument:
Now, on to the main() function of the program which contains the event loop. This is the bulk of the program:
main() first calls create_and_bind() which sets up the socket. It then makes the socket non-blocking, and then callslisten(2). It then creates an epoll instance in
<code>efd</code>, to which it adds the listening socket<code>sfd</code> to watch for input events in an edge-triggered mode.
The outer while loop is the main events loop. It calls epoll_wait(2), where the thread remains blocked waiting for events. When events are available,epoll_wait(2) returns the events in the
<code>events</code> argument, which is a bunch of<code>epoll_event</code> structures.
The epoll instance in <code>efd</code> is continuously updated in the event loop when we add new incoming connections to watch, and remove existing connections when they die.
When events are available, they can be of three types:
Errors: When an error condition occurs, or the event is not a notification about data available for reading, we simply close the associated descriptor. Closing the descriptor automatically removes it from the watched set of epoll instance<code>efd</code>.
New connections: When the listening descriptor <code>sfd</code> is ready for reading, it means one or more new connections have arrived. While there are new connections,accept(2) the connections, print a message about it,
make the incoming socket non-blocking and add it to the watched set of epoll instance<code>efd</code>.
Client data: When data is available for reading on any of the client descriptors, we useread(2) to read the data in pieces of 512 bytes in an inner while loop. This is because we have to read all the data that is available
now, as we won't get further events about it as the descriptor is watched in edge-triggered mode. The data which is read is written to stdout (fd=1) using
write(2). If read(2) returns 0, it means an EOF and we can close the client's connection. If -1 is returned, and<code>errno</code> is set to
<code>EAGAIN</code>, it means that all data for this event was read, and we can go back to the main loop.
That's that. It goes around and around in a loop, adding and removing descriptors in the watched set.
I should have proof-read it before posting. Apologies, and thank you for pointing out the mistake. :)
Thank you for the comments. :)