AcılıKetçap: A threaded server implementation

Hi, today I want to write about a server and a client program I developed under GNU/Linux [source]. And I will write a longer post about what I learned about server design later.

I started off to write a chat server and client but in the end I thought anything related to chat cannot be coded in C easily. I may use the code to make a chat server with C++ someday. Right now the server has a message log and all incoming messages are recorded there. A message coming from one client is sent to all other clients which are connected to the server. Each client records the messages coming from the server to its own message log. The server doesn't use select, poll or epoll. Instead it uses threads. I chose to assign new threads per connection to practice pthreads.

There are three topics I can write about the programs. The queue struct which works as a message buffer, the client application and the server.

The queue struct (struct messagelog in code) looks complicated but it really isn't. It stores chars inside, however it doesn't take chars one by one, it copies whole strings with memcpy. Its code is not clear as normal single element push/pop queues but it can push or pop strings of chars with just one or two (if queue is wrapped around) system calls. Normally you need to make as many syscalls as the number of elements pushed/popped. I think one can understand its mechanics by reading the comments in the code and sketching the if/else situations on a paper.

The client doesn't have much complexity. It is like server code which supports only one connection. The getaddrinfo() function which replaced gethostbyname() deserves attention but I will write a seperate post about its usage. Client doesn't have a GUI, stdin and stdout are used simultaneously. One way to correct this is to use termios.h and a little terminal programming knowledge to make input asynchronous. Other way is to write a Qt (or other framework) windowed client. I didn't bother to implement a neat input system since my focus was learning threads and sockets.

I have many things to say about the server.

First of all I chose threaded server model purely for educational purposes. But once I decided that I didn't want to do any kind of polling. Because polling loops are wasting CPU on idle servers and polling on multiple threads/cores just multiplies the amount of wasted CPU time. Polling is of course efficient in busy servers but since I was using threads I wanted to write a server which uses 0% CPU under 0% network load.

That proved to be challenging and I guess I tried all possible combinations. As I said I will be writing a longer post about server design and various available choices so I will only talk about how the current server code works. It utilizes two threads for each connection, one thread for sending data and one thread for receiving/listening data. All read and write operations are blocking and send thread also blocks while waiting for new messages to send. Hence no threads are scheduled if there is no network I/O.

So why I needed two threads? Using asynchronous I/O with O_ASYNC flag is unnecessarily hard because threads don't work well with signals. I don't know about aio.h library. If you can do asynchronous I/O you can just set a handler to send data and blocking wait on receive. You can also get away with one thread per connection approach if you are working with a synchronous protocol like http. You can just block on receive and when receive function returns you process incoming data, prepare the data to be sent, do the sending and block on receive again in a constant loop.

The main thread just accepts new TCP connections. Once a connection is established it starts a send and a receive thread. Send thread immediately sends the whole message log (data sent by previously connected clients) then blocking waits on a conditional mutex for new data to send. Receive thread just listens for new data from the client. If data arrives it first writes it to the server message log which is shared between all threads and then signals that there is new data ready to be sent to all clients by using the conditional mutex. Then each send thread reads the newly added data (ie. delta) and sends the data to their respective clients on the other side of each TCP connection. If connection fails, for example if client disconnects, receive function returns error in the receive thread. Then receive thread cancels the connection's send thread and exits.

It looks a little bit more complicated in the code because of locking.

I guess that's all. The code is a little messed up, mostly because I had to change everything every time I changed the design. There are also lots of "//TODO: " comments.

So what did I learn from this project?

Practically I can now understand articles and papers about server design. I can now design server/client model communications.

But I also learned something more valuable: You cannot design what you don't know. I didn't know anything about network I/O. I mean it looks like file I/O but it is completely different. Unlike files the sockets are written and read on both sides asynchronously. (I mean normal file operations of course, when you use files in shared memory for IPC that works pretty much like a socket written and read by multiple clients asynchronously.)

Other than these I learned basic socket programming. I knew how concurrent programming worked in Linux kernel and I also had some little test programs in pthreads. I got to use pthreads in user-space for real this time.

What can be improved?

Well, I have to say that I have no intention to add new features to current code. The main reason is I think that C is hard to use for anything higher than low-level operations and library/system call access. However current base code brings what it promises and can be extended with C++ code.

Another thing is I think current code can be used to do many asynchronous networking jobs. If I change stuff it can be harder to do something else with it. It's modularity can be improved with function hooks etc. though. You can use other structures/classes to hold the messages taken and messages to be sent other than message log struct. If I can make receive function to process received data with custom functions and send function to take data to send from custom functions it can be used for many purposes.

I guess I decided to turn it into something like a library just now. After that I can make a decent chat server. Or a game server. Or a file sharing server. Whatever I would need.

I will end this post with repeating the pros and cons of this design.

It uses threads so all access to data processed in the background needs to be done with proper locking. Therefore it is not so efficient if all the threads are accessing only one data structure behind because of heavy locking. If every thread locks and uses a single data structure then it is practically procedural code with lots of unnecessary syscalls for locking.

However if you have independent data structures it can become more efficient. For example if you use a message log struct for different chat rooms, only the connection threads using the specific room (message log struct) will lock it. The threads in front of idle chat rooms are not used anyhow. Similarly if you use it as a web server each thread may lock and use different pages independently. It is also more efficient since you utilize multiple cores automatically. Of course multiple core usage can be implemented in event-driven multiplexed servers too when accessing the data in the background.

The other price tags coming with threading also apply. You need to store each thread's stack and changed global variables. I don't understand how it is much of a memory overhead. (a few papers I read about web server design complain about this.) Stack is not reserved right away, it is filled as it is used and freed when it is empty. Event-driven server models also fill and free stack memory all the time. Still there is an overhead though. Secondly threads mean concurrent access and it always complicates the code. All access to background data should utilize proper locking mechanisms.

The main advantage of using multiple threads with blocking I/O is the decreased CPU load in idle cases. Polling functions can be used for blocking I/O too but mostly they are used in loops constantly checking for new messages. Of course under heavy load multiplexed polling (I mean epoll mainly) results in much fewer system calls, therefore much less context switches. In my threaded blocking I/O approach all threads are constantly doing context switches with each incoming and outgoing data.

So to wrap it up, server design is a big topic. It is also well defined and discovered I guess. It is also still alive with theoretical discussions and solid benchmarks on whether this design or that fits better for a particular purpose. My code let me get a glimpse of it. I don't think resulting code is really fit for any practical purpose but it can help people curious about server design or socket programming if I clean it up sometime.

Sometimes it was frustrating to write a server from scratch since I didn't use any guidance. I went with trial and error mostly. I didn't use a book because I thought I didn't had time to read a book like this. (I have to find a job soon you know.) I guess it would take the same amount of time if I used a book as a guide but it would be less frustrating. However this way was also more fun. No one to tell me what is wrong or right, I had to fore-think why certain approaches would be impossible to use. It would also be a lot easier if I just used epoll() but as I said I didn't develop this to be practical. It was very educative on threads and streaming(TCP) socket I/O so I guess it was a better decision than using epoll().

Sorry about long and disorganized post. I hope to write a much detailed and methodical (and still amateur) post on server design in general. Also if I really turn this code into something like a library I will write proper documentation with graphs, diagrams, function and data structure documentation etc.

See you until next post. I will be writing about developing kernel modules from now on.

AcılıKetçap

18 Aralık 2012 Salı

A threaded server implementation

Hiç yorum yok:

Yorum Gönder