Redis Internals
Redis Internals
Redis Internals
Redis
How does this dictionary server perform with high throughput and low latency?
In this article, I’ll cover the following things to show the internal workings of Redis:
1. Overview of the basic definition and the reasons for its high performance
2. Describe blocking I/O, non-blocking I/O, and I/O multiplexing with async I/O
Before we deep-dive into the hows and whys, let me define Redis. Redis is a Remote
Dictionary Server. It is a TCP server providing in-memory data structures like
dictionaries, sets, etc. Redis has many uses like caching, session storage, real-time data
stores, streaming engine, etc. Many tech organisations have been using Redis because
it delivers high throughput with low latency (HTLL).
Upon reading a few posts, I could drill into the following reasons for its high-
throughput-low-latency performance.
1. Redis stores all of its data in the main memory (RAM) and periodically stores it
in the disk. RAM-based access is fast compared to disk-based access.
3. Redis uses highly efficient data structures like skip-lists, simple dynamic strings,
etc.
In this article, we will highlight the second point because that’s a significant
contributor to high throughput and low-level latency.
Background
Suppose we want to design a web server that can handle concurrent clients. The first
possible — and brute-force solution — for this client-server model is to make a client
create a connection to the server’s socket. After this, the server listens to the
connection on a socket, accepts it, and handles it.
2. Each thread will be busy waiting on the connections for the clients to send data
and perform disk I/O operations. In this case, the CPU will spend the remaining
time waiting for I/O.
One thread per connection is blocked on I/O in the above case. It cannot proceed until
it receives data. I/O is always slower than network I/O, and disk I/O is outside the CPU’s
domain, where it cannot do anything except wait. This kind of situation is termed
blocking I/O.
Note: I/O refers to Input/Output in the computing world to denote how communication
happens between systems. It is everything that happens outside of the CPU’s domains.
Blocking I/O
With the blocking I/O, when the client makes a connection request to the server, the
socket and its thread are blocked from processing that connection until some read data
appears. This data is placed in the network buffer until it is all read and ready for
processing. Until the operation is complete, the server can do nothing but wait.
With blocking I/O, a client makes a connection request, and the server socket and
thread handling processing that request is blocked on socket I/O until some data
appears. In actuality, the thread calls read(2) operation with fd of the socket and buf .
The kernel writes the data from the socket file descriptor to a buffer until it is ready for
processing. Until this operation is complete, the thread is blocked.
As mentioned above, the one-thread-per-connection with a blocking I/O approach is
not ideal for many concurrent connections. How can we make our server cater to a
large number of clients?
Non-Blocking I/O
Now let’s change the approach. Instead of one thread per connection, let’s have a single
thread that will accept connections in a non-blocking way. But how? Luckily, there is a
way to make this single thread stay unblocked.
By setting the socket’s fd with O_NONBLOCK flag, we can make this possible. Now, if the
thread calls accept() on that fd and there is no data available on that fd , it will get an
error EAGAIN or EWOULDBLOCK . This error depicts the non-blocking nature of the I/O.
Upon getting the error, the thread will poll again to see if there is any data available on
that fd . Now, a new activity happens on that socket, and then accept() returns a new
cfd file descriptor for every new connection. We can enqueue such cfd for all the new
connections getting accepted and perform read() on those cfd .
The above solution is polling. It keeps the thread busy-wait by continuously making it
check the error codes for all fd and cfd , and try again. This causes expensive CPU
time and wasted cycles. This is a non-blocking operation but a synchronous one.
I/O multiplexing
We can use another alternative, which is I/O multiplexing. With I/O multiplexing, we
can monitor multiple fd with a single blocking call. But how?
We can use select() call, which monitors multiple fd , and every select() call will
return the number of fd that are ready to accept/read/write. When this retval is non-
zero, we have to explicitly check for all the fd which are ready for read/write.
Async I/O
All the above approaches are synchronous and do not solve the problem efficiently. If
we can get the exact fd that’s ready instead of only the number of “ready” sockets, we
won’t have busy-wait . Because the main single thread can then execute the operations
on those available and “ready” fd , instead of just waiting. CPU cycles are getting used
in a useful manner, and they are not wasted. But how can we achieve this?
Luckily, there is another API, epoll() , that can solve this problem for good. It will
return the available sockets, and then we can loop through them and perform the
operations. This is similar to an event-driven system. The main thread is preparing the
events for operating/processing/handling.
Unfortunately, due to the scope of this article, I won’t be able to cover epoll in detail.
But you can read this amazing and detailed blog by Cindy Sridharan.
3. Binds the port and addr and initialises the socket listeners
5. Traverse the “ready” events from the event loop enqueued by epoll endpoints
2. socket.c
3. anet.c
5. ae_epoll.c
are important and relevant to understand socket networking in Redis and event loops.
connTypeInitialize()
...
}
/* connection.c */
int connTypeRegister(ConnectionType *ct) {
...
ConnectionType *tmpct;
int type;
connTypes[type] = ct;
...
return C_OK;
}
/* create/shutdown/close connection */
.conn_create = connCreateSocket,
.conn_create_accepted = connCreateAcceptedSocket,
...
/* IO */
.write = connSocketWrite,
.writev = connSocketWritev,
.read = connSocketRead,
.set_write_handler = connSocketSetWriteHandler,
.set_read_handler = connSocketSetReadHandler,
...
/* pending data */
.has_pending_data = NULL,
.process_pending_data = NULL,
};
/* server.c */
void initServer(void) {
...
server.el = aeCreateEventLoop(server.maxclients+CONFIG_FDSET_INCR);
...
aeCreateEventLoop calls aeApiCreate which malloc s aeApiState that has two fields —
epfd that holds the epoll file descriptor returned by a call from epoll_create and
events that is of type struct epoll_event . These are defined by the Linux epoll
library.
aeCreateEventLoop only initialises server.el and doesn’t register events to wait on and
handlers for handling the ready events.
In a later section, we will cover the implementation of ae_epoll.c . But for now,
remember that to register any file event, we need to use aeCreateFileEvent . Here’s
what the code looks like:
fe->clientData = clientData;
if (fd > eventLoop->maxfd)
eventLoop->maxfd = fd;
return AE_OK;
}
Initialisation of Listeners
To accept new connections, the following steps are required:
Creating posts of the server’s event loop, main , initialises socket listeners based on the
connection type. For each connectionType , there is a listener. A few examples of
connection types are TCP Socket, TLS socket, UNIX socket, etc. The connListener is
defined as a struct in connection.h .
/* server.h */
struct redisServer {
...
connListener listeners[CONN_TYPE_MAX]
...
}
/* connection.h */
/* Setup a listener by a connection type */
struct connListener {
int fd[CONFIG_BINDADDR_MAX];
int count;
char **bindaddr;
int bindaddr_count;
int port;
ConnectionType *ct; /* important, it has all the functionality for the conn*/
void *priv;
};
void initListeners() {
/* Setup listeners from server config for TCP/TLS/Unix */
int conn_index;
connListener *listener; // defined in connection.h
if (server.port != 0) {
conn_index = connectionIndexByType(CONN_TYPE_SOCKET);
...
listener = &server.listeners[conn_index];
listener->bindaddr = server.bindaddr;
listener->bindaddr_count = server.bindaddr_count;
listener->port = server.port;
listener->ct = connectionByType(CONN_TYPE_SOCKET);
}
...
/* create all the configured listener, and add handler to start to accept */
int listen_fds = 0;
for (int j = 0; j < CONN_TYPE_MAX; j++) {
listener = &server.listeners[j];
...
...
listen_fds += listener->count;
}
...
}
anetTcpServer binds address to the socket and starts listening on s. It later returns this
s to the caller listenToPort() , which stores this in listener.fd array. listenToPort()
/* anet.c */
/* binds the addr and listens on the socket s fd */
static int anetListen(char *err, int s, struct sockaddr *sa, socklen_t len, int ba
if (bind(s,sa,len) == -1) {
anetSetError(err, "bind: %s", strerror(errno));
close(s);
return ANET_ERR;
}
/* server.c */
if (sfd->fd[sfd->count] == ANET_ERR) {
...
/* Rollback successful listens before exiting */
closeListener(sfd);
return C_ERR;
}
/* Create an event handler for accepting new connections in TCP or TLS domain sock
* This works atomically for all socket fds */
int createSocketAcceptHandler(connListener *sfd, aeFileProc *accept_handler) {
int j;
/* socket.c */
/* accept_handler registered in the eventloop by aeCreateFileEvent() */
if (cfd == ANET_ERR) {
if (errno != EWOULDBLOCK)
serverLog(LL_WARNING,
"Accepting client connection: %s", server.neterr);
return;
}
serverLog(LL_VERBOSE,"Accepted %s:%d", cip, cport);
/* registers read handlers */
acceptCommonHandler(connCreateAcceptedSocket(cfd, NULL),0,cip);
}
}
The func acceptCommonHandler() registers the read handler by calling the following:
/* connection.h*/
/* Register a read handler, to be called when the connection is readable.
* If NULL, the existing handler is removed.
*/
static inline int connSetReadHandler(connection *conn, ConnectionCallbackFunc func
return conn->type->set_read_handler(conn, func);
}
/* socket.c */
/* read handler for the accepted connection */
static int connSocketSetReadHandler(connection *conn, ConnectionCallbackFunc func)
if (func == conn->read_handler) return C_OK;
conn->read_handler = func;
if (!conn->read_handler)
aeDeleteFileEvent(server.el,conn->fd,AE_READABLE);
else
if (aeCreateFileEvent(server.el,conn->fd,
AE_READABLE,conn->type->ae_handler,conn) == AE_ERR) return C_E
return C_OK;
}
/* ae.c */
void aeMain(aeEventLoop *eventLoop) {
eventLoop->stop = 0;
while (!eventLoop->stop) {
aeProcessEvents(eventLoop, AE_ALL_EVENTS|
AE_CALL_BEFORE_SLEEP|
AE_CALL_AFTER_SLEEP);
}
}
/* Process every pending time event, then every pending file event
* (that may be registered by time event callbacks just processed).*/
int aeProcessEvents(aeEventLoop *eventLoop, int flags)
{
int processed = 0, numevents;
...
...
processed++;
}
}
fired .
/* ae_epoll.c */
static int aeApiPoll(aeEventLoop *eventLoop, struct timeval *tvp) {
aeApiState *state = eventLoop->apidata;
int retval, numevents = 0;
...
numevents = retval;
for (j = 0; j < numevents; j++) {
struct epoll_event *e = state->events+j;
...
eventLoop->fired[j].fd = e->data.fd;
}
}
...
return numevents;
}
For example, if a client requests a connection, then aeApiPoll will notice it and
populate the eventLoop->fired table with an entry of the descriptor. This entry is called
the listening descriptor, and the mask is AE_READABLE .
aeProcessEvents() {
...
aeFileEvent *fe = &eventLoop->events[fd];
fe->rfileProc(eventLoop,fd,fe->clientData,mask);
...
}
Summary
This article covered blocking, non-blocking I/O, and the Redis event loop.
Redis has everything running in a single thread in a non-blocking fashion.
First, it initialises the server’s event loop, server.el , then it binds the server
address with the port and initialises the listeners sfd . While initialising, it registers
the events with accept_handler in the event loop and uses the AE_READABLE mask by
calling aeCreateFileEvent .
Redis then uses system calls like epoll() , epoll_wait() , etc., for getting the ready
events. It processes all the events synchronously by triggering their respective
registered handlers and continues this process until stopped.
References
1. Why Redis is so fast
2. blocking-and-non-blocking-io
7. Thoughts on Redis
9. Epoll Madness