contributed by < secminhr
>
除了自規格摘錄,應該要點評解說。
HTTP is defined as a stateless protocol, meaning that each request message can be understood in isolation.
RFC 7230 2.3
The original representation of
and \r\n
in RFC 7230 is SP and CRLF, here we convert it into the representation used in C.
*( header-field \r\n )
means 0 or more ( header-field \r\n )
[ message-body ]
means at most 1 message-body
OWS indicates optional whitespace
The presence of a message body in a request is signaled by a Content-Length or Transfer-Encoding header field.
RFC 7230 3.3
The presence of a message body in a response depends on both the request method to which it is responding and the response status code.
RFC 7230 3.3
To determine the length of messsage body, see RFC 7230 3.3.3.
The main part of this program can be seen as interactions between connection queue, greeters and workers.
A queue with accepted connections inside, both greeters and workers depend on this queue to put/get connections.
This queue uses two locks, head_lock
for enqueue, tail_lock
for enqueue to avoid greeters and workers from affecting each other.
This queue also guarantees that enqueue is always successful by keeping a dummy node at head.
A thread will wait for non_empty
if size
is 0 when dequeuing.
Create a new node with fd
, put it into queue q
, and wake any threads that are waiting for non_empty
.
size
check is not needed since the queue ensures that an enqueue action must succeed.
If size
is 0, which indicates that the queue is empty (has only dummy node), then wait for non_empty
.
Otherwise, dequeue the node next to head(dummy), get fd
from it and free it.
A greeter is responsible for accepting an incoming connection from client, and put it into connection queue for workers to handle it.
This will be passed into greeter function as its argument.
The important part of greeter_routine
is inside its while
loop, which continuously accepting incoming connections. The loop can be separated in 3 pieces.
listfd
A worker will dequeue a connection, read request from the connection, parse the request and send the response.
There are some variables and constants being used in the worker function:
q
: connection queueconnfd
: connection fd, where a worker receives requests and sends responsesmsg
:
MAXMSG
: length of msg
recv_bytes
: length of received request (in bytes)len
:
recv
status
: http status to sendfile
: the fd of the file to send as response messagebuf
: content of Date
headerst
: status of the file to send, we use st
to get the size of the filemsg
read request
From the syntax of request we can see that \r\n\r\n
indecates the start of message body (or end of a request if message body does not present).
Since the program does not deal with message body, having \r\n\r\n
is enough.
The following code will keep reading from connection into msg
unless
msg
recv
somehow cannot readLast situation is handled inside if ((len = recv(connfd, msg + recv_bytes, MAXMSG - recv_bytes, 0)) <= 0)
len == 0
, by manual, indicates that the socket has been shut down, in this case, we close the connection and jump to loopstart
to deal with next connection.
len == -1
indicates that an error has occurred, we then give different status depends on the error.
Note: The program cannot handle properly when the request exceeds MAXMSG
.
This line will parse the request and give out the status to send.
parse_request
is supported by some functions and macros, let's break them down one by one.
This structure represent a http request, pretty straight forward.
These macros are used extensively in parsing funcions.
TRY_CATCH
status_t
.STATUS_OK
, then the function using this will return with the status (as if the function throws the error status).do {...} while(0)
so we can use it like a normal function. For exampleTRY_CATCH_S
TRY_CATCH
for sep_newline
and sep_whitespace
.TRY_CATCH_S
will throw a STATUS_BAD_REQUEST
if STATEMENT
returns NULL
.Since we now know that TRY_CATCH
and TRY_CATCH_S
will throw if an error happens, we can just ignore them to see the normal flow of a function.
strsep_whitespace
*s
point to next character of whitespaces.strsep_newline
\r\n
or \n
), and make *s
point to the next character of newline character(s).parse_method
request->method
if it is GET
or HEAD
.STATUS_BAD_REQUEST
in that case.parse_path
/
and /index.html
to full path of index.html
, and assign it to request->path
, also, assign request->type
to TEXT
.Note: parse_path
now can only handle requset to /
and /index.html
, request to other paths will result in a STATUS_NOT_FOUND
error.
parse_protocol_version
STATUS_BAD_REQUEST
error.parse_initial_line
*request
.So the way this function parse the initial line is getting these 3 tokens sequentially and put them in corresponding parsing functions.
parse_header
parse_request
First line is its initial line, and the following lines are its header fields.
The response must follow the syntax of a response
So we can separate sending phase into 3 parts: initial line, headers, and message body.
Here we reuse msg
to store the initial line.
status_to_str
turns the status enum into its corresponding string reason.
Send Date
, Content-Length
, and Content-Type
headers.
Date
Date
is defined in RFC 7231 7.1.1.1.Since it requires GMT
, we use gmtime(&now)
to translate current time to correnponding GMT, then follow the rule above to translate it into a string using strftime
.
Content-Length
and Content-Type
HEAD
or GET
method.Responses to the HEAD request method (Section 4.3.2 of [RFC7231]) never include a message body because the associated response header fields (e.g., Transfer-Encoding, Content-Length, etc.), if present, indicate only what their values would have been if the request method had been GET (Section 4.3.1 of [RFC7231]).
That is to say, we only have to send message body when the request method is GET
, and we should indicate that message body is present by Transfer-Encoding
or Content-Length
.
In this case where we use Content-Length
, its value should be the length of message body.
And Content-Type
indicates the media type of message body, which was given in parse_path
.
As we mentioned in previous section, the program has to send message body only when request method is GET
, and the message body we want to send is specified by requset->path
given by parse_path
.
After the response is sent, we have to decide whether to close the connection or keep it (due to default persistent connection behavior in HTTP/1.1).
enqueue
is used to keep connection. We enqueue the connection back to connection queue so it can be handled later by a worker.
先闡述程式設計的考量,再來分析程式碼
Descriptions are added.
recv
non-blockingLook at the reading part in a worker:
It uses recv
to get the content of a request.
Since recv
will block if there are no messages available, we can easily imagine that all workers are blocked due to persistent connection is possible.
Let's see this effect in action by reducing the number of workers.
First, we modify the length of workers
array to 1 and change its filling loop.
Now we can see the effect with wget
.
This line will send 2 requests to localhost:9000
within the same connection, with a 9 second delay between 2 requests. (we choose 9 only because the timeout set by a greeter is 10 seconds)
During the delay, open another terminal and use
to send a request immediately.
We can clearly see that the later command, which should receive a response immediately, waits until the previous command is finished, and the reason is the blocking behavior of recv
.
See full code on nonblocking_only branch.
To solve this problem, it's intuitive to make the connection fd non-blocking.
We can do that by change the file status flag of connection fd in a greeter.
And add a check in reading loop when recv
returns EAGAIN
.
EAGAIN
means there are currently no messages available, so we enqueue
the connection back and start next loop.
This solution manages to solve the problem.
When we apply the above test method to it, the later command gives the response immediately.
However, this solution makes the CPU usage of the program rises dramatically up to 99% and therefore gets killed by the kernel before long.
The problem in previous section is that the workers spend most of their time dequeuing, reading, and enqueuing a connection that has no messages since we made the connection fd non-blocking.
Epoll gives us the ability to know when a connection gets messages so we can avoid unnecessary recv
if we use it after messages' arrival.
note: after reboot, performance of epoll
The throughput of epoll solution seems to vary across different machines.
The latency, however, is always lower
test on
Consider adding timer first, leave this problem later.