A camera Streaming Web Server

# A Camera Streaming Web Server ## :bulb: Abstract This project presents a lightweight, real-time video streaming system inspired by the concept of a pet camera. The implementation utilizes a Raspberry Pi running Ubuntu, combined with a standard USB webcam, to enable a cost-effective and accessible monitoring solution. Video capture is facilitated through the Video4Linux2 (V4L2) kernel interface, which provides robust support for video device handling on Linux-based systems. To deliver a continuous live feed to remote users, the system employs HTTP-based MJPEG streaming, transmitting video frames from the server to the client at approximately 10 frames per second. This design offers a practical, low-latency solution for browser-compatible video streaming without the need for specialized client-side software. ## :book: 1. Introduction ### 1.1 Video4Linux Video4Linux (V4L) is a framework consisting of a collection of device drivers [[1]](https://www.kernel.org/doc/html/v4.14/media/kapi/v4l2-core.html) and an API [[2]](https://www.kernel.org/doc/html/v4.14/media/uapi/v4l/v4l2.html) designed to facilitate real-time video capture on Linux-based systems. It provides support for a variety of video capture hardware, including USB webcams, streaming media devices, and other video input sources. Video capture devices operate by sampling analog or digital video signals and storing the resulting frames in memory. Through the V4L interface, user-space applications are able to control the video capture process and retrieve image data directly from the device drivers. The original V4L API was introduced in the Linux 2.1 kernel series as a unifying framework intended to replace the disparate and independently developed interfaces previously used for TV and radio devices. In the Linux 2.5 kernel and onward, this interface was superseded by Video4Linux2 (V4L2), a significantly enhanced and more flexible version of the API [[3]](https://www.kernel.org/doc/html/v4.9/media/uapi/v4l/diff-v4l.html#differences-between-v4l-and-v4l2). V4L2 introduces a standardized and extensible set of controls and data formats, along with a suite of tools and utilities that facilitate the configuration, testing, and debugging of video capture devices on Linux systems [[4]](https://www.mankier.com/package/v4l-utils). ```bash! # List supported video formats and resolutions of default video device v4l2-ctl --list-formats-ext ``` ### 1.2 HTTP-based MJPEG streaming Motion JPEG (MJPEG) is a video compression format in which each frame, or each interlaced field of a digital video sequence, is compressed independently as a JPEG image. Unlike modern video codecs that exploit temporal redundancy between frames, MJPEG treats each frame as a discrete still image. As a result, MJPEG can be perceived more as a rapidly displayed series of individual pictures rather than a seamlessly compressed video stream. Despite this limitation, MJPEG remains widely used in various video capture applications, including digital cameras, IP cameras, and webcams. It is also supported by many non-linear video editing systems. HTTP-based MJPEG streaming transmits a sequence of JPEG images over an open HTTP connection. Each image is sent in a separate HTTP response segment, distinguished by a predefined boundary marker. When a client sends an HTTP GET request for an MJPEG stream, the server responds with a continuous stream of JPEG frames, using the MIME type "multipart/x-mixed-replace;boundary=<boundary-name>". This MIME type explicitly signals the client to expect multiple parts—each representing a video frame—separated by the specified boundary string. The TCP connection remains open for the duration of the stream, allowing the server to continuously send new frames and the client to receive and render them in real time. On the client side, the MJPEG stream is typically embedded and displayed using a standard HTML <img> tag, enabling straightforward browser-based video streaming without additional plugins or codecs. :::info ***Differ from HTTP Live Streaming*** HTTP Live Streaming (also known as HLS) is an HTTP-based video streaming protocol developed by Apple [[5]](https://developer.apple.com/streaming/). Leveraging the same underlying HTTP protocol that powers the web, HLS enables efficient delivery of multimedia content using standard web servers and content delivery networks (CDNs), making it highly scalable and compatible with a wide range of client devices. One of the advantages HLS has over some other streaming protocols is adaptive bitrate streaming. This refers to the ability to adjust video quality in the middle of a stream as network conditions change. In a typical HLS workflow, a hardware encoder receives audio and video input, compresses the content using codecs such as HEVC for video and AC-3 for audio, and outputs the stream in the form of a fragmented MPEG-4 file or an MPEG-2 transport stream. A software-based stream segmenter then divides the continuous stream into a sequence of small media files, which are placed on a web server. Alongside the media files, the segmenter generates and maintains an index file—commonly referred to as a playlist—formatted in the M3U8 format. This playlist contains a list of URLs pointing to the segmented media files and includes metadata such as duration and sequence information. Clients begin playback by retrieving the playlist and then sequentially requesting and playing the listed media segments. ![http-live-streaming-Apple](https://hackmd.io/_uploads/ryRdShWNlx.png) Figure 1. The flow using HLS protocol [[5]](https://developer.apple.com/streaming/) ::: ### 1.3 Socket Programming A socket is a communication endpoint used for sending and receiving data across a network. Depending on the underlying transport protocol, sockets are typically categorized into two main types: datagram sockets, which use the User Datagram Protocol (UDP), and stream sockets, which rely on the Transmission Control Protocol (TCP). Additionally, a third type, known as raw sockets, enables the direct transmission and reception of IP packets without the involvement of higher-level transport protocols, offering greater control over packet structure and content. The socket API provides a standardized interface that allows programs to perform interprocess communication (IPC) over a network. This communication model is commonly implemented using the client-server architecture, where one process (the server) listens for incoming connections, and another process (the client) initiates the communication. :::info ***Flow of socket programming*** The socket programming model follows a structured sequence of steps that enable reliable communication between client and server applications over a network. The typical flow is as follows: 1. Socket Creation- **socket()** The process begins with the socket() API, which creates an endpoint for communication. This function returns a socket descriptor, an integer that uniquely identifies the socket within the process. This descriptor will be used in subsequent socket operations. 2. Binding- **bind()** After creating the socket, the server application uses the bind() API to associate the socket with a specific IP address and port number. This step is essential for servers, as it allows them to be reachable by clients over the network. Clients typically omit the bind step, relying on the system to assign an ephemeral port automatically. 3. Listening- **listen()** The listen() API marks a socket as passive, indicating that it is ready to accept incoming connection requests. This function must be called after a socket is created and successfully bound to an address using bind(). Once in listening mode, the socket cannot initiate connections but can accept them using accept(). 4. Connection Request- **connect()** On the client side, the application calls the connect() API to establish a connection with a listening server. This system call attempts to initiate a TCP handshake with the server socket. 5. Accepting Connections- **accept()** When a client attempts to connect, the server invokes the accept() API to accept the incoming connection request. This call blocks until a connection is established and returns a new socket descriptor for communicating with the specific client. The original listening socket remains open and continues to accept new connection requests. 6. Data Transmission- **send(), recv(), read(), write()** Once a connection is established, both the client and server can exchange data using various data transfer APIs such as send(), recv(), read(), and write(). These functions allow for full-duplex communication over the socket. 7. Connection Termination- **close()** When communication is complete, either the client or the server (or both) can invoke the close() API to terminate the connection and release any system resources associated with the socket. Proper closure is essential to avoid resource leaks and ensure graceful disconnection. ![flow-socket-programming](https://hackmd.io/_uploads/ByH9Zyf4ll.gif) Figure 2. Flow of events for a connection-oriented socket [[6]](https://www.ibm.com/docs/en/i/7.4.0?topic=communications-socket-programming) ::: ___ ## :feet: 2. Methodology ### 2.1 Socket Programming Implementation The communication between the client and server is established using the socket API, specifically through stream sockets (TCP). The server initializes the socket using the socket() system call and sets socket options to allow for immediate reuse of the port using setsockopt() with the SO_REUSEADDR flag. It then binds the socket to port 8080 and listens for incoming connections using bind() and listen(). When a client sends a request, the server accepts it using accept() and spawns a new thread via pthread_create() to handle the client independently. This multithreaded design allows the server to support multiple simultaneous connections. The server distinguishes between two types of HTTP requests based on the initial line of the HTTP message: 1. GET /stream: initiates a continuous MJPEG stream to the client. 2. All other requests: serve a static HTML file (View/index.html) as the user interface. Data is transmitted using the send() system call, and connections are closed gracefully after the transaction using close(). ### 2.2 USB Camera Integration The video input is provided by a standard USB webcam connected to the Raspberry Pi running Ubuntu. The system leverages the Video4Linux2 (V4L2) API to interface with the camera. Initialization begins with opening the device file /dev/video0 using open(). The desired image format is set to MJPEG using the VIDIOC_S_FMT ioctl command, with a resolution of 640×480 pixels. Memory mapping is used to improve performance and reduce data copying overhead. A buffer is allocated using VIDIOC_REQBUFS and mapped into user space with mmap(). Each frame is captured by enqueuing and dequeuing buffers using VIDIOC_QBUF and VIDIOC_DQBUF. To ensure thread safety when accessing the camera buffer, the capture function is guarded with a pthread_mutex_t. This low-level access to the webcam allows precise control over the capture pipeline and ensures compatibility with the MJPEG streaming process described in the next section. ### 2.3 MJPEG Streaming over HTTP Live video streaming is implemented using the MJPEG (Motion JPEG) format, in which each video frame is compressed independently as a JPEG image. Upon receiving a GET /stream request, the server responds with an HTTP header indicating the MIME type multipart/x-mixed-replace; boundary=frame. This MIME type instructs the client (typically a web browser) to expect a continuous stream of JPEG images separated by boundary markers. Each frame is captured from the USB webcam and immediately sent to the client as a part of the HTTP response. The server constructs a boundary marker with content-type and content-length headers, followed by the raw JPEG data and a line break. This process is repeated ___ ### :label: 3. Reference [1] Video4Linux devices, https://www.kernel.org/doc/html/v4.14/media/kapi/v4l2-core.html [2] Video4Linux API, https://www.kernel.org/doc/html/v4.14/media/uapi/v4l/v4l2.html [3] Difference btween V4L and V4L2, https://www.kernel.org/doc/html/v4.9/media/uapi/v4l/diff-v4l.html#differences-between-v4l-and-v4l2 [4] Package v4l-utils, https://www.mankier.com/package/v4l-utils [5] HTTP Live Streaming, https://developer.apple.com/streaming/ [6] Socket programing, https://www.ibm.com/docs/en/i/7.4.0?topic=communications-socket-programming