The Best Python HTTP clients for 2021

# The Best Python HTTP clients for 2021 HTTP is the communication protocol for the web. HTTP protocol keeps evolving and has various versions. [HTTP/1.1](https://developer.mozilla.org/en-US/docs/Web/HTTP/Basics_of_HTTP/Evolution_of_HTTP#http1.1_%E2%80%93_the_standardized_protocol) is commonly used. While newer versions are [HTTP/2](https://developer.mozilla.org/en-US/docs/Web/HTTP/Basics_of_HTTP/Evolution_of_HTTP#http2_%E2%80%93_a_protocol_for_greater_performance) and [HTTP/3](https://en.wikipedia.org/wiki/HTTP/3). Exchange of data on the web takes place between the client and the server. You enter the URL in the web browser, the browser(client) establishes a connection and initiates an HTTP request. The server sends the HTTP response back to the client. The response is a web page, it is a hypertext document containing data in the form of images, videos, and text. The abbreviation (HTTP) stands for Hypertext Transfer Protocol. Likewise, HTTP client is used to programmatically: * connect to different web applications * consume an API * download the resources over the network HTTP client can send synchronous and asynchronous requests to the webserver. Synchronous request means the execution of requests happens one after the other. You make an HTTP request call and wait for the response. Once you get the response back from the server only then you can make the next request call. To understand asynchronous requests imagine a scenario. You want to send 2 HTTP requests let us call them request A and request B. You can send request A, request B in parallel. You don't have to wait for the response of the previous request to make the next request call. This is an example of an asynchronous request. In this article, you'll learn about the most popular Python HTTP clients. I'll present a short description for each client. Followed by a code snippet of making an HTTP request call with each of the clients. In the end, to conclude we'll compare the HTTP clients and choose the Best Python HTTP client. ## urllib For accessing resources over the HTTP network call. Python 2, had standard libraries httplib and urllib2. In Python 3 httplib got changed to http.client and urllib2 was split up into different submodules urllib and urllib3. http.client is a low level HTTP client library. urllib is built on top of http.client library. urllib is a collection of various modules for working with URLs. `urllib.request` module defines functions and classes which help in opening urls and fetching the data. ```= import urllib.request response_object = urllib.request.urlopen("http://www.httpbin.org") response = response_object.read().decode("utf-8") print(response) ``` The above code snippet allows us to read the data. Similarly, in the following code snippet you'll learn how you can send data to the sample URL http://httpbin.org/post. ```= import urllib.request import urllib.parse url = "http://httpbin.org/post" args = { "hello": "world" } data = urllib.parse.urlencode(args) data = data.encode() response = urllib.request.urlopen(data) print(response) ``` Output of the program would be ``` {"args": {}, "data": "", "files": {}, "form": {"color": "red"}, "headers": { "Accept-Encoding": "identity", "Content-Length": "9", "Content-Type": "application/x-www-form-urlencoded", "Host": "httpbin.org", "User-Agent": "Python-urllib/3.8", "X-Amzn-Trace-Id": "Root=1-609eb2ab-68d6aa540dc1791b27caf4f5"}, "json": null, "origin": "49.36.37.22", "url": "http://httpbin.org/post"} ``` Through the above example code snippets you can see that using `urllib` you can fetch data from a URL (GET method), send data to the URL (POST method). You can read details about HTTP request methods [here](https://developer.mozilla.org/en-US/docs/Web/HTTP/Methods). **Limitations with urllib module** It only supports a subset of HTTP request methods, doesn't automatically decode the returned data. When type of data is known it is easier to decode otherwise its a tedious process, POST data also needs to be encoded first before it is posted to the HTTP url, there aren't any built-in methods to deal with common HTTP client features like cookies, authentication, sessions and connection pooling. HTTP Cookie is a small piece of data that server sends to the client. The client may store it or send it back to the same server with later HTTP request calls. When server sends the cookie data it sends it corresponding to "Set-Cookie" response header. While when client sends the cookie data back to the server it uses request header "Cookie". urllib offers no support for cookie management. Hence, to create cookies and to extract cookie data from the server response another library http.client needs to be used. ```= import http.cookies cookie = http.cookies.SimpleCookie() cookie["sessionId"] = "38afes7a8" print(cookie.output(header="Cookie: ")) ``` "Cookie" is send in request header sent to the server. When you run the above program it would give the following output. ``` 'Cookie: sessionId=38afes7a8' ``` You see instead of urllib handling cookies and sending them in request. You manually had to create request header using http.client library. Set-Cookie response header is sent from server to the client so that client can send it back to the server along with the request call later. Below, mentioned is a way to extract Set-Cookie header from the response send by the server. ```= import urllib.request import http.cookiejar policy = http.cookiejar.DeafultCookiePolicy(blocked_domains=["ads.net", ".ads.net"]) cookiejar_object = http.cookiejar.CookieJar(policy) opener = urllib.request.build_opener(urllib.request.HTTPCookieProcessor(cookiejar_object)) r = opener.open("http://httpbin.org/get") ``` In the above program import urllib.request for making the request call and sending the request headers along with HTTP request. Next, import http.cookiejar. Natively urllib.request does not handle cookies. ``` policy = http.cookiejar.DefaultCookiePolicy(blocked_domains=["ads.net", ".ads.net"]) ``` http.cookies is used for automatic handling of HTTP cookies. DefaultCookiePolicy class is responsible for deciding whether each cookie should be accepted from the server or should it be returned to the server. `blocked_domains` is a keyword argument, it is the sequence of domain names that we never accept cookie from nor return cookies to. ``` cookiejar_object = http.cookiejar.CookieJar(policy) ``` CookieJar class stores HTTP cookies. It extracts cookies from HTTP requests (i.e gets value corresponding to Cookie request header) and sends them in HTTP responses (i.e sets the extract value corresponding to Set-Cookie response header). ``` opener = urllib.request.build_opener(urllib.request.HTTPCookieProcessor(cookiejar_object)) ``` Requests that are made using the CookieJar instance are handled automatically by CookieJar. ``` r = opener.open("http://httpbin.org/get") print(r.data.decode("utf-8")) ``` The above snippet would access the URL with all the needed request headers. By accessesing the data attribute of the response object r we can read the response. Performing common HTTP features like operations like cookie management, basic authentication is cumbersome, relatively a bit difficult to understand for a newbie and requires you to understand things at a lower level and install more libraries as well for using advance features. Urllib is synchronous which means when used with multiple requests the time taken in processing of requests would be more. Due to all the mentioned limitations, urllib is not as popular as other Python HTTP clients. All the features it has are also available in other python HTTP clients. Hence, we won't be going into further details of urllib. ## urllib3 urllib3 is a user-friendly, powerful HTTP client library. urllib3 is used in some popular and widely used open source projects like requests, pip and many more. urlllib3 offers higher level of abstraction in comparision to urllib. urllib3 is one of the most downloaded packages, its last month downloads are 150,116,254. Unlike urllib, urllib3 is not present in the Python standard library. You can install urllib3 using the command `pip install urllib3`. Below mentioned code snippet shows how you can make GET and POST request calls using urllib3. ```= import urllib3 import json manager = urllib3.PoolManager(num_pools=2) r = manager.request('GET', 'http://httpbin.org/get') k = manager.request( 'POST', 'http://httpbin.org/post', fields={"hello": "world"} ) response = manager.request('GET', 'http://google.com/mail') print(len(manager.pools)) print(json.loads(r.data.decode("utf-8"))) ``` Carefully compare the way of making HTTP request calls with urllib and urllib3. Notice, when you do a HTTP request call using urllib a new connection is opened with each HTTP request. Once the connection is established only then client can send HTTP request to the server. Let us go through the code line by line to understand working with urllib3 HTTP client. You first import urllib3 library and instead of creating a connection directly you create an instance of PoolManager class. `manager = urllib3.PoolManager(num_pools=2)` PoolManager allows for arbitary request while transparently keeping track of necessary connection pools. For every different PoolManager creates a different connection pool. Whenever you request the same host the pool manager re-uses the connection. By re-using existing connections, the requests take up less resources at the servers end and also provides faster response time at clients end. ``` r = manager.request('GET', 'http://httpbin.org/get') r = manager.request( 'POST', 'http://httpbin.org/post', fields={"hello": "world"} ) r = manager.request('GET', 'http://google.com/mail') ``` You see in our example we created 3 HTTP request calls and have 2 hosts http://httpbin.org and http://google.com. PoolManager would as a result create and use 2 connection pools. Suppose you made one more request call to a different host http://yahoo.com. Now, as unique hosts are 3 and number of connection pools are already 2 and we have initialized value of num_pools = 2. So, PoolManager will remove the least recently used connection pool and create a new connection pool for the new host. request() method returns an HTTP response object. It has data member which represents response content in JSON string. ``` import json print(json.loads(r.data.decode("utf-8"))) ``` In the above code snippet we import json to convert string to JSON. JSON is present in python standard library. `json.loads` converts string response to JSON. In real-life use cases when you are working with APIs, doing web scraping at scale there might be cases of network failures, servers fail. To avoid the application code from breaking in these scenarios when making HTTP request call configure Retry. Retries can be initialised as: ``` retries = Retry(connect=5, read=2, redirect=5) ``` `connect` tells us how many connection related errors to retry-on. These errors are raised before the request is sent to the server. **Multi-threading/Async** **Limitations of using urllib3** Cookie Management ## requests ## greqeusts ## httpx ## Conclusion