Python urllib usage tutorial (2024 latest version)
In the absence of third-party libraries, Python’s ownurllibIt is your "only built-in weapon" for making HTTP requests. it doesn't needpip install, ready to use right out of the box, perfect for quick scripts, embedded devices, educational demonstrations, or any "standard library only" situation. Although production-level projects typically userequests, but understandurllibIt will allow you to truly understand the underlying logic of HTTP requests.
This tutorial will use the best practices in 2024 to help you masterurllibMost commonly used functions. The content is easy to understand and starts from scratch, and each example can be run directly.
1. Four core modules, first get familiar with them
urllibIt is a package, which is divided into four sub-modules, each performing its own duties:
urllib.request– Responsible for opening URLs, sending requests, and reading responses. It is the door to the entire library.urllib.error– Specifically catches HTTP and URL related exceptions to make your code more robust.urllib.parse– Handles URL encoding, parameter splicing, and splitting, equivalent to a URL toolbox.urllib.robotparser– parserobots.txt, crawler specification tool (although optional, but highly recommended).
Remember these four brothers, all subsequent operations revolve around them (mainly the first three).
2. Let’s start with the simplest GET request
GET is like entering the URL in the browser address bar and pressing Enter. It is the most commonly used HTTP method.
2.1 Three lines of code, plus automatic release of resources
useurlopen()Open a URL and matchwithStatements can automatically close network connections to prevent resource leaks. This is the golden rule for operating files and networks in Python.
Tips: Use
withIt is equivalent to telling Python: "When you are done with this connection, remember to close the door for me." Definitely don’t skip it!
2.2 Put a "browser" cloak on the request
Many websites check whether visitors haveUser-Agent(browser identifier). If it is not set, you may be directly blocked. At this time we need to construct aRequestObject and add request headers to it.
Set upUser-Agent, your request will not be discarded directly as a machine program.
3. POST request: Submit data to the server
POST is commonly used for logging in, uploading forms, and writing data. The core step is to encode the form data and attach it to the request body.
3.1 Send traditional form (application/x-www-form-urlencoded)
Most login interfaces use this format, which is equivalent to using parameters&put together, e.g.username=test&password=123. We useurllib.parse.urlencodeto generate.
Note: Be sure to encode the dictionary into
bytes, otherwise an error will occur. andContent-TypeThe header cannot be missing, otherwise some strict backends may not recognize it.
4. JSON response, directly parsed using the built-in library
Today's APIs almost all return JSON. Use Python’s ownjsonmodules can be easily processed.
You will find,json.loads()It is more direct than decoding the string first and then parsing it, and it is done in one step.
5. Three high-frequency practical skills
5.1 Hang up the proxy
When working on an intranet or accessing over a wall, it is often necessary to configure a proxy.urllibuseProxyHandlerto achieve.
If you only want to use the proxy for specific requests, you can use it directly without installing it globally.opener.open()substituteurlopen。
5.2 Skip SSL certificate verification (testing only!)
The development environment may use a self-signed certificate, causing the SSL handshake to fail. You can temporarily skip verification, but it is absolutely prohibited in production environments.
⚠️ Doing so will expose you to the risk of man-in-the-middle attacks and can only be used during local debugging.
5.3 Timeout setting: Don’t keep requests waiting
defaulturlopenThere is no timeout limit. Once the other party does not respond, your program may remain stuck. pass totimeoutParameters can set the maximum waiting seconds.
In actual calls, within 10 seconds is a more reasonable timeout value.
6. exception-handling: Make the code more solid
Network requests are full of pitfalls: disconnection, 404, 500, timeout... Code without exception capture is a time bomb.
By writing exception-handling well, your program can fail gracefully instead of crashing directly.
7. Best Practice List for 2024
- Always use
withManage connections - Must be set
User-Agent - Always specified
timeout(3~10 seconds is appropriate) - Process JSON directly
json.loads() - Check before crawling
robots.txt,useurllib.robotparserBe a legal crawler - When encountering complex requirements (such as session retention, file upload), change
requestsSave money and worry
eight,urllib vs requests: One picture to understand
userequestsTo achieve the same function, the amount of code is much smaller:
But after studyingurllibAfter that, you can seerequestsThe source code will suddenly become clear - the basic ideas behind them are exactly the same.
9. Hands-on exercise: Obtaining weather information
Below we useurllibcall freewttr.inThe weather interface directly returns a one-line simplified weather forecast. This exercise just strings together GET, header settings, and exception-handling.
Enter after runningbeijingorlondon, you can see a line of weather overview, which is both practical and consolidates knowledge.
Summarize
urllibIt is the "original HTTP toolbox" for the Python world. It may not be fancy enough, but it's reliable, dependency-free, and available everywhere. Mastering it, you can not only cope with request tasks under various standard library limitations, but also have a deeper understanding of the HTTP protocol itself.
The development suggestions for 2024 are very clear: ** Use small scripts easilyurllib, for serious projectsrequests**. Whichever tool you use, remember to add exception-handling, timeouts, andUser-Agent, your network code is more than half successful.

