Python urllib 使用教程（2024最新版）

在没有第三方库的环境下，Python 自带的 urllib 就是你发起 HTTP 请求的“唯一内置武器”。它不需要 pip install，开箱即用，特别适合快速脚本、嵌入式设备、教学演示，或者任何“只能使用标准库”的场合。虽然生产级项目通常用 requests，但弄懂 urllib 会让你真正理解 HTTP 请求的底层逻辑。

这篇教程会用2024年的最佳实践，带你掌握 urllib 最常用的功能。内容从零开始，通俗易懂，每个示例都可以直接运行。

一、四个核心模块，先混个脸熟

urllib 是个包，里面分了四个子模块，各司其职：

urllib.request – 负责打开 URL、发送请求、读取响应，是整个库的大门。
urllib.error – 专门捕获 HTTP 和 URL 相关异常，让你的代码更健壮。
urllib.parse – 处理 URL 编码、参数拼接、拆分，相当于一个 URL 工具箱。
urllib.robotparser – 解析 robots.txt，爬虫规范工具（虽可选用，但强烈推荐）。

记住这四兄弟，后面所有操作都围着他俩转（主要是前三个）。

二、先来个最简单的 GET 请求

GET 就像在浏览器地址栏输入网址并回车，是最常用的 HTTP 方法。

2.1 三行代码搞定，外加资源自动释放

用 urlopen() 打开一个 URL，配合 with 语句可以自动关闭网络连接，防止资源泄漏。这可是 Python 操作文件、网络的黄金法则。

from urllib.request import urlopen

# 一个公共的测试 API，返回一条待办事项
TEST_URL = "https://jsonplaceholder.typicode.com/todos/1"

with urlopen(TEST_URL) as response:
    # 看看状态码和原因（比如 200 OK）
    print(f"状态码：{response.status} | 原因：{response.reason}")
    
    # 获取响应头中的 Content-Type
    content_type = response.getheader('Content-Type')
    print(f"Content-Type: {content_type}")
    
    # 读取内容（原始字节流）
    raw_data = response.read()
    # 解码成字符串（默认 utf-8 能覆盖99%场景）
    text = raw_data.decode("utf-8")
    print(f"\n响应内容：\n{text}")

小贴士：用 with 相当于告诉 Python：“用完这个连接，记得帮我关门”。绝对不要省掉！

2.2 给请求披上“浏览器”外衣

很多网站会检查访问者有没有 User-Agent（浏览器标识）。如果没设置，可能直接被拒之门外。这时我们需要先构造一个 Request 对象，再给它加上请求头。

from urllib.request import Request, urlopen

URL = "https://jsonplaceholder.typicode.com/todos/1"

# 先创建 Request 对象
req = Request(URL)

# 添加常见的请求头
req.add_header("User-Agent",
               "Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
               "AppleWebKit/537.36 (KHTML, like Gecko) Chrome/130.0.0.0 Safari/537.36")
req.add_header("Accept", "application/json")  # 告诉服务器我只想要 JSON

# 然后像之前一样打开
with urlopen(req) as resp:
    print(resp.read().decode("utf-8"))

设置好 User-Agent，你的请求就不会被当成机器程序直接丢弃了。

三、POST 请求：把数据提交给服务器

POST 常用于登录、上传表单、写入数据。核心步骤是把表单数据编码，然后附在请求体里。

3.1 发送传统表单（`application/x-www-form-urlencoded`）

绝大多数登录接口都采用这种格式，相当于把参数用 & 拼起来，例如 username=test&password=123。我们用 urllib.parse.urlencode 来生成。

from urllib.request import Request, urlopen
from urllib.parse import urlencode

# 模拟登录信息
login_data = {
    "username": "test_user",
    "password": "test_pass_123",
    "remember": "true"
}

# 第1步：把字典变成 URL 编码字符串
encoded_str = urlencode(login_data)
# 第2步：字符串转换成 UTF-8 字节流（urllib 只接受 bytes 作为 POST 数据）
encoded_bytes = encoded_str.encode("utf-8")

# 第3步：创建 Request，指定方法和数据
req = Request(
    url="https://jsonplaceholder.typicode.com/posts",  # 测试用的 POST 接口
    data=encoded_bytes,
    method="POST"   # 默认就是 POST 当提供了 data，但显式写更清楚
)

# 第4步：告诉服务器我们传的是表单格式
req.add_header("Content-Type", "application/x-www-form-urlencoded")
req.add_header("User-Agent", "Chrome/130.0.0.0")

with urlopen(req) as resp:
    print(resp.read().decode("utf-8"))

注意：一定要给字典编码成 bytes，否则会出错。而且 Content-Type 头不能少，不然某些严格的后端可能不认。

四、JSON 响应，直接用内置库解析

现在的 API 几乎都返回 JSON。用 Python 自带的 json 模块就能轻松处理。

import json
from urllib.request import Request, urlopen

def get_todo(todo_id: int) -> dict:
    """封装的函数：根据 ID 获取待办事项，返回字典"""
    url = f"https://jsonplaceholder.typicode.com/todos/{todo_id}"
    req = Request(url, headers={"User-Agent": "Chrome/130.0.0.0"})
    
    with urlopen(req) as resp:
        if resp.status != 200:
            raise Exception(f"请求失败，状态码：{resp.status}")
        # 直接读取字节并解析为 JSON
        return json.loads(resp.read())

# 调用测试
try:
    todo = get_todo(2)
    print(f"任务标题：{todo['title']}")
    print(f"是否完成：{todo['completed']}")
except Exception as e:
    print(f"出错啦：{e}")

你会发现，json.loads() 比先解码字符串再解析更直接，一步到位。

五、三个高频实战技巧

5.1 挂上代理

内网办公、翻墙访问时，常常需要配置代理。urllib 用 ProxyHandler 来实现。

from urllib.request import ProxyHandler, build_opener, install_opener, urlopen

# 假设你的代理运行在 127.0.0.1:7890
PROXY = {
    "http": "http://127.0.0.1:7890",
    "https": "http://127.0.0.1:7890"
}

# 创建代理处理器
proxy_handler = ProxyHandler(PROXY)
# 构建一个 opener
opener = build_opener(proxy_handler)
# 全局安装（之后所有 urlopen 都走代理）
install_opener(opener)

# 测试：查看当前出口 IP
with urlopen("https://ifconfig.me/ip") as resp:
    print(f"代理后的 IP：{resp.read().decode('utf-8').strip()}")

如果只想给特定请求用代理，可以不全局安装，直接用 opener.open() 替代 urlopen。

5.2 跳过 SSL 证书验证（仅限测试！）

开发环境可能用到自签名证书，导致 SSL 握手失败。可以临时跳过验证，但生产环境绝对禁止。

import ssl
from urllib.request import urlopen

# 创建一个不验证证书的上下文
ctx = ssl._create_unverified_context()

# 测试一个著名的自签名站点
with urlopen("https://self-signed.badssl.com/", context=ctx) as resp:
    print(f"成功访问，状态码：{resp.status}")

⚠️ 这样做会面临中间人攻击的风险，只可在本地调试时使用。

5.3 超时设置：别让请求一直等

默认的 urlopen 没有超时限制，一旦对方无响应，你的程序可能一直卡住。传给 timeout 参数就可以设置最大等待秒数。

from urllib.request import urlopen

try:
    with urlopen("https://httpbin.org/delay/10", timeout=5) as resp:
        print(resp.read().decode())
except Exception as e:
    print(f"超时或其它错误：{e}")

在实际调用中，10 秒以内是比较合理的超时值。

六、exception-handling：让代码更皮实

网络请求处处是坑：断网、404、500、超时……不加异常捕获的代码就是定时炸弹。

from urllib.request import Request, urlopen
from urllib.error import HTTPError, URLError
import socket

def safe_get(url: str, timeout: int = 10) -> str | None:
    """带完整exception-handling的 GET 请求函数"""
    req = Request(url, headers={"User-Agent": "Chrome/130.0.0.0"})
    
    try:
        with urlopen(req, timeout=timeout) as resp:
            return resp.read().decode("utf-8")
    except HTTPError as e:
        # 服务器返回了错误状态码（4xx, 5xx）
        print(f"HTTP 错误：{e.code} - {e.reason}")
        # 有时错误响应里也有内容，可以尝试读取 e.read()
        return None
    except URLError as e:
        # 网络层面的错误：DNS 解析失败、连接拒绝等
        if isinstance(e.reason, socket.timeout):
            print(f"请求超时（>{timeout}秒）")
        else:
            print(f"URL 错误：{e.reason}")
        return None
    except Exception as e:
        # 兜底：其他未知异常
        print(f"未知错误：{e}")
        return None

# 几个典型错误场景
safe_get("https://jsonplaceholder.typicode.com/todos/999999")  # 可能 404
safe_get("https://不存在的域名.com")          # URLError
safe_get("https://httpbin.org/delay/15", timeout=5)   # 超时

写好exception-handling，你的程序就能优雅地失败，而不是直接崩溃。

七、2024 年的最佳实践清单

永远用 with 管理连接
必须设 User-Agent
始终指定 timeout（3~10 秒为宜）
处理 JSON 直接用 json.loads()
爬取前检查 robots.txt，用 urllib.robotparser 做合法爬虫
遇到复杂需求（如会话保持、文件上传），换 requests 库更省心

八、`urllib` vs `requests`：一图看懂

特性	urllib (内置)	requests (第三方)
安装	无需安装	`pip install requests`
代码量	相对复杂	简洁优雅
功能	基础全	极其丰富（Keep-Alive、Cookie 持久化、OAuth 等）
推荐场景	教学、受限环境、单文件脚本	99% 的实际项目

用 requests 实现同样的功能，代码量少很多：

import requests

# GET + JSON 解析
todo = requests.get("https://jsonplaceholder.typicode.com/todos/1").json()
print(todo["title"])

# POST
resp = requests.post("https://jsonplaceholder.typicode.com/posts",
                     data={"username": "test", "password": "123"},
                     headers={"User-Agent": "Chrome/130.0.0.0"},
                     timeout=10)
print(resp.status_code)

但学完 urllib 之后，你再看 requests 源码就会豁然开朗——它们背后的基础思想完全一致。

九、动手练习：获取天气信息

下面我们用 urllib 调用免费的 wttr.in 天气接口，直接返回一行精简天气预报。这个练习正好把 GET、头设置、exception-handling串一遍。

from urllib.request import Request, urlopen
from urllib.error import HTTPError, URLError

def get_weather(city: str) -> str | None:
    """返回城市的精简天气信息（格式：城市: 天气 温度）"""
    url = f"https://wttr.in/{city}?format=3"   # ?format=3 返回一行文本
    req = Request(url, headers={"User-Agent": "curl/7.68.0"})
    
    try:
        with urlopen(req, timeout=10) as resp:
            return resp.read().decode("utf-8").strip()
    except HTTPError as e:
        print(f"城市名可能不存在（{e.code}）")
        return None
    except URLError as e:
        print(f"网络连接出错：{e.reason}")
        return None

if __name__ == "__main__":
    city_name = input("请输入城市名（拼音或英文）：")
    weather = get_weather(city_name)
    if weather:
        print(f"\n{weather}")

运行后输入 beijing 或 london，就能看到一行天气概览，既实用又巩固了知识。

总结

urllib 是 Python 世界的“原装 HTTP 工具箱”。它也许不够花哨，但胜在可靠、无依赖、处处可用。掌握了它，你不仅能应付各种标准库限制下的请求任务，还能更深入地理解 HTTP 协议本身。

2024 年的开发建议很明确：随手小脚本用 urllib，正经项目用 requests。无论哪个工具，记住加上exception-handling、超时和 User-Agent，你的网络代码就成功了一大半。

#Python urllib 使用教程（2024最新版）

#一、四个核心模块，先混个脸熟

#二、先来个最简单的 GET 请求

#2.1 三行代码搞定，外加资源自动释放

#2.2 给请求披上“浏览器”外衣

#三、POST 请求：把数据提交给服务器

#3.1 发送传统表单（application/x-www-form-urlencoded）

#四、JSON 响应，直接用内置库解析

#五、三个高频实战技巧

#5.1 挂上代理

#5.2 跳过 SSL 证书验证（仅限测试！）

#5.3 超时设置：别让请求一直等

#六、exception-handling：让代码更皮实

#七、2024 年的最佳实践清单

#八、urllib vs requests：一图看懂

#九、动手练习：获取天气信息

#总结