Python urllib 使用教程（2024最新版）

作为 Python 唯一内置的全功能 HTTP 客户端套件，urllib 不需要 pip install 就能开箱即用——这让它在快速脚本、嵌入式环境、入门教学中格外好用。虽然第三方库 requests 是主流生产选择，但掌握 urllib 能帮你理解底层网络请求逻辑，还能应付所有“只能用标准库”的场景。

本教程会覆盖2024年实际开发中最常用的功能，代码带最佳实践，结构清晰易懂。

核心模块概览

先记住这4个核心子模块，后续功能都围绕它们展开：

urllib.request：发起请求、读取响应的核心入口
urllib.error：捕获 HTTP/URL 相关的标准异常
urllib.parse：处理 URL 编码、拼接、拆分等杂活
urllib.robotparser：解析 robots.txt，合法爬取必备（可选但推荐）

基础功能：发送 GET 请求

GET 是最常用的请求方式，比如获取 API 数据、爬取简单网页。

最简 GET 请求 + 资源安全释放

Python 处理网络/文件资源时，必须用 with 上下文管理器，自动关闭连接，避免资源泄漏。

from urllib.request import urlopen

# 替换成你要测试的接口
TEST_URL = "https://jsonplaceholder.typicode.com/todos/1"

with urlopen(TEST_URL) as resp:
    # 打印核心元信息
    print(f"状态码：{resp.status} | 原因短语：{resp.reason}")
    print("响应头（部分）：")
    print(f"Content-Type: {resp.getheader('Content-Type')}")
    
    # 读取并解码响应（注意先判断编码，默认utf-8即可覆盖99%场景）
    raw_data = resp.read()  # 先读 bytes
    text_data = raw_data.decode("utf-8")
    print(f"\n响应内容：{text_data}")

自定义请求头（必学！避免被反爬拦截）

大部分网站/API 会拒绝没有 User-Agent（浏览器标识）的请求，用 urllib.request.Request 对象可以轻松加头。

from urllib.request import Request, urlopen

CUSTOM_URL = "https://jsonplaceholder.typicode.com/todos/1"

# 创建 Request 对象，先传URL
req = Request(CUSTOM_URL)

# 添加常用头
req.add_header("User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/130.0.0.0 Safari/537.36")
req.add_header("Accept", "application/json")  # 只接受JSON响应

with urlopen(req) as resp:
    print(resp.read().decode("utf-8"))

进阶基础：发送 POST 请求

POST 用于提交数据（比如登录、上传表单），核心是用 urllib.parse.urlencode 编码字典数据，再转成 bytes 传入 Request。

提交 `application/x-www-form-urlencoded` 表单

这是最传统的 POST 数据格式。

from urllib.request import Request, urlopen
from urllib.parse import urlencode

# 模拟登录数据
LOGIN_DATA = {
    "username": "test_user",
    "password": "test_pass_123",
    "remember": "true"
}

# 1. 字典转 URL 编码字符串
encoded_str = urlencode(LOGIN_DATA)
# 2. 字符串转 UTF-8 字节流（urllib 只接受 bytes 作为 POST 数据）
encoded_bytes = encoded_str.encode("utf-8")

# 3. 创建 Request，显式指定 method（可选但推荐明确）
req = Request(
    url="https://jsonplaceholder.typicode.com/posts",  # 测试POST接口
    data=encoded_bytes,
    method="POST"
)

# 4. 补充表单专用头（有些接口会强制检查）
req.add_header("Content-Type", "application/x-www-form-urlencoded")
req.add_header("User-Agent", "Chrome/130.0.0.0")

with urlopen(req) as resp:
    print(resp.read().decode("utf-8"))

实用场景：处理 JSON 响应

现在的 API 几乎全返回 JSON，直接用 Python 内置 json 库解析即可。

import json
from urllib.request import Request, urlopen

def get_todo(todo_id: int) -> dict:
    """封装成可复用的函数，带简单的状态检查"""
    url = f"https://jsonplaceholder.typicode.com/todos/{todo_id}"
    req = Request(url, headers={"User-Agent": "Chrome/130.0.0.0"})
    
    with urlopen(req) as resp:
        if resp.status != 200:
            raise Exception(f"请求失败，状态码：{resp.status}")
        raw_bytes = resp.read()
        # 直接用 json.load() 也行（从响应流读，不用先存 bytes）
        return json.loads(raw_bytes)

# 测试调用
try:
    todo = get_todo(2)
    print(f"任务标题：{todo['title']} | 完成状态：{todo['completed']}")
except Exception as e:
    print(f"出错了：{e}")

高频高级功能

1. 设置代理

适用于需要翻墙、或者公司内网访问外部接口的场景。

from urllib.request import ProxyHandler, build_opener, install_opener, urlopen

# 配置代理（http/https 分开设，只有一个就留一个）
PROXY_CONFIG = {
    "http": "http://127.0.0.1:7890",  # 替换成你的代理地址
    "https": "http://127.0.0.1:7890"
}

# 步骤1：创建代理处理器
proxy_handler = ProxyHandler(PROXY_CONFIG)
# 步骤2：用处理器构建 opener
opener = build_opener(proxy_handler)
# 步骤3：（可选）全局安装 opener，后续所有 urlopen 都走代理
install_opener(opener)

# 测试
with urlopen("https://ifconfig.me/ip") as resp:
    print(f"代理后的IP：{resp.read().decode('utf-8').strip()}")

2. 跳过 SSL 证书验证（仅限开发/测试！）

有些测试环境用的是自签名证书，直接请求会报错，此时可以临时禁用验证。

import ssl
from urllib.request import urlopen

# 创建一个不验证证书的 SSL 上下文
unverified_context = ssl._create_unverified_context()

# 测试（替换成自签名证书的测试接口）
TEST_SELF_SIGNED_URL = "https://self-signed.badssl.com/"
with urlopen(TEST_SELF_SIGNED_URL, context=unverified_context) as resp:
    print(f"成功访问自签名网站，状态码：{resp.status}")

⚠️ 生产环境绝对不能用这个！ 会暴露你的请求被中间人攻击的风险。

必须掌握：错误处理

网络请求随时可能出错（断网、404、500、超时），不加异常捕获的代码很脆弱。

from urllib.request import Request, urlopen
from urllib.error import HTTPError, URLError
import socket

def safe_get(url: str, timeout: int = 10) -> str | None:
    """带完整异常捕获的 GET 请求函数"""
    req = Request(url, headers={"User-Agent": "Chrome/130.0.0.0"})
    
    try:
        with urlopen(req, timeout=timeout) as resp:
            return resp.read().decode("utf-8")
    except HTTPError as e:
        # HTTP 错误：比如404（找不到）、500（服务器内部错误）
        print(f"HTTP 错误：状态码 {e.code} | 原因 {e.reason}")
        # 可以打印错误响应内容
        # print(e.read().decode("utf-8"))
        return None
    except URLError as e:
        # URL 错误：比如断网、域名不存在
        if isinstance(e.reason, socket.timeout):
            print(f"超时错误：{timeout}秒内未收到响应")
        else:
            print(f"URL 错误：{e.reason}")
        return None
    except Exception as e:
        # 其他未知错误
        print(f"未知错误：{e}")
        return None

# 测试几个错误场景
safe_get("https://jsonplaceholder.typicode.com/todos/9999999")  # 404
safe_get("https://不存在的域名.com")  # URLError
safe_get("https://httpbin.org/delay/15", timeout=5)  # 超时

2024年的最佳实践

始终用 with 上下文管理器
必须加 User-Agent
设置合理的超时（默认 urlopen 没有超时，会一直挂起）
优先用 json.loads()/json.dumps() 处理 JSON
遵守 robots.txt：简单场景可以用 urllib.robotparser.RobotFileParser 检查权限
复杂需求直接换 requests：不用纠结，标准库只是兜底

替代方案对比

库	安装要求	API 复杂度	功能完整性	适用场景
`urllib`	内置	中等	基础全	入门教学、快速脚本、受限环境
`requests`	需安装	极低	极高	99%的生产开发场景

requests 实现同样的 GET/POST 请求会简洁很多，推荐大家日常用：

import requests

# GET（自动处理编码、JSON解析）
todo = requests.get("https://jsonplaceholder.typicode.com/todos/1").json()
print(todo["title"])

# POST
login_resp = requests.post(
    "https://jsonplaceholder.typicode.com/posts",
    data={"username": "test", "password": "123"},
    headers={"User-Agent": "Chrome/130.0.0.0"},
    timeout=10
)
print(login_resp.status_code)

快速练习

用 urllib 封装一个获取“今日天气（精简版）”的函数，调用公开的免费天气 API 测试（比如 wttr.in，这个 API 不用密钥）。

from urllib.request import Request, urlopen
from urllib.error import HTTPError, URLError

def get_simple_weather(city: str) -> str | None:
    """获取城市的精简 ASCII 天气（wttr.in 的 ?format=3 接口）"""
    url = f"https://wttr.in/{city}?format=3"
    req = Request(url, headers={"User-Agent": "curl/7.68.0"})  # wttr.in 对 curl 头友好
    
    try:
        with urlopen(req, timeout=10) as resp:
            return resp.read().decode("utf-8").strip()
    except HTTPError as e:
        print(f"城市不存在或API错误：{e.code}")
        return None
    except URLError as e:
        print(f"网络错误：{e.reason}")
        return None

# 测试
if __name__ == "__main__":
    city = input("请输入城市名（拼音/英文）：")
    weather = get_simple_weather(city)
    if weather:
        print(f"\n{weather}")

总结

urllib 是 Python 网络编程的“底层钥匙”，功能虽然不如第三方库花哨，但胜在稳定、免费、无依赖。掌握它既能应付特殊场景，也能为后续学习更复杂的网络库打下基础。

2024年的开发建议是：小脚本用 urllib，大项目用 requests，永远不要忘记异常捕获和超时设置！

#Python urllib 使用教程（2024最新版）

#核心模块概览

#基础功能：发送 GET 请求

#最简 GET 请求 + 资源安全释放

#自定义请求头（必学！避免被反爬拦截）

#进阶基础：发送 POST 请求

#提交 application/x-www-form-urlencoded 表单

#实用场景：处理 JSON 响应

#高频高级功能

#1. 设置代理

#2. 跳过 SSL 证书验证（仅限开发/测试！）

#必须掌握：错误处理

#2024年的最佳实践

#替代方案对比

#快速练习

#总结