Downloader Middleware:User-Agent、Cookie 管理

📂 所属阶段:第三阶段 — 攻防演练(中间件与反爬篇)


1. 自定义 Middleware

# middlewares.py
import random

class UserAgentMiddleware:
    USER_AGENTS = [
        'Mozilla/5.0 (Windows NT 10.0; Win64; x64)...',
        'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7)...',
    ]
    
    def process_request(self, request, spider):
        request.headers['User-Agent'] = random.choice(self.USER_AGENTS)

class CookieMiddleware:
    def process_request(self, request, spider):
        request.cookies['session_id'] = 'xxx'
        request.headers['Cookie'] = 'session_id=xxx'

3. 启用 Middleware

# settings.py
DOWNLOADER_MIDDLEWARES = {
    'myproject.middlewares.UserAgentMiddleware': 543,
    'myproject.middlewares.CookieMiddleware': 544,
}

4. 小结

Middleware 用途:
- 修改请求
- 修改响应
- 处理异常

常见应用:
- User-Agent 轮换
- Cookie 管理
- 代理设置

💡 记住:Middleware 是反爬的第一道防线。学会用它,你就掌握了伪装的核心。


🔗 扩展阅读