#Selenium/Playwright 集成:JavaScript 动态渲染处理
📂 所属阶段:第三阶段 — 攻防演练(中间件与反爬篇)
#1. Selenium 集成
from selenium import webdriver
class SeleniumMiddleware:
def __init__(self):
self.driver = webdriver.Chrome()
def process_request(self, request, spider):
if request.meta.get('use_selenium'):
self.driver.get(request.url)
return HtmlResponse(
url=request.url,
body=self.driver.page_source.encode('utf-8'),
request=request
)#2. Playwright 集成
from playwright.async_api import async_playwright
class PlaywrightMiddleware:
async def process_request(self, request, spider):
if request.meta.get('use_playwright'):
async with async_playwright() as p:
browser = await p.chromium.launch()
page = await browser.new_page()
await page.goto(request.url)
content = await page.content()
await browser.close()
return HtmlResponse(
url=request.url,
body=content.encode('utf-8'),
request=request
)#3. 小结
浏览器自动化:
Selenium:成熟、稳定
Playwright:快速、现代
何时使用:
- 静态页面:直接爬虫
- 动态页面:Selenium/Playwright
- 复杂交互:Playwright💡 记住:浏览器自动化很慢,但有时是必需的。优先用静态爬虫,实在不行再用浏览器。
🔗 扩展阅读

