Selector 选择器：XPath 与 CSS 选择器高级语法

📂 所属阶段：第一阶段 — 初出茅庐（框架核心篇）
🔗 相关章节：Spider 实战 · Item 与 Item Loader

1. CSS 选择器

# 基础选择器
response.css('div.item')           # 类选择器
response.css('div#main')           # ID 选择器
response.css('div > p')            # 子元素
response.css('div p')              # 后代元素

# 属性选择器
response.css('a[href*="example"]') # 包含
response.css('a[href^="http"]')    # 开头
response.css('a[href$=".html"]')   # 结尾

# 伪选择器
response.css('div:first-child')    # 第一个
response.css('div:nth-child(2)')   # 第 N 个
response.css('div:last-child')     # 最后一个

# 提取文本和属性
response.css('h2::text').get()     # 获取文本
response.css('a::attr(href)').get() # 获取属性
response.css('h2::text').getall()  # 获取所有

2. XPath 选择器

# 基础 XPath
response.xpath('//div[@class="item"]')
response.xpath('//div/p')
response.xpath('//div[1]')

# 文本和属性
response.xpath('//h2/text()').get()
response.xpath('//a/@href').get()

# 高级 XPath
response.xpath('//div[contains(@class, "item")]')
response.xpath('//div[starts-with(@id, "main")]')
response.xpath('//div[position()=1]')

# 逻辑运算
response.xpath('//div[@class="item" and @id="first"]')
response.xpath('//div[@class="item" or @class="product"]')

3. 小结

选择器对比：

CSS：简洁、易学
XPath：强大、灵活

实践建议：
- 简单场景：CSS
- 复杂场景：XPath
- 混合使用：最灵活

💡 记住：选择器是爬虫的眼睛。学会精准选择，你就掌握了数据提取的核心。

🔗 扩展阅读

#Selector 选择器：XPath 与 CSS 选择器高级语法

#1. CSS 选择器

#2. XPath 选择器

#3. 小结

Selector 选择器：XPath 与 CSS 选择器高级语法

1. CSS 选择器

2. XPath 选择器

3. 小结