取消

Scrapy爬虫系列3:爬取百度热搜榜单

xxx 发表于 2024-08-282024-08-28T05:10:00+08:00

1 分钟阅读

起始地址 https://top.baidu.com/board?tab=realtime

提取器

  
categories = response.xpath('//*[contains(@class, "category-wrap_iQLoo horizontal_1eKyQ")]')
for category in categories:  
    item = DoubanbooksItem()  # 在每次循环中创建一个新的 item 对象
     item['url'] = response.url
     # 提取 `c-single-text-ellipsis` 类中的文本
    text = category.xpath('.//*[contains(@class, "c-single-text-ellipsis")]/text()').get()
            
      # 提取 `img-wrapper_29V76` 类中的链接
     link = category.xpath('.//*[contains(@class, "img-wrapper_29V76")]/@href').get()

管道使用

  
 text = item['text'].strip()
        link = item['link']
        接收即可。

欢迎评论交流

此博客从手机上发布

Scrapy爬虫系列

本文由作者按照 CC BY 4.0 进行授权

Scrapy爬虫系列3:爬取百度热搜榜单

相关文章

Scrapy爬虫系列1:安装

Scrapy爬虫系列2:爬取豆瓣书籍排行榜

热门标签