如何用beautifulsoup提取信息（用BeautifulSoup的select获取酷狗排行榜）

提笔话周生 2023-05-13 22:29:06

收藏赞分享

用Python进行爬取以下数据：

如何用beautifulsoup提取信息（用BeautifulSoup的select获取酷狗排行榜）(1)

重要公式：

1、from bs4 import BeautifulSoup

2、soup=BeautifulSoup(wb_data.text,'lxml')

3、soup.select

4、title.get_text()

根据网页的特征，是个静态的网页，带上headers，就能返回网页的数据。因此爬取是没问题的，只是对返回的数据，需要筛选提取。

如何用beautifulsoup提取信息（用BeautifulSoup的select获取酷狗排行榜）(2)

复制的内容：

#rankWrap > div.pc_temp_songlist.pc_rank_songlist_short > ul > li:nth-child(1) > a

这样我们只是得到其中一个a标签，要获取所有的a标签，需要改成以下的格式

titles=soup.select("div.pc_temp_songlist > ul > li > a")

这个title.get_text()是获取title的文本格式，返回的是字符串格式str。

from bs4 import BeautifulSoup import requests import time import json headers={ "user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/97.0.4692.99 Safari/537.36" } def get_info(url): wb_data=requests.get(url,headers=headers) soup=BeautifulSoup(wb_data.text,'lxml') ranks=soup.select("span.pc_temp_num") titles=soup.select("div.pc_temp_songlist > ul > li > a") times=soup.select(" div.pc_temp_songlist > ul > li > span.pc_temp_tips_r > span") for rank,title,time in zip(ranks,titles,times): data={"rank":rank.get_text().strip(), "singer":title.get_text().split("-")[1].strip(), "song":title.get_text().split("-")[0].strip(), "time":time.get_text().strip() } print(data) if __name__ == '__main__': url="https://www.kugou.com/yy/rank/home/1-6666.html?from=rank" get_info(url)

以上是所有的代码。

展开全文

免责声明：本文仅代表文章作者的个人观点，与本站无关。其原创性、真实性以及文中陈述文字和内容未经本站证实，对本文以及其中全部或者部分内容文字的真实性、完整性和原创性本站不作任何保证或承诺，请读者仅作参考，并自行核实相关内容。文章投诉邮箱：anhduc.ph@yahoo.com

秒懂生活

如何用beautifulsoup提取信息（用BeautifulSoup的select获取酷狗排行榜）

猜您喜欢

童年超治愈龙猫（一生都在被美好童年治愈）

袋鼠黑狗后续（狗主人咬伤孕妇）

英雄联盟狗头教学吴先生（英雄联盟狗头辅助）

宠物兔是怎么饲养的（宠物兔的挑选与饲养）

上厕所被猫吓到预示着什么（厕所门缝出现鬼猫手　他吓到漏尿）

猫吃什么肉可以发腮（关于猫吃什么能发腮的不科学研究）

加油妈妈大结局在一起了吗（加油妈妈热度不佳）

热门推荐

排行榜