通过python实现简单的爬虫程序（通过python实现简单的爬虫程序）

兲憶弄秂 2023-02-11 20:29:23

1、需求：

抓取http://bbs.hefei.cc/forum-69-1.html合肥论坛跳蚤市场的页面以及该跳蚤市场后续页面的内容并保存到本地。参照python视频贴吧爬虫.flv独立完成代码设计和运行

2、代码：

import requests #合肥论坛爬虫抓取类 class HefeiForumSpider: def __init__(self): self.urlBase = "http://bbs.hefei.cc/forum-69-{}.html" self.headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (Khtml, like Gecko) Chrome/97.0.4692.99 Safari/537.36"} #获取要查询的网页的地址链接，加入到列表中 def getURLList(self): urlList = [] for i in range(5): urlList.append(self.urlBase.format(i)) return urlList #获取要查询的网页的内容 def getHtmlContent(self, url): response = requests.get(url, headers=self.headers) return response.content.decode(encoding = "gbk", errors = "ignore") #将获取到的网页保存到本地 def saveHtml(self, html_str, number): savefilename = "第{}页.html".format(number) with open(savefilename, "w", encoding="gbk") as f: f.write(html_str) def main(): HFS = HefeiForumSpider() urlList = HFS.getURLList() for url in urlList: htmltemp = HFS.getHtmlContent(url) number = urlList.index(url) 1 HFS.saveHtml(htmltemp, number) if __name__ == "__main__": main()

3、运行结果：

通过python实现简单的爬虫程序（通过python实现简单的爬虫程序）(1)

打开页面

通过python实现简单的爬虫程序（通过python实现简单的爬虫程序）(2)

如下表明抓取正常

通过python实现简单的爬虫程序（通过python实现简单的爬虫程序）(3)

4、FAQ：

本例需要用到requets包，如果pip下载失败，解决方案如下：

安装requests报错：

C:\Users\fangel>pip install requests

WARNING: Retrying (Retry(total=4, connect=None, read=None, redirect=None, status=None)) after connection broken by 'ConnectTimeoutError(<pip._vendor.urllib3.connection.HTTPSConnection object at 0x00000172159CC520>, 'Connection to pypi.org timed out. (connect timeout=15)')': /simple/requests/

WARNING: Retrying (Retry(total=3, connect=None, read=None, redirect=None, status=None)) after connection broken by 'ConnectTimeoutError(<pip._vendor.urllib3.connection.HTTPSConnection object at 0x00000172159CCEE0>, 'Connection to pypi.org timed out. (connect timeout=15)')': /simple/requests/

通过python实现简单的爬虫程序（通过python实现简单的爬虫程序）(4)

原因分析：

无法下载国外服务器的相关资源导致

解决方案：

加-i参数从指定的国内镜像进行下载：

C:\Users\fangel>pip install requests -i https://mirrors.aliyun.com/pypi/simple/

通过python实现简单的爬虫程序（通过python实现简单的爬虫程序）(5)

展开全文

免责声明：本文仅代表文章作者的个人观点，与本站无关。其原创性、真实性以及文中陈述文字和内容未经本站证实，对本文以及其中全部或者部分内容文字的真实性、完整性和原创性本站不作任何保证或承诺，请读者仅作参考，并自行核实相关内容。文章投诉邮箱：anhduc.ph@yahoo.com

秒懂生活

通过python实现简单的爬虫程序（通过python实现简单的爬虫程序）

猜您喜欢

做销售最基本的常识（一个做了十年销售的告诉你经验）

会展项目可行性分析的内容（会展活动可行性分析报告的写作格式与大纲）

拉新获客手段（如何获客拉新）

杨紫黑色衣服露肩（杨紫俯身捂胸防走光）

有偶像包袱的星座（偶像包袱略重的星座）

哈尔滨各区县人口概况（哈尔滨有个区人口527705人）

（单词积累tight牛津选必4U4）

热门推荐

排行榜