Python异步爬虫(aiohttp版)

异步协程不太了解的话可以去看我上篇博客:https://www.cnblogs.com/Red-Sun/p/16934843.html
PS:本博客是个人笔记分享,不需要扫码加群或必须关注什么的(如果外站需要加群或关注的可以直接去我主页查看)
欢迎大家光临ヾ(≧▽≦*)o我的博客首页https://www.cnblogs.com/Red-Sun/

1.requests请求

# -*- coding: utf-8 -*- # @Time    : 2022/12/6 16:03 # @Author  : 红后 # @Email   : not_enabled@163.com # @blog    : https://www.cnblogs.com/Red-Sun # @File    : 实例1.py # @Software: PyCharm import aiohttp, asyncio   async def aiohttp_requests(url):  # aiohttp的requests函数     async with aiohttp.request("GET", url=url) as response:         return await response.text(encoding='UTF-8')   async def main():  # 主函数用于异步函数的启动     url = 'https://www.baidu.com'     html = await aiohttp_requests(url)  # await修饰异步函数     print(html)   if __name__ == '__main__':     loop = asyncio.get_event_loop()     loop.run_until_complete(main())  

Python异步爬虫(aiohttp版)

2.session请求

GET:

# -*- coding: utf-8 -*- # @Time    : 2022/12/6 16:33 # @Author  : 红后 # @Email   : not_enabled@163.com # @blog    : https://www.cnblogs.com/Red-Sun # @File    : 实例2.py # @Software: PyCharm import aiohttp, asyncio   async def aiohttp_requests(url):  # aiohttp的requests函数     async with aiohttp.ClientSession() as session:  # 声明了一个支持异步的上下文管理器         async with session.get(url) as response:             return await response.text(encoding='UTF-8')   async def main():  # 主函数用于异步函数的启动     url = 'https://www.baidu.com'     html = await aiohttp_requests(url)  # await修饰异步函数     print(html)   if __name__ == '__main__':     loop = asyncio.get_event_loop()     loop.run_until_complete(main()) 

Python异步爬虫(aiohttp版)
其中aiohttp还有post,put, delete...等一系列请求(PS:一般情况下只需要创建一个session,然后使用这个session执行所有的请求。)
PSOT:传参

async def aiohttp_requests(url):  # aiohttp的requests函数     async with aiohttp.ClientSession() as session:         data = {'key': 'value'}         async with session.post(url=url, data=data) as response:             return await response.text(encoding='UTF-8') 

PS:这种传参传递的数据将会被转码,如果不想被转码可以直接提交字符串data=str(data)

附:关于session请求数据修改操作

1.cookies

自定义cookies应该放在ClientSession中,而不是session.get()中

async def aiohttp_requests(url):  # aiohttp的requests函数     cookies = {'USER_AGENT': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/106.0.0.0 Safari/537.36'}     async with aiohttp.ClientSession(cookies=cookies) as session:         async with session.get(url) as response:             return await response.text(encoding='UTF-8') 

2.headers

放在自定义的headers跟正常的requests一样放在session.get()中

async def aiohttp_requests(url):  # aiohttp的requests函数     async with aiohttp.ClientSession() as session:         headers = {'USER_AGENT': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/106.0.0.0 Safari/537.36'}         async with session.get(url=url, headers=headers) as response:             return await response.text(encoding='UTF-8') 

3.timeout

默认响应时间为5分钟,通过timeout可以重新设定,其放在session.get()中

async def aiohttp_requests(url):  # aiohttp的requests函数     async with aiohttp.ClientSession() as session:         async with session.get(url=url, timeout=60) as response:             return await response.text(encoding='UTF-8') 

4.proxy

当然代理也是支持的在session.get()中配置

async def aiohttp_requests(url):  # aiohttp的requests函数     async with aiohttp.ClientSession() as session:         async with session.get(url=url, proxy="http://some.proxy.com") as response:             return await response.text(encoding='UTF-8') 

需要授权的代理

async def aiohttp_requests(url):  # aiohttp的requests函数     async with aiohttp.ClientSession() as session:         proxy_auth = aiohttp.BasicAuth('user', 'pass')  # 用户,密码         async with session.get(url=url, proxy="http://some.proxy.com", proxy_auth=proxy_auth) as response:             return await response.text(encoding='UTF-8') 

或者

async def aiohttp_requests(url):  # aiohttp的requests函数     async with aiohttp.ClientSession() as session:         async with session.get(url=url, proxy='http://user:pass@some.proxy.com') as response:             return await response.text(encoding='UTF-8') 

报错处理

错误:RuntimeError: Event loop is closed

Python异步爬虫(aiohttp版)
报错原因是使用了asyncio.run(main())来运行程序
看到别个大佬的总结是asyncio.run()会自动关闭循环,并且调用_ProactorBasePipeTransport.__del__报错, 而asyncio.run_until_complete()不会。
第一种解决方法换成如下代码运行

loop = asyncio.get_event_loop() loop.run_until_complete(main())  

第二种重写方法以保证run()的运行

from functools import wraps  from asyncio.proactor_events import _ProactorBasePipeTransport  def silence_event_loop_closed(func):     @wraps(func)     def wrapper(self, *args, **kwargs):         try:             return func(self, *args, **kwargs)         except RuntimeError as e:             if str(e) != 'Event loop is closed':                 raise     return wrapper  _ProactorBasePipeTransport.__del__ = silence_event_loop_closed(_ProactorBasePipeTransport.__del__) 

发表评论

评论已关闭。

相关文章