多ip服务器绑定ip发送请求(requests和scrapy)
有时候我们会购买多ip服务器(站群服务器),来满足特定的需求,比如建站,SEO等。多ip服务器通常是ip越多价格越贵,ip段也有优劣之分。
对于爬虫而言,200多个ip,合理利用的话也可以满足很多需求了。
本篇文章针对python的两个爬虫库:requests和scrapy,怎么绑定指定ip发送请求做的整理笔记。
1、获取本机所有ip
第一步就是要知道本机有多少可用的ip。
| import psutil
def get_local_ips():
"""获取本机所有ip"""
local_ips = []
info = psutil.net_if_addrs()
for k, v in info.items():
if "eth" in k:
for item in v:
if item[0] == 2:
local_ips.append(item[1])
print("本机ip数量:", len(local_ips), local_ips)
return local_ips
def getNetiAddrInfo():
"""获取IP地址
代码来源: https://www.programcreek.com/python/example/88702/psutil.net_if_addrs
"""
neti_list = []
ipv4_list = []
ipv6_list = []
# id_neti_list = []
# result_list = []
neti_dict = psutil.net_if_addrs()
for neti in neti_dict:
neti_list.append(neti)
# id_neti_list.append('NETI-{}-{}'.format(self.os_id, neti))
snic_list = neti_dict[neti]
for snic in snic_list:
if snic.family.name == 'AF_INET':
ipv4_list.append(snic.address)
elif snic.family.name == 'AF_INET6':
ipv6_list.append(re.sub('%.*$', '', snic.address))
# result = [','.join(neti_list), ','.join(ipv4_list + ipv6_list), ','.join(id_neti_list)]
return list(set(ipv4_list))
|
2、requests
主要是使用SourceAddressAdapter绑定ip,发送请求。
| from requests_toolbelt import SourceAddressAdapter
def adapter_requests(self):
"""随机绑定一个本机ip"""
bind_address = random.choice(self.ips)
print("请求ip:", bind_address)
new_source = SourceAddressAdapter(bind_address)
self.session.mount('http://', new_source)
self.session.mount('https://', new_source)
|
3、测试requests请求
利用http://httpbin.org/get网站来测试请求ip。
总代码为:
| import random
import psutil
import requests
from requests_toolbelt import SourceAddressAdapter
class SourceAddressRequests(object):
def __init__(self):
self.session = requests.session()
self.ips = []
@staticmethod
def get_local_ips():
"""获取本机所有ip"""
local_ips = []
info = psutil.net_if_addrs()
for k, v in info.items():
if "eth" in k:
for item in v:
if item[0] == 2:
local_ips.append(item[1])
print("本机ip数量:", len(local_ips), local_ips)
return local_ips
def adapter_requests(self):
"""随机绑定一个本机ip"""
bind_address = random.choice(self.ips)
print("请求ip:", bind_address)
new_source = SourceAddressAdapter(bind_address)
self.session.mount('http://', new_source)
self.session.mount('https://', new_source)
def test_requests(self):
"""测试请求"""
url = "http://httpbin.org/get"
response = self.session.get(url=url)
origin = response.json()["origin"]
print("检测到ip:", origin)
def main(self):
self.ips = self.get_local_ips()
for i in range(5):
print("第{}次请求".format(i + 1))
self.adapter_requests()
self.test_requests()
if __name__ == '__main__':
test = SourceAddressRequests()
test.main()
|
测试结果为:
ip地址已经做处理,请自行测试。
| 本机ip数量: 256
第1次请求
请求ip: 100.100.100.72
检测到ip: 100.100.100.72
第2次请求
请求ip: 100.100.100.52
检测到ip: 100.100.100.52
第3次请求
请求ip: 100.100.100.51
检测到ip: 100.100.100.51
第4次请求
请求ip: 100.100.100.166
检测到ip: 100.100.100.166
第5次请求
请求ip: 100.100.100.75
检测到ip: 100.100.100.75
|
4、scrapy
创建一个测试的scrapy项目:demo。
在middlewares.py中添加自定义中间件
| class BindAddressMiddleware(object):
"""随机获取本机ip,发送请求"""
def __init__(self, settings):
self.is_bind_address = settings.get('IS_MORE_NETWORK_CARDS')
self.ips = settings.get('BIND_ADDRESS')
@classmethod
def from_crawler(cls, crawler):
return cls(crawler.settings)
def process_request(self, request, spider):
if self.is_bind_address:
bind_address = random.choice(self.ips)
if bind_address:
request.meta['bindaddress'] = (bind_address, 0)
spider.logger.info('Using: %s as bind address' % bind_address)
return None
|
在settings中配置参数
| "IS_MORE_NETWORK_CARDS": True,
"BIND_ADDRESS": get_local_ips(),
|
5、scrapy测试
在spider中进行测试
| import scrapy
import json
class BinSpider(scrapy.Spider):
name = 'bin'
allowed_domains = ['httpbin.org']
start_urls = ['http://httpbin.org/get'] * 5
def parse(self, response):
origin = json.loads(response.text)["origin"]
print("origin is", origin)
|
测试结果:
| ...
2020-11-12 16:41:56 [bin] INFO: Using: 100.100.100.190 as bind address
2020-11-12 16:41:56 [bin] INFO: Using: 100.100.100.151 as bind address
2020-11-12 16:41:56 [bin] INFO: Using: 100.100.100.34 as bind address
2020-11-12 16:41:56 [bin] INFO: Using: 100.100.100.140 as bind address
2020-11-12 16:41:56 [bin] INFO: Using: 100.100.100.79 as bind address
origin is 100.100.100.190
origin is 100.100.100.79
origin is 100.100.100.140
origin is 100.100.100.151
origin is 100.100.100.34
...
|
效果好像还可以,后续自由发挥。
Any text/graphics/videos and other articles on this website that indicate "Source: xxx" are reprinted on this website for the purpose of transmitting more information, which does not mean that we agree with their views or confirm the authenticity of their content. If you are involved in the content of the work, copyright and other issues, please contact this website, we will delete the content in the first time!
Author: down_dawn
Source: https://blog.csdn.net/qq_42280510/article/details/109646220