跳转至

多ip服务器绑定ip发送请求(requests和scrapy)

有时候我们会购买多ip服务器(站群服务器),来满足特定的需求,比如建站,SEO等。多ip服务器通常是ip越多价格越贵,ip段也有优劣之分。 对于爬虫而言,200多个ip,合理利用的话也可以满足很多需求了。 本篇文章针对python的两个爬虫库:requests和scrapy,怎么绑定指定ip发送请求做的整理笔记。

1、获取本机所有ip

第一步就是要知道本机有多少可用的ip。

import psutil


def get_local_ips():
    """获取本机所有ip"""
    local_ips = []
    info = psutil.net_if_addrs()
    for k, v in info.items():
        if "eth" in k:
            for item in v:
                if item[0] == 2:
                    local_ips.append(item[1])


    print("本机ip数量:", len(local_ips), local_ips)
    return local_ips

def getNetiAddrInfo():
    """获取IP地址
    代码来源: https://www.programcreek.com/python/example/88702/psutil.net_if_addrs
    """
    neti_list = []
    ipv4_list = []
    ipv6_list = []
    # id_neti_list = []
    # result_list = []
    neti_dict = psutil.net_if_addrs()
    for neti in neti_dict:
        neti_list.append(neti)
        # id_neti_list.append('NETI-{}-{}'.format(self.os_id, neti))
        snic_list = neti_dict[neti]
        for snic in snic_list:
            if snic.family.name == 'AF_INET':
                ipv4_list.append(snic.address)
            elif snic.family.name == 'AF_INET6':
                ipv6_list.append(re.sub('%.*$', '', snic.address))
    # result = [','.join(neti_list), ','.join(ipv4_list + ipv6_list), ','.join(id_neti_list)]
    return list(set(ipv4_list))

2、requests

主要是使用SourceAddressAdapter绑定ip,发送请求。

1
2
3
4
5
6
7
8
9
from requests_toolbelt import SourceAddressAdapter

def adapter_requests(self):
    """随机绑定一个本机ip"""
    bind_address = random.choice(self.ips)
    print("请求ip:", bind_address)
    new_source = SourceAddressAdapter(bind_address)
    self.session.mount('http://', new_source)
    self.session.mount('https://', new_source)

3、测试requests请求

利用http://httpbin.org/get网站来测试请求ip。

总代码为:

    import random
    import psutil
    import requests
    from requests_toolbelt import SourceAddressAdapter


    class SourceAddressRequests(object):
        def __init__(self):
            self.session = requests.session()
            self.ips = []

        @staticmethod
        def get_local_ips():
            """获取本机所有ip"""
            local_ips = []
            info = psutil.net_if_addrs()
            for k, v in info.items():
                if "eth" in k:
                    for item in v:
                        if item[0] == 2:
                            local_ips.append(item[1])


            print("本机ip数量:", len(local_ips), local_ips)
            return local_ips

        def adapter_requests(self):
            """随机绑定一个本机ip"""
            bind_address = random.choice(self.ips)
            print("请求ip:", bind_address)
            new_source = SourceAddressAdapter(bind_address)
            self.session.mount('http://', new_source)
            self.session.mount('https://', new_source)

        def test_requests(self):
            """测试请求"""
            url = "http://httpbin.org/get"
            response = self.session.get(url=url)
            origin = response.json()["origin"]
            print("检测到ip:", origin)

        def main(self):
            self.ips = self.get_local_ips()
            for i in range(5):
                print("第{}次请求".format(i + 1))
                self.adapter_requests()
                self.test_requests()


    if __name__ == '__main__':
        test = SourceAddressRequests()
        test.main()

测试结果为:

ip地址已经做处理,请自行测试。

    本机ip数量: 256
    第1次请求
    请求ip: 100.100.100.72
    检测到ip: 100.100.100.72
    第2次请求
    请求ip: 100.100.100.52
    检测到ip: 100.100.100.52
    第3次请求
    请求ip: 100.100.100.51
    检测到ip: 100.100.100.51
    第4次请求
    请求ip: 100.100.100.166
    检测到ip: 100.100.100.166
    第5次请求
    请求ip: 100.100.100.75
    检测到ip: 100.100.100.75

4、scrapy

创建一个测试的scrapy项目:demo。

在middlewares.py中添加自定义中间件

class BindAddressMiddleware(object):
    """随机获取本机ip,发送请求"""
    def __init__(self, settings):
        self.is_bind_address = settings.get('IS_MORE_NETWORK_CARDS')
        self.ips = settings.get('BIND_ADDRESS')

    @classmethod
    def from_crawler(cls, crawler):
        return cls(crawler.settings)

    def process_request(self, request, spider):
        if self.is_bind_address:
            bind_address = random.choice(self.ips)
            if bind_address:
                request.meta['bindaddress'] = (bind_address, 0)
                spider.logger.info('Using: %s as bind address' % bind_address)
        return None

在settings中配置参数

    "IS_MORE_NETWORK_CARDS": True,
    "BIND_ADDRESS": get_local_ips(),

5、scrapy测试

在spider中进行测试

import scrapy
import json

class BinSpider(scrapy.Spider):
    name = 'bin'
    allowed_domains = ['httpbin.org']
    start_urls = ['http://httpbin.org/get'] * 5

    def parse(self, response):
        origin = json.loads(response.text)["origin"]
        print("origin is", origin)

测试结果:

    ...
    2020-11-12 16:41:56 [bin] INFO: Using: 100.100.100.190 as bind address
    2020-11-12 16:41:56 [bin] INFO: Using: 100.100.100.151 as bind address
    2020-11-12 16:41:56 [bin] INFO: Using: 100.100.100.34 as bind address
    2020-11-12 16:41:56 [bin] INFO: Using: 100.100.100.140 as bind address
    2020-11-12 16:41:56 [bin] INFO: Using: 100.100.100.79 as bind address
    origin is 100.100.100.190
    origin is 100.100.100.79
    origin is 100.100.100.140
    origin is 100.100.100.151
    origin is 100.100.100.34
    ...

效果好像还可以,后续自由发挥。

Any text/graphics/videos and other articles on this website that indicate "Source: xxx" are reprinted on this website for the purpose of transmitting more information, which does not mean that we agree with their views or confirm the authenticity of their content. If you are involved in the content of the work, copyright and other issues, please contact this website, we will delete the content in the first time!
Author: down_dawn
Source: https://blog.csdn.net/qq_42280510/article/details/109646220