爬蟲抓取大數據精準獲客，python爬蟲下載重試_python爬蟲多次請求超時的幾種重試方法(6種)-基礎知識庫-匯編語言學習筆記

爬蟲抓取大數據精準獲客，python爬蟲下載重試_python爬蟲多次請求超時的幾種重試方法(6種)

2023-10-04 阅读 31 评论 0

摘要：第一種方法headers = Dict()url = 'https://www.baidu.com'try:proxies = None爬蟲抓取大數據精準獲客、response = requests.get(url, headers=headers, verify=False, proxies=None, timeout=3)except:# logdebug('requests fa

第一種方法

headers = Dict()

url = 'https://www.baidu.com'

try:

proxies = None

爬蟲抓取大數據精準獲客、response = requests.get(url, headers=headers, verify=False, proxies=None, timeout=3)

except:

# logdebug('requests failed one time')

try:

proxies = None

response = requests.get(url, headers=headers, verify=False, proxies=None, timeout=3)

python爬取網頁詳細教程，except:

# logdebug('requests failed two time')

print('requests failed two time')

總結：代碼比較冗余，重試try的次數越多，代碼行數越多，但是打印日志比較方便

第二種方法

def requestDemo(url，):

python批量下載文件。headers = Dict()

trytimes = 3 # 重試的次數

for i in range(trytimes):

try:

proxies = None

response = requests.get(url, headers=headers, verify=False, proxies=None, timeout=3)

python爬蟲爬取圖片代碼。# 注意此處也可能是302等狀態碼

if response.status_code == 200:

break

except:

# logdebug(f'requests failed {i}time')

print(f'requests failed {i} time')

愛心代碼編程python可復制，總結：遍歷代碼明顯比第一個簡化了很多，打印日志也方便

第三種方法

def requestDemo(url， times=1):

headers = Dict()

try:

proxies = None

一個完整的python代碼，response = requests.get(url, headers=headers, verify=False, proxies=None, timeout=3)

html = response.text()

# todo 此處處理代碼正常邏輯

pass

return html

except:

爬蟲app下載、# logdebug(f'requests failed {i}time')

trytimes = 3 # 重試的次數

if times < trytimes:

times += 1

return requestDemo(url， times)

return 'out of maxtimes'

網絡爬蟲怎么做、總結：迭代顯得比較高大上，中間處理代碼時有其它錯誤照樣可以進行重試；缺點不太好理解，容易出錯，另外try包含的內容過多時，對代碼運行速度不利。

第四種方法

@retry(3) # 重試的次數 3

def requestDemo(url):

headers = Dict()

proxies = None

怎么爬取網頁數據，response = requests.get(url, headers=headers, verify=False, proxies=None, timeout=3)

html = response.text()

# todo 此處處理代碼正常邏輯

pass

return html

def retry(times):

新手python爬蟲代碼。def wrapper(func):

def inner_wrapper(*args, **kwargs):

i = 0

while i < times:

try:

print(i)

網頁爬蟲代碼、return func(*args, **kwargs)

except:

# 此處打印日志 func.__name__ 為say函數

print("logdebug: {}()".format(func.__name__))

i += 1

return inner_wrapper

python例題和答案，return wrapper

總結：裝飾器優點多種函數復用，使用十分方便

第五種方法

#!/usr/bin/python

# -*-coding='utf-8' -*-

import requests

c++代碼大全及注解、import time

import json

from lxml import etree

import warnings

warnings.filterwarnings("ignore")

def get_xiaomi():

try:

# for n in range(5): # 重試5次

# print("第"+str(n)+"次")

for a in range(5): # 重試5次

print(a)

url = "https://www.mi.com/"

headers = {

"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3",

"Accept-Encoding": "gzip, deflate, br",

"Accept-Language": "zh-CN,zh;q=0.9,en;q=0.8",

"Connection": "keep-alive",

# "Cookie": "xmuuid=XMGUEST-D80D9CE0-910B-11EA-8EE0-3131E8FF9940; Hm_lvt_c3e3e8b3ea48955284516b186acf0f4e=1588929065; XM_agreement=0; pageid=81190ccc4d52f577; lastsource=www.baidu.com; mstuid=1588929065187_5718; log_code=81190ccc4d52f577-e0f893c4337cbe4d|https%3A%2F%2Fwww.mi.com%2F; Hm_lpvt_c3e3e8b3ea48955284516b186acf0f4e=1588929099; mstz=||1156285732.7|||; xm_vistor=1588929065187_5718_1588929065187-1588929100964",

"Host": "www.mi.com",

"Upgrade-Insecure-Requests": "1",

"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.90 Safari/537.36"

}

response = requests.get(url,headers=headers,timeout=10,verify=False)

html = etree.HTML(response.text)

# print(html)

result = etree.tostring(html)

# print(result)

print(result.decode("utf-8"))

title = html.xpath('//head/title/text()')[0]

print("title==",title)