python爬虫requests如何提取图片,（一）python爬虫入门 - requests+bs4爬取壁纸

1、环境准备

pycharm，python3.5以上，requests，BeautifulSoup4，chrome

2、通过chrome分析网页

众所周知，如何请求到网页资源是网络爬虫的首要任务！我们需要首先分析网页的请求方式，继而编写代码完成爬虫。

(1) 首先，访问http://www.netbian.com/s/huyan/，在F12开发者工具的Network中抓包，找到携带图片链接的资源，通过查找发现在index.htm中

python爬虫requests如何提取图片,（一）python爬虫入门 - requests+bs4爬取壁纸

(2) 验证链接是否确实是图片的链接

python爬虫requests如何提取图片,（一）python爬虫入门 - requests+bs4爬取壁纸

3、通过requests发起请求

import requests

res = requests.get('http://www.netbian.com/s/huyan/', headers={})

print(res.text)

运行结果如下：成功请求到html，并在其中找到图片链接

python爬虫requests如何提取图片,（一）python爬虫入门 - requests+bs4爬取壁纸

4、提取图片链接并下载图片

import requests

from bs4 import BeautifulSoup

import os

try:

os.mkdir('./Temp')

except Exception as e:

del e

def download_file(file_url):

_res = requests.get(file_url, headers={})

save_path = './Temp/%s' % file_url.split('/')[-1]

with open(save_path, 'wb') as f:

f.write(_res.content)

res = requests.get('http://www.netbian.com/s/huyan/', headers={})

res.encoding = res.apparent_encoding

soup = BeautifulSoup(res.text, 'html.parser')

all_li = soup.find('div', class_='list').find_all('li')

for li in all_li:

img_src = li.find('img')['src']

print(img_src)

if img_src != 'http://img.netbian.com/file/2020/0107/63562ba62a7cd23bea9992db97e07095.jpg':

download_file(img_src)

运行结果：

python爬虫requests如何提取图片,（一）python爬虫入门 - requests+bs4爬取壁纸

结语

至此，我们已经通过python的requests成功下载了壁纸图片，但是还有需要改进的地方，将在后面教程中进行优化。关于爬虫的一些技巧及详细教学，以及以上代码的含义和代码优化，我也将在后面的教程中讲解，满满的干货，一定会让你学有所获，受益匪浅。

欢迎大家加入我创建的qq群：python爬虫交流群(494976303)

python爬虫requests如何提取图片,（一）python爬虫入门 - requests+bs4爬取壁纸