參照一本書(《python資料分析入門 從資料擷取到可視化》-沈祥壯)上的代碼準備學習一下爬蟲,但是卡在了标題中的錯誤中,嘗試了很多方法:pip install lxml / pip uninstall lxml、直接在官網上下載下傳相應版本的lxml 使用絕對路徑安裝等等,仍然無法解決。
期間有許多提示内容,其中就包括下圖所示:
Requirement already satisfied: lxml in c:\users\許逍遙\appdata\local\programs\python\python37\lib\site-packages (4.4.1)
顯示的意思很明顯,已經安裝過了lxml,是以問題就在pycharm配置這塊,具體解決辦法可以參考下面這篇文章(主要是
“
注意!敲黑闆了!
進入到pycharm,選擇file-setting-project interpreter:
”
這塊):
python 中安裝lxml包出現的問題
import requests
from bs4 import BeautifulSoup
url = 'https://book.douban.com/latest'
data = requests.get(url)
#data = requests.get(url)
#print(data.text)
soup = BeautifulSoup(data.text,'lxml')
books_left = soup.find('ul',{ 'class':'cover-col-4 clearfix' })
books_left = books_left.find_all('li')
books_right = soup.find('ul',{ 'class':'cover-col-4 pl20 clearfix' })
books_right = books_right.find_all('li')
books = list(books_left) + list(books_right)
#print(soup)
img_urls = []
titles = []
ratings = []
authors = []
details = []
for book in books:
#封面圖檔url位址
img_url = book.find_all('a')[0].find('img').get('src')
img_urls.append(img_url)
#圖書标題
title = book.find_all('a')[1].get_text()
titles.append(title)
# 評價星級
rating = book.find('p', {'class': 'rating'}).get_text()
rating = rating.replace('\n', '').replace(' ', '')
ratings.append(rating)
# 作者及出版資訊
author = book.find('p', {'class': 'color-gray'}).get_text()
author = author.replace('\n', '').replace(' ', '')
authors.append(author)
# 圖書簡介
detail = book.find_all('p')[2].get_text()
detail = detail.replace('\n', '').replace(' ', '')
details.append(detail)
print("img_urls: ", img_urls)
print("titles: ", titles)
print("ratings: ", ratings)
print("authors: ", authors)
print("details: ", details)