沒有網絡怎麼學網絡爬蟲之爬取智聯招聘網python就業招聘資訊存入Excel表格

2023-08-05 23:00:21

沒有網絡可以練習網絡爬蟲？當然可以啦，但是必須先找個有網絡的地方，打開你要爬取的網頁，找的你要擷取的内容，我将要在智聯招聘網上擷取招聘python的相關資訊，如（工作名稱、公司名稱、薪資待遇、位址、經驗、學曆、公司性質、招聘人數、公司福利等）

1、爬蟲前步驟

（1）找個有網的地方，打開需要爬取網頁。

（2）找到需要擷取的内容。

沒有網絡怎麼學網絡爬蟲之爬取智聯招聘網python就業招聘資訊存入Excel表格

（3）儲存源碼到本地檔案，我們沒有必要全部儲存，最好選取需要的部分進行儲存，智聯招聘網python有兩頁，我把它一起儲存在G:/20190720_zhilianzhaopin.html中，可以直接複制粘貼需要部分。

沒有網絡怎麼學網絡爬蟲之爬取智聯招聘網python就業招聘資訊存入Excel表格

（4）建立記事本，ctrl+v粘貼剛複制的内容

<html>
<head>
<title>智聯招聘網python資訊</title>
</head>
<body>
這裡是複制進來的内容，可以多個頁面和為一個html

</body>
</html>

(5)現在直接通路本地檔案了，想去哪裡玩爬蟲都可以了，無需網絡！！！

沒有網絡怎麼學網絡爬蟲之爬取智聯招聘網python就業招聘資訊存入Excel表格

2、爬取網頁代碼

#!/usr/bin/env python2
# -*- coding: utf-8 -*-
import urllib2
from bs4 import BeautifulSoup
import xlwt

url = 'file:///G:/20190720_zhilianzhaopin.html'#本地網頁路徑
html = urllib2.urlopen(url).read()
soup = BeautifulSoup(html,"html.parser")
all_page=[]

#爬蟲函數
for tag in soup.find_all(attrs = {"class":"contentpile__content__wrapper__item clearfix"}):
    print u'工作名稱：',tag.span.get('title')
    gzmc = tag.span.get('title')
    
    for d in tag.find_all(attrs = {"class":"contentpile__content__wrapper__item__info__box__cname__viplevel is_vipLevel"}):
        print u'公司名稱：',d.get('alt')
        gsmc = d.get('alt')
        
    for p in tag.find_all(attrs = {"class":"contentpile__content__wrapper__item__info__box__job__saray"}):
        print u'薪資待遇：',p.get_text()
        xzdy = p.get_text()

    #公司要求
    for ul in tag.find_all(attrs = {"class":"contentpile__content__wrapper__item__info__box__job__demand"}):
        print u'位址：',ul.find_all('li')[0].get_text()
        print u'經驗：',ul.find_all('li')[1].get_text().replace("\n","").replace(" ","")
        print u'學曆：',ul.find_all('li')[-1].get_text()
        dz = ul.find_all('li')[0].get_text()
        jl = ul.find_all('li')[1].get_text().replace("\n","").replace(" ","")
        xl = ul.find_all('li')[-1].get_text()

    for comdec in tag.find_all(attrs = {"class":"contentpile__content__wrapper__item__info__box__job__comdec"}):
        print u'公司性質：',comdec.find_all('span')[0].get_text()
        print u'招聘人數：',comdec.find_all('span')[-1].get_text()
        gsxz = comdec.find_all('span')[0].get_text()
        zprs = comdec.find_all('span')[-1].get_text()
        
    for welfare in tag.find_all(attrs = {"class":"contentpile__content__wrapper__item__info__box__welfare job_welfare"}):
        print u'公司福利：',welfare.get_text()
        gsfl = welfare.get_text()
        print " "
        
    page = [gzmc,gsmc,xzdy,dz,jl,xl,gsxz,zprs,gsfl]
    all_page.append(page)

    book = xlwt.Workbook(encoding='utf-8')
    sheet = book.add_sheet('python就業情況表')
    head = ['工作名稱','公司名稱','薪資待遇','位址','經驗','學曆','公司性質','招聘人數','公司福利']
    for h in range(len(head)):
        sheet.write(0,h,head[h])

    j = 1
    for list in all_page:
        k = 0
        for data in list:
            sheet.write(j,k,data)
            k = k+1
        j = j+1
    book.save('D:/Python/智聯招聘python就業公司情況.xls')

運作結果：

沒有網絡怎麼學網絡爬蟲之爬取智聯招聘網python就業招聘資訊存入Excel表格

Excel結果：

沒有網絡怎麼學網絡爬蟲之爬取智聯招聘網python就業招聘資訊存入Excel表格

沒有網絡怎麼學網絡爬蟲之爬取智聯招聘網python就業招聘資訊存入Excel表格

1、爬蟲前步驟

2、爬取網頁代碼

繼續閱讀

BeautifulSoup爬取豆瓣電影top250資訊

商業分析python實戰（二）：電影智能推薦

釋出了python實戰項目，給大家分享一下！

【Python實戰】使用python計算多種類型到期還款日

python 切分字元串（隻切分最後N個）

python連接配接資料庫：pymsql子產品--增删查改操作類化dbDemon項目目錄結構：和shell互動操作格式化輸出内容（第二個測試檔案）

247個python實戰案例＋技巧！#python#資料分析#python程式設計#幹貨分享#資料分析

Python實作對檔案的批量移動、複制、删除等前沿代碼實作結果展示

【Python實戰】使用python計算多種還款方式的還款計劃

百看不如一練，247個python實戰案例拿去練手吧希望對大家有幫助！喜歡python和正在學習python的小夥伴可以

python-字元串大小寫轉換、是否全為大小寫、字元等判定

python 8-5 如何使用線程池線程池是指配置設定固定個數的線程,concurrentfutues下的ThreadPoolExecutor

有1、2、3、4個數字，能組成多少個互不相同且無重複數字的三位數？都是多少

Python tornado上傳檔案

urllib操作實戰一：post方式通路百度翻譯頁面分析代碼實作

selenium登入爬取淘寶商品資訊