前言:周末閑來無事,玩了幾局王者榮耀,突發奇想怎麼擷取到王者榮耀裡面的英雄皮膚,本期分享一下如何通過matlab爬蟲批量提取王者榮耀的英雄皮膚
關鍵字:王者榮耀、爬蟲、Matlab
首先在度娘上找到王者榮耀的英雄首頁
https://pvp.qq.com/web201605/herolist.shtml
通過chrome浏覽器的F12,找到擷取英雄的調用方法。
輕按兩下這個herolist.json,通過Notepad++打開可以看到裡面對應的英雄詳情
在matlab通過urlread函數擷取到英雄清單情況,由于擷取到的資料為char,需要轉換為json格式,通過jsondecode函數實作轉換
url = 'https://pvp.qq.com/web201605/js/herolist.json';try herolist = urlread(url);catch disp('提取英雄清單失敗,請再次重試')endjsonData=jsondecode(herolist);
對每個英雄提取對應的ename、cname和skin_name變量,
[row, col] = size(jsonData);hero_name = cell(row,1);hero_number = cell(row,1);hero_skin_number = cell(row,1);for i = 1:row hero_name{i} = jsonData{i,1}.cname; hero_number{i} = jsonData{i,1}.ename; try skin_name = strsplit(jsonData{i,1}.skin_name, '|'); hero_skin_number{i} = length(skin_name); catch hero_skin_number{i} = 1; endend
下面開始要提取對應的圖檔,例如任意選擇一個英雄,網址為:
https://pvp.qq.com/web201605/herodetail/508.shtml
右擊選擇圖檔-->檢查,
可以看到對應的圖檔路徑:
https://game.gtimg.cn/images/yxzj/img201606/heroimg/508/508-smallskin-2.jpg
根據上面的網址,可以得到對應的圖檔路徑格式為:
https://game.gtimg.cn/images/yxzj/img201606/heroimg/{hero_number
}/{hero_number}-smallskin-{hero_skin_number}.jpg
通過webread函數提取到對應的圖檔,然後擷取的圖檔通過imwrite儲存成jpg檔案。
onehero_link = strcat('http://game.gtimg.cn/images/yxzj/img201606/skin/hero-info/' ... , num2str(hero_number{i}), '/', num2str(hero_number{i}), '-bigskin-', num2str(k),'.jpg');try image = webread(onehero_link);catch disp(['擷取英雄',hero_name{i},'圖檔失敗,請再次重試']) end imwrite(image,strcat(file_path_name_, num2str(k), '.jpg'));
通過urlread和webread在網絡不好的情況下,可能會出現報錯,是以通過try catch避免程式異常報錯。對于每個英雄都單獨儲存在對應的名稱檔案夾下,通過mkdir建立檔案夾。
完整的代碼:
url = 'https://pvp.qq.com/web201605/js/herolist.json';try herolist = urlread(url);catch disp('提取英雄清單失敗,請再次重試')end
jsnotallow=jsondecode(herolist);[row, col] = size(jsonData);hero_name = cell(row,1);hero_number = cell(row,1);hero_skin_number = cell(row,1);for i = 1:row hero_name{i} = jsonData{i,1}.cname; hero_number{i} = jsonData{i,1}.ename; try skin_name = strsplit(jsonData{i,1}.skin_name, '|'); hero_skin_number{i} = length(skin_name); catch hero_skin_number{i} = 1; endendsavepath = 'C:\Users\xxx\Documents\MATLAB';for i = 1:row file_name = hero_name{i}; file_path_name = strcat(savepath,'\',file_name); file_path_name_ = strcat(file_path_name,'\'); mkdir(file_path_name_); for k = 1:hero_skin_number{i} onehero_link = strcat('http://game.gtimg.cn/images/yxzj/img201606/skin/hero-info/' ... , num2str(hero_number{i}), '/', num2str(hero_number{i}), '-bigskin-', num2str(k),'.jpg'); try image = webread(onehero_link); catch disp(['擷取英雄',hero_name{i},'圖檔失敗,請再次重試']) end imwrite(image,strcat(file_path_name_, num2str(k), '.jpg')); endend
最後提取的結果如下
最喜歡的英雄程咬金結束本文
import osimport requests
url = 'https://pvp.qq.com/web201605/js/herolist.json'herolist = requests.get(url) # 擷取英雄清單json檔案
herolist_json = herolist.json() # 轉化為json格式hero_name = list(map(lambda x: x['cname'], herolist.json())) # 提取英雄的名字hero_number = list(map(lambda x: x['ename'], herolist.json())) # 提取英雄的編号hero_skin_number = []for i in herolist.json(): try: hero_skin_number.append(len(i['skin_name'].split("|"))) except KeyError: hero_skin_number.append(1)
# 下載下傳圖檔def downloadPic(): i = 0 for j in hero_number: # 建立檔案夾 os.mkdir("./" + hero_name[i]) # 進入建立好的檔案夾 os.chdir("./" + hero_name[i]) i += 1 for k in range(1, hero_skin_number[i - 1] + 1): # 拼接url onehero_link = 'http://game.gtimg.cn/images/yxzj/img201606/skin/hero-info/' + str(j) + '/' + str( j) + '-bigskin-' + str(k) + '.jpg' print(onehero_link) im = requests.get(onehero_link) # 請求url if im.status_code == 200: open(str(k) + '.jpg', 'wb').write(im.content) # 寫入檔案 os.chdir("../")
downloadPic()