天天看點

Python查詢天氣預報一. 實作過程二. 實作技術三. 實作四. 小結

一. 實作過程

1.1 查詢外網IP

通過這個網址查詢到外網IP http://ip.dnsexit.com/index.php

1.2 查詢IP所在省份和城市

通過這個位址查詢到IP所在省份和城市 http://int.dpool.sina.com.cn/iplookup/iplookup.php?format=json&ip=54.54.194.134

1.3 查詢所在城市的天氣URL

所在省份和城市, 查找到城市天氣的URL

1.4 查詢所在城市的天氣情況.

通過這個網址查詢天氣的URL查詢天氣資訊

http://m.weather.com.cn/data/101280101.html

(前面3個步驟都是為了這一步準備)

二. 實作技術

2.1 網頁資料抓取與提取

所有網頁資料通過Python抓取, 然後使用正規表達式或者BeautifulSoup或者json來解析.

2.2 城市天氣URL的擷取

利用這個網站上的資訊http://www.weather.com.cn/

先獲得城市的省份URL, 在通過省份資訊獲得該城市的URL

2.3 天氣資訊的獲得

利用這個網站上的資訊http://www.weather.com.cn/

使用json解析.

三. 實作

3.1 查詢外網IP

這個簡單

#!/usr/bin/env python
# coding=utf-8
# Python 2.7.3
# File: GetIP.py
# 獲得外網IP位址
import urllib2
import httplib

def GetIP():
	response = urllib2.urlopen('http://ip.dnsexit.com/index.php')
	htmlStr = response.read()
	return htmlStr

'''
# 測試代碼
print GetIP()
'''
           

3.2 獲得IP所在省份和城市

這個也很簡單

#!/usr/bin/env python
# coding=utf-8
# Python 2.7.3
# File: GetCity.py
# 擷取IP所在國家/省份/城市
import urllib2
import httplib
import json

'''
傳回資訊的結構
{"ret":1,"start":"54.52.163.0","end":"54.57.3.255","country":"美國","province":"紐澤西州","city":"Woodbridge","district":"","isp":"聯通","type":"","desc":""}
'''
def GetCity(ip, city):
	response = urllib2.urlopen('http://int.dpool.sina.com.cn/iplookup/iplookup.php?format=json&ip=' + ip)
	htmlStr = response.read()
	cityInfo = htmlStr.decode("unicode-escape");	
	st = json.loads(cityInfo);
	city[0] = st["country"]
	city[1] = st["province"]
	city[2] = st["city"]

'''
# 測試代碼
city = ["", "", ""]
GetCity("54.54.194.134", city)
print city
'''
           

3.3 擷取城市的天氣的URL

先獲得省份資訊, 在查找城市資訊.

3.3.1 獲得省份URL

看這個網址 http://www.weather.com.cn/textFC/hb.shtml

分析html(儲存為GetCityID1.html)知道 <div class="lqcontentBoxheader">包含省份資訊. 由于這個html比較複雜和有些字元不是唯一的, 是以這裡使用的是BeautifulSoup分析.

3.3.2 獲得城市URL

上面步驟獲得省份URL後, 例如 http://www.weather.com.cn/textFC/xizang.shtml

分析html(儲存為GetCityID2.html)知道 <div class="hanml"> 包含城市URL資訊. 同樣使用BeautifulSoup分析.

這裡獲得的URL是這樣的格式 http://www.weather.com.cn/weather/101280101.shtml 你需要修改成這樣的格式http://m.weather.com.cn/data/101280101.html

3.3.3 實作代碼

這個代碼有一個明顯的缺點, 就是運作速度很慢(一個是網站資料比較多是以慢, 還有就是BeautifulSoup分析也有一點慢(HTML的資料太多了)). 是以動态或者這個就比較慢了, 先把這些URL下載下傳再來儲存到本地也是一個好方法.

#!/usr/bin/env python
# coding=utf-8
# Python 2.7.3
# File: GetCityID.py
# 擷取城市的天氣的URL位址
import urllib2
import HTMLParser
import httplib
from bs4 import BeautifulSoup

def GetProvinceURL(province):
	response = urllib2.urlopen('http://www.weather.com.cn/textFC/hn.shtml')
	htmlByte = response.read()
	htmlStr = htmlByte.decode("utf8")

	soup2 = BeautifulSoup(htmlStr)
	div = soup2.find("div", class_ = "lqcontentBoxheader")
	lista = div.find_all("a")
	provinceURL = "http://www.weather.com.cn"
	for aItem in lista:
		if aItem.text == province:
			provinceURL = provinceURL + aItem["href"]
			break
	
	return provinceURL
			
def GetCityURL(provinceURL, city):
	response = urllib2.urlopen(provinceURL)
	htmlByte = response.read()
	htmlStr = htmlByte.decode("utf8")

	soup2 = BeautifulSoup(htmlStr)
	div = soup2.find("div", class_ = "hanml")
	lista = div.find_all("a", text = city)
	cityURL = lista[0]["href"].replace("www.weather.com.cn/weather", "m.weather.com.cn/data")
	cityURL = cityURL.replace("shtml", "html")
	return cityURL
'''
# GetProvinceURL 測試代碼
print GetProvinceURL(u"廣東")
'''
# GetProvinceURL 測試代碼
provinceURL = GetProvinceURL(u"廣東")
print provinceURL
cityURL = GetCityURL(provinceURL, u"廣州")
print cityURL
           

3.4 天氣資料的擷取

3.4.1 天氣資料的解析

從獲http://m.weather.com.cn/data/101280101.html得到的資料是Json格式, 需要進行解析. (有了這些資料, 你喜歡怎麼顯示都可以了)

{"weatherinfo":{"city":"廣州","city_en":"guangzhou","date_y":"2013年11月29日","date":"","week":"星期五","fchh":"11","cityid":"101280101","temp1":"18℃~5℃","temp2":"20℃~7℃","temp3":"21℃~8℃","temp4":"21℃~9℃","temp5":"22℃~10℃","temp6":"23℃~10℃","tempF1":"64.4℉~41℉","tempF2":"68℉~44.6℉","tempF3":"69.8℉~46.4℉","tempF4":"69.8℉~48.2℉","tempF5":"71.6℉~50℉","tempF6":"73.4℉~50℉","weather1":"晴","weather2":"晴","weather3":"晴","weather4":"晴","weather5":"晴","weather6":"晴","img1":"0","img2":"99","img3":"0","img4":"99","img5":"0","img6":"99","img7":"0","img8":"99","img9":"0","img10":"99","img11":"0","img12":"99","img_single":"0","img_title1":"晴","img_title2":"晴","img_title3":"晴","img_title4":"晴","img_title5":"晴","img_title6":"晴","img_title7":"晴","img_title8":"晴","img_title9":"晴","img_title10":"晴","img_title11":"晴","img_title12":"晴","img_title_single":"晴","wind1":"北風3-4級轉微風","wind2":"微風","wind3":"微風","wind4":"微風","wind5":"微風","wind6":"微風","fx1":"北風","fx2":"微風","fl1":"3-4級轉小于3級","fl2":"小于3級","fl3":"小于3級","fl4":"小于3級","fl5":"小于3級","fl6":"小于3級","index":"較冷","index_d":"建議着大衣、呢外套加毛衣、衛衣等服裝。體弱者宜着厚外套、厚毛衣。因晝夜溫差較大,注意增減衣服。","index48":"較冷","index48_d":"建議着大衣、呢外套加毛衣、衛衣等服裝。體弱者宜着厚外套、厚毛衣。因晝夜溫差較大,注意增減衣服。","index_uv":"中等","index48_uv":"中等","index_xc":"适宜","index_tr":"适宜","index_co":"舒适","st1":"16","st2":"6","st3":"19","st4":"8","st5":"20","st6":"9","index_cl":"不宜","index_ls":"适宜","index_ag":"易發"}}

#!/usr/bin/env python
# coding=utf-8
# Python 2.7.3
# File: GetCityWeather.py
# 獲得城市天氣資料
import urllib2
import httplib
import json

def GetCityWeather(cityURL):
	response = urllib2.urlopen(cityURL)
	htmlByte = response.read()
	htmlStr = htmlByte.decode("utf8")
	st = json.loads(htmlStr);
	return st
'''
# http://m.weather.com.cn/data/101280101.html
{"weatherinfo":{"city":"廣州","city_en":"guangzhou","date_y":"2013年11月29日","date":"","week":"星期五","fchh":"11","cityid":"101280101","temp1":"18℃~5℃","temp2":"20℃~7℃","temp3":"21℃~8℃","temp4":"21℃~9℃","temp5":"22℃~10℃","temp6":"23℃~10℃","tempF1":"64.4℉~41℉","tempF2":"68℉~44.6℉","tempF3":"69.8℉~46.4℉","tempF4":"69.8℉~48.2℉","tempF5":"71.6℉~50℉","tempF6":"73.4℉~50℉","weather1":"晴","weather2":"晴","weather3":"晴","weather4":"晴","weather5":"晴","weather6":"晴","img1":"0","img2":"99","img3":"0","img4":"99","img5":"0","img6":"99","img7":"0","img8":"99","img9":"0","img10":"99","img11":"0","img12":"99","img_single":"0","img_title1":"晴","img_title2":"晴","img_title3":"晴","img_title4":"晴","img_title5":"晴","img_title6":"晴","img_title7":"晴","img_title8":"晴","img_title9":"晴","img_title10":"晴","img_title11":"晴","img_title12":"晴","img_title_single":"晴","wind1":"北風3-4級轉微風","wind2":"微風","wind3":"微風","wind4":"微風","wind5":"微風","wind6":"微風","fx1":"北風","fx2":"微風","fl1":"3-4級轉小于3級","fl2":"小于3級","fl3":"小于3級","fl4":"小于3級","fl5":"小于3級","fl6":"小于3級","index":"較冷","index_d":"建議着大衣、呢外套加毛衣、衛衣等服裝。體弱者宜着厚外套、厚毛衣。因晝夜溫差較大,注意增減衣服。","index48":"較冷","index48_d":"建議着大衣、呢外套加毛衣、衛衣等服裝。體弱者宜着厚外套、厚毛衣。因晝夜溫差較大,注意增減衣服。","index_uv":"中等","index48_uv":"中等","index_xc":"适宜","index_tr":"适宜","index_co":"舒适","st1":"16","st2":"6","st3":"19","st4":"8","st5":"20","st6":"9","index_cl":"不宜","index_ls":"适宜","index_ag":"易發"}}
'''

'''
# GetCityWeather測試代碼
# GetProvinceURL 測試代碼
cityURL = "http://m.weather.com.cn/data/101280101.html"
st = GetCityWeather(cityURL)
ss = st["weatherinfo"]
print ss["city"]
print ss["date_y"]
print ss["week"]
print ss["temp1"]
print ss["weather1"]
'''
'''
# 輸出
廣州
2013年11月29日
星期五
18℃~5℃
晴
'''
           

3.5 主程式代碼

#!/usr/bin/env python
# coding=utf-8
# Python 2.7.3
import GetIP
import GetCity
import GetCityID
import GetCityWeather


ip = GetIP.GetIP()
print ip

# 國家/省份/城市
city = ["", "", ""]
GetCity.GetCity(ip, city)
print city[0], city[1], city[2]
 
provinceURL = GetCityID.GetProvinceURL(city[1])
cityURL = GetCityID.GetCityURL(provinceURL, city[2])
print provinceURL
print cityURL

st = GetCityWeather.GetCityWeather(cityURL)
ss = st["weatherinfo"]
print ss["city"]
print ss["date_y"]
print ss["week"]
print ss["temp1"]
print ss["weather1"]
           

這兩段代碼運作的非常慢

provinceURL = GetCityID.GetProvinceURL(city[1])

cityURL = GetCityID.GetCityURL(provinceURL, city[2])

四. 小結

4.1 國外的城市可能查不到, 因為天氣資料依賴于http://www.weather.com.cn/

4.2 擷取城市url的速度實在太慢了. 的确先提取儲存可能會更快吧.

4.3 通過實作這樣的功能, 了解了json.

4.4 網上有很多有用的資料, 特别是一些動态的海量的資料, 你不可能手動去取, 就看你能不能抓, 找出規律, 兩手抓, 兩手都要硬

4.5 本文是參考http://blog.csdn.net/x_iya/article/details/8583015