大意是,封裝google語言檢測ajax web service的接口,輸入一段話,輸出語言種類。這個方法是從rssmeme.com看來的,經測試效果還不錯,可用于檢測微部落格消息的語言,如中文、日文、韓文等。但由于google對過于頻繁的請求會重置連結,是以提請注意,這個web service不适合大量密集請求送出。
通路
<a href="http://ajax.googleapis.com/ajax/services/language/detect?v=1.0&q=hello+world">http://ajax.googleapis.com/ajax/services/language/detect?v=1.0&q=hello+world</a>
連結,你可以看到傳回結果是一個json字元串:
{"responsedata": {"language":"en","isreliable":false,"confidence":0.114892714}, "responsedetails": null, "responsestatus": 200}
記得加版本号參數:v=1.0,否則傳回如下json:
{"responsedata": null, "responsedetails": "invalid version", "responsestatus": 400}
舉例,送去檢測的微部落格消息是:
經過urlencode變換後,送出到google,傳回的結果是:
{"responsedata": {"language":"ja","isreliable":true,"confidence":0.88555187}, "responsedetails": null, "responsestatus": 200}
這樣用result['responsedata']['language']就獲得了語言的代号。
隻要檢查這個代号不是“zh-cn”,那麼就不是中文語言了。
示範:
import urllib
import httplib2
try:
from base import easyjson
except:
pass
class detect():
def __init__(self, httplib2_inst=none):
"""從外可以傳入httplib執行個體,便于在外部加設代理軟體穿牆"""
self.http = httplib2_inst or httplib2.http()
def post_sentence(self, q):
return self._fetch(
self.google_api_prefix,
{'v':"1.0",'q':q}
)
def _fetch(self, url, params):
request = url +"?"+ urllib.urlencode(params)
resp, content = self.http.request(request, "get")
return easyjson.parse_json_func(content)
def detectzhcn(self, text):
"""輸入文字如果檢測到是zh-cn,傳回true,否則傳回false"""
data = self.post_sentence(text)['responsedata']
if(data):
language = data['language']
if(language=='zh-cn'):
return true
return false