RHTMLForms在R高版本中不可用,解決如下
install_github("omegahat/RHTMLForms")
送出表單,http://一定不能省略
u = "http://www.bing.com"
form = getHTMLFormDescription(u)[[1]];form
得到
HTML Form: http://cn.bing.com/search
q:
制作一個form送出的function
bing_search = createFunction(form)
這樣bing_search()裡面就能送出各式各樣的搜尋關鍵字,最後用
getHTMLLinks(bing_search("rstudio"))
這邊得到
[36] "http://www.liangchan.net/liangchan/1123.html"
[37] "https://rstudio.org/"
[38] "http://www.microsofttranslator.com/bv.aspx?ref=SERP&br=ro&mkt=zh-CN&dl=zh&lp=EN_ZH-CHS&a=https%3a%2f%2frstudio.org%2f"
中間[13]-[81]是有效連結
如果隻是想提取我們需要的連結呢?用xpath,結果更精确,但是也損失了不少資訊(怎麼處理?)
xpq = "//a/@href[starts-with(.,\'/search?q=rstudio\')]"
getHTMLLinks(txt,xpQuery = xpq)
[1] "/search?q=rstudio&qs=ds&intlF=1&FORM=TIPEN1"
[2] "/search?q=rstudio&qs=ds&intlF=&upl=zh-chs&FORM=TIPCN1"
[3] "/search?q=rstudio+%e4%b8%ad%e6%96%87%e4%b9%b1%e7%a0%81&FORM=QSRE1"