天天看點

form submit

RHTMLForms在R高版本中不可用,解決如下

install_github("omegahat/RHTMLForms")      

 送出表單,http://一定不能省略

u = "http://www.bing.com"
form = getHTMLFormDescription(u)[[1]];form      

得到

HTML Form: http://cn.bing.com/search 
q:       

 制作一個form送出的function

bing_search = createFunction(form)
      

 這樣bing_search()裡面就能送出各式各樣的搜尋關鍵字,最後用

getHTMLLinks(bing_search("rstudio"))      

這邊得到

[36] "http://www.liangchan.net/liangchan/1123.html"

[37] "https://rstudio.org/"

[38] "http://www.microsofttranslator.com/bv.aspx?ref=SERP&br=ro&mkt=zh-CN&dl=zh&lp=EN_ZH-CHS&a=https%3a%2f%2frstudio.org%2f"

中間[13]-[81]是有效連結

如果隻是想提取我們需要的連結呢?用xpath,結果更精确,但是也損失了不少資訊(怎麼處理?)

xpq = "//a/@href[starts-with(.,\'/search?q=rstudio\')]"
getHTMLLinks(txt,xpQuery = xpq)
      

[1] "/search?q=rstudio&qs=ds&intlF=1&FORM=TIPEN1"

[2] "/search?q=rstudio&qs=ds&intlF=&upl=zh-chs&FORM=TIPCN1"

[3] "/search?q=rstudio+%e4%b8%ad%e6%96%87%e4%b9%b1%e7%a0%81&FORM=QSRE1"