天天看點

爬蟲算法

澶?娆¤?浜洪???拌?涓???棰?锛????ラ??瑕???涓??矽?存??锛?杩???绠???浠?缁?涓?涓???绱㈠??????哄?ㄧ???????朵???涓?浜??烘??瑕?娉ㄦ????浜?椤廣??





璇寸??绠???????涓?浜?锛?缃?缁?????璺?浣?浣跨?ㄧ????绂葷嚎??璇彙??宸ュ?峰樊涓?澶???璇寸?绾匡??跺??杩???瑕?璺?缃?缁???缁?锛???????涔???涓?瑗誇??ワ?





?d?涓??????版?瑰?ㄥ????锛?





1?? 缃?缁?????楂?搴?????缃??с??


2?? 缃?缁???????浠ヨВ?????扮??缃?椤甸?????炬??

3?? 缃?缁???????绠?????瀛??ㄩ??缃?


4?? 缃?缁??????ユ???鴻?界???規??缃?椤墊?存?闆????????

5?? 缃?缁????????????稿???楂?





?d?渚????瑰?锛??跺??涔?灏辨??瑕?姹?浜?锛?濡?浣?璁捐?$???????瑕?娉ㄦ????浜?姝ラ?ゅ???





1?? url ????????绾?褰?


杩???larbin ??寰???甯哥??濂斤??跺??瀵逛?url????????寰?绠?????锛?渚?濡?锛?


cat [what you got]| tr /" //n | gawk '{print $2}' | pcregrep ^http://

 


灏卞??浠ュ??頒?涓????辯?? url ??琛?




2??澶?杩?绋? VS 澶?绾跨?


????浼??逛?锛??闆?ㄤ??版??????PC 渚?濡? booso.com 涓?澶╁??浠ヨ交?劇??涓?5涓?G???版????澶х害20涓?缃?椤點??





3???堕?存?存?版?у??

???葷????娉???娌℃???堕?存?存?版????锛?涓???????锛???澶村??涓?????????


??甯稿?ㄤ?涓?娆$???????版??瑕?璺?涓?涓?娆¤?琛?姣?杈?锛?濡???杩?缁?5娆¢?芥病??????锛??d?灏???杩?涓?缃?椤電???堕?撮?撮???╁ぇ1????





濡???涓?涓?缃?椤靛?ㄨ?缁?5娆$???????跺???芥???存?幫??d?灏?璁劇疆???????堕?寸緝??涓哄???ョ??1锛?2??





娉ㄦ??锛??????????????抽??涔?涓???





4??????娣卞害??澶?灏????


?????典???濡???浣?姣?杈???锛?????涓??版???″?ㄥ??缃?缁?????锛??????ㄨ煩杩?杩?涓??廣??


濡???浣?????涓??峰????涓??版???″?ㄥ??缃?缁?????锛??d?杩??蜂?涓?缁?璁℃?ㄥ?璇ョ?ラ??锛?





缃?椤墊繁搴??缃?椤典釜?幫?缃?椤甸??瑕?绋?搴?

0 : 1 : : 10


1 :20 : :8


2: :600: :5


3: :2000: :2


4 above: 6000: 涓?????娉?璁$??





濂戒?锛????頒?绾у氨宸?涓?澶?浜?锛???娣卞?ヤ????版?????╁ぇ浜?3锛?4??锛?浜?????瑕?搴?‘涓???浜?璁稿?锛?杩???????绉?涓?????榫?绉?锛??惰?風????璺寵?ゃ????





5??????涓???涓?涔??寸??瀵規?圭??缃?椤碉?涓???????杩?涓?涓?Proxy?哄?夥?杩?涓?proxy??缂?瑙e?????????斤???涓哄?瀵規?圭??缃?椤墊病???存?扮???跺??锛???瑕??垮??header ?? tag灏卞??浠ヤ?锛?娌℃??蹇?瑕??ㄩ?ㄤ?杈?涓?娆′?锛???浠ュぇ澶ц??绾??缁?甯??姐??





apache webserver???㈢邯褰??? 304 涓???灏辨??琚?cache??浜???





6??璇鋒??绌虹???跺???х??涓?涓?robots.txt





7??瀛??ㄧ?????


杩?涓?浜轟漢瑙??猴?google ??gfs 绯葷?锛?濡???浣???7锛?8?版???″???????浣???FS绯葷?锛?瑕???浣???70锛?80涓????″?ㄧ??璇???寤鴻??浣???fs 绯葷?锛?瑕???浣?????涓??版???″????d???渚褲??





缁?涓?涓?浠g??????锛??????????伴?繪??绱㈠?????濡?浣?杩?琛??版??瀛??ㄧ??锛?





NAME=`echo $URL |perl -p -e 's/([^/w/-/./@])/$1 eq "/n" ? "/n":sprintf("%%%2.2x",ord($1))/eg'`


mkdir -p $AUTHOR


newscrawl.pl $URL --user-agent="news.booso.com+(+http://booso.com

)" -outfile=$AUTHOR/$NAME


<%

' BSD 2.0 license,

' http://www.opensource.org/licenses/bsd-license.php


'

'杞?璐存??淇??矽?蜂???bug??渚?浜哄??bug淇?澶?浜虹??淇℃??,??????绠卞??缃?绔???绉?

'濡?bug??渚?浜哄??bug淇?澶?浜哄????瑕?姹????ゅ?

'??浠ュ??http://www.vtalkback.com/site-map
 瀵逛唬??杩?琛?娴?璇?

'???? 0.1.2

'------------------------------??????濮???-----------------------------------------------------------

ver="0.1.2"

'script configuration

'debug =0

'Response.CharSet="gb2312";

'current_charset="utf-8"

current_charset="gb2312"????'蹇?椤諱嬌?ㄥ???

'Url="http://www.vtalkback.com
"

'Url="http://www.jwmodel.com
"

Url=request("url")

Url=trim(url)

if right(Url,1)="/" then

??Url=left(url,len(url)-1)

end if



first_page=Url

'response.write first_page& " "& url&" "

'response.flush



'first_page=""

none_http_url=right(url,len(url)-len("http://
"))     '?????? http://??url




root_url_len=instr(none_http_url,"/")



if(root_url_len=0) then

  root_url_len=len(none_http_url)

end if



root_url="http://
" & left(none_http_url,root_url_len)  '?繪??灏鵑?ㄧ?? '/'

if right(root_url,1)="/" then

??root_url=left(root_url,len(root_url)-1)

end if



'response.write root_url & " <br>"

'response.flush



str_depth = request("url_depth")

FinalDepth=CInt(str_depth)

'---------------Depth limit----------------------

'if FinalDepth>2 then 

'??FinalDepth=2

'end if

'FinalDepth=1



'response.write "str_depth =" & str_depth &  " <br>"

'response.flush



'------------------------------------------------

LimitUrl=1000

'leave sitemapDate empty if you want sitemapDate=now

sitemapDate=""

'sitemapPriority possible value are from 0.1 to 1.0

sitemapPriority="0.7"

'sitemapChangefreq possible value are: always, hourly, daily, weekly, monthly, yearly, never

sitemapChangefreq="monthly"

'see http://www.time.gov/
 for utcOffset

utcOffset=1



Dim objRegExp,objUrlArchive,strHTML,objMatch,crawledUrlArchive,BytesStream,CharsetRegExp,CharsetUrlArchive,oHttp

Set oHttp=Server.CreateObject("WinHttp.WinHttpRequest.5.1")

Server.ScriptTimeout=300

set xmlhttp = CreateObject("MSXML2.ServerXMLHTTP")

Set BytesStream   =   Server.CreateObject("ADODB.Stream")   



Set objUrlArchive=Server.CreateObject("Scripting.Dictionary")

Set crawledUrlArchive=Server.CreateObject("Scripting.Dictionary")

Set CharsetUrlArchive=Server.CreateObject("Scripting.Dictionary")

Set objRegExp = New RegExp

objRegExp.IgnoreCase = True

objRegExp.Global = True



'you can change this patterns for your needs

'objRegExp.Pattern = "href=(.*?)[/s|>]"

'objRegExp.Pattern

 = "<!--(.*?)-->|<(/s*)a(/s*)href=(.*?)[" & chr(34) & 

"](.*?)" & "[" & chr(34)&"]"

objRegExp.Pattern = "<!--(.*?)-->|<(/s*)a(/s*)href=(.*?)[/s|>]"

Set CharsetRegExp = New RegExp

CharsetRegExp.IgnoreCase = True

CharsetRegExp.Global = True

CharsetRegExp.Pattern = "<META(.*?)Content-Type(.*?)>"

'to remove elements from html urls

RemoveText=array("<",">","a href=",chr(34)," target="_blank" rel="external nofollow" '","href=")

'to exclude elements from urls

ExcludeUrl=array("mailto:","javascript:",".css",".ico","file:")



'if you want sitemapDate=now

if sitemapDate="" then filelmdate=now()



sitemapDate=iso8601date(filelmdate,utcOffset)



'------------------------------</??????濮???>-----------------------------------------------------------

crawl first_page,1



For Depth=0 to FinalDepth-1



??arrUrl=objUrlArchive.Keys

??arrDepth=objUrlArchive.Items

??For LoopUrl= 0 to ubound(arrurl)

????willCrawlUrl=url&"/"&arrUrl(LoopUrl)

????willCrawldepth=arrDepth(LoopUrl)



'????response.write "willCraw="& willCrawlUrl &" depth="&willCrawldepth&" <br>"

'????response.flush

????

????if crawledUrlArchive.Exists(willCrawlUrl)=false and willCrawldepth < FinalDepth then 

'??????response.write "Craw="& willCrawlUrl &" depth="&willCrawldepth&" <br>"

'??????response.flush

??????crawledUrlArchive.add willCrawlUrl,1

??????'if ubound(arrurl)>max_url_count then

??????'??Exit For

??????'end if

??????'debugging

??????'response.write "<!-- pagefound='"&loopurl&"'-->"

??????crawl willCrawlUrl,willCrawldepth+1??????

??????if objUrlArchive.Count-1>LimitUrl then exit for   'if you want to limit the url number

????end if



??Next

??erase arrUrl

??erase arrDepth

Next



' create the xml on the fly

'arrDepth=objUrlArchive.Items

'response.write

  "<textarea rows=" &chr(34)& "93" &chr(34)& "name=" 

&chr(34)& "S1" &chr(34)& "cols=" &chr(34)& "138"

 &chr(34)& ">"



writeHead??'杈??烘??浠跺ご



arrCharset=CharsetUrlArchive.items

arrurl=objUrlArchive.Keys

For LoopUrl=0 to ubound(arrurl)

'??response.write "<loc>"&server.htmlencode(url&"/"&arrUrl(LoopUrl))&"</loc>"   ??'杈???rl

'??response.write

 

"<loc>&&&&"&server.urlEncode(url&"/"&arrUrl(LoopUrl))&"</loc>"

   '杈???rl

??cur_charset=arrCharset(LoopUrl)

??cur_url=arrUrl(LoopUrl)

??writelink cur_url,cur_charset

Next



response.write  Chr(13) & Chr(10)

response.write "</urlset>"

response.write  Chr(13) & Chr(10)



'response.write "</textarea>"

'arrUrl=objUrlArchive.Keys



'response.write "<!-- pagefound='"&ubound(arrurl)+2&"'--> " 

'---------------------------<娓??ょ??澧?????>-----------------------------------------------

erase arrUrl

erase arrCharset??



'erase arrDepth

objUrlArchive.RemoveAll()

crawledUrlArchive.RemoveAll()

CharsetUrlArchive.RemoveAll()

Set BytesStream   =   Nothing   

set xmlhttp = nothing

set oHttp = nothing

'---------------------------</娓??ょ??澧?????>-----------------------------------------------

'*************************************************************************************************************

function writeHead()

??response.ContentType = "text/xml; charset=gb2312"

??response.write "<?xml version='1.0' encoding='gb2312'?>"

??response.write Chr(13) & Chr(10)

??response.write "<!-- generator='http://www.vtalkback.com/sitemap/&#39;
 ver='" &ver &"'-->"??

??response.write  Chr(13) & Chr(10) 

??response.write "<!-- pagefound='"&ubound(objUrlArchive.Keys)+2&"'--> "??

??response.write  Chr(13) & Chr(10)

??response.write "<urlset xmlns='http://www.google.com/schemas/sitemap/0.84&#39;
>"

??response.write  Chr(13) & Chr(10)



??response.write "<url>"

??response.write  Chr(13) & Chr(10)

??response.write "<loc>"&url&"/</loc>"

??response.write  Chr(13) & Chr(10)

??response.write "<lastmod>"&sitemapDate&"</lastmod>"

??response.write  Chr(13) & Chr(10)

??response.write "<priority>"&sitemapPriority&"</priority>"

??response.write  Chr(13) & Chr(10)

??response.write "<changefreq>"&sitemapChangefreq&"</changefreq>"

??response.write  Chr(13) & Chr(10)

??response.write "</url>"

??response.write  Chr(13) & Chr(10)

end function

'*************************************************************************************************************

Function writeLink(write_url,write_charset)

'??response.write "<loc>"&write_charset&"</loc>"   ??????



??response.write "<url>"

??response.write  Chr(13) & Chr(10)

??

??if write_charset="gb2312" then??????

'????write_url=num2gb(write_url)??

????write_url=urldecode(write_url)

'????response.write write_charset

??else??'utf8

'????response.write write_charset&"aa"

????write_url=urldecode(write_url)??'new_url=url2utf(cur_url) new_url=num2gb(cur_url) new_url=gb2utf(new_url)??

??end if

'??write_url=replace(write_url,"&amp;",GB2UTF8("&"))

????



??response.write "<loc>"&url&"/"&write_url&"</loc>"   ??????'杈???rl

??response.write  Chr(13) & Chr(10)

'-------------------------------------------------------------------------------------

??response.write "<lastmod>"&sitemapDate&"</lastmod>"

??response.write  Chr(13) & Chr(10)

??response.write "<priority>"&sitemapPriority&"</priority>"

??response.write  Chr(13) & Chr(10)

??response.write "<changefreq>"&sitemapChangefreq&"</changefreq>"

??response.write  Chr(13) & Chr(10)

??response.write "</url>"

??response.write  Chr(13) & Chr(10)

end function



'***********************************??琛?**************************************************************************

Function crawl(sub_url,crawl_depth) 

??sub_url=urldecode(sub_url)

'----------------<sub_url澶???>-------------------------------------------

'--------------涓??藉?????瀹???bug---??http://www.xiaoyezi.com
 )?ュ??---------------------------------

'----------------(http://www.vtalkback.com
 , [email?protected]) 淇?琛?---------------------------------------

'??sub_url1=sub_url

'??response.write "sub_url="& sub_url&"<br>  "

??sub_url=GetAbsoluteURL(sub_url,1)  '璇誨????瀹??????? url

'??if sub_url1<>sub_url then

'????response.write  sub_url1&" "&sub_url

'????response.flush??

'??end if 

'------------</sub_url澶???>-------------------------------------------



'------------<sub_dir澶???>-------------------------------------------

'??response.write "sub_url="& sub_url&"<br>  "&len(sub_url)&" "& len(url) &"</br>"

'??response.flush??

??sub_dir=right(sub_url,len(sub_url) - len(url))??' http://www.vtalkback.com/blog
 -> blog

'??response.write "sub_dir="& sub_dir&" <br>"



??if instr(sub_dir,"?")>0 then

????sub_dir=left(sub_dir,InStr(sub_dir,"?")-1)

??end if

??

??if instr(sub_dir,".")>0 or instr(sub_dir,"?")>0 or instr(sub_dir,"=")>0 or instr(sub_dir,"#")>0 then

????sub_dir=left(sub_dir,InStrRev(sub_dir,"/"))??

??end if

??

??if sub_dir<>"" and right(sub_dir ,1)<>"/" then

????sub_dir=sub_dir&"/"

??end if 

??

'??response.write "sub_url="& sub_url&" <br>"

'??response.flush

'------------</sub_dir澶???>----------------------------------------------------

'??response.write "sub_url="& sub_url&" <br>"

'??response.flush

??xmlhttp.open "GET", sub_url, false

??xmlhttp.send ""



'------------------------------------<??缃?椤?harset??gt;------------------------------------------------

??if   XmlHttp.readystate <> 4 then 

????exit function

??end if



??htmlText   =   xmlhttp.responseText??

??if htmlText="" then

????exit function

??end if

??

??For Each CharsetMatch in CharsetRegExp.Execute(htmlText)

????CharsetMatch=lcase(CharsetMatch)

????char_index=instr(CharsetMatch,"charset=")

????if char_index>0 then

??????CharsetMatch=right(CharsetMatch,len(CharsetMatch)-char_index-7)

'??????CharsetMatch=trim(CharsetMatch)

??????char_index=instr(CharsetMatch,chr(34))

??????CharsetMatch=left(CharsetMatch,char_index-1) '?繪????寮???
??????current_charset=trim(CharsetMatch)

'??????current_charset=CharsetMatch

??

????end if

??next

??

'??response.write "--------------" & current_charset &"--------------------- <br>"

'??response.flush??

'------------------------------------</??缃?椤?harset??gt;------------------------------------------------

??



'-------------------------------<缂???璇誨??>------------------------------------------------------------------------------

'-------------------涓??藉???gb2312??bug---??[email?protected])(http://www.sijiholiday.com
 )?ュ??-------------------

'-------------------------(http://www.vtalkback.com
)([email?protected]) 淇?琛?-----------------------------------------



'??strHTML=bytes2BSTR(xmlHttp.responseBody)   



??BytesStream.Type = 1

??BytesStream.Mode =3

??BytesStream.Open

??BytesStream.Write xmlHttp.responseBody

??BytesStream.Position = 0

??BytesStream.Type = 2

??BytesStream.Charset = current_charset 

'????strHTML = xmlhttp.responseText

   ??strHTML=BytesStream.ReadText   

    BytesStream.close   



'????response.binarywrite   htmlbody 

'??response.write(strHtml)

'??response.flush

'----------------------------------</缂???璇誨??>-----------------------------------------------------------------------------



??For Each objMatch in objRegExp.Execute(strHTML)

'????response.write objMatch & "<br>"

'????response.flush



??  if left(objMatch,4)<>"<!--" then



????for i=0 to ubound(excludeUrl)

??????if instr(objmatch,excludeUrl(i))>0 then objmatch=""

????next



????if objmatch<>"" then

'??????response.write "objmatch1="& objMatch& "    <br>"

????

'??????response.write "obj match is   "&right(objMatch,len(objMatch)-1)&"<br>"

'??????response.flush

'------------------------<url?寸??>---------------------------------------------------------------

'---------------涓??藉???gb2312??bug---??[email?protected])(http://www.sijiholiday.com
 )?ュ??--------------------

'-------------------------(http://www.vtalkback.com
)([email?protected]) 淇?琛?-----------------------------------------

'??????objMatch=server.htmlencode(objMatch)

'??????response.write objMatch & "<br>"

'??????response.flush



'??????for i=0 to ubound(RemoveText) ????'娓??ゆ????瀛?绗?chr(34),"'"

'????????objMatch=replace(lcase(objMatch),lcase(RemoveText(i))," ")

'??????next



??????objMatch=lcase(objMatch)

??????objMatch=replace(objMatch,chr(34)," ") '?繪??url涓???绗???
??????objMatch=replace(objMatch,"'"," ")

??????objMatch=replace(objMatch,">"," ")

'??????response.write "objmatch2="& objMatch& "    <br>"



'??????str_index=instr(objMatch,chr(34)) ??'?繪??寮??峰??寮??峰乏杈圭????瀹?

'??????objMatch=right(objMatch,len(objMatch)-str_index)



??????str_index=instr(objMatch,"=") ????'?繪??绗?涓?涓?绛??峰??绛??峰乏杈圭????瀹?

??????objMatch=right(objMatch,len(objMatch)-str_index)



??????objMatch=ltrim(objMatch)   ????'???烘????瀛?绗?


'??????response.write objMatch & "<br>"

'??????response.flush



??????str_index=instr(objMatch," ") ????'?繪??绌烘?煎??绌烘?煎?寵竟????瀹?


'??????response.write objMatch &"  "&str_index &"<br>"

'??????response.flush

??????if str_index <> 0 then??????

????????objMatch=left(objMatch,str_index-1)

??????end if

'??????response.write objMatch & str_index &"<br>"

'??????response.flush



'??????str_index=instr(objMatch,chr(34)) ??'?繪??寮??峰??寮??峰?寵竟????瀹?
'??????objMatch=left(objMatch,str_index-1)

'------------------------</url?寸??>---------------------------------------------------------------



'------------------------<root

????绾垮?

??>---------------------------------------------------------------------------------??'--------------------

涓??藉???/url?煎???bug---??[email?protected])(http://www.gamelee.cn
 )?ュ??-----------------------

'-------------------------(http://www.vtalkback.com
)([email?protected]) 淇?琛?-----------------------------------------

??



??????if left(objMatch,1)="/" then           '/blog --> http://www.vtalkback.com/blog
 

????????objMatch=root_url & objMatch

??????end if



'??????response.write objMatch & "<br>"

'??????response.flush

'------------------------</root????绾垮???>-------------------------------------------------------------------------



'--------------------------------<?繪??root url>----------------------------------------------------------

'??????response.write "url2="& url& "    <br>"

'??????response.flush



??????'in some cases this is better if left(objMatch,len(url))=Url then



??????if left(objMatch,len(url))=Url then

????????the_url=right(objMatch,len(objMatch) - len(url))

????????if the_url<>"" and left(the_url,1)="/" then 

??????????the_url=right(the_url,len(the_url) - 1) '?繪??宸?竟 "/"

????????end if

????????objMatch = the_url

????????

??????elseif left(objMatch,len(none_http_url))=none_http_url then

????????the_url=right(objMatch,len(objMatch) - len(none_http_url))

????????if the_url<>"" and left(the_url,1)="/" then 

??????????the_url=right(the_url,len(the_url) - 1) '?繪??宸?竟 "/"

????????end if

????????objMatch = the_url



??????elseif instr(objMatch,"http://
")=0 and objmatch<>"" then

????????the_url=sub_dir&objMatch

????????if the_url<>"" and left(the_url,1)="/" then 

??????????the_url=right(the_url,len(the_url) - 1) '?繪??宸?竟 "/"

????????end if

????????objMatch = the_url

'????????response.write "subdir="& sub_dir& "    <br>"

'????????response.flush

????????

????????

??????else  '(out of domain)

????????objMatch=""

??????end if

'--------------------------------</?繪??root url>------------------------------------------------------

       ????end if??



????if objmatch<>"" then



'------------------------<&绗??瘋漿??gt;--------------------------------------------------------------------------------



??????objMatch=replace(objMatch,"&","&amp;")??????'& to &amp

??????objMatch=replace(objMatch,"&amp;#","&#")

??????objMatch=replace(objMatch,"&amp;amp;","&amp;")



??????if right(objMatch,1)="/" then ??????'?寵竟???繪?? "/"

????????objMatch=left(objMatch,len(objMatch)-1)

??????end if



'??????response.write objMatch & "<br>"

'??????response.flush

'------------------------</&绗??瘋漿??gt;---------------------------------------------------------------------------



'--------------------------------<缂???澶???>------------------------------------------------??????

'娉?濡?????濮?椤甸?㈡??%琛ㄧず??url缂???,姝ゆ??浼?琚?杞??㈡??涓?25



????    if current_charset="gb2312"??then

'????    ??objMatch=gb2num(objMatch)



'????????response.write current_charset&" "& sub_url & "<br>"

'????????response.write objMatch & "<br>"

????????objMatch= server.urlEncode(objMatch)????    ??

'????????response.write objMatch &" " & "<br>"

'????????response.flush

??????else

'????????response.write "url2 " & current_charset&" "&sub_url & "<br>"

'????????response.write objMatch &" " & "<br>"

'????????objMatch= urlDecode(objMatch)

????????objMatch= server.urlEncode(objMatch)

'????????response.write objMatch &" " & "<br>"

'????????response.flush



'????????objMatch=server.htmlencode(objMatch)??

'????????objMatch=encodeURI(objMatch)

'????????objMatch=UTF2GB(objMatch)

'????    ??objMatch=gb2num(objMatch)

??????end if



'--------------------------------</缂???澶???>------------------------------------------------??????

??????

'??????if objMatch<>newMatch then

'????????response.write objMatch & "<br>"

'????????response.write newMatch & "<br>"

'????????response.flush

'??????end if



??????if objUrlArchive.Exists(objMatch)= false and the_url<>"" then

????????objUrlArchive.Add objMatch,crawl_depth

????????CharsetUrlArchive.Add objMatch,current_charset

'????????response.write objMatch &"  "&sub_url&  "<br>"  '?劇ずurl???ㄩ〉??---------

'????????response.flush

????????

'????????writelink ObjMatch,current_charset????????

??????end if

????

????end if

??  end if??

??Next

End Function



'*************************************************************************************************************



function gb2num(str)

??newStr=""

??for i=1 to len(str) ??'gb2312澶???

????c=mid(str,i,1)

????if asc(c)<0 then

??????gb2312Code=ascW(c)

??????if gb2312Code <0 then 

????????gb2312Code =gb2312Code+65536

??????end if

??????newStr=newStr & "&#" & gb2312Code & ";"

????else

??????newStr=newStr&c

????end if

??next 

??gb2num=newStr

end function

'*************************************************************************************************************

function url2utf(str)

??url2utf=decodeURI(str)

end function



'*************************************************************************************************************



Function URLDecode(enStr) '娉?濡?????濮?椤甸?㈡??%琛ㄧず??url缂???,?版?ゆ??浼???涓?25,decode??杩???涓?

??dim deStr 

??dim c,i,v 

??deStr="" 

??

??for i=1 to len(enStr) 

????c=Mid(enStr,i,1) 

????if c="%" then 

??????v=eval("&h"+Mid(enStr,i+1,2))   'eval 璁$??涓?涓?琛ㄨ揪寮?????
??????if v<128 then 

????????deStr=deStr&chr(v) 

????????i=i+2 

??????else 

????????

????????if isvalidhex(mid(enstr,i,3)) then  '??瀛???url绗???
??????????if isvalidhex(mid(enstr,i+3,3)) then 

????????????v=eval("&h"+Mid(enStr,i+1,2)+Mid(enStr,i+4,2)) '+65536

????????????

????????????deStr=deStr& chr(v)



????????????i=i+5 

??????????else ??????'??涓?url绗???
????????????v=eval("&h"+Mid(enStr,i+1,2)+cstr(hex(asc(Mid(enStr,i+3,1))))) 

????????????deStr=deStr&chr(v) 

????????????i=i+3 

??????????end if 

????????else 

'??????????destr=destr&c 

????????end if 

??????end if 

????else 

??????if c="+" then 

????????deStr=deStr&" " 

??????else 

????????deStr=deStr&c 

??????end if 

????end if 

??next 

'??response.write  Chr(13) & Chr(10)

'??response.write "enstr="&enStr??

'??response.write  Chr(13) & Chr(10)

'??response.write "destr="&deStr??

'??response.write  Chr(13) & Chr(10)

'??response.flush

??

??URLDecode=deStr 

end function 

'*************************************************************************************************************

function isvalidhex(str) 

??isvalidhex=true 

??str=ucase(str) 

??if len(str)<>3 then isvalidhex=false:exit function 

??if left(str,1)<>"%" then isvalidhex=false:exit function 

????c=mid(str,2,1) 

??if not (((c>="0") and (c<="9")) or ((c>="A") and (c<="Z"))) then isvalidhex=false:exit function 

????c=mid(str,3,1) 

??if not (((c>="0") and (c<="9")) or ((c>="A") and (c<="Z"))) then isvalidhex=false:exit function 

end function 



'*************************************************************************************************************



function num2gb(str)

??newStr=""

??for i=1 to len(str)

????c=mid(str,i,1)

????if c="&" and mid(str,i+1,1)="#" then

??????num=""

??????for j=i+2 to len(str)

????????ch=mid(str,j,1)

????????if ch=";" then

??????????i=j

??????????exit for

????????end if

????????num=num &ch

??????next

??????newStr=newStr & chrW(CLng(num)) 'GB2UTF8(chrW(CLng(num)))????????

????else

??????newStr=newStr & c

????end if

??next

??num2gb=newStr

end function

'*************************************************************************************************************

'Function GB2UTF(Chinese) 

'  For i = 1 to Len (Chinese) 

'   a = Mid(Chinese, i, 1) 

'   GB2UTF = GB2UTF & "&#x" & Hex(Ascw(a)) & ";" 

'  Next 

'End Function 

'*************************************************************************************************************

Function GB2UTF(Chinese) 

  For i = 1 to Len (Chinese) 

   a = Mid(Chinese, i, 1) 

   GB2UTF = GB2UTF &  Ascw(a)

  Next 

End Function 

'*************************************************************************************************************



function UTF2GB(UTFStr) 

    for Dig=1 to len(UTFStr) 

        if mid(UTFStr,Dig,1)="%" then 

            if len(UTFStr) >= Dig+8 then 

                GBStr=GBStr & ConvChinese(mid(UTFStr,Dig,9)) 

                Dig=Dig+8 

            else 

                GBStr=GBStr & mid(UTFStr,Dig,1) 

            end if 

        else 

            GBStr=GBStr & mid(UTFStr,Dig,1) 

        end if 

    next 

    UTF2GB=GBStr 

end function 

'*************************************************************************************************************



Function iso8601date(dLocal,utcOffset)??

??Dim d

??' convert local time into UTC

??d = DateAdd("H",-1 * utcOffset,dLocal)



??' compose the date

??iso8601date = Year(d) & "-" & Right("0" & Month(d),2) & "-" & Right("0" & Day(d),2)

'

 & "T" & _Right("0" & Hour(d),2) & ":" & Right("0" 

& Minute(d),2) & ":" & Right("0" & Second(d),2) & 

"Z"

End Function



'*************************************************************************************************************

Function GetAbsoluteURL(sUrl,iStep) '璇誨????瀹??????? url

??Dim bUrl,bDat

??GetAbsoluteURL=sUrl

??If iStep>15 Then

??        Response.Write "??褰?宓?濂?瓒?杩?15灞?" & "<br />"

????exit function

??End If



??If InStr(sUrl,"?")>0 THen

????Dim tmpUrl : tmpUrl=split(sUrl,"?")

????bUrl=tmpUrl(0)

????bDat=tmpUrl(1)

??Else

????bUrl=sUrl

????bDat=""

??End If



'??Response.Write "<p style=""border-top:solid 1px silver;padding:0px;margin:0px;"">"

'??Response.Write "姝e?ㄥ??澶??峰?? " & sUrl & "<br />"



'??if bDat<>"" Then Response.Write "3 &nbsp;&nbsp;>>???幫? " & bDat & "<br />"



??oHttp.Option(6)=0 ??'绂?姝㈣????edirect

??oHttp.SetTimeouts 5000,5000,30000,5000

'??Response.Write burl&"<br /> "

??oHttp.Open "HEAD",sUrl,False

??On Error Resume Next

??oHttp.Send bDat



'??response.write  oHttp.responseText 

'??response.flush

'??Response.Write "  <br /> "

??

??If Err.Number<>0 Then

'????Response.Write "<font color=""red"">??????璇?锛?" & Err.Description & "</font><br />"

????Err.Clear

'????GetAbsoluteURL=""

'????Set oHttp=Nothing

'????Response.Write "</p>"

????Exit Function

??End If

'??Response.Write " <br /> "



??On Error Goto 0

'??Response.Write "&nbsp;&nbsp;>>HTTP ?舵??锛?" & oHttp.Status  & "<br />"



??If oHttp.Status<>200 And oHttp.Status<>302 and oHttp.Status<>301 Then

'????Response.Write "<font color=""red"">HTTP??璇?锛?" & oHttp.StatusText  & "</font><br />"

????Err.Clear

????GetAbsoluteURL=""

'????Set oHttp=Nothing

'????Response.Write "</p>"

????Exit Function

??End If

??Dim sLoca 

??On Error Resume Next

??sLoca=oHttp.getResponseHeader("Location")



??If Err.Number<>0 Then

????Err.Clear

????sLoca=""

??End If

??

??On Error Goto 0

'??Response.Write "  <br /> "



??If sLoca = "" Then

'????Response.Write

 "&nbsp;&nbsp;>>Content-Type:" & 

oHttp.getResponseHeader("Content-Type") & "<br />"

'????Response.Write "&nbsp;&nbsp;>>娌℃??杩???Location澶?lt;br />"

????GetAbsoluteURL=sUrl

'????Set oHttp=Nothing

'????Response.Write " </p>"

????GetAbsoluteURL=sUrl

????Exit Function

??Else



'????Response.Write

 " &nbsp;&nbsp;>>Content-Type:" & 

oHttp.getResponseHeader("Content-Type") & "<br />"

'????Response.Write " ?跺??ocation澶達?" & sLoca & "<br />"

'????Response.Write " </p>"



????'杩???瑕??????扮??URL



????If InStr(sLoca,"://")<=0 Then

??????'娌℃????瀹???璁?锛???褰???URL??浣?缃????拌?劇疆

??????Dim ind : ind=InstrRev(sUrl,"/")

??????sUrl=Left(sUrl,ind)

??????sLoca=sUrl & sLoca

????End If

????GetAbsoluteURL=GetAbsoluteURL(sLoca,iStep+1)



??End If

End Function



%>

<script language="javaScript" runat="Server">

function UTF8toGB(str){

??return decodeURIComponent(str)



}

function encodeURI(str){

??return encodeURIComponent(str)

}

function decodeURI(str){

??return decodeURIComponent(str)

}



function GB2UTF8(str){

??return encodeURIComponent(str)

}



function convert(str) { 

??return string(str.getBytes("UTF-8"),"gb2312"); 

} 



</script>

            

瀵規????甯??? 18

?