使用PHP做网页采集实例过程总结

2022-08-26 23:59:09

æ¥çåæï¼ââhttp://www.sijitao.net/1511.htmlââ

ç¬¬ä¸æ¥ï¼é¾æ¥æ°æ®åºï¼ååºéè¦æ£æ¥çç½ç«åæ£åã

[code language="php"]
 $conn_1=pg_connect("host=xxx.xxx.xxx.xxx port=5432 dbname=mydb1 user=postgres password=xxxxxx") ;
 $conn_2=pg_connect("host=xxx.xxx.xxx.xxx port=5432 dbname=mydb2 user=postgres password=xxxxxx") ;
 [/code]

ç¬¬äºæ¥ï¼ååºç½é¡µæºç ï¼å¯¹æºç è¿è¡åæ¥å¤çã

[code language="php"]
//è·åç½é¡µæºç 
 //$url='http://www.sijitao.net/' ;
 $str = file_get_contents($url);
 //ä½¿ç¨preg_matchåæ£åè¡¨è¾¾å¼ååºç¼ç 
 $wcharset = preg_match("/<meta.+?charset=[^\w]?([-\w]+)/i",$str,$temp) ? strtolower($temp[1]):"" ;
 //ç¼ç è½¬æ¢
 if($wcharset){
 $str=iconv("$wcharset", "UTF-8", $str) ;
 }[/code]

ç¬¬ä¸æ¥ï¼å¹éå¤çåçæºç åç¬¦ä¸²ï¼å¯¹å¹éçæ°æ®å¥åºã

[code language="php"]
$pat = "/<a(.*?)href=\"($preg)\"(.*?)>(.*?)<\/a>/is";
 preg_match_all($pat, $str, $m);
 $cnt=count($m[2]) ;
 for($i=0;$i<$cnt;$i++){
 if(strip_tags($m[2][$i])){
 $url=strip_tags($m[2][$i]) ;
 $url=$m[2][$i] ;
 }
 if(strip_tags($m[4][$i])){
 $title=strip_tags($m[4][$i]) ;
 }
 else{
 $title="There's Something Errors!" ;
 }
 //ç¼åä»£ç ï¼å¯¹titleåurlè¿è¡å¥åºæä½ã
 }
 }

使用PHP做网页采集实例过程总结

继续阅读

艰难安装LDAP,SSL认证

Apache配置SSLApache配置SSL

《Linux命令行与Shell脚本编程大全第2版.布卢姆》pdf

MySQL的4种隔离级别？出现问题

配置apache支持PHP（win7）

XX系统实施过程问题总结

无组件上传图片到数据库中，最完整解决方案

【MySQL数据库】数据库索引事务1.索引2.事务

neo4j之cypher使用文档

Cloud Studio初体验

NOSQL安全攻击

mybatis_入门程序Mybatis入门

php 去掉字符串的最后一个字符及截取原字符串1,2,3,4,5,6,

登录plsql 报错 the account is locked --用户被锁

php——水印

SequoiaDB巨杉数据库C++驱动概述