天天看點

ASP.NET 中抓取網頁内容 并出現伺服器送出了協定沖突. Section=ResponseHeader 解決方案

ASP.NET 中抓取網頁内容是非常友善的,而其中更是解決了 ASP 中困擾我們的編碼問題。

需要三個類:WebRequest、WebResponse、StreamReader。

WebRequest、WebResponse 的名稱空間 是:

System.Net

StreamReader 的名稱空間是:

System.IO

核心代碼

WebRequest request = WebRequest.Create("http://www.cftea.com/");

WebResponse response = request.GetResponse();

StreamReader reader = new StreamReader(response.GetResponseStream(), Encoding.GetEncoding("gb2312"));

  • WebRequest 類的 Create 為靜态方法,參數為要抓取的網頁的網址;
  • Encoding 指定編碼,Encoding 中有屬性 ASCII、UTF32、UTF8 等全球通用的編碼,但沒有 gb2312 這個編碼屬性,是以我們使用 GetEncoding 獲得 gb2312 編碼。

示例

    private static string getContent(string Url)

        {

            string strResult = "";

            try

            {

                HttpWebRequest request = (HttpWebRequest)WebRequest.Create(Url);

                //聲明一個HttpWebRequest請求  

                request.Timeout = 30000;

                //設定連接配接逾時時間  

                request.Headers.Set("Pragma", "no-cache");

                HttpWebResponse response = (HttpWebResponse)request.GetResponse();

                Stream streamReceive = response.GetResponseStream();

                Encoding encoding = Encoding.GetEncoding("GB2312");

                StreamReader streamReader = new StreamReader(streamReceive, encoding);

                strResult = streamReader.ReadToEnd();

                streamReader.Close();

            }

            catch

            {

                throw;

            }

            return strResult;

        }

        private string GetUrl(string url)

        {

            string str = string.Empty;

            try

            {

                WebRequest request = WebRequest.Create(url);

                WebResponse response = request.GetResponse();

                StreamReader reader = new StreamReader(response.GetResponseStream(), Encoding.GetEncoding("gb2312"));

                str = reader.ReadToEnd();

                reader.Close();

                reader.Dispose();

                response.Close();

                return str;

            }

            catch (Exception ex)

            {

                str = ex.Message;

                return str;

            }

        }

        private string GetPostContent(string strUrl)

        {

            string strMsg = string.Empty;

            try

            {

                string data = "";

                byte[] requestBuffer = System.Text.Encoding.GetEncoding("gb2312").GetBytes(data);

                WebRequest request = WebRequest.Create(strUrl);

                request.Method = "POST";

                request.ContentType = "application/x-www-form-urlencoded";

                request.ContentLength = requestBuffer.Length;

                using (Stream requestStream = request.GetRequestStream())

                {

                    requestStream.Write(requestBuffer, 0, requestBuffer.Length);

                    requestStream.Close();

                }

                WebResponse response = request.GetResponse();

                using (StreamReader reader = new StreamReader(response.GetResponseStream(), Encoding.GetEncoding("gb2312")))

                {

                    strMsg = reader.ReadToEnd();

                    reader.Close();

                }

            }

            catch

            { }

            return strMsg;

        }

一般情況下會出現這個問題  解決如下

伺服器送出了協定沖突.   Section=ResponseHeader   Detail=CR   後面必須是   LF  

The   server   committed   a   protocol   violation.   Section=ResponseHeader   Detail=CR   must   be   followed   by   LF

主體意思是微軟沒有容忍不符合RFC   822中的httpHeader必須以CRLF結束的規定的伺服器響應。

一個解決方案是在application.config或web.config檔案裡加入

    <system.net>

        <settings>

            <httpWebRequest   useUnsafeHeaderParsing= "true "   />

        </settings>

    </system.net>

允許系統容忍(tolerant)隻以CR或LF結尾的hearder資訊