天天看點

使用http 上傳檔案的原理可參考的文章有:通過 http 協定上傳檔案(rfc1867協定概述,jsp 應用舉例,用戶端發送内容構造) Content-Type:multipart/form-data;boundary=---------------------------7d33a816d302b6 Content-Length: 424

可參考的文章有:

http://www.cnblogs.com/kaixuan/archive/2008/01/31/1060284.html

通過 http 協定上傳檔案(rfc1867協定概述,jsp 應用舉例,用戶端發送内容構造)

1、概述

在最初的 http 協定中,沒有上傳檔案方面的功能。 rfc1867 ( http://www.ietf.org/rfc/rfc1867.txt ) 為 http 協定添加了這個功能。用戶端的浏覽器,如 Microsoft IE, Mozila, Opera 等,按照此規範将使用者指定的檔案發送到伺服器。伺服器端的網頁程式,如 php, asp, jsp 等,可以按照此規範,解析出使用者發送來的檔案。

Microsoft IE, Mozila, Opera 已經支援此協定,在網頁中使用一個特殊的 form 就可以發送檔案。

絕大部分 http server ,包括 tomcat ,已經支援此協定,可接受發送來的檔案。

各種網頁程式,如 php, asp, jsp 中,對于上傳檔案已經做了很好的封裝。

2、上傳檔案的執行個體:用 servelet 實作(http server 為 tomcat 4.1.24)

1. 在一個 html 網頁中,寫一個如下的form :

<form enctype="multipart/form-data"  action="http://192.168.29.65/UploadFile" method=post >

     load multi files :<br>

     <input name="userfile1" type="file" ><br>

     <input name="userfile2" type="file"><br>

     <input name="userfile3" type="file"><br>

     <input name="userfile4" type="file"><br>

     text field :<input type="text" name="text" value="text"><br>

     <input type="submit" value=" 送出 "><input type=reset>

</form>

2. 服務端 servelet 的編寫

現在第三方的 http upload file 工具庫很多。Jarkata 項目本身就提供了fileupload 包http://jakarta.apache.org/commons/fileupload/  。檔案上傳、表單項處理、效率問題基本上都考慮到了。在 struts 中就使用了這個包,不過是用 struts 的方式另行封裝了一次。這裡我們直接使用 fileupload 包。至于struts 中的用法,請參閱 struts 相關文檔。

這個處理檔案上傳的 servelet 主要代碼如下:

public void doPost( HttpServletRequest request, HttpServletResponse response ) {

     DiskFileUpload diskFileUpload = new DiskFileUpload();

     //  允許檔案最大長度

     diskFileUpload.setSizeMax( 100*1024*1024 );

     //  設定記憶體緩沖大小

     diskFileUpload.setSizeThreshold( 4096 );

     //  設定臨時目錄

     diskFileUpload.setRepositoryPath( "c:/tmp" );

     List fileItems = diskFileUpload.parseRequest( request );

     Iterator iter = fileItems.iterator();

     for( ; iter.hasNext(); ) {

         FileItem fileItem = (FileItem) iter.next();

         if( fileItem.isFormField() ) {

             //  目前是一個表單項

             out.println( "form field : " + fileItem.getFieldName() + ", " + fileItem.getString() );

         } else {

             //  目前是一個上傳的檔案

             String fileName = fileItem.getName();

             fileItem.write( new File("c:/uploads/"+fileName) );

         }

     }

}

為簡略起見,異常處理,檔案重命名等細節沒有寫出。

3、 用戶端發送内容構造

假設接受檔案的網頁程式位于 http://192.168.29.65/upload_file/UploadFile .

假設我們要發送一個二進制檔案、一個文本框表單項、一個密碼框表單項。檔案名為 E:/s ,其内容如下:(其中的XXX代表二進制資料,如 01 02 03)

a

bb

XXX

ccc

用戶端應該向  192.168.29.65  發送如下内容:

POST /upload_file/UploadFile HTTP/1.1

Accept: text/plain, */*

Accept-Language: zh-cn

Host: 192.168.29.65:80

Content-Type:multipart/form-data;boundary=---------------------------7d33a816d302b6

User-Agent: Mozilla/4.0 (compatible; OpenOffice.org)

Content-Length: 424

Connection: Keep-Alive

-----------------------------7d33a816d302b6

Content-Disposition: form-data; name="userfile1"; filename="E:/s"

Content-Type: application/octet-stream

a

bb

XXX

ccc

-----------------------------7d33a816d302b6

Content-Disposition: form-data; name="text1"

foo

-----------------------------7d33a816d302b6

Content-Disposition: form-data; name="password1"

bar

-----------------------------7d33a816d302b6--

此内容必須一字不差,包括最後的回車。

注意:Content-Length: 424 這裡的424是紅色内容的總長度(包括最後的回車)

注意這一行:

Content-Type: multipart/form-data; boundary=---------------------------7d33a816d302b6

根據 rfc1867, multipart/form-data是必須的.

---------------------------7d33a816d302b6 是分隔符,分隔多個檔案、表單項。其中 33a816d302b6 是即時生成的一個數字,用以確定整個分隔符不會在檔案或表單項的内容中出現。前面 的 ---------------------------7d 是 IE 特有的标志。 Mozila 為 ---------------------------71

用手工發送這個例子,在上述的 servlet 中檢驗通過。

(上面有一個回車)

使用者可以選擇多個檔案,填寫表單其它項,點選“送出”按鈕後就開始上傳給  http://192.168.29.65/upload_file/UploadFile  這是一個 servelet 程式

注意 enctype="multipart/form-data", method=post, type="file" 。根 據 rfc1867, 這三個屬性是必須的。multipart/form-data 是新增的編碼類型,以提高二進制檔案的傳輸效率。具體的解釋請參 閱 rfc1867.

第二篇文章使用perl實作的:

http://www.vivtek.com/rfc1867.html

RFC1867 HTTP file upload

RFC1867 is the standard definition of that "Browse..." button that you use to upload files to a Web server. It introduced the INPUT field type="file", which is that button, and also specified a multipart form encoding which is capable of encapsulating files for upload along with all the other fields on an upload form.

It's not easy to find documentation on how to work with this stuff, though. Partly this is because if you're writing a Perl CGI it's really rather easy to work with, and partly it's due to the fact that Microsoft IIS ASP doesn't (exactly) support RFC1867 file upload. So on the one hand the Unixheads think it's too trivial to document, while the ASP script kiddies think that file upload is the exclusive preserve of genius and guru alike. I.e. Bill doesn't think you need to use it.

If that last sounds overly bitter, it's because I just finished up a really horrible job that involved uploading files to an IIS server. It would have been nice had somebody at Microsoft found file upload a sufficiently significant function to design competently. As it is, IIS 5.0 now provides a "Request.ReadBinary" method that gives you the whole request in plaintext, and graciously allows you to design your own object to read it. Note that VBS has no (easy) ability to read this binary data.

So let's assume for the time being that you're working with some reasonable non-IIS server. How do you really deal with file upload? It turns out to be easy. First, you design your form so that it will actually do an upload. In short, do this:

<form action=/mycode.cgi method=post enctype=multipart/form-data


>



  <input type="file"


>



</form>



      

In case you were wondering, the standard encoding type for a form is application/x-www-form-urlencoded, and if you leave the multipart enctype out of your form, then Netscape, for one, will not upload the file, it'll just include the filename. If that's what you actually want, this is pretty useful. (However, the RFC leaves behavior in this situation undefined, so you shouldn't rely on any particular behavior. I haven't looked to see what IE does in this situation. Undoubtedly something different.)

So this much information I already knew going into my horrible project, or at least knew of it. That's why I assumed that the server end was just as simple. And as I mentioned, in Perl it isn't much more difficult than retrieving normal posted data is already. It's just that IIS doesn't support multipart/form-data posts, that's all. Oh, Microsoft has a solution of sorts, called the something-or-other manager, and IIS 5.0 is so powerful that this manager thingy is now included right in the service pack with, gee, at least a kilobyte of documentation.

Yeesh. I'm off-track again, aren't I?

OK, so when this post gets to the server, what does it look like? Well, first of all the Content-type header of the request is set to

multipart/form-data; boundary=[some stuff]

This is how you can ascertain that you're really dealing with a properly encoded upload post. The boundary value is probably of the form --------------------------------1878979834, where the digits are randomly generated. This boundary is a MIME boundary; it's guaranteed not to appear anywhere in the data except between the multiple parts of the data.

The data itself appears in blocks that are made up of lines separated by CR/LF pairs. It looks like this, more or less:

-------------------------------18788734234



Content-Disposition: form-data; name="nonfile_field"







value here



-------------------------------18788734234



Content-Disposition: form-data; name="myfile"; filename="ad.gif"



Content-Type: image/gif







[ooh -- file contents!]



-------------------------------18788734234--



      

As you can see, this post isn't from the form I listed above, because I threw in a non-upload field just to show what it looks like. Anyway, you can see where everything is. Note that you get the originating local filename of the document for free in this format, meaning that you can use this to develop a document management system. Actual implementation is left as an exercise for the reader. I'll write more later on this topic, especially if you ask me any questions. Hint, hint.

So a Perl reader for this guy is simple: you iterate on the lines of the input and break on your boundary. Do things with the parts as you find them. I have an extensive example that you can read and use, which you can see here. It works (I'm using it daily) and it's well-documented.

And thus concludes the lesson for today. Go forth and upload files.

LINKS

  • RFC1867 at Ohio State

    An interesting RFC, actually, as it goes into some of the alternatives that the working group rejected in the interest of a clean design.

  • Perl/CGI implementation of RFC1867

    My implementation in Perl. Literately programmed.

具體協定請看:

http://tools.ietf.org/html/rfc1867

http://tools.ietf.org/html/rfc2854

http://tools.ietf.org/html/rfc2388

繼續閱讀