laitimes

探究 Content-Disposition:解决下载中文文件名乱码

author:Jay's lab
探究 Content-Disposition:解决下载中文文件名乱码

Today solved an issue of setting the download file name to Chinese: directly setting Chinese in Content-Disposition will cause garbled characters. Follow the online method (Content-Disposition + UTF-8) and you're done. However, in order to understand the crux of the problem, I still looked at the official documentation to understand the fields and meanings of Content-Disposition. You can use Content-Disposition to set the file name, but to set the Chinese you need to encode, and RFC 822 stipulates that Message can only be ASCII, which is the problem.

Content-Disposition的定义

Hypertext Transfer Protocol – HTTP/1.1中的描述

Content-Disposition is not part of the HTTP standard, but since it is widely implemented, we are documenting its use and risks for implementors.

The Content-Disposition response-header field has been proposed as a means for the origin server to suggest a default filename if the user requests that the content is saved to a file. This usage is derived from the definition of Content-Disposition in RFC 1806.

RFC 1806

the Content-Disposition header field is defined as follows:

disposition := "Content-Disposition" ":"
                   disposition-type
                   *(";" disposition-parm)

    disposition-type := "inline"
                      / "attachment"
                      / extension-token
                      ; values are not case-sensitive

    disposition-parm := filename-parm / parameter

    filename-parm := "filename" "=" value;
           

‘extension-token’, ‘parameter’ and`’value’ are defined according to [RFC 822] and [RFC 1521].

The first thing to note is the disposition-type. From the Content-Disposition header field given above, we can know that Content-Disposition has two types, namely inline and attachment. According to the introduction in the document, the inline type will automatically display the attachment content, such as displaying an image, while the attachment type will not be automatically displayed, and may appear as an attachment with an icon in the email, and may prompt you to download it in the browser.

The second is disposition-parm. The main purpose is to provide a suggested file name (filename-parm) with which the client (browser, email system) will save the file if possible. As much as possible, this means that there are different cases, such as an invalid file name, a file with the same name, in which case the client will take some action, such as changing the file name.

Finally, let's take a look at the value of filename-parm. This value is the file name (the purpose of this article is to set a Chinese value to the value, and the main pit is here).

With the above introduction, to send a file to the frontend, and the name of the definition file is Chinese, you can return an HTTP Response Header like this before sending the file:

Content-Disposition: attachment; filename=文件.txt           

需要注意的是,RFC 822( Standard for ARPA Internet Text Messages)规定了文本消息只能为ASCII,因此这个Content-Disposition是非法的。 RFC 1521(Multipurpose Internet Mail Extensions)基于前者对编码方式进行了拓展,使用了4种机制:

MIME-Version header

Content-Type header

Content-Transfer-Encoding header

Content-ID and Content-Description header

可惜这些与给HTTP Response Header中设置中文没啥关系。 后来google了一下,通过stackoverflow找到了一个叫 RFC 5987 - Character Set and Language Encoding for Hypertext Transfer Protocol (HTTP) Header Field Parameters 的文档,顿时拨开迷雾见青天:

By default, message header field parameters in HTTP ([RFC2616]) messages cannot carry characters outside the ISO-8859-1 character set. RFC 2231 defines an encoding mechanism for use in MIME headers. This document specifies an encoding suitable for use in HTTP header fields that is compatible with a profile of the encoding defined in RFC 2231.

文中的Guidelines for Usage in HTTP Header Field Definitions给出了一个通用表达式:

foo-header  = "foo" LWSP ":" LWSP token ";" LWSP title-param
     title-param = "title" LWSP "=" LWSP value
                 / "title*" LWSP "=" LWSP ext-value
     ext-value   = charset  "'" [ language ] "'" value-chars
     charset     = "UTF-8" / "ISO-8859-1" / mime-charset
     value-chars = *( pct-encoded / attr-char )           

Convert the previous illegal Content-Disposition to the following:

Content-Disposition : attachment; filename* = UTF-8''%E6%96%87%E4%BB%B6.txt           

The "file .txt" is encoded here: UTF-8 encoding followed by pct-encoded. In fact, it is the process of URL_ENCODE ...

Usage in Node

In addition to using filename*=UTF-8"+value, Chinese encoding is also required. How to write it in node:

let name = urlencode("文件.txt", "utf-8");
res.setHeader("Content-Disposition", "attachment; filename* = UTF-8''"+name);           

亲测在Chrome、Edge、IE 11 下有效。

The feeling of writing here: There are so many RFCs in the IETF, and I really feel the development of HTTP -_-||