天天看点

RFC 3875 - The Common Gateway Interface (CGI) Version 1.1

通用网关接口 1.1版本

RFC3875。原文链接。
本文状态
本文为互联网社区提供资料,但没有指定任何互联网标准。分发、传播本文是不受限制的。
版权信息
Copyright © The Internet Society (2004).
互联网工程指导委员会的意见
本文不是任何级别的互联网标准的候选者。互联网工程任务组(IETF)否认这篇文章任何层面的适用性,并且指出,它没有在安全性、拥塞控制、与已应用协议的冲突等方面经过IETF的检验。RFC文档的编写者自行决定发布了本文。本文的读者应该小心谨慎地评估本文在应用和部署方面的价值。
摘要

通用网关接口(CGI)是一个简单的接口,它用来在信息服务器下以平台独立的方式运行外部程序、软件或网关程序。当前支持的信息服务器是HTTP服务器。

这个接口自1993年起被www(World-Wide Web)采用。这篇说明定义了美国NCSA机构开发和编纂的’CGI/1.1’ 接口中的“当前实践”(current practic)参数。同样定义了UNIX或其他系统中CGI/1.1接口的使用。

目录
  1. Introduction. . . . . . . . . . . . . . . . . . . . . . . . . 4

    1.1. Purpose . . . . . . . . . . . . . . . . . . . . . . . . 4

    1.2. Requirements . . . . . . . . . . . . . . . . . . . . . . 4

    1.3. Specifications . . . . . . . . . . . . . . . . . . . . . 4

    1.4. Terminology . . . . . . . . . . . . . . . . . . . . . . 5

  2. Notational Conventions and Generic Grammar. . . . . . . . . . 5

    2.1. Augmented BNF . . . . . . . . . . . . . . . . . . . . . 5

    2.2. Basic Rules . . . . . . . . . . . . . . . . . . . . . . 6

    2.3. URL Encoding . . . . . . . . . . . . . . . . . . . . . . 7

  3. Invoking the Script . . . . . . . . . . . . . . . . . . . . . 8

    3.1. Server Responsibilities . . . . . . . . . . . . . . . . 8

    3.2. Script Selection . . . . . . . . . . . . . . . . . . . . 9

    3.3. The Script-URI . . . . . . . . . . . . . . . . . . . . . 9

    3.4. Execution . . . . . . . . . . . . . . . . . . . . . . . 10

  4. The CGI Request . . . . . . . . . . . . . . . . . . . . . . . 10

    4.1. Request Meta-Variables . . . . . . . . . . . . . . . . . 10

    4.1.1. AUTH_TYPE. . . . . . . . . . . . . . . . . . . . 11

    4.1.2. CONTENT_LENGTH . . . . . . . . . . . . . . . . . 12

    4.1.3. CONTENT_TYPE . . . . . . . . . . . . . . . . . . 12

    4.1.4. GATEWAY_INTERFACE. . . . . . . . . . . . . . . . 13

    4.1.5. PATH_INFO. . . . . . . . . . . . . . . . . . . . 13

    4.1.6. PATH_TRANSLATED. . . . . . . . . . . . . . . . . 14

    4.1.7. QUERY_STRING . . . . . . . . . . . . . . . . . . 15

    4.1.8. REMOTE_ADDR. . . . . . . . . . . . . . . . . . . 15

    4.1.9. REMOTE_HOST. . . . . . . . . . . . . . . . . . . 16

    4.1.10. REMOTE_IDENT . . . . . . . . . . . . . . . . . . 16

    4.1.11. REMOTE_USER. . . . . . . . . . . . . . . . . . . 16

    4.1.12. REQUEST_METHOD . . . . . . . . . . . . . . . . . 17

    4.1.13. SCRIPT_NAME. . . . . . . . . . . . . . . . . . . 17

    4.1.14. SERVER_NAME. . . . . . . . . . . . . . . . . . . 17

    4.1.15. SERVER_PORT. . . . . . . . . . . . . . . . . . . 18

    4.1.16. SERVER_PROTOCOL. . . . . . . . . . . . . . . . . 18

    4.1.17. SERVER_SOFTWARE. . . . . . . . . . . . . . . . . 19

    4.1.18. Protocol-Specific Meta-Variables . . . . . . . . 19

    4.2. Request Message-Body . . . . . . . . . . . . . . . . . . 20

    4.3. Request Methods . . . . . . . . . . . . . . . . . . . . 20

    4.3.1. GET. . . . . . . . . . . . . . . . . . . . . . . 20

    4.3.2. POST . . . . . . . . . . . . . . . . . . . . . . 21

    4.3.3. HEAD . . . . . . . . . . . . . . . . . . . . . . 21

    4.3.4. Protocol-Specific Methods. . . . . . . . . . . . 21

    4.4. The Script Command Line. . . . . . . . . . . . . . . . . 21

  5. NPH Scripts . . . . . . . . . . . . . . . . . . . . . . . . . 22

    5.1. Identification . . . . . . . . . . . . . . . . . . . . . 22

    5.2. NPH Response . . . . . . . . . . . . . . . . . . . . . . 22

  6. CGI Response. . . . . . . . . . . . . . . . . . . . . . . . . 23

    6.1. Response Handling. . . . . . . . . . . . . . . . . . . . 23

    6.2. Response Types . . . . . . . . . . . . . . . . . . . . . 23

    6.2.1. Document Response. . . . . . . . . . . . . . . . 23

    6.2.2. Local Redirect Response. . . . . . . . . . . . . 24

    6.2.3. Client Redirect Response . . . . . . . . . . . . 24

    6.2.4. Client Redirect Response with Document . . . . . 24

    6.3. Response Header Fields . . . . . . . . . . . . . . . . . 25

    6.3.1. Content-Type . . . . . . . . . . . . . . . . . . 25

    6.3.2. Location . . . . . . . . . . . . . . . . . . . . 26

    6.3.3. Status . . . . . . . . . . . . . . . . . . . . . 26

    6.3.4. Protocol-Specific Header Fields. . . . . . . . . 27

    6.3.5. Extension Header Fields. . . . . . . . . . . . . 27

    6.4. Response Message-Body. . . . . . . . . . . . . . . . . . 28

  7. System Specifications . . . . . . . . . . . . . . . . . . . . 28

    7.1. AmigaDOS . . . . . . . . . . . . . . . . . . . . . . . . 28

    7.2. UNIX . . . . . . . . . . . . . . . . . . . . . . . . . . 28

    7.3. EBCDIC/POSIX . . . . . . . . . . . . . . . . . . . . . . 29

  8. Implementation. . . . . . . . . . . . . . . . . . . . . . . . 29

    8.1. Recommendations for Servers. . . . . . . . . . . . . . . 29

    8.2. Recommendations for Scripts. . . . . . . . . . . . . . . 30

  9. Security Considerations . . . . . . . . . . . . . . . . . . . 30

    9.1. Safe Methods . . . . . . . . . . . . . . . . . . . . . . 30

    9.2. Header Fields Containing Sensitive Information . . . . . 31

    9.3. Data Privacy . . . . . . . . . . . . . . . . . . . . . . 31

    9.4. Information Security Model . . . . . . . . . . . . . . . 31

    9.5. Script Interference with the Server. . . . . . . . . . . 31

    9.6. Data Length and Buffering Considerations . . . . . . . . 32

    9.7. Stateless Processing . . . . . . . . . . . . . . . . . . 32

    9.8. Relative Paths . . . . . . . . . . . . . . . . . . . . . 33

    9.9. Non-parsed Header Output . . . . . . . . . . . . . . . . 33

  10. Acknowledgements. . . . . . . . . . . . . . . . . . . . . . . 33
  11. References. . . . . . . . . . . . . . . . . . . . . . . . . . 33

    11.1. Normative References. . . . . . . . . . . . . . . . . . 33

    11.2. Informative References. . . . . . . . . . . . . . . . . 34

  12. Authors’ Addresses. . . . . . . . . . . . . . . . . . . . . . 35
  13. Full Copyright Statement. . . . . . . . . . . . . . . . . . . 36
  1. 简介

    1.1. 目的

    通用网关接口(CGI)允许一个HTTP服务器和一个CGI脚本共同承当向客户端请求发送响应的责任。一个客户端请求包含资源定位符(URI)、请求方法,和一些由传输协议提供的附加信息。

    CGI定义了那些抽象的参数,即为人熟知的元变量(meta-variables),这些变量描述了客户端的请求。其次还给出了具体的编程接口,这个接口以平台独立的方式供脚本和HTTP服务器使用。

    服务器负责管理连接、数据传送,以及处理和客户端请求有关的传输层和网络层问题。然而CGI脚本处理应用层问题,如数据存取和文件处理。

    1.2 需求

    这篇文档 中关键字’MUST’, ‘MUST NOT’, ‘REQUIRED’, ‘SHALL’, ‘SHALL NOT’, ‘SHOULD’, ‘SHOULD NOT’, ‘RECOMMENDED’, ‘MAY’ , 'OPTIONAL’已经在RFC2119中做了解释。

    不能满足一个或多个“must”需求的协议实现被称为是不兼容的。满足所有“must”需求和所有“should”需求的实现被称为是无条件兼容的。满足所有“must”需求但不满足所有“should”需求的实现被称为是有条件兼容的。

    1.3 说明

    并非所有的CGI功能和特性都定在这篇说明的主要部分中被定义。如下的短语被用来描述未定义的特性:

    “系统定义的”(system-defined):

    表示这个特性在不同的系统中可能存在差异,但在相同的系统中是一样的即使拥有不同的实现方式。某个系统通常指的是一类操作系统。一些操作系统在本文的第七章中给出了定义。新的操作系统将在新的规范中给出定义,并且不在本文中给出修正。

    1.4 术语

    本规范使用了很多在HTTP/1.1规范中定义的术语。但是,如下的术语和他们的定义或释义并不一致。

    ‘元变量’(meta-variable)

    一个命名的参数,携带了服务器中的信息并将信息传递给脚本。在操作系统中这并不是一个必须的变量,尽管这是一个通常的实现。

    ‘脚本’(script)

    根据本接口被服务器调用的软件。它不必是一个独立的程序,也可以是动态加载或者共享的库,甚至是服务器的一个子程序。

    ‘服务器’(server)

    调用脚本程序以响应客户端请求的一个应用程序。

  2. 符号惯例和通用语法

    All of the mechanisms specified in this document are described in

    both prose and an augmented Backus-Naur Form (BNF) similar to that

    used by RFC 822 [13]. Unless stated otherwise, the elements are

    case-sensitive. This augmented BNF contains the following

    constructs:

name = definition

The name of a rule and its definition are separated by the equals

character (’=’). Whitespace is only significant in that

continuation lines of a definition are indented.

“literal”

Double quotation marks (") surround literal text, except for a

literal quotation mark, which is surrounded by angle-brackets (’<’

and ‘>’).

rule1 | rule2

Alternative rules are separated by a vertical bar (’|’).

(rule1 rule2 rule3)

Elements enclosed in parentheses are treated as a single element.

*rule

A rule preceded by an asterisk (’’) may have zero or more

occurrences. The full form is 'nm rule’ indicating at least n

and at most m occurrences of the rule. n and m are optional

decimal values with default values of 0 and infinity respectively.

[rule]

An element enclosed in square brackets (’[’ and ‘]’) is optional,

and is equivalent to ‘*1 rule’.

N rule

A rule preceded by a decimal number represents exactly N

occurrences of the rule. It is equivalent to ‘N*N rule’.

2.2. Basic Rules

This specification uses a BNF-like grammar defined in terms of

characters. Unlike many specifications which define the bytes

allowed by a protocol, here each literal in the grammar corresponds

to the character it represents. How these characters are represented

in terms of bits and bytes within a system are either system-defined

or specified in the particular context. The single exception is the

rule ‘OCTET’, defined below.

The following rules are used throughout this specification to

describe basic parsing constructs.

alpha         = lowalpha | hialpha
  lowalpha      = "a" | "b" | "c" | "d" | "e" | "f" | "g" | "h" |
                  "i" | "j" | "k" | "l" | "m" | "n" | "o" | "p" |
                  "q" | "r" | "s" | "t" | "u" | "v" | "w" | "x" |
                  "y" | "z"
  hialpha       = "A" | "B" | "C" | "D" | "E" | "F" | "G" | "H" |
                  "I" | "J" | "K" | "L" | "M" | "N" | "O" | "P" |
                  "Q" | "R" | "S" | "T" | "U" | "V" | "W" | "X" |
                  "Y" | "Z"

  digit         = "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" |
                  "8" | "9"
  alphanum      = alpha | digit
  OCTET         = <any 8-bit byte>
  CHAR          = alpha | digit | separator | "!" | "#" | "$" |
                  "%" | "&" | "'" | "*" | "+" | "-" | "." | "`" |
                  "^" | "_" | "{" | "|" | "}" | "~" | CTL
  CTL           = <any control character>
  SP            = <space character>
  HT            = <horizontal tab character>
  NL            = <newline>
  LWSP          = SP | HT | NL
  separator     = "(" | ")" | "<" | ">" | "@" | "," | ";" | ":" |
                  "\" | <"> | "/" | "[" | "]" | "?" | "=" | "{" |
                  "}" | SP | HT
  token         = 1*<any CHAR except CTLs or separators>
  quoted-string = <"> *qdtext <">
  qdtext        = <any CHAR except <"> and CTLs but including LWSP>
  TEXT          = <any printable character>
           

Note that newline (NL) need not be a single control character, but

can be a sequence of control characters. A system MAY define TEXT to

be a larger set of characters than .

2.3. URL Encoding

Some variables and constructs used here are described as being

‘URL-encoded’. This encoding is described in section 2 of RFC 2396

[2]. In a URL-encoded string an escape sequence consists of a

percent character ("%") followed by two hexadecimal digits, where the

two hexadecimal digits form an octet. An escape sequence represents

the graphic character that has the octet as its code within the

US-ASCII [9] coded character set, if it exists. Currently there is

no provision within the URI syntax to identify which character set

non-ASCII codes represent, so CGI handles this issue on an ad-hoc

basis.

Note that some unsafe (reserved) characters may have different

semantics when encoded. The definition of which characters are

unsafe depends on the context; see section 2 of RFC 2396 [2], updated

by RFC 2732 [7], for an authoritative treatment. These reserved

characters are generally used to provide syntactic structure to the

character string, for example as field separators. In all cases, the

string is first processed with regard to any reserved characters

present, and then the resulting data can be URL-decoded by replacing

“%” escape sequences by their character values.

To encode a character string, all reserved and forbidden characters

are replaced by the corresponding “%” escape sequences. The string

can then be used in assembling a URI. The reserved characters will

vary from context to context, but will always be drawn from this set:

reserved = ";" | "/" | "?" | ":" | "@" | "&" | "=" | "+" | "$" |
             "," | "[" | "]"
           

The last two characters were added by RFC 2732 [7]. In any

particular context, a sub-set of these characters will be reserved;

the other characters from this set MUST NOT be encoded when a string

is URL-encoded in that context. Other basic rules used to describe

URI syntax are:

hex        = digit | "A" | "B" | "C" | "D" | "E" | "F" | "a" | "b"
               | "c" | "d" | "e" | "f"
  escaped    = "%" hex hex
  unreserved = alpha | digit | mark
  mark       = "-" | "_" | "." | "!" | "~" | "*" | "'" | "(" | ")"
           
  1. 调用脚本程序

    3.1 服务器职责

    服务器表现得像一个应用网关。它从客户端获取请求,然后选择一个脚本程序处理请求,将客户端的请求转换为对CGI形式的请求,执行脚本程序,并且转换CGI形式的响应,最后向客户端发送响应。当处理客户端请求的时候,它负责实现任何的协议、传输级认证、安全保证。服务器也可以以一种透明的方式工作,修改请求或响应,目的是提供额外的服务,如媒体类型转换和减少协议。

    服务器必须(MUST)对本规范中必须的请求参数进行翻译和协议转换。此外,服务器还需保证客户端遵照相关的网络协议,即使CGI脚本和本规范并不一致。

    如果服务器使用了安全认证,那它一定不能执行脚本程序,除非客户端请求通过了认证。

    3.2 脚本选择

    服务器根据一个客服端提供的一个一般形式的URI决定哪一个CGI是要执行的脚本。这个URI由“/”分隔的分层路径组成。对于任何特定 的请求,服务器会将整个或者一部分的路径识别为一个单独的脚本,从而在路径层级的特定地方放置这个脚本。余下的路径,如果有的话,将被脚本程序解释为资源标识符或次级资源标识符。

    脚本程序可以根据如下描述的元变量获取路径分隔信息。对不分级URI方案的支持超出了本规范的讨论范围。

    3.3. 脚本URI

    将客户端请求URI匹配到脚本是由特定的服务器实现和配置决定的。服务器可能允许脚本被一系列不同的URI路径层级匹配到,因此允许在处理和生成元变量的时候用路径中的其他成员替代这个URI。服务器可能有以下行为:

    1.保留特定的客户端请求中的URI;或者

    2.为每一个脚本从一系列的可能值中选择一个经典的URI;或者

    3.在这个集合中实现其他的选择。

    自元变量生成后,‘Script-URI’就可以被构造。

    This MUST have the property that if the client had

    accessed this URI instead, then the script would have been executed

    with the same values for the SCRIPT_NAME, PATH_INFO and QUERY_STRING

    meta-variables. The Script-URI has the structure of a generic URI as

    defined in section 3 of RFC 2396 [2], with the exception that object

    parameters and fragment identifiers are not permitted. The various

    components of the Script-URI are defined by some of the

    meta-variables (see below);

    script-URI = <scheme> “: //” <server-name> “:” <server-port>

    <script-path> <extra-path> “?” <query-string>

where is found from SERVER_PROTOCOL, ,

and are the values of the respective

meta-variables. The SCRIPT_NAME and PATH_INFO values, URL-encoded

with “;”, “=” and “?” reserved, give and .

See section 4.1.5 for more information about the PATH_INFO

meta-variable.

The scheme and the protocol are not identical as the scheme

identifies the access method in addition to the application protocol.

For example, a resource accessed using Transport Layer Security (TLS)

[14] would have a request URI with a scheme of https when using the

HTTP protocol [19]. CGI/1.1 provides no generic means for the script

to reconstruct this, and therefore the Script-URI as defined includes

the base protocol used. However, a script MAY make use of

scheme-specific meta-variables to better deduce the URI scheme.

Note that this definition also allows URIs to be constructed which

would invoke the script with any permitted values for the path-info

or query-string, by modifying the appropriate components.

3.4. Execution

The script is invoked in a system-defined manner. Unless specified

otherwise, the file containing the script will be invoked as an

executable program. The server prepares the CGI request as described

in section 4; this comprises the request meta-variables (immediately

available to the script on execution) and request message data. The

request data need not be immediately available to the script; the

script can be executed before all this data has been received by the

server from the client. The response from the script is returned to

the server as described in sections 5 and 6.

In the event of an error condition, the server can interrupt or

terminate script execution at any time and without warning. That

could occur, for example, in the event of a transport failure between

the server and the client; so the script SHOULD be prepared to handle

abnormal termination.

  1. The CGI Request

Information about a request comes from two different sources; the

request meta-variables and any associated message-body.

4.1. Request Meta-Variables

Meta-variables contain data about the request passed from the server

to the script, and are accessed by the script in a system-defined

manner. Meta-variables are identified by case-insensitive names;

there cannot be two different variables whose names differ in case

only. Here they are shown using a canonical representation of

capitals plus underscore ("_"). A particular system can define a

different representation.

meta-variable-name = "AUTH_TYPE" | "CONTENT_LENGTH" |
                       "CONTENT_TYPE" | "GATEWAY_INTERFACE" |
                       "PATH_INFO" | "PATH_TRANSLATED" |
                       "QUERY_STRING" | "REMOTE_ADDR" |
                       "REMOTE_HOST" | "REMOTE_IDENT" |
                       "REMOTE_USER" | "REQUEST_METHOD" |
                       "SCRIPT_NAME" | "SERVER_NAME" |
                       "SERVER_PORT" | "SERVER_PROTOCOL" |
                       "SERVER_SOFTWARE" | scheme |
                       protocol-var-name | extension-var-name
  protocol-var-name  = ( protocol | scheme ) "_" var-name
  scheme             = alpha *( alpha | digit | "+" | "-" | "." )
  var-name           = token
  extension-var-name = token
           

Meta-variables with the same name as a scheme, and names beginning

with the name of a protocol or scheme (e.g., HTTP_ACCEPT) are also

defined. The number and meaning of these variables may change

independently of this specification. (See also section 4.1.18.)

The server MAY set additional implementation-defined extension meta-

variables, whose names SHOULD be prefixed with “X_”.

This specification does not distinguish between zero-length (NULL)

values and missing values. For example, a script cannot distinguish

between the two requests http://host/script and http://host/script?

as in both cases the QUERY_STRING meta-variable would be NULL.

meta-variable-value = "" | 1*<TEXT, CHAR or tokens of value>
           

An optional meta-variable may be omitted (left unset) if its value is

NULL. Meta-variable values MUST be considered case-sensitive except

as noted otherwise. The representation of the characters in the

meta-variables is system-defined; the server MUST convert values to

that representation.

4.1.1. AUTH_TYPE

The AUTH_TYPE variable identifies any mechanism used by the server to

authenticate the user. It contains a case-insensitive value defined

by the client protocol or server implementation.

For HTTP, if the client request required authentication for external

access, then the server MUST set the value of this variable from the

‘auth-scheme’ token in the request Authorization header field.

AUTH_TYPE      = "" | auth-scheme
  auth-scheme    = "Basic" | "Digest" | extension-auth
  extension-auth = token
           

HTTP access authentication schemes are described in RFC 2617 [5].

4.1.2. CONTENT_LENGTH

The CONTENT_LENGTH variable contains the size of the message-body

attached to the request, if any, in decimal number of octets. If no

data is attached, then NULL (or unset).

CONTENT_LENGTH = "" | 1*digit
           

The server MUST set this meta-variable if and only if the request is

accompanied by a message-body entity. The CONTENT_LENGTH value must

reflect the length of the message-body after the server has removed

any transfer-codings or content-codings.

4.1.3. CONTENT_TYPE

If the request includes a message-body, the CONTENT_TYPE variable is

set to the Internet Media Type [6] of the message-body.

CONTENT_TYPE = "" | media-type
  media-type   = type "/" subtype *( ";" parameter )
  type         = token
  subtype      = token
  parameter    = attribute "=" value
  attribute    = token
  value        = token | quoted-string
           

The type, subtype and parameter attribute names are not

case-sensitive. Parameter values may be case sensitive. Media types

and their use in HTTP are described section 3.7 of the HTTP/1.1

specification [4].

There is no default value for this variable. If and only if it is

unset, then the script MAY attempt to determine the media type from

the data received. If the type remains unknown, then the script MAY

choose to assume a type of application/octet-stream or it may reject

the request with an error (as described in section 6.3.3).

Each media-type defines a set of optional and mandatory parameters.

This may include a charset parameter with a case-insensitive value

defining the coded character set for the message-body. If the

charset parameter is omitted, then the default value should be

derived according to whichever of the following rules is the first to

apply:

1. There MAY be a system-defined default charset for some
     media-types.

  2. The default for media-types of type "text" is ISO-8859-1 [4].

  3. Any default defined in the media-type specification.

  4. The default is US-ASCII.
           

The server MUST set this meta-variable if an HTTP Content-Type field

is present in the client request header. If the server receives a

request with an attached entity but no Content-Type header field, it

MAY attempt to determine the correct content type, otherwise it

should omit this meta-variable.

4.1.4. GATEWAY_INTERFACE

The GATEWAY_INTERFACE variable MUST be set to the dialect of CGI

being used by the server to communicate with the script. Syntax:

GATEWAY_INTERFACE = "CGI" "/" 1*digit "." 1*digit
           

Note that the major and minor numbers are treated as separate

integers and hence each may be incremented higher than a single

digit. Thus CGI/2.4 is a lower version than CGI/2.13 which in turn

is lower than CGI/12.3. Leading zeros MUST be ignored by the script

and MUST NOT be generated by the server.

This document defines the 1.1 version of the CGI interface.

4.1.5. PATH_INFO

The PATH_INFO variable specifies a path to be interpreted by the CGI

script. It identifies the resource or sub-resource to be returned by

the CGI script, and is derived from the portion of the URI path

hierarchy following the part that identifies the script itself.

Unlike a URI path, the PATH_INFO is not URL-encoded, and cannot

contain path-segment parameters. A PATH_INFO of “/” represents a

single void path segment.

PATH_INFO = "" | ( "/" path )
  path      = lsegment *( "/" lsegment )
  lsegment  = *lchar
  lchar     = <any TEXT or CTL except "/">
           

The value is considered case-sensitive and the server MUST preserve

the case of the path as presented in the request URI. The server MAY

impose restrictions and limitations on what values it permits for

PATH_INFO, and MAY reject the request with an error if it encounters

any values considered objectionable. That MAY include any requests

that would result in an encoded “/” being decoded into PATH_INFO, as

this might represent a loss of information to the script. Similarly,

treatment of non US-ASCII characters in the path is system-defined.

URL-encoded, the PATH_INFO string forms the extra-path component of

the Script-URI (see section 3.3) which follows the SCRIPT_NAME part

of that path.

4.1.6. PATH_TRANSLATED

The PATH_TRANSLATED variable is derived by taking the PATH_INFO

value, parsing it as a local URI in its own right, and performing any

virtual-to-physical translation appropriate to map it onto the

server’s document repository structure. The set of characters

permitted in the result is system-defined.

PATH_TRANSLATED = *<any character>
           

This is the file location that would be accessed by a request for

<scheme> "://" <server-name> ":" <server-port> <extra-path>
           

where is the scheme for the original client request and

is a URL-encoded version of PATH_INFO, with “;”, “=” and

“?” reserved. For example, a request such as the following:

http://somehost.com/cgi-bin/somescript/this%2eis%2epath%3binfo
           

would result in a PATH_INFO value of

/this.is.the.path;info
           

An internal URI is constructed from the scheme, server location and

the URL-encoded PATH_INFO:

http://somehost.com/this.is.the.path%3binfo
           

This would then be translated to a location in the server’s document

repository, perhaps a filesystem path something like this:

/usr/local/www/htdocs/this.is.the.path;info
           

The value of PATH_TRANSLATED is the result of the translation.

The value is derived in this way irrespective of whether it maps to a

valid repository location. The server MUST preserve the case of the

extra-path segment unless the underlying repository supports case-

insensitive names. If the repository is only case-aware, case-

preserving, or case-blind with regard to document names, the server

is not required to preserve the case of the original segment through

the translation.

The translation algorithm the server uses to derive PATH_TRANSLATED

is implementation-defined; CGI scripts which use this variable may

suffer limited portability.

The server SHOULD set this meta-variable if the request URI includes

a path-info component. If PATH_INFO is NULL, then the

PATH_TRANSLATED variable MUST be set to NULL (or unset).

4.1.7. QUERY_STRING

The QUERY_STRING variable contains a URL-encoded search or parameter

string; it provides information to the CGI script to affect or refine

the document to be returned by the script.

The URL syntax for a search string is described in section 3 of RFC

2396 [2]. The QUERY_STRING value is case-sensitive.

QUERY_STRING = query-string
  query-string = *uric
  uric         = reserved | unreserved | escaped
           

When parsing and decoding the query string, the details of the

parsing, reserved characters and support for non US-ASCII characters

depends on the context. For example, form submission from an HTML

document [18] uses application/x-www-form-urlencoded encoding, in

which the characters “+”, “&” and “=” are reserved, and the ISO

8859-1 encoding may be used for non US-ASCII characters.

The QUERY_STRING value provides the query-string part of the

Script-URI. (See section 3.3).

The server MUST set this variable; if the Script-URI does not include

a query component, the QUERY_STRING MUST be defined as an empty

string ("").

4.1.8. REMOTE_ADDR

The REMOTE_ADDR variable MUST be set to the network address of the

client sending the request to the server.

REMOTE_ADDR  = hostnumber
  hostnumber   = ipv4-address | ipv6-address
  ipv4-address = 1*3digit "." 1*3digit "." 1*3digit "." 1*3digit
  ipv6-address = hexpart [ ":" ipv4-address ]
  hexpart      = hexseq | ( [ hexseq ] "::" [ hexseq ] )
  hexseq       = 1*4hex *( ":" 1*4hex )
           

The format of an IPv6 address is described in RFC 3513 [15].

4.1.9. REMOTE_HOST

The REMOTE_HOST variable contains the fully qualified domain name of

the client sending the request to the server, if available, otherwise

NULL. Fully qualified domain names take the form as described in

section 3.5 of RFC 1034 [17] and section 2.1 of RFC 1123 [12].

Domain names are not case sensitive.

REMOTE_HOST   = "" | hostname | hostnumber
  hostname      = *( domainlabel "." ) toplabel [ "." ]
  domainlabel   = alphanum [ *alphahypdigit alphanum ]
  toplabel      = alpha [ *alphahypdigit alphanum ]
  alphahypdigit = alphanum | "-"
           

The server SHOULD set this variable. If the hostname is not

available for performance reasons or otherwise, the server MAY

substitute the REMOTE_ADDR value.

4.1.10. REMOTE_IDENT

The REMOTE_IDENT variable MAY be used to provide identity information

reported about the connection by an RFC 1413 [20] request to the

remote agent, if available. The server may choose not to support

this feature, or not to request the data for efficiency reasons, or

not to return available identity data.

REMOTE_IDENT = *TEXT
           

The data returned may be used for authentication purposes, but the

level of trust reposed in it should be minimal.

4.1.11. REMOTE_USER

The REMOTE_USER variable provides a user identification string

supplied by client as part of user authentication.

REMOTE_USER = *TEXT
           

If the client request required HTTP Authentication [5] (e.g., the

AUTH_TYPE meta-variable is set to “Basic” or “Digest”), then the

value of the REMOTE_USER meta-variable MUST be set to the user-ID

supplied.

4.1.12. REQUEST_METHOD

The REQUEST_METHOD meta-variable MUST be set to the method which

should be used by the script to process the request, as described in

section 4.3.

REQUEST_METHOD   = method
  method           = "GET" | "POST" | "HEAD" | extension-method
  extension-method = "PUT" | "DELETE" | token
           

The method is case sensitive. The HTTP methods are described in

section 5.1.1 of the HTTP/1.0 specification [1] and section 5.1.1 of

the HTTP/1.1 specification [4].

4.1.13. SCRIPT_NAME

The SCRIPT_NAME variable MUST be set to a URI path (not URL-encoded)

which could identify the CGI script (rather than the script’s

output). The syntax is the same as for PATH_INFO (section 4.1.5)

SCRIPT_NAME = "" | ( "/" path )
           

The leading “/” is not part of the path. It is optional if the path

is NULL; however, the variable MUST still be set in that case.

The SCRIPT_NAME string forms some leading part of the path component

of the Script-URI derived in some implementation-defined manner. No

PATH_INFO segment (see section 4.1.5) is included in the SCRIPT_NAME

value.

4.1.14. SERVER_NAME

The SERVER_NAME variable MUST be set to the name of the server host

to which the client request is directed. It is a case-insensitive

hostname or network address. It forms the host part of the

Script-URI.

SERVER_NAME = server-name
  server-name = hostname | ipv4-address | ( "[" ipv6-address "]" )
           

A deployed server can have more than one possible value for this

variable, where several HTTP virtual hosts share the same IP address.

In that case, the server would use the contents of the request’s Host

header field to select the correct virtual host.

4.1.15. SERVER_PORT

The SERVER_PORT variable MUST be set to the TCP/IP port number on

which this request is received from the client. This value is used

in the port part of the Script-URI.

SERVER_PORT = server-port
  server-port = 1*digit
           

Note that this variable MUST be set, even if the port is the default

port for the scheme and could otherwise be omitted from a URI.

4.1.16. SERVER_PROTOCOL

The SERVER_PROTOCOL variable MUST be set to the name and version of

the application protocol used for this CGI request. This MAY differ

from the protocol version used by the server in its communication

with the client.

SERVER_PROTOCOL   = HTTP-Version | "INCLUDED" | extension-version
  HTTP-Version      = "HTTP" "/" 1*digit "." 1*digit
  extension-version = protocol [ "/" 1*digit "." 1*digit ]
  protocol          = token
           

Here, ‘protocol’ defines the syntax of some of the information

passing between the server and the script (the ‘protocol-specific’

features). It is not case sensitive and is usually presented in

upper case. The protocol is not the same as the scheme part of the

script URI, which defines the overall access mechanism used by the

client to communicate with the server. For example, a request that

reaches the script with a protocol of “HTTP” may have used an “https”

scheme.

A well-known value for SERVER_PROTOCOL which the server MAY use is

“INCLUDED”, which signals that the current document is being included

as part of a composite document, rather than being the direct target

of the client request. The script should treat this as an HTTP/1.0

request.

4.1.17. SERVER_SOFTWARE

The SERVER_SOFTWARE meta-variable MUST be set to the name and version

of the information server software making the CGI request (and

running the gateway). It SHOULD be the same as the server

description reported to the client, if any.

SERVER_SOFTWARE = 1*( product | comment )
  product         = token [ "/" product-version ]
  product-version = token
  comment         = "(" *( ctext | comment ) ")"
  ctext           = <any TEXT excluding "(" and ")">
           

4.1.18. Protocol-Specific Meta-Variables

The server SHOULD set meta-variables specific to the protocol and

scheme for the request. Interpretation of protocol-specific

variables depends on the protocol version in SERVER_PROTOCOL. The

server MAY set a meta-variable with the name of the scheme to a

non-NULL value if the scheme is not the same as the protocol. The

presence of such a variable indicates to a script which scheme is

used by the request.

Meta-variables with names beginning with “HTTP_” contain values read

from the client request header fields, if the protocol used is HTTP.

The HTTP header field name is converted to upper case, has all

occurrences of “-” replaced with “" and has "HTTP” prepended to

give the meta-variable name. The header data can be presented as

sent by the client, or can be rewritten in ways which do not change

its semantics. If multiple header fields with the same field-name

are received then the server MUST rewrite them as a single value

having the same semantics. Similarly, a header field that spans

multiple lines MUST be merged onto a single line. The server MUST,

if necessary, change the representation of the data (for example, the

character set) to be appropriate for a CGI meta-variable.

The server is not required to create meta-variables for all the

header fields that it receives. In particular, it SHOULD remove any

header fields carrying authentication information, such as

‘Authorization’; or that are available to the script in other

variables, such as ‘Content-Length’ and ‘Content-Type’. The server

MAY remove header fields that relate solely to client-side

communication issues, such as ‘Connection’.

4.2. Request Message-Body

Request data is accessed by the script in a system-defined method;

unless defined otherwise, this will be by reading the ‘standard

input’ file descriptor or file handle.

Request-Data   = [ request-body ] [ extension-data ]
  request-body   = <CONTENT_LENGTH>OCTET
  extension-data = *OCTET
           

A request-body is supplied with the request if the CONTENT_LENGTH is

not NULL. The server MUST make at least that many bytes available

for the script to read. The server MAY signal an end-of-file

condition after CONTENT_LENGTH bytes have been read or it MAY supply

extension data. Therefore, the script MUST NOT attempt to read more

than CONTENT_LENGTH bytes, even if more data is available. However,

it is not obliged to read any of the data.

For non-parsed header (NPH) scripts (section 5), the server SHOULD

attempt to ensure that the data supplied to the script is precisely

as supplied by the client and is unaltered by the server.

As transfer-codings are not supported on the request-body, the server

MUST remove any such codings from the message-body, and recalculate

the CONTENT_LENGTH. If this is not possible (for example, because of

large buffering requirements), the server SHOULD reject the client

request. It MAY also remove content-codings from the message-body.

4.3. Request Methods

The Request Method, as supplied in the REQUEST_METHOD meta-variable,

identifies the processing method to be applied by the script in

producing a response. The script author can choose to implement the

methods most appropriate for the particular application. If the

script receives a request with a method it does not support it SHOULD

reject it with an error (see section 6.3.3).

4.3.1. GET

The GET method indicates that the script should produce a document

based on the meta-variable values. By convention, the GET method is

‘safe’ and ‘idempotent’ and SHOULD NOT have the significance of

taking an action other than producing a document.

The meaning of the GET method may be modified and refined by

protocol-specific meta-variables.

4.3.2. POST

The POST method is used to request the script perform processing and

produce a document based on the data in the request message-body, in

addition to meta-variable values. A common use is form submission in

HTML [18], intended to initiate processing by the script that has a

permanent affect, such a change in a database.

The script MUST check the value of the CONTENT_LENGTH variable before

reading the attached message-body, and SHOULD check the CONTENT_TYPE

value before processing it.

4.3.3. HEAD

The HEAD method requests the script to do sufficient processing to

return the response header fields, without providing a response

message-body. The script MUST NOT provide a response message-body

for a HEAD request. If it does, then the server MUST discard the

message-body when reading the response from the script.

4.3.4. Protocol-Specific Methods

The script MAY implement any protocol-specific method, such as

HTTP/1.1 PUT and DELETE; it SHOULD check the value of SERVER_PROTOCOL

when doing so.

The server MAY decide that some methods are not appropriate or

permitted for a script, and may handle the methods itself or return

an error to the client.

4.4. The Script Command Line

Some systems support a method for supplying an array of strings to

the CGI script. This is only used in the case of an ‘indexed’ HTTP

query, which is identified by a ‘GET’ or ‘HEAD’ request with a URI

query string that does not contain any unencoded “=” characters. For

such a request, the server SHOULD treat the query-string as a

search-string and parse it into words, using the rules

search-string = search-word *( "+" search-word )
  search-word   = 1*schar
  schar         = unreserved | escaped | xreserved
  xreserved     = ";" | "/" | "?" | ":" | "@" | "&" | "=" | "," |
                  "$"
           

After parsing, each search-word is URL-decoded, optionally encoded in

a system-defined manner and then added to the command line argument

list.

If the server cannot create any part of the argument list, then the

server MUST NOT generate any command line information. For example,

the number of arguments may be greater than operating system or

server limits, or one of the words may not be representable as an

argument.

The script SHOULD check to see if the QUERY_STRING value contains an

unencoded “=” character, and SHOULD NOT use the command line

arguments if it does.

  1. NPH Scripts

5.1. Identification

The server MAY support NPH (Non-Parsed Header) scripts; these are

scripts to which the server passes all responsibility for response

processing.

This specification provides no mechanism for an NPH script to be

identified on the basis of its output data alone. By convention,

therefore, any particular script can only ever provide output of one

type (NPH or CGI) and hence the script itself is described as an ‘NPH

script’. A server with NPH support MUST provide an implementation-

defined mechanism for identifying NPH scripts, perhaps based on the

name or location of the script.

5.2. NPH Response

There MUST be a system-defined method for the script to send data

back to the server or client; a script MUST always return some data.

Unless defined otherwise, this will be the same as for conventional

CGI scripts.

Currently, NPH scripts are only defined for HTTP client requests. An

(HTTP) NPH script MUST return a complete HTTP response message,

currently described in section 6 of the HTTP specifications [1], [4].

The script MUST use the SERVER_PROTOCOL variable to determine the

appropriate format for a response. It MUST also take account of any

generic or protocol-specific meta-variables in the request as might

be mandated by the particular protocol specification.

The server MUST ensure that the script output is sent to the client

unmodified. Note that this requires the script to use the correct

character set (US-ASCII [9] and ISO 8859-1 [10] for HTTP) in the

header fields. The server SHOULD attempt to ensure that the script

output is sent directly to the client, with minimal internal and no

transport-visible buffering.

Unless the implementation defines otherwise, the script MUST NOT

indicate in its response that the client can send further requests

over the same connection.

  1. CGI Response

6.1. Response Handling

A script MUST always provide a non-empty response, and so there is a

system-defined method for it to send this data back to the server.

Unless defined otherwise, this will be via the ‘standard output’ file

descriptor.

The script MUST check the REQUEST_METHOD variable when processing the

request and preparing its response.

The server MAY implement a timeout period within which data must be

received from the script. If a server implementation defines such a

timeout and receives no data from a script within the timeout period,

the server MAY terminate the script process.

6.2. Response Types

The response comprises a message-header and a message-body, separated

by a blank line. The message-header contains one or more header

fields. The body may be NULL.

generic-response = 1*header-field NL [ response-body ]
           

The script MUST return one of either a document response, a local

redirect response or a client redirect (with optional document)

response. In the response definitions below, the order of header

fields in a response is not significant (despite appearing so in the

BNF). The header fields are defined in section 6.3.

CGI-Response = document-response | local-redir-response |
                 client-redir-response | client-redirdoc-response
           

6.2.1. Document Response

The CGI script can return a document to the user in a document

response, with an optional error code indicating the success status

of the response.

document-response = Content-Type [ Status ] *other-field NL
                      response-body
           

The script MUST return a Content-Type header field. A Status header

field is optional, and status 200 ‘OK’ is assumed if it is omitted.

The server MUST make any appropriate modifications to the script’s

output to ensure that the response to the client complies with the

response protocol version.

6.2.2. Local Redirect Response

The CGI script can return a URI path and query-string

(‘local-pathquery’) for a local resource in a Location header field.

This indicates to the server that it should reprocess the request

using the path specified.

local-redir-response = local-Location NL
           

The script MUST NOT return any other header fields or a message-body,

and the server MUST generate the response that it would have produced

in response to a request containing the URL

scheme "://" server-name ":" server-port local-pathquery
           

6.2.3. Client Redirect Response

The CGI script can return an absolute URI path in a Location header

field, to indicate to the client that it should reprocess the request

using the URI specified.

client-redir-response = client-Location *extension-field NL
           

The script MUST not provide any other header fields, except for

server-defined CGI extension fields. For an HTTP client request, the

server MUST generate a 302 ‘Found’ HTTP response message.

6.2.4. Client Redirect Response with Document

The CGI script can return an absolute URI path in a Location header

field together with an attached document, to indicate to the client

that it should reprocess the request using the URI specified.

client-redirdoc-response = client-Location Status Content-Type
                             *other-field NL response-body
           

The Status header field MUST be supplied and MUST contain a status

value of 302 ‘Found’, or it MAY contain an extension-code, that is,

another valid status code that means client redirection. The server

MUST make any appropriate modifications to the script’s output to

ensure that the response to the client complies with the response

protocol version.

6.3. Response Header Fields

The response header fields are either CGI or extension header fields

to be interpreted by the server, or protocol-specific header fields

to be included in the response returned to the client. At least one

CGI field MUST be supplied; each CGI field MUST NOT appear more than

once in the response. The response header fields have the syntax:

header-field    = CGI-field | other-field
  CGI-field       = Content-Type | Location | Status
  other-field     = protocol-field | extension-field
  protocol-field  = generic-field
  extension-field = generic-field
  generic-field   = field-name ":" [ field-value ] NL
  field-name      = token
  field-value     = *( field-content | LWSP )
  field-content   = *( token | separator | quoted-string )
           

The field-name is not case sensitive. A NULL field value is

equivalent to a field not being sent. Note that each header field in

a CGI-Response MUST be specified on a single line; CGI/1.1 does not

support continuation lines. Whitespace is permitted between the “:”

and the field-value (but not between the field-name and the “:”), and

also between tokens in the field-value.

6.3.1. Content-Type

The Content-Type response field sets the Internet Media Type [6] of

the entity body.

Content-Type = "Content-Type:" media-type NL
           

If an entity body is returned, the script MUST supply a Content-Type

field in the response. If it fails to do so, the server SHOULD NOT

attempt to determine the correct content type. The value SHOULD be

sent unmodified to the client, except for any charset parameter

changes.

Unless it is otherwise system-defined, the default charset assumed by

the client for text media-types is ISO-8859-1 if the protocol is HTTP

and US-ASCII otherwise. Hence the script SHOULD include a charset

parameter. See section 3.4.1 of the HTTP/1.1 specification [4] for a

discussion of this issue.

6.3.2. Location

The Location header field is used to specify to the server that the

script is returning a reference to a document rather than an actual

document (see sections 6.2.3 and 6.2.4). It is either an absolute

URI (optionally with a fragment identifier), indicating that the

client is to fetch the referenced document, or a local URI path

(optionally with a query string), indicating that the server is to

fetch the referenced document and return it to the client as the

response.

Location        = local-Location | client-Location
  client-Location = "Location:" fragment-URI NL
  local-Location  = "Location:" local-pathquery NL
  fragment-URI    = absoluteURI [ "#" fragment ]
  fragment        = *uric
  local-pathquery = abs-path [ "?" query-string ]
  abs-path        = "/" path-segments
  path-segments   = segment *( "/" segment )
  segment         = *pchar
  pchar           = unreserved | escaped | extra
  extra           = ":" | "@" | "&" | "=" | "+" | "$" | ","
           

The syntax of an absoluteURI is incorporated into this document from

that specified in RFC 2396 [2] and RFC 2732 [7]. A valid absoluteURI

always starts with the name of scheme followed by “:”; scheme names

start with a letter and continue with alphanumerics, “+”, “-” or “.”.

The local URI path and query must be an absolute path, and not a

relative path or NULL, and hence must start with a “/”.

Note that any message-body attached to the request (such as for a

POST request) may not be available to the resource that is the target

of the redirect.

6.3.3. Status

The Status header field contains a 3-digit integer result code that

indicates the level of success of the script’s attempt to handle the

request.

Status         = "Status:" status-code SP reason-phrase NL
  status-code    = "200" | "302" | "400" | "501" | extension-code
  extension-code = 3digit
  reason-phrase  = *TEXT
           

Status code 200 ‘OK’ indicates success, and is the default value

assumed for a document response. Status code 302 ‘Found’ is used

with a Location header field and response message-body. Status code

400 ‘Bad Request’ may be used for an unknown request format, such as

a missing CONTENT_TYPE. Status code 501 ‘Not Implemented’ may be

returned by a script if it receives an unsupported REQUEST_METHOD.

Other valid status codes are listed in section 6.1.1 of the HTTP

specifications [1], [4], and also the IANA HTTP Status Code Registry

[8] and MAY be used in addition to or instead of the ones listed

above. The script SHOULD check the value of SERVER_PROTOCOL before

using HTTP/1.1 status codes. The script MAY reject with error 405

‘Method Not Allowed’ HTTP/1.1 requests made using a method it does

not support.

Note that returning an error status code does not have to mean an

error condition with the script itself. For example, a script that

is invoked as an error handler by the server should return the code

appropriate to the server’s error condition.

The reason-phrase is a textual description of the error to be

returned to the client for human consumption.

6.3.4. Protocol-Specific Header Fields

The script MAY return any other header fields that relate to the

response message defined by the specification for the SERVER_PROTOCOL

(HTTP/1.0 [1] or HTTP/1.1 [4]). The server MUST translate the header

data from the CGI header syntax to the HTTP header syntax if these

differ. For example, the character sequence for newline (such as

UNIX’s US-ASCII LF) used by CGI scripts may not be the same as that

used by HTTP (US-ASCII CR followed by LF).

The script MUST NOT return any header fields that relate to

client-side communication issues and could affect the server’s

ability to send the response to the client. The server MAY remove

any such header fields returned by the client. It SHOULD resolve any

conflicts between header fields returned by the script and header

fields that it would otherwise send itself.

6.3.5. Extension Header Fields

There may be additional implementation-defined CGI header fields,

whose field names SHOULD begin with “X-CGI-”. The server MAY ignore

(and delete) any unrecognised header fields with names beginning “X-

CGI-” that are received from the script.

6.4. Response Message-Body

The response message-body is an attached document to be returned to

the client by the server. The server MUST read all the data provided

by the script, until the script signals the end of the message-body

by way of an end-of-file condition. The message-body SHOULD be sent

unmodified to the client, except for HEAD requests or any required

transfer-codings, content-codings or charset conversions.

response-body = *OCTET
           
  1. System Specifications

7.1. AmigaDOS

Meta-Variables

Meta-variables are passed to the script in identically named

environment variables. These are accessed by the DOS library

routine GetVar(). The flags argument SHOULD be 0. Case is

ignored, but upper case is recommended for compatibility with

case-sensitive systems.

The current working directory

The current working directory for the script is set to the

directory containing the script.

Character set

The US-ASCII character set [9] is used for the definition of

meta-variables, header fields and values; the newline (NL)

sequence is LF; servers SHOULD also accept CR LF as a newline.

7.2. UNIX

For UNIX compatible operating systems, the following are defined:

Meta-Variables

Meta-variables are passed to the script in identically named

environment variables. These are accessed by the C library

routine getenv() or variable environ.

The command line

This is accessed using the argc and argv arguments to main(). The

words have any characters which are ‘active’ in the Bourne shell

escaped with a backslash.

The current working directory

The current working directory for the script SHOULD be set to the

directory containing the script.

Character set

The US-ASCII character set [9], excluding NUL, is used for the

definition of meta-variables, header fields and CHAR values; TEXT

values use ISO-8859-1. The PATH_TRANSLATED value can contain any

8-bit byte except NUL. The newline (NL) sequence is LF; servers

should also accept CR LF as a newline.

7.3. EBCDIC/POSIX

For POSIX compatible operating systems using the EBCDIC character

set, the following are defined:

Meta-Variables

Meta-variables are passed to the script in identically named

environment variables. These are accessed by the C library

routine getenv().

The command line

This is accessed using the argc and argv arguments to main(). The

words have any characters which are ‘active’ in the Bourne shell

escaped with a backslash.

The current working directory

The current working directory for the script SHOULD be set to the

directory containing the script.

Character set

The IBM1047 character set [21], excluding NUL, is used for the

definition of meta-variables, header fields, values, TEXT strings

and the PATH_TRANSLATED value. The newline (NL) sequence is LF;

servers should also accept CR LF as a newline.

media-type charset default

The default charset value for text (and other implementation-

defined) media types is IBM1047.

  1. Implementation

8.1. Recommendations for Servers

Although the server and the CGI script need not be consistent in

their handling of URL paths (client URLs and the PATH_INFO data,

respectively), server authors may wish to impose consistency. So the

server implementation should specify its behaviour for the following

cases:

1. define any restrictions on allowed path segments, in particular
     whether non-terminal NULL segments are permitted;

  2. define the behaviour for "." or ".." path segments; i.e.,
     whether they are prohibited, treated as ordinary path segments
     or interpreted in accordance with the relative URL
     specification [2];

  3. define any limits of the implementation, including limits on
     path or search string lengths, and limits on the volume of
     header fields the server will parse.
           

8.2. Recommendations for Scripts

If the script does not intend processing the PATH_INFO data, then it

should reject the request with 404 Not Found if PATH_INFO is not

NULL.

If the output of a form is being processed, check that CONTENT_TYPE

is “application/x-www-form-urlencoded” [18] or “multipart/form-data”

[16]. If CONTENT_TYPE is blank, the script can reject the request

with a 415 ‘Unsupported Media Type’ error, where supported by the

protocol.

When parsing PATH_INFO, PATH_TRANSLATED or SCRIPT_NAME the script

should be careful of void path segments ("//") and special path

segments ("." and “…”). They should either be removed from the path

before use in OS system calls, or the request should be rejected with

404 ‘Not Found’.

When returning header fields, the script should try to send the CGI

header fields as soon as possible, and should send them before any

HTTP header fields. This may help reduce the server’s memory

requirements.

Script authors should be aware that the REMOTE_ADDR and REMOTE_HOST

meta-variables (see sections 4.1.8 and 4.1.9) may not identify the

ultimate source of the request. They identify the client for the

immediate request to the server; that client may be a proxy, gateway,

or other intermediary acting on behalf of the actual source client.

  1. Security Considerations

9.1. Safe Methods

As discussed in the security considerations of the HTTP

specifications [1], [4], the convention has been established that the

GET and HEAD methods should be ‘safe’ and ‘idempotent’ (repeated

requests have the same effect as a single request). See section 9.1

of RFC 2616 [4] for a full discussion.

9.2. Header Fields Containing Sensitive Information

Some HTTP header fields may carry sensitive information which the

server should not pass on to the script unless explicitly configured

to do so. For example, if the server protects the script by using

the Basic authentication scheme, then the client will send an

Authorization header field containing a username and password. The

server validates this information and so it should not pass on the

password via the HTTP_AUTHORIZATION meta-variable without careful

consideration. This also applies to the Proxy-Authorization header

field and the corresponding HTTP_PROXY_AUTHORIZATION meta-variable.

9.3. Data Privacy

Confidential data in a request should be placed in a message-body as

part of a POST request, and not placed in the URI or message headers.

On some systems, the environment used to pass meta-variables to a

script may be visible to other scripts or users. In addition, many

existing servers, proxies and clients will permanently record the URI

where it might be visible to third parties.

9.4. Information Security Model

For a client connection using TLS, the security model applies between

the client and the server, and not between the client and the script.

It is the server’s responsibility to handle the TLS session, and thus

it is the server which is authenticated to the client, not the CGI

script.

This specification provides no mechanism for the script to

authenticate the server which invoked it. There is no enforced

integrity on the CGI request and response messages.

9.5. Script Interference with the Server

The most common implementation of CGI invokes the script as a child

process using the same user and group as the server process. It

should therefore be ensured that the script cannot interfere with the

server process, its configuration, documents or log files.

If the script is executed by calling a function linked in to the

server software (either at compile-time or run-time) then precautions

should be taken to protect the core memory of the server, or to

ensure that untrusted code cannot be executed.

9.6. Data Length and Buffering Considerations

This specification places no limits on the length of the message-body

presented to the script. The script should not assume that

statically allocated buffers of any size are sufficient to contain

the entire submission at one time. Use of a fixed length buffer

without careful overflow checking may result in an attacker

exploiting ‘stack-smashing’ or ‘stack-overflow’ vulnerabilities of

the operating system. The script may spool large submissions to disk

or other buffering media, but a rapid succession of large submissions

may result in denial of service conditions. If the CONTENT_LENGTH of

a message-body is larger than resource considerations allow, scripts

should respond with an error status appropriate for the protocol

version; potentially applicable status codes include 503 ‘Service

Unavailable’ (HTTP/1.0 and HTTP/1.1), 413 ‘Request Entity Too Large’

(HTTP/1.1), and 414 ‘Request-URI Too Large’ (HTTP/1.1).

Similar considerations apply to the server’s handling of the CGI

response from the script. There is no limit on the length of the

header or message-body returned by the script; the server should not

assume that statically allocated buffers of any size are sufficient

to contain the entire response.

9.7. Stateless Processing

The stateless nature of the Web makes each script execution and

resource retrieval independent of all others even when multiple

requests constitute a single conceptual Web transaction. Because of

this, a script should not make any assumptions about the context of

the user-agent submitting a request. In particular, scripts should

examine data obtained from the client and verify that they are valid,

both in form and content, before allowing them to be used for

sensitive purposes such as input to other applications, commands, or

operating system services. These uses include (but are not limited

to) system call arguments, database writes, dynamically evaluated

source code, and input to billing or other secure processes. It is

important that applications be protected from invalid input

regardless of whether the invalidity is the result of user error,

logic error, or malicious action.

Authors of scripts involved in multi-request transactions should be

particularly cautious about validating the state information;

undesirable effects may result from the substitution of dangerous

values for portions of the submission which might otherwise be

presumed safe. Subversion of this type occurs when alterations are

made to data from a prior stage of the transaction that were not

meant to be controlled by the client (e.g., hidden HTML form

elements, cookies, embedded URLs, etc.).

9.8. Relative Paths

The server should be careful of “…” path segments in the request

URI. These should be removed or resolved in the request URI before

it is split into the script-path and extra-path. Alternatively, when

the extra-path is used to find the PATH_TRANSLATED, care should be

taken to avoid the path resolution from providing translated paths

outside an expected path hierarchy.

9.9. Non-parsed Header Output

If a script returns a non-parsed header output, to be interpreted by

the client in its native protocol, then the script must address all

security considerations relating to that protocol.

  1. Acknowledgements

This work is based on the original CGI interface that arose out of

discussions on the ‘www-talk’ mailing list. In particular, Rob

McCool, John Franks, Ari Luotonen, George Phillips and Tony Sanders

deserve special recognition for their efforts in defining and

implementing the early versions of this interface.

This document has also greatly benefited from the comments and

suggestions made Chris Adie, Dave Kristol and Mike Meyer; also David

Morris, Jeremy Madea, Patrick McManus, Adam Donahue, Ross Patterson

and Harald Alvestrand.

  1. References

11.1 Normative References

[1] Berners-Lee, T., Fielding, R. and H. Frystyk, “Hypertext

Transfer Protocol – HTTP/1.0”, RFC 1945, May 1996.

[2] Berners-Lee, T., Fielding, R. and L. Masinter, “Uniform Resource

Identifiers (URI) : Generic Syntax”, RFC 2396, August 1998.

[3] Bradner, S., “Key words for use in RFCs to Indicate Requirements

Levels”, BCP 14, RFC 2119, March 1997.

[4] Fielding, R., Gettys, J., Mogul, J., Frystyk, H., Masinter, L.,

Leach, P., and T. Berners-Lee, “Hypertext Transfer Protocol –

HTTP/1.1”, RFC 2616, June 1999.

[5] Franks, J., Hallam-Baker, P., Hostetler, J., Lawrence, S.,

Leach, P., Luotonen, A., and L. Stewart, “HTTP Authentication:

Basic and Digest Access Authentication”, RFC 2617, June 1999.

[6] Freed, N. and N. Borenstein, “Multipurpose Internet Mail

Extensions (MIME) Part Two: Media Types”, RFC 2046, November

1996.

[7] Hinden, R., Carpenter, B., and L. Masinter, “Format for Literal

IPv6 Addresses in URL’s”, RFC 2732, December 1999.

[8] “HTTP Status Code Registry”,

http://www.iana.org/assignments/http-status-codes, IANA.

[9] “Information Systems – Coded Character Sets – 7-bit American

Standard Code for Information Interchange (7-Bit ASCII)”, ANSI

INCITS.4-1986 (R2002).

[10] “Information technology – 8-bit single-byte coded graphic

character sets – Part 1: Latin alphabet No. 1”, ISO/IEC

8859-1:1998.

11.2. Informative References

[11] Berners-Lee, T., “Universal Resource Identifiers in WWW: A

Unifying Syntax for the Expression of Names and Addresses of

Objects on the Network as used in the World-Wide Web”, RFC 1630,

June 1994.

[12] Braden, R., Ed., “Requirements for Internet Hosts – Application

and Support”, STD 3, RFC 1123, October 1989.

[13] Crocker, D., “Standard for the Format of ARPA Internet Text

Messages”, STD 11, RFC 822, August 1982.

[14] Dierks, T. and C. Allen, “The TLS Protocol Version 1.0”, RFC

2246, January 1999.

[15] Hinden R. and S. Deering, “Internet Protocol Version 6 (IPv6)

Addressing Architecture”, RFC 3513, April 2003.

[16] Masinter, L., “Returning Values from Forms:

multipart/form-data”, RFC 2388, August 1998.

[17] Mockapetris, P., “Domain Names - Concepts and Facilities”, STD

13, RFC 1034, November 1987.

[18] Raggett, D., Le Hors, A., and I. Jacobs, Eds., “HTML 4.01

Specification”, W3C Recommendation December 1999,

http://www.w3.org/TR/html401/.

[19] Rescola, E. “HTTP Over TLS”, RFC 2818, May 2000.

[20] St. Johns, M., “Identification Protocol”, RFC 1413, February

1993.

[21] IBM National Language Support Reference Manual Volume 2,

SE09-8002-01, March 1990.

[22] “The Common Gateway Interface”,

http://hoohoo.ncsa.uiuc.edu/cgi/, NCSA, University of Illinois.

  1. Authors’ Addresses

David Robinson

The Apache Software Foundation

EMail: [email protected]

Ken A. L. Coar

The Apache Software Foundation

EMail: [email protected]

  1. Full Copyright Statement

Copyright © The Internet Society (2004). This document is subject

to the rights, licenses and restrictions contained in BCP 78 and at

www.rfc-editor.org, and except as set forth therein, the authors

retain all their rights.

This document and the information contained herein are provided on an

“AS IS” basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS

OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET

ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED,

INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE

INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED

WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.

Intellectual Property

The IETF takes no position regarding the validity or scope of any

Intellectual Property Rights or other rights that might be claimed to

pertain to the implementation or use of the technology described in

this document or the extent to which any license under such rights

might or might not be available; nor does it represent that it has

made any independent effort to identify any such rights. Information

on the ISOC’s procedures with respect to rights in ISOC Documents can

be found in BCP 78 and BCP 79.

Copies of IPR disclosures made to the IETF Secretariat and any

assurances of licenses to be made available, or the result of an

attempt made to obtain a general license or permission for the use of

such proprietary rights by implementers or users of this

specification can be obtained from the IETF on-line IPR repository at

http://www.ietf.org/ipr.

The IETF invites any interested party to bring to its attention any

copyrights, patents or patent applications, or other proprietary

rights that may cover technology that may be required to implement

this standard. Please address the information to the IETF at ietf-

[email protected].

Acknowledgement

Funding for the RFC Editor function is currently provided by the

Internet Society.

继续阅读