天天看點

Cross-Site Scripting Attacks (XSS)

A cross-site scripting attack is one of the top 5 security

attacks carried out on a daily basis across the Internet, and your PHP

scripts may not be immune.

Also known as XSS, the attack is

basically a type of code injection attack which is made possible by

incorrectly validating user data, which usually gets inserted into the

page through a web form or using an altered hyperlink. The code injected

can be any malicious client-side code, such as JavaScript, VBScript,

HTML, CSS, Flash, and others. The code is used to save harmful data on

the server or perform a malicious action within the user’s browser.

Unfortunately,

cross-site scripting attacks occurs mostly, because developers are

failing to deliver secure code. Every PHP programmer has the

responsibility to understand how attacks can be carried out against

their PHP scripts to exploit possible security vulnerabilities. Reading

this article, you’ll find out more about cross-site scripting attacks

and how to prevent them in your code.

Learning by Example

Let’s take the following code snippet.1    <form action="post.php" method="post">

2    <input type="text" name="comment" value="">

3    <input type="submit" name="submit" value="Submit">

4    </form>

Here

we have a simple form in which there is a text box for data input and a

submit button. Once the form is submitted, it will submit the data to

post.php

for processing. Let’s say all

post.php

does is output the data like so:

<?php

echo $_POST["comment"];Without

any filtering, a hacker could submit the following through the form

which will generates a popup in the browser with the message “hacked”.

<script>alert("hacked")</script>This

example, despite its being malicious in nature, does not seem to do

much harm. But think about what could happen in the JavaScript code was

written to steal a user’s cookie and extract sensitive information from

it? There are far worse XSS attacks than a simple

alert()

call.

Cross-site

scripting attacks can be grouped in two major categories, based on how

they deliver the malicious payload: non-persistent XSS, and persistent

XSS. Allow me to discuss each type in detail.

Non-persistent XSS

Also

known as reflected XSS attack, meaning that the actual malicious code

is not stored on the server but rather gets passed through it and

presented to the victim, is the more popular XSS strategy of the two

delivery methods. The attack is launched from an external source, such

as from an e-mail message or a third-party website.

Here’s an example of a portion of a simple search result script:1    <?php

2    // Get search results based on the query

3    echo "You searched for: " . $_GET["query"];

4    

5    // List search results

6    ...The

example can be a very unsecure results page where the search query is

displayed back to the user. The problem here is that the

$_GET["query"]

variable isn’t validated or escaped, therefore an attacker could send the following link to the victim: http://example.com/search.php?query=<script>alert("hacked")</script>Without validation, the page would contain:You searched for: <script>alert("hacked")</script>

Persistent XSS

This

type of attack happens when the malicious code has already slipped

through the validation process and it is stored in a data store. This

could be a comment, log file, notification message, or any other section

on the website which required user input at one time. Later, when this

particular information is presented on the website, the malicious code

gets executed.

Let’s use the following example for a rudimentary

file-based comment system. Assuming the same form I presented earlier,

let’s say the receiving script simply appends the comment to a data

file.1    <?php

2    file_put_contents("comments.txt", $_POST["comment"], FILE_APPEND);

Elsewhere the contents of comments.txt is shown to visitors:

1    <?php

2    echo file_get_contents("comments.txt");When

a user submit a comment it gets saved to the data file. Then the entire

file (thus the entire series of comments) is displayed to the

readership. If malicious code is submitted then it will be saved and

displayed as is without any validation or escaping.

Preventing Cross-Site Scripting Attacks

Fortunately,

as easily as an XSS attack can carried out against an unprotected

website, protecting against them are just as easy. Prevention must

always be in your thoughts, though, even before you write a single line

of code.

The first rule which needs to be “enforced” in any web environment (be it development, staging, or production) is never trust data coming from the user or from any other third party sources.

This can’t be emphasized enough. Every bit of data must be validated on

input and escaped on output. This is the golden rule of preventing XSS.

In

order to implement solid security measures which prevents XSS attacks,

we should be mindful of data validation, data sanitization, and output

escaping.

Data Validation

Data validation is the process

of ensuring that your application is running with correct data. If your

PHP script expects an integer for user input, then any other type of

data would be discarded. Every piece of user data must be validated when

it is received to ensure it is of the corrected type, and discarded if

it doesn’t pass the validation process.

If you wanted to validate a

phone number, for example, you would discard any strings containing

letters, because a phone number should consist of digits only. You

should also take the length of the string into consideration. If you

wanted to be more permissive, you could allow a limited set of special

characters such as plus, parenthesis, and dashes which are often used in

formatting phone numbers specific to your intended locale.1    <?php

2    // validate a US phone number

3    if (preg_match('/^((1-)?\d{3}-)\d{3}-\d{4}$/', $phone)) {

4        echo $phone . " is valid format.";

5    }

Data Sanitization

Data

sanitization focuses on manipulating the data to make sure it is safe

by removing any unwanted bits from the data and normalizing it to the

correct form. For example, if you are expecting a plain text string as

user input, you may want to remove any HTML markup from it.<?php// sanitize HTML from the comment$comment = strip_tags($_POST["comment"]);Sometimes, data validation and sanitization/normalization can go hand in hand.

// escape output sent to the browser

echo "You searched for: " . htmlspecialchars($_GET["query"]);

Output Escaping

order to protect the integrity of displayed/output data, you should

escape the data when presenting it to the user. This prevents the

browser from applying any unintended meaning to any special sequence of

characters that may be found. <?php// escape output sent to the browserecho "You searched for: " . htmlspecialchars($_GET["query"]);

All Together Now!

To

better understand the three aspects of data processing, let’s take

another look at the file-based comment system from earlier and modify it

to make sure it’s secure. The potential vulnerabilities in the code

stem from the fact that

$_POST["comment"]

is blindly appended to the

comments.txt

file which is then displayed directly to the user. To secure it, the

$_POST["comment"]

value should be validated and sanitized before it is added to the file,

and the file’s contents should be escaped when displayed to the user.

// validate comment

$comment = trim($_POST["comment"]);

if (empty($comment)) {

    exit("must provide a comment");

}

// sanitize comment

$comment = strip_tags($comment);

// comment is now safe for storage

file_put_contents("comments.txt", $comment, FILE_APPEND);

// escape comments before display

$comments = file_get_contents("comments.txt");

echo htmlspecialchars($comments);The

script first validates the incoming comment to make sure a non-zero

length string as been provided by the user. After all, a blank comment

isn’t very interesting.

Data validation needs to happen within a

well defined context, meaning that if I expect an integer back from the

user, then I validate it accordingly by converting the data into an

integer and handle it as an integer. If this results in invalid data,

then simply discard it and let the user know about it.

Then the script sanitizes the comment by removing any HTML tags it may contain.

And finally, the comments are retrieved, filtered, and displayed.

Generally the

htmlspecialchars()

function is sufficient for filtering output intended for viewing in a

browser. If you’re using a character encoding in your web pages other

than ISO-8859-1 or UTF-8, though, then you’ll want to use

htmlentities()

. For more information on the two functions, read their respective write-ups in the official PHP documentation.

Bear

in mind that no single solution exists that is 100% secure on a

constantly evolving medium like the Web. Test your validation code

thoroughly with the most up to date XSS test vectors. Using the test

data from the following sources should reveal if your code is still

prone to XSS attacks.

  • RSnake XSS cheatsheet (a pretty comprehensive list of XSS vectors you can use to test your code)
  • Zend Framework’s XSS test data
  • XSS cheatsheet (makes use of HTML5 features)

Summary

Hopefully

this article gave you a good explanation of what cross-site scripting

attacks are and how you can prevent them from happening to your code.

Never trust data coming from the user or from any other third party

sources. You can protect yourself by validating the incoming values in a

well defined context, sanitizing the data to protect your code, and

escaping output to protect your users. After you’ve written your code,

be sure your efforts work correctly by testing the code as thoroughly as

you can.

XSS