A cross-site scripting attack is one of the top 5 security
attacks carried out on a daily basis across the Internet, and your PHP
scripts may not be immune.
Also known as XSS, the attack is
basically a type of code injection attack which is made possible by
incorrectly validating user data, which usually gets inserted into the
page through a web form or using an altered hyperlink. The code injected
can be any malicious client-side code, such as JavaScript, VBScript,
HTML, CSS, Flash, and others. The code is used to save harmful data on
the server or perform a malicious action within the user’s browser.
Unfortunately,
cross-site scripting attacks occurs mostly, because developers are
failing to deliver secure code. Every PHP programmer has the
responsibility to understand how attacks can be carried out against
their PHP scripts to exploit possible security vulnerabilities. Reading
this article, you’ll find out more about cross-site scripting attacks
and how to prevent them in your code.
Learning by Example
Let’s take the following code snippet.1 <form action="post.php" method="post">
2 <input type="text" name="comment" value="">
3 <input type="submit" name="submit" value="Submit">
4 </form>
Here
we have a simple form in which there is a text box for data input and a
submit button. Once the form is submitted, it will submit the data to
post.php
for processing. Let’s say all
post.php
does is output the data like so:
<?php
echo $_POST["comment"];Without
any filtering, a hacker could submit the following through the form
which will generates a popup in the browser with the message “hacked”.
<script>alert("hacked")</script>This
example, despite its being malicious in nature, does not seem to do
much harm. But think about what could happen in the JavaScript code was
written to steal a user’s cookie and extract sensitive information from
it? There are far worse XSS attacks than a simple
alert()
call.
Cross-site
scripting attacks can be grouped in two major categories, based on how
they deliver the malicious payload: non-persistent XSS, and persistent
XSS. Allow me to discuss each type in detail.
Non-persistent XSS
Also
known as reflected XSS attack, meaning that the actual malicious code
is not stored on the server but rather gets passed through it and
presented to the victim, is the more popular XSS strategy of the two
delivery methods. The attack is launched from an external source, such
as from an e-mail message or a third-party website.
Here’s an example of a portion of a simple search result script:1 <?php
2 // Get search results based on the query
3 echo "You searched for: " . $_GET["query"];
4
5 // List search results
6 ...The
example can be a very unsecure results page where the search query is
displayed back to the user. The problem here is that the
$_GET["query"]
variable isn’t validated or escaped, therefore an attacker could send the following link to the victim: http://example.com/search.php?query=<script>alert("hacked")</script>Without validation, the page would contain:You searched for: <script>alert("hacked")</script>
Persistent XSS
This
type of attack happens when the malicious code has already slipped
through the validation process and it is stored in a data store. This
could be a comment, log file, notification message, or any other section
on the website which required user input at one time. Later, when this
particular information is presented on the website, the malicious code
gets executed.
Let’s use the following example for a rudimentary
file-based comment system. Assuming the same form I presented earlier,
let’s say the receiving script simply appends the comment to a data
file.1 <?php
2 file_put_contents("comments.txt", $_POST["comment"], FILE_APPEND);
Elsewhere the contents of comments.txt is shown to visitors:
1 <?php
2 echo file_get_contents("comments.txt");When
a user submit a comment it gets saved to the data file. Then the entire
file (thus the entire series of comments) is displayed to the
readership. If malicious code is submitted then it will be saved and
displayed as is without any validation or escaping.
Preventing Cross-Site Scripting Attacks
Fortunately,
as easily as an XSS attack can carried out against an unprotected
website, protecting against them are just as easy. Prevention must
always be in your thoughts, though, even before you write a single line
of code.
The first rule which needs to be “enforced” in any web environment (be it development, staging, or production) is never trust data coming from the user or from any other third party sources.
This can’t be emphasized enough. Every bit of data must be validated on
input and escaped on output. This is the golden rule of preventing XSS.
In
order to implement solid security measures which prevents XSS attacks,
we should be mindful of data validation, data sanitization, and output
escaping.
Data Validation
Data validation is the process
of ensuring that your application is running with correct data. If your
PHP script expects an integer for user input, then any other type of
data would be discarded. Every piece of user data must be validated when
it is received to ensure it is of the corrected type, and discarded if
it doesn’t pass the validation process.
If you wanted to validate a
phone number, for example, you would discard any strings containing
letters, because a phone number should consist of digits only. You
should also take the length of the string into consideration. If you
wanted to be more permissive, you could allow a limited set of special
characters such as plus, parenthesis, and dashes which are often used in
formatting phone numbers specific to your intended locale.1 <?php
2 // validate a US phone number
3 if (preg_match('/^((1-)?\d{3}-)\d{3}-\d{4}$/', $phone)) {
4 echo $phone . " is valid format.";
5 }
Data Sanitization
Data
sanitization focuses on manipulating the data to make sure it is safe
by removing any unwanted bits from the data and normalizing it to the
correct form. For example, if you are expecting a plain text string as
user input, you may want to remove any HTML markup from it.<?php// sanitize HTML from the comment$comment = strip_tags($_POST["comment"]);Sometimes, data validation and sanitization/normalization can go hand in hand.
// escape output sent to the browser
echo "You searched for: " . htmlspecialchars($_GET["query"]);
Output Escaping
order to protect the integrity of displayed/output data, you should
escape the data when presenting it to the user. This prevents the
browser from applying any unintended meaning to any special sequence of
characters that may be found. <?php// escape output sent to the browserecho "You searched for: " . htmlspecialchars($_GET["query"]);
All Together Now!
To
better understand the three aspects of data processing, let’s take
another look at the file-based comment system from earlier and modify it
to make sure it’s secure. The potential vulnerabilities in the code
stem from the fact that
$_POST["comment"]
is blindly appended to the
comments.txt
file which is then displayed directly to the user. To secure it, the
$_POST["comment"]
value should be validated and sanitized before it is added to the file,
and the file’s contents should be escaped when displayed to the user.
// validate comment
$comment = trim($_POST["comment"]);
if (empty($comment)) {
exit("must provide a comment");
}
// sanitize comment
$comment = strip_tags($comment);
// comment is now safe for storage
file_put_contents("comments.txt", $comment, FILE_APPEND);
// escape comments before display
$comments = file_get_contents("comments.txt");
echo htmlspecialchars($comments);The
script first validates the incoming comment to make sure a non-zero
length string as been provided by the user. After all, a blank comment
isn’t very interesting.
Data validation needs to happen within a
well defined context, meaning that if I expect an integer back from the
user, then I validate it accordingly by converting the data into an
integer and handle it as an integer. If this results in invalid data,
then simply discard it and let the user know about it.
Then the script sanitizes the comment by removing any HTML tags it may contain.
And finally, the comments are retrieved, filtered, and displayed.
Generally the
htmlspecialchars()
function is sufficient for filtering output intended for viewing in a
browser. If you’re using a character encoding in your web pages other
than ISO-8859-1 or UTF-8, though, then you’ll want to use
htmlentities()
. For more information on the two functions, read their respective write-ups in the official PHP documentation.
Bear
in mind that no single solution exists that is 100% secure on a
constantly evolving medium like the Web. Test your validation code
thoroughly with the most up to date XSS test vectors. Using the test
data from the following sources should reveal if your code is still
prone to XSS attacks.
- RSnake XSS cheatsheet (a pretty comprehensive list of XSS vectors you can use to test your code)
- Zend Framework’s XSS test data
- XSS cheatsheet (makes use of HTML5 features)
Summary
Hopefully
this article gave you a good explanation of what cross-site scripting
attacks are and how you can prevent them from happening to your code.
Never trust data coming from the user or from any other third party
sources. You can protect yourself by validating the incoming values in a
well defined context, sanitizing the data to protect your code, and
escaping output to protect your users. After you’ve written your code,
be sure your efforts work correctly by testing the code as thoroughly as
you can.