-
-
Notifications
You must be signed in to change notification settings - Fork 181
Description
Hello,
i am wondering if it is somehow possible to get the page source (after javascript has been executed) as HTML (like a browser would show it if i inspect a pages source with its developer console).
I know that i can convert a HtmlPage
to XML like this:
a given html:
<div>
<span id="dynamic"></span>
<script>
document.querySelector('#dynamic').innerHTML = "<span>dynamically added</span>";
</script>
</div>
// kotlin example to parse HTML as string to HtmlPage object
val rendered: HtmlPage = WebClient(BrowserVersion.BEST_SUPPORTED).loadHtmlCodeIntoCurrentWindow(htmlFromAboveAsString)
// convert HtmlPage object to XML as string and print it to console
println(rendered.asXml())
which will lead to the following output:
<?xml version="1.0" encoding="UTF-8"?>
<html>
<head/>
<body>
<div>
<span id="dynamic">
<span>
dynamically added
</span>
</span>
<script>
//<![CDATA[
document.querySelector('#dynamic').innerHTML = "<span>dynamically added</span>";
//]]>
</script>
</div>
</body>
</html>
but since this is xml (as the asXml()
function promises^^) the string will diverge from what a browser would show during DOM inspection.
because the asXml()
methods use-case is to create a valid XML, it adds a prolog that defines the XML version and the character encoding on top (<?xml version="1.0" encoding="UTF-8"?>
) as well as wrapping the innerText of script tags with a CDATA block to not clash with potential valid XML tags (like in my example a text including things like <span>dynamically added</span>
) and potentially doing even more things.
a real browser on the other hand would give me the actual html after rendering while having a look in its developer console, like this:
<html>
<head></head>
<body>
<div>
<span id="dynamic"><span>dynamically added</span></span>
<script>
document.querySelector('#dynamic').innerHTML = "<span>dynamically added</span>";
</script>
</div>
</body>
</html>
Actual Question:
Is it possible to get a rendered html version as string instead of a html that has been converted to xml?