Skip to content

HtmlUnit returns UnexpectedPage when Content-Type header is missing #357

@antonireus

Description

@antonireus

We have a couple of system tests using Selenium/HtmlUnit in a legacy system using Java 7. Due to version restriction, in this tests I use HtmlUnit version 2.18 with Selenium 2.52.

Now we are migrating to Java 11, and we upgraded to HtmlUnit version 2.50 and Selenium 3.141.59, and we found an issue with one of the tests.

The problems seems to be that the response of one external service that we use doesn't contain the Content-Type header. Here is the log with the response headers:

DEBUG [pool-1-thread-1] (Wire.java 73) - http-outgoing-1 << "HTTP/1.1 200 OK[\r][\n]"
DEBUG [pool-1-thread-1] (Wire.java 73) - http-outgoing-1 << "Date: Fri, 11 Jun 2021 12:46:05 GMT[\r][\n]"
DEBUG [pool-1-thread-1] (Wire.java 73) - http-outgoing-1 << "Server: Apache[\r][\n]"
DEBUG [pool-1-thread-1] (Wire.java 73) - http-outgoing-1 << "X-XSS-Protection: 1; mode=block[\r][\n]"
DEBUG [pool-1-thread-1] (Wire.java 73) - http-outgoing-1 << "X-FRAME-OPTIONS: ALLOW-FROM http://intranet.pre.cdti.es[\r][\n]"
DEBUG [pool-1-thread-1] (Wire.java 73) - http-outgoing-1 << "Keep-Alive: timeout=3600, max=800[\r][\n]"
DEBUG [pool-1-thread-1] (Wire.java 73) - http-outgoing-1 << "Connection: Keep-Alive[\r][\n]"
DEBUG [pool-1-thread-1] (Wire.java 73) - http-outgoing-1 << "Transfer-Encoding: chunked[\r][\n]"

This leads to HtmlUnit creating an UnexpectedPage and unloading the DOM:

DEBUG [pool-1-thread-1] (HtmlPage.java 1254) - Firing Event unload (Current Target: com.gargoylesoftware.htmlunit.javascript.host.html.HTMLDocument@120eb694);
DEBUG [pool-1-thread-1] (WebWindowImpl.java 131) - setEnclosedPage: com.gargoylesoftware.htmlunit.UnexpectedPage@2bf01f1
DEBUG [pool-1-thread-1] (WebWindowImpl.java 213) - destroyChildren

It seems that with HtmlUnit 2.18, if Content-Type is missing, the default text/html is assumed. This is the same behaviour of the browsers I tested (Firefox, Chrome).

Is this an intended behaviour? Is there any workaround to assume text/html on missing Content-Type?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions