-
-
Notifications
You must be signed in to change notification settings - Fork 181
Open
Description
I'm trying to load a Wechat official account URL https://mp.weixin.qq.com/s/PcUNumU5j2T3UD-3FN66lQ and I get some exceptions and a strange page.
The java code looks to add some useless Html tags like span and attributes like "display:none" in div which causes the div not visible (please check out the attachment below).
My code:
public static String getPageXmlByUrl(String url) {
if (!isUrl(url)) {
throw new ServerException(ResultEnum.PARAM_ERROR);
}
final WebClient webClient = new WebClient(BrowserVersion.FIREFOX);
webClient.getOptions().setThrowExceptionOnScriptError(false);
webClient.getOptions().setThrowExceptionOnFailingStatusCode(false);
webClient.getOptions().setActiveXNative(false);
webClient.getOptions().setCssEnabled(false);
webClient.getOptions().setJavaScriptEnabled(true);
webClient.getOptions().setDownloadImages(false);
webClient.getOptions().setWebSocketEnabled(true);
webClient.setAjaxController(new NicelyResynchronizingAjaxController());
HtmlPage page = null;
try {
page = webClient.getPage(url);
webClient.waitForBackgroundJavaScript(2000);
} catch (Exception e) {
e.printStackTrace();
} finally {
webClient.close();
}
String pageXml = page.asXml();
return pageXml;
}
Exceptions:
Exceptions.txt
Result Html from the testing code
Strange Page.html.txt
Metadata
Metadata
Assignees
Labels
No labels