Skip to content Skip to sidebar Skip to footer

Web Scraping Javascript With Htmlunit - "you Are Currently Browsing With Javascript Turned Off"

I'm trying to scrape this page with HtmlUnit. In the Xml, it says 'You are currently browsing with JavaScript turned off which means you can't use our search functionality.' I've b

Solution 1:

Most of the content of the web page are rendered by javascript started async. You have to wait a bit to get the content....

String url = "https://doaj.org/search?source=%7B%22query%22%3A%7B%22filtered%22%3A%7B%22filter%22%3A%7B%22bool%22%3A%7B%22must%22%3A%5B%7B%22term%22%3A%7B%22index.classification.exact%22%3A%22Biology%20(General)%22%7D%7D%5D%7D%7D%2C%22query%22%3A%7B%22match_all%22%3A%7B%7D%7D%7D%7D%2C%22sort%22%3A%5B%7B%22created_date%22%3A%7B%22order%22%3A%22desc%22%7D%7D%5D%7D";

try (final WebClient webClient = new WebClient(BrowserVersion.FIREFOX)) {
    // js is enabled by default
    webClient.waitForBackgroundJavaScriptStartingBefore(1_000);

    HtmlPage page = webClient.getPage(url);
    webClient.waitForBackgroundJavaScript(10_000);

    System.out.println(page.asText());
}

Works here with version 2.42.0-SNAPSHOT but should work also with the 2.41.0 release.

Post a Comment for "Web Scraping Javascript With Htmlunit - "you Are Currently Browsing With Javascript Turned Off""