Skip to content Skip to sidebar Skip to footer

How To Apply Javascript To Html Simulating A Browser

I've already searched on the Internet how to 'create' a simple headless browser, because I was interested to know how does a Browser works internally. I'd like to implement a simpl

Solution 1:

If you're looking a headless browser I'm sure you're aware of phantomsJS. PhantomJS is a headless browser based off apple's webkit browser engine.

You're asking for a lot here. You need:

  1. a javascript runtime (such as v8) to run the javascript.
  2. a web engine to bring the html and the document object model it defines to life.

Both of those things take millions of lines of code to execute.

My recommendation is integrate your program with PhantomJS. PhantomJS is a headless webbrowser and a javascript environment. If you're using scala, start a child process of phantomjs and send messages to it via std i/o. The JS part of PhantomJS means that you use it via it's javascript API, so additionally you'd have to write a js script to handle the messages coming in from std i/o. It's undocumented but phantomjs has a system.std.in and system.std.out apis to handle the messages.

That's a lot of work and a lot of extra resources outside of the JVM to get it work. I saw that you're using scala so you could go with a simpler solution using jsoup to parse and modify the HTML document, however you would have to do the transformations using scala (or java).

Actually, now that I think about it, you should use jsdom paired with nodejs. JSDom implements the dom API without actually rendering it which might be what you need. jsdom is made for nodejs which is headless. You can also use node's std i/o and have it send messages to and from the JVM if you wanted to use both scala and node.


Here is a proof of concept to using jsdom to evaluate the javascript and modify the html. It's a really simple solution and it is the most resource efficient for the given task (and this is a hard task).

I made a gist for you with a very simple proof of concept. To run the gist do:

git clone https://gist.github.com/c8aef41ee27e5304e94f6a255b048f87.git apply-js-to-html
cd apply-js-to-html
npm install
node example.js

This is the meat of the example:

const jsdom = require('jsdom');

module.exports = function (html, js) {
    returnnewPromise((resolve, reject) => {
        jsdom.env(html, (error, window) => {
            if (error) {
                reject(error);
            }
            try {
            (functionevalInContext () {
                'use strict';
                constdocument = this.document;
                constwindow = this.window;
                eval(js);
                resolve(window.document.documentElement.innerHTML);
            }).call(window);
            } catch (e) {
                reject(e);
            }
        });
    });
}

And here is the module in use

const applu = require('./index');

const html = `
    <html>
        <head></head>
        <body>
            <p id="content"></p>
        <body>
    </html>
`;

const js = `document.getElementById("content").innerHTML = "Hello";`applu(html, js).then(result => {
    console.log('input html: ', html);
    console.log('output html: ', result);
}).catch(err =>console.error(error));

And here is the output of the code:

input html:  
    <html><head></head><body><pid="content"></p><body></html>

output html:  <head></head><body><pid="content">Hello</p></body>

jsdom creates a headless window and document environment that doesn't render anything. You can use eval and call it in context using window as the this value. I've also declared document and window again the js that will be evaled will have those variables in scope.

This is a just a basic POC, you'll have iron out the details by yourself.

Post a Comment for "How To Apply Javascript To Html Simulating A Browser"