Foreword
This article was first published in my Tumblr blog in 2015. Yes, the devblogging concept was not wide-spread those days and I was looking for a nice solution and Tumblr was a good choice. Now, however, we have many blogging solutions for developers. That's why I migrate the article here.
There are some things to be aware of, though. This article was published in 2015. Things were quite different back then.
- There was no concept of headless browsing, a browser that runs as a service and not providing a GUI. In those days, there was a project called PhantomJS, which is pretty much abandoned today and does not receive updates. Today, however, major browsers provide a headless option. If you see some mumbling about PhantomJS, ignore it.
- The part about QuickJava extension configuration is removed in this version of article. The extension was removed from Firefox extension registry and does not exist today. It was basically an extension to disable some things, like Flash, Silverlight, Javascript etc. Today, Firefox provides configurations to disable them, but...
- Standard Firefox configurations are kept as is in this article. That's because (i) I am lazy, (ii) I want to persist my technical past and (iii) I've actually linked this in a Stackoverflow question that still gets reactions today. That's why disabling CSS is not included in this article (which was done by QuickJava). You should figure it out.
I've got to say, however, I will return to this article one day and edit it out properly. Until then, though, I present you "me" in 2015.
Article
It is good to run a browser then manipulate the DOM elements on a page and scrap data. However, it might be a nightmare testing on a personal computer. There are a couple of solutions for headless browser in Python, but in Selenium, there’s one choice and it seems to be buggy while manipulating DOM elements. So there’s another choice, which is built-in widely in the most of Linux distributions: Firefox!
You might find it too slow. However, there are a couple of about:config
tricks and extension for increasing the rendering speed. First, create a Firefox profile instance:
from selenium import webdriver
profile = webdriver.FirefoxProfile()
And these are the built-in configuration of Firefox:
profile.set_preference("network.http.pipelining", True)
profile.set_preference("network.http.proxy.pipelining", True)
profile.set_preference("network.http.pipelining.maxrequests", 8)
profile.set_preference("content.notify.interval", 500000)
profile.set_preference("content.notify.ontimer", True)
profile.set_preference("content.switch.threshold", 250000)
profile.set_preference("browser.cache.memory.capacity", 65536) # Increase the cache capacity.
profile.set_preference("browser.startup.homepage", "about:blank")
profile.set_preference("reader.parse-on-load.enabled", False) # Disable reader, we won't need that.
profile.set_preference("browser.pocket.enabled", False) # Duck pocket too!
profile.set_preference("loop.enabled", False)
profile.set_preference("browser.chrome.toolbar_style", 1) # Text on Toolbar instead of icons
profile.set_preference("browser.display.show_image_placeholders", False) # Don't show thumbnails on not loaded images.
profile.set_preference("browser.display.use_document_colors", False) # Don't show document colors.
profile.set_preference("browser.display.use_document_fonts", 0) # Don't load document fonts.
profile.set_preference("browser.display.use_system_colors", True) # Use system colors.
profile.set_preference("browser.formfill.enable", False) # Autofill on forms disabled.
profile.set_preference("browser.helperApps.deleteTempFileOnExit", True) # Delete temprorary files.
profile.set_preference("browser.shell.checkDefaultBrowser", False)
profile.set_preference("browser.startup.homepage", "about:blank")
profile.set_preference("browser.startup.page", 0) # blank
profile.set_preference("browser.tabs.forceHide", True) # Disable tabs, We won't need that.
profile.set_preference("browser.urlbar.autoFill", False) # Disable autofill on URL bar.
profile.set_preference("browser.urlbar.autocomplete.enabled", False) # Disable autocomplete on URL bar.
profile.set_preference("browser.urlbar.showPopup", False) # Disable list of URLs when typing on URL bar.
profile.set_preference("browser.urlbar.showSearch", False) # Disable search bar.
profile.set_preference("extensions.checkCompatibility", False) # Addon update disabled
profile.set_preference("extensions.checkUpdateSecurity", False)
profile.set_preference("extensions.update.autoUpdateEnabled", False)
profile.set_preference("extensions.update.enabled", False)
profile.set_preference("general.startup.browser", False)
profile.set_preference("plugin.default_plugin_disabled", False)
profile.set_preference("permissions.default.image", 2) # Image load disabled again
Those will make your browser load and render the page faster. Thanks for reading.