In my opinion, Redis is now the swiss army knife for any developer writing a scraper. I can't remember a sizeable scraping project I started in the past year that didn't involve Redis somehow. Queueing Queueing isn't particularly tied to web scraping, it's just a necessary part. The chances are…
Pony is a neat new ORM on the block. You likely haven't heard of it, as it doesn't seem like anybody has. It is similar to other ORMs such as SQLAlchemy in the sense that you can define your models, and query them using convenient syntax. However, the syntax for…
If you're in any kind of tech circles, then all you will have heard today and yesterday is discussion regarding Heartbleed. It's a very serious bug within the OpenSSL library. The chances are if you have a server, you're using OpenSSL somewhere or other. Everybody with public facing servers using…
Requests is a fantastic library for python, one of the most enjoyable libraries I have used to this day. I use it on a daily basis for most of my scraping activities. The chances are you have some convenience functions that you use in all of your scraping projects, but…
If you need to extract data from a web page, then the chances are you looked for their API. Unfortunately this isn't always available and you sometimes have to fall back to web scraping. In this article I'm going to cover a lot of the things that apply to all…
We're going to create a simple flask webapp. I say simple because it only has three routes, however this doesn't mean it can't be extremely useful. Jump to final code The webapp will provide the following three functions: Root route - Details for requesters IP IP route - Details for…
This is only going to be a short one. If you want to scrape the main content of a site only (we aren't interested in the menu/sidebar etc), then you can use the python port of the readability library. Jump to final code We'll simply fetch the page using…