A Scraper's Toolkit: Redis

In my opinion, Redis is now the swiss army knife for any developer writing a scraper. I can't remember a sizeable scraping project I started in the past year that didn't involve Redis somehow. Queueing Queueing isn't particularly tied to web scraping, it's just a necessary part. The chances are…

Continue reading

Why you should give PonyORM a chance

Pony is a neat new ORM on the block. You likely haven't heard of it, as it doesn't seem like anybody has. It is similar to other ORMs such as SQLAlchemy in the sense that you can define your models, and query them using convenient syntax. However, the syntax for…

Continue reading

Heartbleed affects clients too

If you're in any kind of tech circles, then all you will have heard today and yesterday is discussion regarding Heartbleed. It's a very serious bug within the OpenSSL library. The chances are if you have a server, you're using OpenSSL somewhere or other. Everybody with public facing servers using…

Continue reading

Extending the requests response class

Requests is a fantastic library for python, one of the most enjoyable libraries I have used to this day. I use it on a daily basis for most of my scraping activities. The chances are you have some convenience functions that you use in all of your scraping projects, but…

Continue reading

Python web scraping resource

If you need to extract data from a web page, then the chances are you looked for their API. Unfortunately this isn't always available and you sometimes have to fall back to web scraping. In this article I'm going to cover a lot of the things that apply to all…

Continue reading

Flask GeoIP API in python

We're going to create a simple flask webapp. I say simple because it only has three routes, however this doesn't mean it can't be extremely useful. Jump to final code The webapp will provide the following three functions: Root route - Details for requesters IP IP route - Details for…

Continue reading

Scraping content with readability and python

This is only going to be a short one. If you want to scrape the main content of a site only (we aren't interested in the menu/sidebar etc), then you can use the python port of the readability library. Jump to final code We'll simply fetch the page using…

Continue reading