Extending the requests response class

Requests is a fantastic library for python, one of the most enjoyable libraries I have used to this day. I use it on a daily basis for most of my scraping activities.

The chances are you have some convenience functions that you use in all of your scraping projects, but you may have just been copying them around for now, and passing your response objects into them as an argument. We can do better.

I'm just going to show you how to add a few simple methods to the Response class, so that you can use this technique for your own projects with your own methods.

We'll start by defining a Response class with a few convenience methods. The important method defined is doc() . It "caches" the parsed tree of the HTML, so all our other convenient methods don't cause the whole HTML to be re-parsed with each function call.

import requests  
from lxml import html  
import inspect

class Response(object):  
    def doc(self):
        if not hasattr(self, '_doc'):
            self._doc = html.fromstring(self.text)
        return self._doc

    def links(self):
        return self.doc().xpath('//a/@href')

    def images(self, filter_extensions=['jpg', 'jpeg', 'gif', 'png']):
        return [link for link in self.doc().xpath('//img/@src') if link.endswith(tuple(filter_extensions))]

    def title(self):
        title = self.doc().xpath('//title/text()')
        if len(title):
            return title[0].strip()
            return None

Now we need to patch the requests.Response class with the methods inside our newly defined class. We'll use the getmembers() function from the inspect module passing the ismethod() method.

for method_name, method in inspect.getmembers(Response, inspect.ismethod):  
    setattr(requests.models.Response, method_name, method.im_func)

We're all done. You can now access these convinience functions on any response object, see the following example:

r = requests.get('http://imgur.com/')  
print r.title()  
print r.images(filter_extensions=['png'])  

Now go ahead, and make your response objects as powerful as you desire. If you're interested in other scraping related hints / tips, check out my python web scraping resource.

By Jake Austwick

21 year old self-taught programmer living in San Diego, California. Proficient in Python, Ruby and have dabbled in Go. Main interests are web scraping and web applications.

comments powered by Disqus