translate

python-readability

https://travis-ci.org/buriy/python-readability.svg?branch=master

python-readability

Given a html document, it pulls out the main body text and cleans it up.

This is a python port of a ruby port of arc90's readability
project
.

Installation

It's easy using pip, just run:

$ pip install readability-lxml

Usage

>> import requests
>> from readability import Document
>>
>> response = requests.get('http://example.com')
>> doc = Document(response.text)
>> doc.title()
>> 'Example Domain'

Change Log

  • 0.3 Added Document.encoding, positive_keywords and
    negative_keywords
  • 0.4 Added Videos loading and allowed more images per paragraph
  • 0.5 Preparing a release to support Python versions 2.6, 2.7, 3.3 and
    3.4
  • 0.6 Finally a release which supports Python versions 2.6, 2.7, 3.3
    and 3.4

Licensing

This code is under the Apache License
2.0
license.

Thanks to

  • Latest
    readability.js
  • Ruby port by starrhorne and iterationlabs
  • Python port by
    gfxmonk
  • Decruft
    effort

    to move to lxml
  • "BR to P" fix from readability.js which improves quality for smaller
    texts
  • Github users contributions.

Rating

ABOUT

LESS COMMENTS

MESSAGE REVIEW OK

Ok