Finally looking at python, to debug a problem

Friday, January 9th, 2009 at 10:24pm

For over two years I have used rss2email to deliver new items from quite a large number of RSS feeds (which is a post of its own) into my Inbox. This works extreemly well, except for one problem: pizzaburger.

I didn’t know what that is, but that this is the title that around half of the items from the FAIL Blog come through with. The other half have a title similar to fail-owned-desert-foundation-fail, but that isn’t the real title either. Two recent posts with these strange titles are actually titled Pen Trick Fail and Foundation Fail.

At first I thought there might be something screwed up with the RSS feed. However a quick look showed that the title tags are as they should be. After looking a bit deeper I saw where the strange titles were coming from, the media:title tag inside a media:content tag.

After checking that I was using the latest version of rss2email my thought was that rss2email is using media:title instead of title. It looked like it, but I decided to find out. One problem, rss2email is in python, which (despite meaning to for the last few years) I have yet to take a decent look at.

Hey, debugging someone elses code isn’t that bad a place to start…

Half an hour later I have determined that the problem was not in rss2email. It appeared to be in feedparser, a module for parsing all types of feeds, so after checking that I had the latest version I wrote a simple script to confirm.

Instead of digging into the feedparser code I first turned to Google to find a blog post about the same problem which in turn led to a bug report.

I now thought the problem was on its way to being solved, until I saw that the bug had been resolved just over a year ago. So why didn’t the latest version have the fix?

I then spotted the nightly build page that references version 4.2. That’s greater than 4.1 which is the latest stable release! From there it was a simple matter to download the latest nightly build and use my script to verify that 4.2 contained the fix I needed.

Even though I had resolved a problem that had annoyed me for months I wasn’t quite happy. Why had the feedparser maintainer sat on 4.2 for so long?

And how did I find python?

Once I started to get my head around the different syntax there wasn’t anything fundamentally different to other languages. As expected I had the most difficulty with understanding how someone else had structured their code.

Tagged with: ,