Posted by danielmeyer on October 20, 2009
As I learn Python, I’m writing some little helper scripts. Here’s the latest, a script that prints all the .mp3 links referenced by a page (or pages) to standard output:
# mp3s.py # # Purpose: display all .mp3 links in the pages # pointed to by the given URLs # # Usage: python mp3s.py url [url2 [url3...]] # # Author: Daniel Meyer # Date: Oct 20, 2009 import sys import urllib2 from HTMLParser import HTMLParser class LinkFinder(HTMLParser): def __init__(self): HTMLParser.__init__(self) self.links =  def handle_starttag(self, tag, attrs): if tag == 'a': for attr, value in attrs: if attr == 'href': self.links.append(value) for url in sys.argv[1:] : page = urllib2.urlopen(url) linkFinder = LinkFinder() linkFinder.feed(page.read()) linkFinder.close() for link in linkFinder.links: if link.find('.mp3') != -1: print link
Notice lines 15-17, which were required to initialize the
links data member (since while still calling the base class constructor.
I’ve found this type of thing helpful when preparing to download conference audio where there are several individual mp3 links – I can then pipe the output through xargs to wget to download ‘em:
python mp3s.py http://www.t4g.org/conference/t4g-2006/ | xargs wget