As I learn Python, I’m writing some little helper scripts. Here’s the latest, a script that prints all the .mp3 links referenced by a page (or pages) to standard output:
# mp3s.py
#
# Purpose: display all .mp3 links in the pages
# pointed to by the given URLs
#
# Usage: python mp3s.py url [url2 [url3...]]
#
# Author: Daniel Meyer
# Date: Oct 20, 2009
import sys
import urllib2
from HTMLParser import HTMLParser
class LinkFinder(HTMLParser):
def __init__(self):
HTMLParser.__init__(self)
self.links = []
def handle_starttag(self, tag, attrs):
if tag == 'a':
for attr, value in attrs:
if attr == 'href':
self.links.append(value)
for url in sys.argv[1:] :
page = urllib2.urlopen(url)
linkFinder = LinkFinder()
linkFinder.feed(page.read())
linkFinder.close()
for link in linkFinder.links:
if link.find('.mp3') != -1:
print link
Notice lines 15-17, which were required to initialize the links
data member (since while still calling the base class constructor.
I’ve found this type of thing helpful when preparing to download conference audio where there are several individual mp3 links – I can then pipe the output through xargs to wget to download ’em:
python mp3s.py http://www.t4g.org/conference/t4g-2006/ | xargs wget