Link getter

As I learn Python, I’m writing some little helper scripts.  Here’s the latest, a script that prints all the .mp3 links referenced by a page (or pages) to standard output:

# mp3s.py
#
# Purpose: display all .mp3 links in the pages
#   pointed to by the given URLs
#
# Usage: python mp3s.py url [url2 [url3...]]
#
# Author: Daniel Meyer
# Date: Oct 20, 2009
import sys
import urllib2
from HTMLParser import HTMLParser

class LinkFinder(HTMLParser):
  def __init__(self):
    HTMLParser.__init__(self)
    self.links = []

  def handle_starttag(self, tag, attrs):
    if tag == 'a':
      for attr, value in attrs:
        if attr == 'href':
          self.links.append(value)

for url in sys.argv[1:] :
  page = urllib2.urlopen(url)
  linkFinder = LinkFinder()
  linkFinder.feed(page.read())
  linkFinder.close()

  for link in linkFinder.links:
    if link.find('.mp3') != -1:
      print link

Notice lines 15-17, which were required to initialize the links data member (since while still calling the base class constructor.

I’ve found this type of thing helpful when preparing to download conference audio where there are several individual mp3 links – I can then pipe the output through xargs to wget to download ’em:

python mp3s.py http://www.t4g.org/conference/t4g-2006/ | xargs wget

Advertisements

  1. Leave a comment

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s