Our Craft

Making it better

Link getter

Posted by danielmeyer on October 20, 2009

As I learn Python, I’m writing some little helper scripts.  Here’s the latest, a script that prints all the .mp3 links referenced by a page (or pages) to standard output:

# mp3s.py
#
# Purpose: display all .mp3 links in the pages
#   pointed to by the given URLs
#
# Usage: python mp3s.py url [url2 [url3...]]
#
# Author: Daniel Meyer
# Date: Oct 20, 2009
import sys
import urllib2
from HTMLParser import HTMLParser

class LinkFinder(HTMLParser):
  def __init__(self):
    HTMLParser.__init__(self)
    self.links = []

  def handle_starttag(self, tag, attrs):
    if tag == 'a':
      for attr, value in attrs:
        if attr == 'href':
          self.links.append(value)

for url in sys.argv[1:] :
  page = urllib2.urlopen(url)
  linkFinder = LinkFinder()
  linkFinder.feed(page.read())
  linkFinder.close()

  for link in linkFinder.links:
    if link.find('.mp3') != -1:
      print link

Notice lines 15-17, which were required to initialize the links data member (since while still calling the base class constructor.

I’ve found this type of thing helpful when preparing to download conference audio where there are several individual mp3 links – I can then pipe the output through xargs to wget to download ‘em:

python mp3s.py http://www.t4g.org/conference/t4g-2006/ | xargs wget

About these ads

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Connecting to %s

 
Follow

Get every new post delivered to your Inbox.