So, my efforts to simplify the maintenance of my website and
archiving of my blog contents is snowballing, as any good programming
project should. My ultimate objective: to write a set of scripts (I
don't care what language - would prefer python, Java, or perl, in that
order) that extracts content (posts and photo gallery links) from my
typepad blog and archives it to my static website so that I don't have
to edit any html files or manually transfer files. Ideally, I want to
run one program that finds the latest photo galleries on my blog,
extracts all new (or all, period) posts to my blogs, writes html files
in the format desired for my static website, and even uploads the files
to the site.
Typepad provides an export facility that seems to give me all the
posts ever made to each blog and writes them to a flat file of some
parseable format (I've written a simple parser in python).
My typepad blog displays a list of photo albums along with
thumbnails - I want to add similar links to the "Recent Photos" section
on my static website without manually editing the html. One way to
accomplish this would be to open the url for the blog, and write a
script that parses the resulting html to extract the thumbnail links -
this is doable in python using the xml dom parser (I don't want to use
an event driven sax parser).
There are drawbacks to both of these approaches, unfortunately.
The export feature of typepad is accomplished by logging in to the
admin tool and manually navigating to the link for exporting. Maybe I
can write a python script that can handle the admin log in and then
send the export url for each blog, capturing the output for each to a
file.
Parsing the html for my blog to find image thumbnail links is
doable, but not very pretty. If typepad changes the html structure, my
parsing program must change.
In both cases, it would be nice to directly query typepad for the
posts for each blog, and the current list of photo albums. Typepad
provides some support for a publishing API called Atom, and publishing
of blog entries using RSS. I'm trying to determine how I can use one
or both of these API's to accomplish the objectives described above.
Here are some relevant links to help me decide: