Vinay from the Internet Archive asked me, with reference to
http://meta.wikimedia.org/wiki/Help:Recent_changes :
Hi Sumana,
Is there someone I can contact regarding parsing out the URLs from the stream of recent
changes? The idea being to grab the text of the recent change and extract out anything
that looks like a URL and feed it into a queue at IA's end for archiving.
Looking at the Recent Changes feed, it looks like I'd need to parse the
'diff' page to find any new links, or in the case of 'new' pages, parse
the new page to find all external links. Is there a better way? A live feed that includes
the text that's changed for every article?
Thanks,
Vinay
Vinay, #mediawiki on Freenode IRC, and possibly also the mediawiki-api
mailing list, will be helpful to you.
Thanks all.
--
Sumana Harihareswara
Engineering Community Manager
Wikimedia Foundation