Edits made after 2018-12-04 are by version 2.5
- Changes to URLs are checked against the remote site to ensure they are working
- Real-time link checks, no link database. However, links are checked over a 24 hour period before final upload of diff.
- Supports many APIs including the Internet Archive, Memento, WebCite and "Timemap" APIs at individual services
- Multiple HTTP header status code checks at the application (WaybackMedic) layer
- Additional time-out and retries built-in to the web transfer libraries.
- Additional operating-procedure level checks against network and other errors – bot is semi-supervised in known trouble areas.
- Multiple redundant checks of the APIs using multiple dates to ensure a page really is unavailable
- Accepts API results but then verifies by looking at page headers and/or contents
- The bot is primarily written in Nim (compiles to C source) with support utilities in Awk. Libraries were custom made including a string primitives library for regex, a wiki template parsing library, OAuth library (in awk), a MediaWiki API interface library, a soft404 detector.
- Due to the nature of the task, running the bot includes a fair amount of supervisory overhead so it requires operator training, though the steps are documented in the source package.
About every 2–3 months, the bot creates a new batch of articles to process, about 50,000 to 100,000, taking about 1–2 weeks to complete, then takes a break before the next batch 2–3 months later. Typically it follows behind IABot editing the same articles IABot did during that 2–3 month period. This is because WaybackMedic started life as a bug fixer for IABot, a task it can still perform as needed. Also because WaybackMedic does not have a dead link checker so it relies on IABot to tag links dead so it knows which ones might be saved.
Last edited on 26 February 2021, at 15:11
Content is available under CC BY-SA 3.0
unless otherwise noted.