Sign up
AGPL-3.0 License
30 stars
13 forks
Go to file
2 days ago
InternetArchiveBot (IABot)
A Wikipedia bot that fights linkrot.
What is InternetArchiveBot
IABot is a powerful PHP, framework independent, OAuth bot designed primarily for use for Wikimedia Foundation wikis, per the request of the global communities, by Cyberpower678. It is a global bot that uses wiki-specific functions in an abstract class to run on different wikis with different rules. For maximum flexibility, it features on and off site configuration values that can be altered to suit the operator, and/or the wiki community. Its function is to address many aspects of linkrot. For large sites, it can be set to multi-thread with a specified number of workers to get the job done faster. Each worker analyzes its own page, and reports back to the master with the statistics afterwards.
How it works
IABot has a suite of functions it can do when it analyzes a page. Since the aim is to address link rot as completely as possible, it analyzes links in many ways by:
IABot's functions are in several different classes, based on the functions they do. Communication-related functions and wiki configuration values, are stored in the API class. DB related functions in the DB class, miscellaneous core functions in a static Core class, dead link checking functions in a CheckIfDead class, thread engine in Thread class, and the global and wiki-specific parsing functions in an abstract Parser class. While all but the last functions can run uniformly on all wikis, the Parser class requires a class extension due to its abstract nature. The class extensions contain the functions that allow the bot to operate properly on a given wiki, with its given rules. When the bot starts up, it will attempt to load the proper extension of the Parser class and initialize that as its parsing class.
Using Docker
Using Docker is the quickest and easiest way to install InternetArchiveBot. If you expect to run the bot on a multitude of wikis, it may be better to break up the install to a dedicated execution VM and a dedicated MariaDB VM.
Docker automatically provides IABot with the needed PHP and MariaDB environment, but does not come with Tor support.
The Docker image is preloaded with xDebug. It is recommended to use PHPStorm when developing, or debugging, InternetArchiveBot. PHPStorm comes with Docker support, as well as VCS management, Composer support, and xDebug support.
Manual install
Manually installing offers more flexibility, but is more complicated to set up. This is the recommended method when deploying to a large wikifarm. IABot requires the following to run:
  1. Decide on whether or not to run the DB on a separate host
  2. Install PHP with required extensions. You can run php -m to check for installed modules, and php -v to check its version.
  3. You may optionally install a Tor package from your host's package manager. Tor will work right out of the box, if installed, and shouldn't require any further setup.
  4. Install your database server on your desired host
  5. Install your webserver on your host to run IABot
  6. Clone this repo. For easiest setup, if your webserver loads content from /var/www/html, you can copy the contents of the repo to /var/www.
  7. If you opt not to go this route, you may symlink, or move, the html folder of the this repo to the html folder of the webserver.
  8. Create a file html/setpath.php with <?php $path='/path/to/src/folder/';
  9. Run composer install
  10. Copy app/src/ to app/src/
  11. Define your configuration values. If you did steps 8 and 9, you need to define $publicHTMLPath as the relative path, relative to the location of the config file, to the html folder of the webserver. Otherwise, you can just leave it as is.
  12. Open a browser to your webserver to complete bot setup
  13. When the bot is set up, you can execute the bot by running php deadlink.php
First-time setup
The bot should now be ready to run 🎉
In case you can't import the first-time.sql database or prefer to perform a manual setup, do the following:
Debug run
As of v2.0, the values on wiki pages for configuring IABot are no longer used. The bot instead is configured with the IABot Management Interface. All global keywords are still used.
If you are running InternetArchiveBot yourself, you can configure it via the on wiki config page and by creating a new file in the same directory. If someone else is running InternetArchiveBot and you just need to configure it for a particular wiki, you can set up a subpage of the bot's userpage called Dead-links.js and configure it there. For example,​. The configuration values are explained below:
Magic Word Globals
These magic words are available when mentioned in the respective configuration options above.
No releases published
No packages published
Contributors 18
+ 7 contributors
© 2021 GitHub, Inc.
Contact GitHubPricingAPITrainingBlogAbout
CodeCodePull requestsPull requests2ActionsActionsProjectsProjectsWikiWikiSecuritySecurityInsightsInsights Code Pull requests Actions Projects Wiki Security Insights