Sign up
internetarchive
/
internetarchivebot
iabot.toolforge.org
AGPL-3.0 License
24 stars
14 forks
Star
Notifications
master
Go to file
cyberpower678
2 days ago
README.md
Contact
Email: mark@archive.org
What is InternetArchiveBot
InternetArchiveBot is a powerful PHP, framework independent, OAuth bot designed primarily for use for WMF Wikis, per the request of the global communities, by Cyberpower678. It is a global bot that uses wiki-specific functions in an abstract class to run on different wikis with different rules. For maximum flexibility, it features on and off site configuration values that can be altered to suit the operator, and/or the wiki community. Its function is to address many aspects of linkrot. For large sites, it can be set to multi-thread with a specified number of workers to get the job done faster. Each worker analyzes its own page, and reports back to the master with the statistics afterwards.
How it works
IABot has a suite of functions it can do when it analyzes a page. Since the aim is to address link rot as completely as possible, analyzes links in many ways by:
IABot's functions are in several different classes, based on the functions they do. Communication-related functions and wiki configuration values, are stored in the API class. DB related functions in the DB class, miscellaneous core functions in a static Core class, dead link checking functions in a CheckIfDead class, thread engine in Thread class, and the global and wiki-specific parsing functions in an abstract Parser class. While all but the last functions can run uniformly on all wikis, the Parser class requires a class extension due to its abstract nature. The class extensions contain the functions that allow the bot to operate properly on a given wiki, with its given rules. When the bot starts up, it will attempt to load the proper extension of the Parser class and initialize that as its parsing class.
Installation and Requirements
IABot requires the following to run:
Using Docker
Using Docker is the quickest and easiest way to install InternetArchiveBot. If you expect to run the bot on a multitude of wikis, it may be better to break up the install to a dedicated execution VM and a dedicated MariaDB VM.
Docker automatically provides IABot with the needed PHP and MariaDB environment, but does not come with TOR support. All that is needed is a composer.phar file to install the dependencies.
  1. Clone this repo to your desired directory.
  2. cd into the root folder of your clone.
  3. Run docker-compose up, it will take a few minutes for the containers to come up for the first time.
  4. Install composer and run a composer install. (Composer requires PHP to run)
  5. Navigate to the 'app/src/' folder and rename deadlink.config.docker.inc.php to deadlink.config.local.inc.php
  6. Define your configuration values, leaving the preconfiguged values alone.
  7. Goto http://localhost:8080/ to complete bot setup.
  8. When the bot is set up, you can execute the bot from within docker's command line, by running php deadlink.php
Manual install
Manually installing offers more flexibility, but is more complicated to set up. This is the recommended method when deploying to a large wikifarm.
  1. Decide on whether or not to run the DB on a separate host.
  2. Install PHP with required extensions. You can run php -m to check for installed modules, and php -v to check it's version.
  3. You may optionally install a tor package from your host's package manager. Tor will work right out of the box, if installed, and shouldn't require any further setup.
  4. Install your DB server (MariaDB is recommended) on your desired host.
  5. Install your webserver on your host to run IABot.
  6. Clone this repo. For easiest setup, if your webserver loads content from /var/www/html, you can copy the contents of the repo to /var/www. Refer to step 7 and 8 if you opt to not go this route, else skip to 9.
  7. If you opt not to go this route, you may symlink, or move, the html folder of the this repor to the html folder of the webserver.
  8. Create setpath.php file in the html folder with <?php $path='/path/to/src/folder/';
  9. cd into the root folder of your clone.
  10. Install composer and run a composer install.
  11. Navigate to the 'app/src/' folder and copy deadlink.config.inc.php to deadlink.config.local.inc.php
  12. Define your configuration values. If you did steps 8 and 9, you need to define $publicHTMLPath as the relative path, relative to the location of the config file, to the html folder of the webserver. Otherwise, you can just leave it as is.
  13. Open a webbrowser to your web server to complete bot setup.
  14. When the bot is set up, you can execute the bot by running php deadlink.php
Docker and xDebug
The Docker image is preloaded with xDebug. It is recommended to use PHPStorm when developing, or debugging, InternetArchiveBot. PHPStorm comes with Docker support, as well as VCS management, Composer support, and a xDebug support.
Configuration
As of v2.0, the on wiki pages for configuring IABot are no longer used. The bot instead is configured with the IABot Management Interface. All global keywords are still used.
If you are running InternetArchiveBot yourself, you can configure it via the on wiki config page and by creating a new deadlink.config.local.inc.php file in the same directory. If someone else is running InternetArchiveBot and you just need to configure it for a particular wiki, you can set up a subpage of the bot's userpage called "Dead-links.js" and configure it there. For example, https://en.wikipedia.org/wiki/User:InternetArchiveBot/Dead-links.js​. The configuration values are explained below:
Magic Word Globals
These magic words are available when mentioned in the respective configuration options above.
Releases
No releases published
Packages
No packages published
Contributors 17
+ 6 contributors
Languages
© 2021 GitHub, Inc.
Terms
Privacy
Security
Status
Docs
Contact GitHubPricingAPITrainingBlogAbout
CodeCodePull requestsPull requests3ActionsActionsProjectsProjectsWikiWikiSecuritySecurityInsightsInsights Code Pull requests Actions Projects Wiki Security Insights