WikiSummary, the Social Science Summary Database

User:Adam/Creating MediaWiki bots in PHP

From WikiSummary, the Free Social Science Summary Database

 

So, you're wanting to develop a bot to automate some annoying task on a MediaWiki site. You know that there are tons of user-developed bots on sites like Wikipedia, but you just can't find a good description of how to do it--at least not one for PHP coders like you.

Well, you've come to the right place. Not only will I explain how to do it, but I'll even give you a complete PHP class called BasicBot that does all the hard work for you.

Before we get started, though, you need to make sure you have done your homework. If you haven't already, please read Wikipedia's page on creating a bot. You'll find the page to be vague, especially the PHP section. But it has some important cautions and whatnot.

What you Need

My template will work only in PHP 5 unless you also use this user-contributed patch.

Permission

Please don't deploy a bot on any wiki without permission. Contact the wiki's admins to tell them about the bot you want to create and ask them to grant you an account with "bot" rights.

Snoopy

To write a bot using PHP, you'll need to be able to write PHP scripts that can receive HTTP cookies from your wiki and submit forms. This is very easy to do if you use Snoopy. Snoopy is just a class of functions written for PHP by some helpful strangers. You don't need to do anything with it other than download Snoopy and place Snoopy.class.php in the directory where you want to put the bot we're going to make.

BasicBot.php

Next, you need to download BasicBot, my PHP MediaWiki bot template. It comes as a zipped archive. Unzip it and put BasicBot.php into the same directory where you put Snoopy. The archive also contains a couple other files; we'll worry about those later. You can put them into that directory with Snoopy too, if you want.

Open up BasicBot.php and look near the top. You'll find a few settings you need to change. Put in your site's URL, a username and password that your bot should use, and so on. If you put BasicBot.php into a different directory than Snoopy.class.php, you'll need to change the require_once line right after the settings.

Now, unless you're the type that likes to look through code and figure out how it works, close BasicBot.php. You won't be editing it. Trust me. By not editing the file, you make it very easy to develop multiple bots using this one file as the backend. Each bot will need only a very small PHP file that begins by including BasicBot.php and then adds just a few lines of code. We'll walk through that in a second. But first...

Bot Security

Snoopy and BasicBot.php should be in a very secure place. Anybody who can run your file can cause your bot to run. I don't care how you make it secure, but please do so. Maybe you could put it in a folder that requires HTTP authentication or something.

Creating Your Bot

Create a new PHP file in the same directory as Snoopy and BasicBot.php. Let's call it MyBot.php for now (but you can call it whatever you want). Place the following code in MyBot.php:

<?php
// Set these if you want to override BasicBot.php
define('USERID','1'); // get this from your preferences page
define('USERNAME','MyBot');
define('PASSWORD','password');); // plain text, no md5 or anything

require_once('BasicBot.php');

$myBot = new BasicBot();

Astute observers will notice that we've defined the login credentials twice, once in MyBot.php and once in BasicBot.php. This is optional. If you use BasicBot.php as the backend for more than one bot, then doing it this way lets individual bots have separate usernames, while BasicBot.php contains default credentials just in case you forget to define a unique username for a particular bot.

So we've just created $myBot, an instance of class BasicBot, which is defined in BasicBot.php. Now what? Well, that depends on what you want to do. Before we go into that, this would be a good time to explain what's in BasicBot.php, wouldn't it?

Methods in BasicBot.php

BasicBot--the class defined by BasicBot.php--contains several handy methods for what you might want to do.

Logging your Bot in

Suppose you want to log your bot in and store its cookies so that you can use them later to edit a page or something. Easy. The wikiConnect method will log you in and store the cookies, which will then be sent automatically next time you do something. Cookies are cached for one hour. wikiConnect returns TRUE if you're logged in, FALSE otherwise. Example:

$myBot->wikiConnect();

Editing a Page Automatically

Editing a page with your bot is simple. You'll use the wikiFilter method (which calls the wikiConnect method for you, so you don't need to worry about that). wikiFilter will grab the current contents of a page, pass the content through a callback function that you specify, and then edit the page using the callback's results.

For example, suppose you wanted to slap a {{Needs_Wikification}} template on an article if it doesn't have basic formatting in it. Suppose the article in question was called Project:Sandbox. You would define a callback function (we'll call it "checkWikify"), then use wikiFilter like this:

function checkWikify($content){
   $args = array( 'template'=>'{{Template:Needs_Wikification}}' );
   if (isWikified($content))
     return $content;
   else
     return
addTemplate( $content, $args );
}


$wikifyBot = new BasicBot();
// Our settings
$source = 'Project:Sandbox';
$callback = 'checkWikify';
$editSummary = 'Add notice: Needs wikification';

// The action
$wikifyBot->wikiFilter($source,$callback,$editSummary);

Note that the callback makes use of addTemplate(), one of the convenience functions defined at the end of BasicBot.php.

Easy enough so far, right? But who needs a bot if you're just going to edit a single page with it? Well, read on.

Editing Several Articles

If you want to edit several articles, you need to tell BasicBot which articles you want edited. There are a few ways to do this.

  • Harvest all the internal links from a regular page
  • Harvest all the links from a special page
  • Provide an array of links to loop through

BasicBot can do that. Look in BasicBot.php, and you'll see methods called wikiFilterAll (for harvesting from a regular page), SpecialFilterAll (for harvesting from a special page), CategoryFilterAll (for harvesting all links in a category), FilterRecentChanges (for automating recent changes patrols), and ArrayFilterAll (for looping through for an array of links that you provide). As an example, suppose you wanted to harvest links from Special:Deadendlinks. Articles without links in them might lack any wikification at all, right? So let's harvest all the dead end links, then run them through our "CheckWikify" filter. There would be only a couple changes from the preceding block of code:

$wikifyBot = new BasicBot();
$source = 'Special:Deadendlinks';
$callback = 'checkWikify';
$editSummary = 'Add notice: Needs wikification';
$wikifyBot->SpecialFilterAll($source,$callback,$editSummary);

These "FilterAll" methods are easy to use. They do not require cron, though you could certainly use it. They'll just move through all the articles linked from $source and run each one through your callback. They'll wait a few seconds between each request from the wiki. You can set the delay at the top of BasicBot.php, or you can just add a fifth argument to the method call. This example changes the delay to 30 seconds between edits (note the empty fourth argument):

$wikifyBot->SpecialFilterAll($source,$callback,$editSummary,"","30");

You might be wondering what the fourth argument is for. Good question. There are examples at the bottom of BasicBot.php that explain all that.

Your Very Own Bot

I've almost made it too easy, haven't I? In fact, bundled with BasicBot.php you'll find WikifyBot.php, which is basically the bot we've just constructed with some additional commenting. Now, all you need to do is define whatever callbacks you want, and you're set. For additional examples and tips, check out the information at the end of BasicBot.php. And take a look through the methods in the BasicBot class and you should find much of the work done for you.

So if you haven't already, download BasicBot.zip and you're in business.

Enjoy.