User:BrownHairedGirl/Election links cleanup

This page describes an ongoing series of edits made by User:BrownHairedGirl, as a cleanup exercise. This follows on from an RFC in late 2018 which changed the WP:NC-GAL convention for election and referendum names from "Foo election, YYYY" to "YYYY Foo election".

This brings the election/referendum naming format in line with the convention for other topics: WP:NCEVENTS. I weakly opposed the change (largely because of the disruption it would cause), but I accept the clear consensus to proceed with the renaming. These edits help to implement that consensus.

This is a one-off change, which will:

  • make it easier for editors to maintain the links in future
  • in many cases, make wikicode more readable
  • assist other tasks which process these pages

After a lot of experimenting, I found a way of doing this which allows nearly all the "Foo election, YYYY" links on any given page to be changed in a single edit. This means much less impact on watchlists than doing each type of election as a separate series of edits.

Summary edit

These edits are performed using WP:AutoWikiBrowser (AWB) with a custom module. (See below: #Custom module).

They replace each wikilink of the form Foo election, YYYY or Foo election YYYY with one of the form YYYY Foo election, with some exceptions.

The edit summary displays the links changed, insofar as AWB's short limit on edit summaries allows (see https://phabricator.wikimedia.org/T199347). This allows tracking and fixing of the v small minority of cases where a bluelink is replaced by a redlink.

Purposes edit

This run of edits has three primary purposes:

  1. To fix the use in running text of [[Foo election, YYYY]]. It is much more readable to have [[YYYY Foo election]].
  2. To fix the now-pointless redirects of [[Foo election, YYYY|YYYY Foo election]]. The wikicode is much more readable as [[YYYY Foo election]].
  3. To fix the broken links caused by changes in naming format. This is complex, but surprisingly widespread, so I'll try to explain it without to much verbosity by giving two examples of the permutations I have encountered which raise issues requiring standardisation:
    By-elections
    General elections usually involve many many links to a single title. In the case of Ireland, there have been 32 general elections to Dáil Éireann since 1918, but 131 by-elections to the Dáil. In the UK, there have been 56 general elections since the UK was established in 1801, but 4,167 by-elections.
    It's relatively easy to use redirects to cover most permutations of general election title: a dozen redirects in each case covers over 99%.
    However, doing that with a large set of target articles gets very problematic. For example a biographical article may contain a long-standing link to "ThisTown by-election, 1927" ... but if the by-election article is now created, it should be at "1927 ThisTown by-election", and all the redlinks will remain red. Alternatively, an editor may encounter the redlink in the biog and mistakenly create the page at the old-style "ThisTown by-election, 1927".
    With UK by-elections, there is further complication in that the place name may have variations: e.g. Midlothian used to be known for some purposes as Edinburghshire, and there are variants such "Western CountyName"/"West CountyName".
    So canonicalsiing the year format significantly reduces the chance that a redlink will remain red after article creation, by removing the major variant in naming format.
    Re-named series
    The development of naming conventions has often led to several changes in naming practice for article. For example:
    • Editors start creating articles on the local elections to FooBar Council, using the format "FooBar Council election, YYYY". Redlinks are created as appropriate, both from lists of elections and from other articles such as biogs, timelines etc.
    • Other editors conclude that greater specificity is needed, so they rename the articles to "FooBar Borough Council election, YYYY". Redirects are of course automatically created from the old titles .... but that leaves redlinks to the articles which did not exist.
    • Then the WP:NC-GAL renaming happens, and the articles are renamed to "YYYY FooBar Borough Council election". So now we have three naming formats to contend with, giving permutations:
      1. "YYYY FooBar Borough Council election" (the new canonical name)
      2. "FooBar Borough Council election, YYYY"
      3. "FooBar Council election, YYYY"
      4. "YYYY FooBar Council election"
    In some cases, there are even more permutations, e.g. the article currently named 1986 Southwark London Borough Council election could also be titled as "1986 Southwark Council election", "1986 Southwark Borough Council election", "1986 Southwark London Borough Council election", "1986 London Borough of Southwark Council election", etc. Allowing for the possibility of years at the end instead of the beginning doubles the number of variants, which means more redlinks; and in practice it quadruples the number of variants, because the links may be written with or without a comma, e.g. "Southwark Council election, 1986" or "1986 Southwark Council election 1986". It's a trivial matter for AWB to pick up both variants and standardise them.

When I started on this, I was initially doing a very restricted set of use cases: e.g. only elections to to the European Parliament. But the more examples I encountered, the more I realised that there was no advantage in doing only a sub-set, when each edit could resolve a much wider set of issues in one pass.

So the effect of what I am doing is to fix a set of redirects, some of which may be broken, but where identifying only the broken ones is massively more work than just standardising the lot. AWB just handles text patterns, and can't identify whether a link is red, so unless someone wants to handcode a whole bot which does squillions of system calls to identify only redlinks, this is the neatest way of doing it.

There are some changes (example) of the form [[Foo election, YYYY|alias]] to [[YYYY Foo election|alias]]. This is a mild violation of WP:NOTBROKEN but harmless, and it's quicker to action the change mechanically (albeit unnecessarily) than to spend time calculating whether it would be redundant.

Custom module edit

Edits are done using Wikipedia:AutoWikiBrowser (AWB) with a custom module (see WP:AutoWikiBrowser/Custom Modules) which generates a custom edit summary. The code of my module is at User:BrownHairedGirl/Election links cleanup/AWB custom module.

The design goals of the module were to:

  1. On any page, replace each wikilink of the form Foo election, YYYY or Foo election YYYY with a link of the form YYYY Foo election
  2. To be entirely rules-based, knowing nothing about any election.
  3. Display each variant in the edit summary as an actual link, to allow checking for any bluelinks turned red
  4. Skip cases where moving the year to the start of the title would be wrong.
    e.g. Candidates in the Foo election, YYYY should not be changed to YYYY Candidates in the Foo election.

Most of this has been achieved.

  1. The wikilink replacement works reliably and accurately
    • It also handles dates with a month, i.e. replace each wikilink of the form Foo election, Monthname YYYY or Foo election Monthname YYYY with a link of the form Monthname YYYY Foo election
    • It does not handle date ranges Foo election, June–July 1907 or Foo election, 1286–1287
  2. No special variants have been needed for any type of election, but it handles only those links which end in "election, YYYY" or "election, Monthname YYYY" (with or without the comma). It does not handle links to "Foo election, YYYY in Place". So e.g. it will ignore United States presidential election, 2012 in Texas and will not convert it to 2012 United States presidential election in Texas
  3. The edit summary displays the links changed, insofar as AWB's short limit on edit summaries allows (see https://phabricator.wikimedia.org/T199347). This allows easy tracking and fixing of the v small minority of cases where a bluelink is replaced by a redlink. Just look at my contribs, and look for redlinks
  4. The skip cases list has been developed by monitoring for unintended changes, and adding them to the list. The current list excludes links containing the following phases: (Boundary|list|in the|at the|elected|returned|results?|candidates?|selection|selected|polls?|polling|opinion|debates?)