User:Monkbot/task 11: CS1 multiple authors/editors fixes

Task 11 trolls through Category:CS1 maint: Multiple names: authors list and Category:CS1 maint: Multiple names: editors list to replace singular author and editor parameters that hold multiple names with a parameter for each name or with the Vancouver system parameters when appropriate.

description edit

Module:Citation/CS1 adds pages to Category:CS1 maint: Multiple names: authors list and Category:CS1 maint: Multiple names: editors list when an author or editor parameter value has more than one separator character. Separator characters are commas and semicolons. The test isn't perfect and may 'catch' generational suffixes, html entities, etc. These false positives are relatively rare.

Multiple names in a singular parameter causes Module:Citation/CS1 to produce malformed metadata. The solution to this is not to simply convert |author= to |authors= because |authors= does not contribute to the citation's metadata. There are too many possible ways to write author name lists for the module to attempt to parse the name-list into meaningful metadata.

The same diversity of author/editor name-list formats constrains what task 11 can accomplish. Task 11 seeks out a few commonly used name-list formats and attempts to rewrite them using more appropriate parameters.

supported name-list formats edit

There are a few name-list formats that editors commonly use. In no particular order, these are:

semicolon separated name-lists

These name-lists take the form: |author=name; name; name;.... The semicolon separator makes it relatively easy to create |author1=name |author2=name parameters from the original source.

comma separated name-lists

There are two forms of this type

the form |author=first last, first last, .... The comma separators make it relatively easy to create |author1=first last |author2=first last parameters from the original source.
(disabled) the form |author=last, first, last, first, .... As long as there is an even number of comma separators, creating |author1=last, first |author2=last, first from the original source is mostly straightforward. This form is not supported by the bot because it is too susceptible to misinterpretation.

Both of these formats are susceptible to editor inconsistencies – primarily switching from one name format to another within the same source parameter, for example: |author=last, first, first last, first last, .... Task 11 attempts to skip mixed format parameters.

Vancouver style

Because the Vancouver style imposes a consistent format: |author=last I, last I, last I,... it is relatively easy to create |vauthors=last I, last I, last I,... from the original source.

name and name

Very common, this form is not detected by Module:Citation/CS1 but is equally inappropriate so when possible, task 11 fixes this form.

avoiding errors edit

Task 11 takes some steps to reduce improper edits but can't avoid them entirely:

sometimes editors include affiliations in author parameters. These can be interpreted as author names. GIGO
the word 'and' in |author=National Aeronautics and Space Administration becomes |author1=National Aeronautics |author2=Space Administration (this particular error is avoided; see below)

Task 11 avoids:

author and editor parameter values that contain digits
names with zero or more than three spaces in comma separated lists (|author=Bono, Leonard Bernstein is avoided because it looks like the name is 'Leonard Bernstein Bono')
templates that have enumerated author and editor parameters
templates that have certain words in the author parameter: journal, national, university, etc which may be part of a longer name that contains the important word 'and'

errors that are not avoided edit

The conversion process for Vancouver style name-lists does not ensure that these converted name-lists conform completely to the Vancouver style. That is not the purpose of task 11. When the result of an |author= → |vauthors= conversion is malformed, Module:Citation/CS1 will add the article to Category:CS1 errors: Vancouver style from which the errors can be corrected.

ancillary tasks edit

Task 11 does some housekeeping:

removes empty |author=, |authors=, |last=, |first= |author-link=, |author-mask= in their singular and enumerated forms
removes empty |editor=, |editors=, |editor-last=, |editor-first= |editor-link=, |editor-mask= in their singular and enumerated forms
removes empty |display-authors= and |display-editors= because these parameters are related to the author and editor parameters
removes empty |others= because this parameter is vaguely related to the author parameters
removes empty |coauthor= and |coauthors= because these are deprecated
removes extraneous editor annotation from editor parameter values (redundant to the static text supplied by the templates)
removes some pre and post nominals from author and editor names (Dr and PH.D., for example)
replaces some html entities in author names with their unicode equivalents because html entities end with a semicolon which can cause Module:Citation/CS1 to add the article to the category

If these are the only changes to be made to an article, the edit is abandoned.

script edit

// this script attempts to fix multiple author / editor names in author / editor parameters.
//
// Category:CS1 maint: Multiple names: authors list (1 C, 113,709 P, 45 F on 2016-05-12)
// Category:CS1 maint: Multiple names: editors list (1 C, 8,297 P on 2016-05-12)

//
// Things to watch out for:
//	name suffixes: PhD, LL.D, D.D., KBE and other post-nominals
//		generation suffixes: Jr, II, III, ...
//	name prefixes: Dr

// 'and' and ampersand (&) separators


//---------------------------< F I L E   S C O P E   V A R I A B L E S >--------------------------------------

string IS_VNAME = @"[\p{L}'\-\s]+\s+[\p{Lu}\-]{1,3}\b";			// allow hyphens in the initials to keep otherwise correct vanc from becoming |authorn=
string IS_VNAME_1 =  @"^[\p{L}'\-\s]+?\s+[\p{Lu}\-]{1,3}(?:\s+Jr)?$";		// same, slightly more strict but allows 'Name I Jr
string IS_COMMA_SEP = @"(?:,\s*\band\b|,\s*&|\band\b|&|,)";


//---------------------------< M A I N >----------------------------------------------------------------------


public string ProcessArticle(string ArticleText, string ArticleTitle, int wikiNamespace, out string Summary, out bool Skip)
	{
	Skip = true;
	Summary = "cs1|2 maint: multiple [[Category:CS1 maint: Multiple names: authors list|authors]]/[[Category:CS1 maint: Multiple names: editors list|editors]] fixes;";
	string pattern;		// local variable to hold regex pattern for reuse
	
	string IS_CS1 = @"(?:[Cc]ite[_ ]*(?=(?:(?:AV|av) [Mm]edia(?: notes)?)|article|ar[Xx]iv|blog|book|conference|dictionary|document|(?:DVD|dvd)(?: notes)?|encyclopa?edia|episode|interview|journal|letter|magazine|mailing ?list|manual|map|(?:news(?!group|paper))|paper|podcast|press ?release|report|serial|sign|speech|techreport|thesis|tweet|video|web)|[Cc]itation|[Cc]itar\s+web|Ouvrage|[Cc]ite(?=\s*\|))";

	string IS_AUTHOR = @"(?:author1?|last1?)";
	string IS_EDITOR = @"(?:editor1?|editor1?\-?last1?)";

	string IS_ENUM_AUTHOR = @"(?:author|last)[2-9]\d*";
	string IS_ENUM_EDITOR = @"(?:editor[2-9]\d*\-?last|editor\-?last[2-9]\d*|editor[2-9]\d*)";

	string IS_NAME = @"[\p{L}'\-\s,\.]+";
	string IS_NAME_NO_COMMA = @"[\p{L}'\-\s\.]+";
	string IS_NAME_COMMA = @"[\p{L}'\-\s\.,]+[\p{L}'\-\s\.]+";
	string IS_POST_NOMINAL = @"\s*,?\s*[Pp][Hh]\.?[DdMm]\.?";	// Ph.D, Ph.M
	string IS_PRE_NOMINAL = @"\s*(?:\bDr\b\.?|\bSir\b)";
	string IS_ED_ANNOTATE = @"(?:\([Ee]ditors?\)|[Ee]ditors?|\([Ee]ds?\.?\)|\b[Ee]ds?\.|\b[Ee]ds?\b)";
	
// these words and symbols when found in |author= shall cause the template to be skipped because they are commonly associated with 'and' or maybe
// an indication of affiliation: 'Author, University of Someplace', or just don't belong.  If not skipped, the result might be a broken valid
// |author= parameter in to two or more |author= parameters.  This regex is case insensitive.
	string IS_SKIP_WORDS = @"(?i)(?:©|Academy|administration|Agence France[\-\s]Presse|agenc(?:y|ies)|association|Associated Press|auditor|bbc|bishop|bureau|citing|CNN|commission|committee|conservancy|consortium|correspondent|council|cyclone|dept|department|directorate|division|economist|forecasters|global|institute|journal|laboratory|Los Angeles Times|meteorolog(?:ical|y)|MLB\.com|military|museum|national|naval|news|oceanographic|office|producer|projects?|research|Reuters|service|society|special|subgroup|technologies|Telegraph|university|US Army|USA Today|weather)";
	
	bool	changes_made = false;									// set to true when multiple author fixes applied

//---------------------------< H I D E >----------------------------------------------------------------------
// HIDE TEMPLATES: find templates that are not CS1; replace the opening {{ with __0P3N__ and the closing }} with __CL0S3__

	pattern = @"\{\{(?!\s*" + IS_CS1 + @")([^\{\}]*)\}\}";
	while (Regex.Match (ArticleText, pattern).Success)
		{
		ArticleText = Regex.Replace(ArticleText, pattern, "__0P3N__$1__CL0S3__");
		}

// HIDE single curly braces { and } with __L3F7__ and __R16H7__
	pattern = @"([^\{])\{([^\{])";
	while (Regex.Match (ArticleText, pattern).Success)
		{
		ArticleText = Regex.Replace(ArticleText, pattern, "$1__L3F7__$2");
		}

	pattern = @"([^\}])\}([^\}])";
	while (Regex.Match (ArticleText, pattern).Success)
		{
		ArticleText = Regex.Replace(ArticleText, pattern, "$1__R16H7__$2");
		}

// HIDE complex wikilinks: [[article title|label]] to __WL1NK_O__article title__P1P3__label__WL1NK_C__
	ArticleText = Regex.Replace(ArticleText, @"\[\[([^\|\]]+)\|([^\]]+)\]\]", "__WL1NK_O__$1__P1P3__$2__WL1NK_C__");

// HIDE simple wikilinks: [[article title]] to __WL1NK_O__article title__WL1NK_C__
	ArticleText = Regex.Replace(ArticleText, @"\[\[([^\]]+)\]\]", "__WL1NK_O__$1__WL1NK_C__");


// Hide semicolons in html comments: <!--Staff writer(s); no by-line.-->
	pattern = @"(\<\!\-\-[^;\>]*);";
	while (Regex.Match (ArticleText, pattern).Success)
		{
		ArticleText = Regex.Replace(ArticleText, pattern, "$1__S3M1C0L0N__");
		}

// Hide Jr generational suffixes
	pattern = @",\s*([SJ])r\.";
	while (Regex.Match (ArticleText, pattern).Success)
		{
		ArticleText = Regex.Replace(ArticleText, pattern, "COMMA$1RDOT");	// no numbers or underscores so that we don't have to have special IS_NAME rules
		}
	
	pattern = @",\s*([SJ])r\b";
	while (Regex.Match (ArticleText, pattern).Success)
		{
		ArticleText = Regex.Replace(ArticleText, pattern, "COMMA$1R");		// no numbers or underscores so that we don't have to have special IS_NAME rules
		}
	
// hide cs1|2 templates that have author parameters with 'editor' annotation because |author=name (editor) is nonsensical and ambiguous
	pattern = @"\{\{(\s*" + IS_CS1 + @"[^\}]*\|\s*" + IS_AUTHOR + @"\d*\s*=[^\|\}]*?\s*,?\s*" + IS_ED_ANNOTATE + @")";
	while (Regex.Match (ArticleText, pattern).Success)
		{
		ArticleText = Regex.Replace(ArticleText, pattern, "__0P3N__$1");
		}

// hide cs1|2 templates that have enumerated author parameters (greater than 1) with assigned values
	pattern = @"\{\{(\s*" + IS_CS1 + @"[^\}]*\|\s*" + IS_ENUM_AUTHOR + @"\s*=\s*\w[^\}]*)\}\}";
	while (Regex.Match (ArticleText, pattern).Success)
		{
		ArticleText = Regex.Replace(ArticleText, pattern, "__0P3N__$1__CL0S3__");
		}
	
// hide cs1|2 templates that have enumerated editor parameters (greater than 1) with assigned values
	pattern = @"\{\{(\s*" + IS_CS1 + @"[^\}]*\|\s*" + IS_ENUM_EDITOR + @"\s*=\s*\w[^\}]*)\}\}";
	while (Regex.Match (ArticleText, pattern).Success)
		{
		ArticleText = Regex.Replace(ArticleText, pattern, "__0P3N__$1__CL0S3__");
		}

// hide cs1|2 templates that have numbers in the author parameter value
	pattern = @"\{\{(\s*" + IS_CS1 + @"[^\}]*\|\s*" + IS_AUTHOR + @"\s*=\s*[^\d\|\}]*\d[^\}]*)\}\}";
	while (Regex.Match (ArticleText, pattern).Success)
		{
		ArticleText = Regex.Replace(ArticleText, pattern, "__0P3N__$1__CL0S3__");
		}
	
// hide cs1|2 templates that contain any of several words.  This to prevent making multiple author parameters from a name that contains 'and'
	pattern = @"\{\{(\s*" + IS_CS1 + @"[^\}]*\|\s*" + IS_AUTHOR + @"\s*=\s*[^\|\}]*" + IS_SKIP_WORDS + @"[^\}]*)\}\}";
	while (Regex.Match (ArticleText, pattern).Success)
		{
		ArticleText = Regex.Replace(ArticleText, pattern, "__0P3N__$1__CL0S3__");
		}
	
//---------------------------< E M P T Y   P A R A M E T E R S >----------------------------------------------

// EMPTY DISPLAYAUTHORS: Remove empty |display-authors= and |displayauthors= parameters.
	ArticleText = Regex.Replace(ArticleText, @"(\{\{\s*" + IS_CS1 + @"[^\}]*)\|\s*display\-?authors\s*=\s*([\|\}])", "$1$2");
	
// EMPTY AUTHORn: Remove empty |authorn= parameters.
	while (Regex.Match (ArticleText, @"(\{\{\s*" + IS_CS1 + @"[^\}]*)\|\s*author\d*\s*=\s*([\|\}])").Success)
		{
		ArticleText = Regex.Replace(ArticleText, @"(\{\{\s*" + IS_CS1 + @"[^\}]*)\|\s*author\d*\s*=\s*([\|\}])", "$1$2");
		}
	
// EMPTY LASTn: Remove empty |lastn= parameters.
	while (Regex.Match (ArticleText, @"(\{\{\s*" + IS_CS1 + @"[^\}]*)\|\s*last\d*\s*=\s*([\|\}])").Success)
		{
		ArticleText = Regex.Replace(ArticleText, @"(\{\{\s*" + IS_CS1 + @"[^\}]*)\|\s*last\d*\s*=\s*([\|\}])", "$1$2");
		}

// EMPTY FIRSTn: Remove empty |firstn= parameters.
	while (Regex.Match (ArticleText, @"(\{\{\s*" + IS_CS1 + @"[^\}]*)\|\s*first\d*\s*=\s*([\|\}])").Success)
		{
		ArticleText = Regex.Replace(ArticleText, @"(\{\{\s*" + IS_CS1 + @"[^\}]*)\|\s*first\d*\s*=\s*([\|\}])", "$1$2");
		}

// EMPTY DISPLAYEDITORS: Remove empty |display-editors= and |displayeditors= parameters.
	ArticleText = Regex.Replace(ArticleText, @"(\{\{\s*" + IS_CS1 + @"[^\}]*)\|\s*display\-?editors\s*=\s*([\|\}])", "$1$2");
	
// EMPTY EDITORn: Remove empty |editorn= parameters.
	while (Regex.Match (ArticleText, @"(\{\{\s*" + IS_CS1 + @"[^\}]*)\|\s*editor\d*\s*=\s*([\|\}])").Success)
		{
		ArticleText = Regex.Replace(ArticleText, @"(\{\{\s*" + IS_CS1 + @"[^\}]*)\|\s*editor\d*\s*=\s*([\|\}])", "$1$2");
		}
	
// EMPTY EDITOR-LASTn: Remove empty |editor-lastn= or |editorn-last= parameters.
	while (Regex.Match (ArticleText, @"(\{\{\s*" + IS_CS1 + @"[^\}]*)\|\s*editor\d*-?last\d*\s*=\s*([\|\}])").Success)
		{
		ArticleText = Regex.Replace(ArticleText, @"(\{\{\s*" + IS_CS1 + @"[^\}]*)\|\s*editor\d*-?last\d*\s*=\s*([\|\}])", "$1$2");
		}

// EMPTY EDITOR-FIRSTn: Remove empty |editor-firstn= or |editorn-first=  parameters.
	while (Regex.Match (ArticleText, @"(\{\{\s*" + IS_CS1 + @"[^\}]*)\|\s*editor\d*-?first\d*\s*=\s*([\|\}])").Success)
		{
		ArticleText = Regex.Replace(ArticleText, @"(\{\{\s*" + IS_CS1 + @"[^\}]*)\|\s*editor\d*-?first\d*\s*=\s*([\|\}])", "$1$2");
		}

// since we're removing empty author/editor parameters, also remove empty author/editor link and mask parameters

// EMPTY EDITOR-LINKn: Remove empty |editor-linkn= or |editorn-link=  parameters.
	pattern = @"(\{\{\s*" + IS_CS1 + @"[^\}]*)\|\s*(?:author|editor)\d*-?(?:mask|link)\d*\s*=\s*([\|\}])";
	while (Regex.Match (ArticleText, pattern).Success)
		{
		ArticleText = Regex.Replace(ArticleText, pattern, "$1$2");
		}

// EMPTY EDITOR-LINKn: Remove empty |authors= or |editors=  parameters.
	pattern = @"(\{\{\s*" + IS_CS1 + @"[^\}]*)\|\s*(?:authors|editors)\s*=\s*([\|\}])";
	while (Regex.Match (ArticleText, pattern).Success)
		{
		ArticleText = Regex.Replace(ArticleText, pattern, "$1$2");
		}

// EMPTY OTHERS: Remove empty |others= parameters because vaguely related
	pattern = @"(\{\{\s*" + IS_CS1 + @"[^\}]*)\|\s*others\s*=\s*([\|\}])";
	while (Regex.Match (ArticleText, pattern).Success)
		{
		ArticleText = Regex.Replace(ArticleText, pattern, "$1$2");
		}

// EMPTY COAUTHOR: Remove empty |coauthor= or |coauthors=  parameters because deprecated
	pattern = @"(\{\{\s*" + IS_CS1 + @"[^\}]*)\|\s*coauthors?\s*=\s*([\|\}])";
	while (Regex.Match (ArticleText, pattern).Success)
		{
		ArticleText = Regex.Replace(ArticleText, pattern, "$1$2");
		}

//---------------------------< H I D E >----------------------------------------------------------------------
// to be done after removing empty parameters.  If there is a |firstn= parameter in any template we now know that
// it has an assigned value,  This could bugger up the works by mixing wrong |firstn= with |authorn=

// hide cs1|2 templates that contain any |firstn= parameters
	pattern = @"\{\{(\s*" + IS_CS1 + @"[^\}]*\|\s*first\d*[^\}]*)\}\}";
	while (Regex.Match (ArticleText, pattern).Success)
		{
		ArticleText = Regex.Replace(ArticleText, pattern, "__0P3N__$1__CL0S3__");
		}
	
// hide cs1|2 templates that contain any |editor-firstn= parameters
	pattern = @"\{\{(\s*" + IS_CS1 + @"[^\}]*\|\s*editor\d*\-first\d*[^\}]*)\}\}";
	while (Regex.Match (ArticleText, pattern).Success)
		{
		ArticleText = Regex.Replace(ArticleText, pattern, "__0P3N__$1__CL0S3__");
		}
	

//---------------------------< M I S C   C L E A N U P >------------------------------------------------------

// replace html entities 
// &nbsp; with space
	pattern = @"(\{\{\s*" + IS_CS1 + @"[^\}]*\|\s*" + IS_AUTHOR + @"\s*=\s*[^\|\}]*)\s*&nbsp;\s*([^\|\}]*)";	
	while (Regex.Match (ArticleText, pattern).Success)
		{
		ArticleText = Regex.Replace(ArticleText, pattern, "$1 $2");
		}

// &ntilde; with ñ (n with tilde)
	pattern = @"(\{\{\s*" + IS_CS1 + @"[^\}]*\|\s*" + IS_AUTHOR + @"\s*=\s*[^\|\}]*)\s*&ntilde;";	
	while (Regex.Match (ArticleText, pattern).Success)
		{
		ArticleText = Regex.Replace(ArticleText, pattern, "$1ñ");
		}

// &ouml; with ö (o with diaeresis)
	pattern = @"(\{\{\s*" + IS_CS1 + @"[^\}]*\|\s*" + IS_AUTHOR + @"\s*=\s*[^\|\}]*)\s*&ouml;";	
	while (Regex.Match (ArticleText, pattern).Success)
		{
		ArticleText = Regex.Replace(ArticleText, pattern, "$1ö");
		}

// &uuml; with ü (u with diaeresis)
	pattern = @"(\{\{\s*" + IS_CS1 + @"[^\}]*\|\s*" + IS_AUTHOR + @"\s*=\s*[^\|\}]*)\s*&uuml;";	
	while (Regex.Match (ArticleText, pattern).Success)
		{
		ArticleText = Regex.Replace(ArticleText, pattern, "$1ü");
		}


// &#39; with apostrophe
	pattern = @"(\{\{\s*" + IS_CS1 + @"[^\}]*\|\s*" + IS_AUTHOR + @"\s*=\s*[^\|\}]*)\s*&#39;\s*([^\|\}]*)";	
	while (Regex.Match (ArticleText, pattern).Success)
		{
		ArticleText = Regex.Replace(ArticleText, pattern, "$1'$2");
		}

// remove post nominals
// PhD, Ph.D., etc
	pattern = @"(\{\{\s*" + IS_CS1 + @"[^\}]*\|\s*" + IS_AUTHOR + @"\s*=\s*[^\|\}]*?)" + IS_POST_NOMINAL;	
	while (Regex.Match (ArticleText, pattern).Success)
		{
		ArticleText = Regex.Replace(ArticleText, pattern, "$1");
		}

	pattern = @"(\{\{\s*" + IS_CS1 + @"[^\}]*\|\s*" + IS_EDITOR + @"\s*=\s*[^\|\}]*?)" + IS_POST_NOMINAL;	
	while (Regex.Match (ArticleText, pattern).Success)
		{
		ArticleText = Regex.Replace(ArticleText, pattern, "$1");
		}

// remove pre-nominals
// Dr, Dr., etc
	pattern = @"(\{\{\s*" + IS_CS1 + @"[^\}]*\|\s*" + IS_AUTHOR + @"\s*=\s*[^\|\}]*?)" + IS_PRE_NOMINAL;	
	while (Regex.Match (ArticleText, pattern).Success)
		{
		ArticleText = Regex.Replace(ArticleText, pattern, "$1");
		}

	pattern = @"(\{\{\s*" + IS_CS1 + @"[^\}]*\|\s*" + IS_EDITOR + @"\s*=\s*[^\|\}]*?)" + IS_PRE_NOMINAL;	
	while (Regex.Match (ArticleText, pattern).Success)
		{
		ArticleText = Regex.Replace(ArticleText, pattern, "$1");
		}

// remove bold wikimarkup
	pattern = @"(\{\{\s*" + IS_CS1 + @"[^\}]*\|\s*" + IS_AUTHOR + @"\s*=\s*[^\|\}]*?)'''";	
	while (Regex.Match (ArticleText, pattern).Success)
		{
		ArticleText = Regex.Replace(ArticleText, pattern, "$1");
		}

	pattern = @"(\{\{\s*" + IS_CS1 + @"[^\}]*\|\s*" + IS_EDITOR + @"\s*=\s*[^\|\}]*?)'''";	
	while (Regex.Match (ArticleText, pattern).Success)
		{
		ArticleText = Regex.Replace(ArticleText, pattern, "$1");
		}


// remove trailing 'ed', 'ed.', '(ed)', '(ed.)', etc text from editorn parameters (redundant to static text provided by the templates)
	pattern = @"(\{\{\s*" + IS_CS1 + @"[^\}]*\|\s*" + IS_EDITOR + @"\d*\s*=[^\|\}]*?)\s*,?\s*" + IS_ED_ANNOTATE;
	while (Regex.Match (ArticleText, pattern).Success)
		{
		ArticleText = Regex.Replace(ArticleText, pattern, "$1");
		}

// hide author parameters that have parenthetical annotation which may indicate that the parameter value is misused
// this done after removal of editor annotation from editor parameters because that annotation might be parenthetical
	pattern = @"\{\{(\s*" + IS_CS1 + @"[^\}]*\|\s*" + IS_AUTHOR + @"\d*\s*=[^\|\}\(]*\([^\|\}\(]*\))";
	while (Regex.Match (ArticleText, pattern).Success)
		{
		ArticleText = Regex.Replace(ArticleText, pattern, "__0P3N__$1");
		}


//---------------------------< V A N C   S T Y L E >----------------------------------------------------------
//
// There is a weakness here.  The definition of IS_VNAME accepts all letters, even those that are not Latin letters.
// Generally, there are very few occurrences of this kind of name, far fewer than hyphenated or with generaltional
// suffixes.  No need to worry about it.
//

// authors
	pattern = @"(\{\{\s*" + IS_CS1 + @"[^\}]*)\|\s*" + IS_AUTHOR + @"\s*=\s*(" + IS_VNAME + @"[,;][^\|\}]*)";	// captures are prefix and author param value
	ArticleText = Regex.Replace(ArticleText, pattern,
		delegate(Match match)
			{
			bool changed = false;
			string ret_val = vancouver_style (@"|vauthors=", match.Groups[0].Value, match.Groups[1].Value, match.Groups[2].Value, out changed); //313
			if (true == changed)
				changes_made = true;
			return ret_val;
			});

// editors
	pattern = @"(\{\{\s*" + IS_CS1 + @"[^\}]*)\|\s*" + IS_EDITOR + @"\s*=\s*(" + IS_VNAME + @",[^\|\}]*)";	// captures are prefix and editor param value
	ArticleText = Regex.Replace(ArticleText, pattern,
		delegate(Match match)
			{
			bool changed = false;
			string ret_val = vancouver_style (@"|veditors=", match.Groups[0].Value, match.Groups[1].Value, match.Groups[2].Value, out changed);
			if (true == changed)
				changes_made = true;
			return ret_val;
			});


//---------------------------< T W O   A N D - S E P A R A T E D   V A N C   S T Y L E   N A M E S >----------
// non-standard (shouldn't have 'and' for proper vancouver style)

	pattern = @"(\{\{\s*" + IS_CS1 + @"[^\}]*)\|\s*" + IS_AUTHOR + @"\s*=\s*(" + IS_VNAME + @")\s+(?:\band\b|&)\s*(" + IS_VNAME + @")\s*\.?\s*([\|\}]*)";
	if (Regex.Match (ArticleText, pattern).Success)
		{
		ArticleText = Regex.Replace(ArticleText, pattern, "$1|vauthors=$2, $3$4");
		changes_made = true;
		}


	pattern = @"(\{\{\s*" + IS_CS1 + @"[^\}]*)\|\s*" + IS_EDITOR + @"\s*=\s*(" + IS_VNAME + @")\s+(?:\band\b|&)\s*(" + IS_VNAME + @")\s*\.?\s*([\|\}]*)";
	if (Regex.Match (ArticleText, pattern).Success)
		{
		ArticleText = Regex.Replace(ArticleText, pattern, "$1|vauthors=$2, $3$4");
		changes_made = true;
		}



//---------------------------< C O M M A - S E P A R A T E D   N A M E S >------------------------------------
//
// Name-lists fixed here are 'first last' order.  Commas are not allowed in the names.
//

// authors – first-last order; no commas in names
	pattern = @"(\{\{\s*" + IS_CS1 + @"[^\}]*)\|\s*" + IS_AUTHOR + @"\s*=\s*(" + IS_NAME_NO_COMMA + IS_COMMA_SEP + @"[^\|\}]*)";
	ArticleText = Regex.Replace(ArticleText, pattern,
		delegate(Match match)
			{
			bool changed = false;
			string ret_val = comma_style (@"|author", match.Groups[0].Value, match.Groups[1].Value, match.Groups[2].Value, out changed); //313
			if (true == changed)
				changes_made = true;
			return ret_val;
			});


// editors – first-last order; no commas in names
	pattern = @"(\{\{\s*" + IS_CS1 + @"[^\}]*)\|\s*" + IS_EDITOR + @"\s*=\s*(" + IS_NAME_NO_COMMA + IS_COMMA_SEP + @"[^\|\}]*)";
	ArticleText = Regex.Replace(ArticleText, pattern,
		delegate(Match match)
			{
			bool changed = false;
			string ret_val = comma_style (@"|editor", match.Groups[0].Value, match.Groups[1].Value, match.Groups[2].Value, out changed); //313
			if (true == changed)
				changes_made = true;
			return ret_val;
			});


//---------------------------< T W O   A N D - S E P A R A T E D   N A M E S >--------------------------------
// two names separated by 'and' with optional punctuation

// replace ', and' and ', &' with ' and '
//	pattern = @"(\{\{\s*" + IS_CS1 + @"[^\}]*\|\s*" + IS_AUTHOR + @"\s*=\s*[\p{L}\-'\s,\.]+?)(?:,\s*\band\b|\bAND\b|,\s*&)\s*([\p{L}\-'\s,\.]+)";
//	if (Regex.Match (ArticleText, pattern).Success)
//		{
//		ArticleText = Regex.Replace(ArticleText, pattern, "$1 and $2");
//		}

	pattern = @"(\{\{\s*" + IS_CS1 + @"[^\}]*)\|\s*" + IS_AUTHOR + @"\s*=\s*([\p{L}\-'\s,\.]+?)(?:[;,]\s*\band\b|;\s*&|\band\b|&)\s*([\p{L}\-'\s,\.]+)";
	ArticleText = Regex.Replace(ArticleText, pattern,
		delegate(Match match)
			{
			string raw_capture = match.Groups[0].Value;						// the captured citation
			string raw_prefix = match.Groups[1].Value;						// citation template up to the start of author param
			string first_author_name = match.Groups[2].Value;				// author parameter value
			string second_author_name = match.Groups[3].Value;				// author parameter value

			int	count = first_author_name.Split(',').Length - 1;			// count the number of commas in author parameter before the 'and'
			if (1 < count)													// if there are more than two
				return raw_capture;											// no fix
				
			count = first_author_name.Trim().Split(' ').Length - 1;			// count the number of spaces in the first name
			if (0 == count || 3 < count)									// if there are none or more than three
				return raw_capture;											// no fix

			count = second_author_name.Trim().Split(' ').Length - 1;		// count the number of spaces in the second name
			if (0 == count || 3 < count)									// if there are none or more than three
				return raw_capture;											// no fix

			changes_made = true;
			return raw_prefix + @"|author1=" + first_author_name + @" |author2=" + second_author_name;
			});
			

//---------------------------< S E M I C O L O N   S E P A R A T E D   N A M E S >----------------------------
// authors
//	pattern = @"(\{\{\s*" + IS_CS1 + @"[^\}]*)\|\s*" + IS_AUTHOR + @"\s*=\s*([^\|\}]*)";	// captures are prefix and author param value
	pattern = @"(\{\{\s*" + IS_CS1 + @"[^\}]*)\|\s*" + IS_AUTHOR + @"\s*=\s*(" + IS_NAME + @";[^\|\}]*)";	// captures are prefix and author param value
	
	ArticleText = Regex.Replace(ArticleText, pattern,
		delegate(Match match)
			{
			bool changed = false;
			string ret_val = semicolon_style (@"|author", match.Groups[0].Value, match.Groups[1].Value, match.Groups[2].Value, out changed); //313
			if (true == changed)
				changes_made = true;
			return ret_val;
			});

// editors
	pattern = @"(\{\{\s*" + IS_CS1 + @"[^\}]*)\|\s*" + IS_EDITOR + @"\s*=\s*([^\|\}]*)";	// captures are prefix and editor param value

	ArticleText = Regex.Replace(ArticleText, pattern,
		delegate(Match match)
			{
			bool changed = false;
			string ret_val = semicolon_style (@"|editor", match.Groups[0].Value, match.Groups[1].Value, match.Groups[2].Value, out changed); //313
			if (true == changed)
				changes_made = true;
			return ret_val;
			});


// cleanup: remove trailing 'ed', 'ed.', '(ed)', '(ed.)', etc text from editorn parameters
//	pattern = @"(\{\{\s*" + IS_CS1 + @"[^\}]*\|\s*" + IS_EDITOR + @"\d*\s*=[^\|\}]*?)\s*,?\s*(?:\([Ee]ditors?\)|\([Ee]ds?\.?\)|\b[Ee]ds?\.|\b[Ee]ds?\b)";
//	while (Regex.Match (ArticleText, pattern).Success)
//		{
//		ArticleText = Regex.Replace(ArticleText, pattern, "$1");
//		}


//---------------------------< L F   C O M M A   S E P   N A M E S >------------------------------------------
//
// must do after semicolon style because semicolon style may have names in last-first order
//

// authors – last-first order

//DISPABLED – for the bot version because it is too susceptible to misinterpreting the |author= parameter value

/*
	pattern = @"(\{\{\s*" + IS_CS1 + @"[^\}]*)\|\s*" + IS_AUTHOR + @"\s*=\s*(" + IS_NAME_COMMA + IS_COMMA_SEP + @"[^\|\}]*)";
	ArticleText = Regex.Replace(ArticleText, pattern,
		delegate(Match match)
			{
			bool changed = false;
			string ret_val = lf_comma_style (@"|author", match.Groups[0].Value, match.Groups[1].Value, match.Groups[2].Value, out changed); //313
			if (true == changed)
				changes_made = true;
			return ret_val;
			});

// editors – last-first order
	pattern = @"(\{\{\s*" + IS_CS1 + @"[^\}]*)\|\s*" + IS_EDITOR + @"\s*=\s*(" + IS_NAME_COMMA + IS_COMMA_SEP + @"[^\|\}]*)";
	ArticleText = Regex.Replace(ArticleText, pattern,
		delegate(Match match)
			{
			bool changed = false;
			string ret_val = lf_comma_style (@"|editor", match.Groups[0].Value, match.Groups[1].Value, match.Groups[2].Value, out changed); //313
			if (true == changed)
				changes_made = true;
			return ret_val;
			});
*/



//---------------------------< U N H I D E >------------------------------------------------------------------

// UNHIDE: replace COMMAJRDOT with , Jr. (same with COMMASR)
	ArticleText = Regex.Replace(ArticleText, @"COMMA([SJ])RDOT", ", $1r.");

// UNHIDE: replace COMMAJR with , Jr (same with COMMASRDOT)
	ArticleText = Regex.Replace(ArticleText, @"COMMA([SJ])R", ", $1r");

// UNHIDE: replace __S3M1C0L0N__ with ;
	ArticleText = Regex.Replace(ArticleText, @"__S3M1C0L0N__", ";");

// UNHIDE: replace __WL1NK_O__ with [[
	ArticleText = Regex.Replace(ArticleText, @"__WL1NK_O__", "[[");

// UNHIDE: replace __WL1NK_C__ with ]]
	ArticleText = Regex.Replace(ArticleText, @"__WL1NK_C__", "]]");

// UNHIDE: replace __P1P3__ with |
	ArticleText = Regex.Replace(ArticleText, @"__P1P3__", "|");

// UNHIDE: replace __L3F7__ with {{
	ArticleText = Regex.Replace(ArticleText, @"__L3F7__", "{");

// UNHIDE: replace __R16H7__ with {{
	ArticleText = Regex.Replace(ArticleText, @"__R16H7__", "}");

// UNHIDE: replace __0P3N__ with {{
	ArticleText = Regex.Replace(ArticleText, @"__0P3N__", "{{");

// UNHIDE: replace __CL0S3__ with }}
	ArticleText = Regex.Replace(ArticleText, @"__CL0S3__", "}}");

	Skip = !changes_made;
	return ArticleText;
	}


//===========================< V A N C O U V E R _ S T Y L E >================================================
private string vancouver_style (string param_name, string raw_capture, string raw_prefix, string param_value, out bool changed)
	{
	changed = false;
	string	name_list = "";											// reconstituted author/editor name-list
	string	split_pattern = @"(?:\s*[;,]\s*(?:and|&)?\s*|\s+and\s+)";				// split on commas with or without surrounding spaces

	param_value = param_value.Trim().TrimEnd(',', '.', '\'');		// remove whitespace, commas, periods, apostrophes from end

	if (!Regex.Match (param_value, split_pattern).Success)			// if split_pattern not in parameter value (no commas) ...
		return raw_capture;											// we're done
	
	if (Regex.Match (param_value, @"[\(\[\]\)]").Success)			// if param_value has parentheses, brackets ...
		return raw_capture;											// we're done
	
	string[] substrings = Regex.Split(param_value, split_pattern);	// split author/editor parameter value into individual names
	foreach (string name in substrings)								// for each author/editor name
		{
		if (!Regex.Match (name.Trim(), IS_VNAME_1).Success)			// if an author/editor name does not have the proper format
			return param_value + raw_capture;										// make no changes

		name_list = name_list + @", " + name.Trim();				// remake the list to remove extra spaces and 'and' and '&' when they occur
		}

	name_list = param_name + name_list.Trim(',', '.', ' ');			// add parameter name and remove commas and whitespace from end
	changed = true;
	return raw_prefix + name_list + ' ';							// concatenate with the raw_prefix and done
	}
	

//===========================< S E M I C O L O N _ S T Y L E >================================================
private string semicolon_style (string param_name, string raw_capture, string raw_prefix, string param_value, out bool changed)
	{
	changed = false;
	string	name_list = "";											// reconstituted author list |author1=... |author2=...
	string	split_pattern = @"\s*(?:;?\s*\band\b|\bAND\b|;\s*&|&|;\s*)\s*";	// split on semicolons with or without surrounding spaces
	int		i = 1;													// indexer
	int		count;													// used to count number of spaces in name

	param_value = param_value.TrimEnd(',', ';', ' ');				// remove commas, semicolons, and whitespace from end

	if (!Regex.Match (param_value, @";").Success)					// if split_pattern not in author parameter value (no semicolons) ...
		return raw_capture;											// we're done
		
	string[] substrings = Regex.Split(param_value, split_pattern);	// split author parameter value into individual names
	foreach (string name in substrings)								// for each author name
		{
		count = name.Trim().Split(' ').Length - 1;					// count the number of spaces in the name
		if (3 < count)												// if there are more than three (catches vanc style that has a semicolon)
			return raw_capture;										// no fix

		if (Regex.Match (name.Trim(), @"^(?:(?:\p{Lu}\b\.?\s*){2,4}|\p{Lu}\.?)$").Success)		// attempt to identify just initials where author is like 'dos Santos, B. A.'
			return raw_capture;

		name_list = name_list + param_name + i.ToString() + @"=" + name.Trim() + @" ";	// make an individual author parameter
		i++;														// bump the indexer
		}

	changed = true;
	return raw_prefix + name_list;									// concatenate with the raw_prefix and done
	}


//===========================< C O M M A _ S T Y L E >========================================================
private string comma_style (string param_name, string raw_capture, string raw_prefix, string param_value, out bool changed)
	{
	changed = false;
	string name_list = "";
	int		i = 1;													// indexer
	int		count;													// used to count number of spaces in name
	
	if (Regex.Match (param_value, @";").Success)					// if there are semicolons in the parameter value
		return raw_capture;											// we're done
		
	param_value = param_value.TrimEnd(',', ' ');					// remove commas and whitespace from end
	string[] substrings = Regex.Split(param_value, IS_COMMA_SEP);
	
	foreach (string name in substrings)								// for each author/editor name
		{
		count = name.Trim().Split(' ').Length - 1;					// count the number of spaces in the name
		if (0 == count || 3 < count)								// if there are none or more than three
			return raw_capture;										// no fix
		
		if (Regex.Match (name.Trim(), @"^(?:(?:\p{Lu}\b\.?\s*){2,4}|\p{Lu}\.?)$").Success)		// attempt to identify just initials where author is like 'dos Santos, B. A.'
			return raw_capture;

		name_list = name_list + param_name + i.ToString() + @"=" + name.Trim() + @" ";	// make an individual author parameter
		i++;														// bump the indexer
		}
	
	changed = true;
	return raw_prefix + name_list;									// concatenate with the raw_prefix and done
	}


//===========================< L F _ C O M M A _ S T Y L E >==================================================
//
// last-first with comma separators
//

private string lf_comma_style (string param_name, string raw_capture, string raw_prefix, string param_value, out bool changed)
	{
	changed = false;
	string name_list = "";
	string name;
	int		i = 1, index;											// indexer
	int		count;													// used to count number of spaces in name
	
	param_value = param_value.TrimEnd(',', ' ');					// remove commas and whitespace from end
	string[] substrings = Regex.Split(param_value, IS_COMMA_SEP);
	count = substrings.Length;										// get the number of substrings
	
	if (2 == count)													// only one author; nothing to do
		return raw_capture;
	
	if (1 == count % 2)												// this test does not work for |author=last, first, first last & first last
		return raw_capture;
	
	for (i=0, index=1; i < count; i++, index++)						// for each author/editor name
		{
		name = substrings[i++].Trim() + @", " + substrings[i].Trim();
		
		name_list = name_list + param_name + index.ToString() + @"=" + name + @" ";	// make an individual author parameter
		}
	
	changed = true;
	return raw_prefix + name_list;									// concatenate with the raw_prefix and done
	}