News/Actor storage changes on the Wiki Replicas
< News
Tracked in Phabricator
Task T223406
The Actor table was introduced to MediaWiki by T167246 and related work on the Actor migration project. This page is about related changes that have been made to the Wiki Replica views to be compatible with the MediaWiki operational database schema. These changes to the Wiki Replicas were first announced on 2019-05-17 via the cloud-announce mailing list and then made live on 2019-06-03.
What changed?
*_user and *_user_text fields have been removed from the Wiki Replica views. This change was made to keep the replicas ahead of changes being made to the production databases in the near future.
An example error message that could be caused by this change is:
ERROR 1054 (42S22): Unknown column 'ar_user' in 'field list'
The fields that were dropped and their replacements (see T223406 for more details):
TableColumns removedColumns added
archivear_user, ar_user_textar_actor
filearchivefa_user, fa_user_textfa_actor
imageimg_user, img_user_textimg_actor
ipblocksipb_by, ipb_by_textipb_by_actor
logginglog_user, log_user_textlog_actor
oldimageoi_user, oi_user_textoi_actor
recentchangesrc_user, rc_user_textrc_actor
revisionrev_user, rev_user_textrev_actor
The same changes were made to the associated _userindex views such as revision_userindex.
Tables with names ending in _compat will be effectively unchanged, but the hidden joins that preserve the legacy *_user and *_user_text fields will make queries involving those tables slower than direct use of the actor table.
Changes to the revision_userindex view were made to optimize usage of the rev_actor field in T221339. These changes have the side effect of reducing the speed of queries using the revision_user and revision_user_text fields.
What should I do?
The best solution
Every table that had user and user_text type fields has a similar actor field. That field is the id to find in the actor table.
Example: ar_user and ar_user_text from the archive table would be replaced with a join on the actor table using the ar_actor field as actor_id or a separate query, which can be faster in some cases, to obtain the needed actor_name associated with a given archive action.
The list of actor fields follows:
TableActor field
archivear_actor
filearchivefa_actor
imageimg_actor
ipblocksipb_by_actor
logginglog_actor
oldimageoi_actor
recentchangesrc_actor
revisionrev_actor
What about a quick solution if the refactor is too time consuming at the moment
Tables affected by the actor change also have a _compat view (for example archive_compat) which have the actor table joins already present in the view's code and will present just like the affected table did before the changes took place in the MediaWiki codebase. The _compat views will suffer a performance penalty because of those same joins. In some cases, where performance isn't the most important factor, this may be the quickest way to fix a tool or application. Use of the _compat views should not be considered a permanent solution for a long lived tool. These views will very likely be removed in the future.
The actor table seems really slow--so does comment
Tracked in Phabricator
Task T215445
Tracked in Phabricator
Task T224850
The views for actor and comment are burdened by subqueries against 8 other tables in order to ensure that only the correct rows are available. This is being worked on here T215445, but since that's work toward a long-term solution, WMCS has put up a set of sub-views of these tables to make things faster for individual queries (phab:T224850). Each subview only makes a subquery against a single table. This way, if you are looking for an actor row from the logging table, then you can query against the actor_logging view. This view will be more efficient for that case. The actor_logging view will, naturally, not have any rows available that might be referenced instead in the revision table or the archive table.
Example query using the new views (in this case actor_logging), provided by MusikAnimal (talkcontribs):
SELECT * FROM logging_userindex JOIN actor_logging ON actor_id = log_actor WHERE actor_name = 'MusikAnimal'
The specialized views that can be used in this case are:
  • actor_filearchive
  • actor_image
  • actor_ipblocks
  • actor_logging
  • actor_oldimage
  • actor_protected_titles
  • actor_recentchanges
  • actor_revision
  • comment_filearchive
  • comment_image
  • comment_ipblocks
  • comment_logging
  • comment_oldimage
  • comment_protected_titles
  • comment_recentchanges
  • comment_revision
Advanced use cases of specialized views
There are other more clever ways to exploit the fact that these views include a single subquery against the external table in the second part of the name to remove joins or where clauses from your own queries, but we suggest experimenting to ensure you are getting the same results if you try this. An example of this is, if you were just trying to see if user "Foo" had ever made edits to a wiki, you could reasonably get that from SELECT 1 FROM actor_revision WHERE actor_name = 'Foo' because any rows with that actor name would be rows that are visible in the revision table. This would likely be faster than a join query for that particular piece of information.
See also
mw:Actor migration
Last edited on 10 July 2019, at 04:48
Wikitech
Content is available under CC BY-SA 3.0 unless otherwise noted.
Privacy policy
Terms of Use
Desktop
 Home Random Log in  Settings  Donate  About Wikitech  Disclaimers
WatchEdit