Page MenuHomePhabricator

[L] Create edit tags to measure multimedia edits to Wikipedia articles
Closed, ResolvedPublic

Description

As per T265771, we want to measure the vectors for how multimedia content gets added to Wikipedia articles - via Visual Editor, direct Wikitext editing, or bots. This will help us understand where the most impactful places will be to target our work on media matching, and to be able to measure the success of that work.

This ticket is to create an edit tag for media additions so that we can do those measurements. (Information about whether the additions come from VE, Wikitext, or bots will come from other avenues - see Morten's comment in T266067#6634887).

Acceptance criteria:

  • An edit tag add media is applied to all edits that add any type of media to an article, specifically:
Changes in mediaEdit tag
User adds one or more imagesmw-add-media
User deletes one or more imagesmw-remove-media
User adds two images, deletes one imagemw-add-media + mw-remove-media
User updates 1 image to a different onemw-add-media + mw-remove-media
  • The tag description should be "Edits that add media"/"Edits that remove media"
  • The tag should be a hidden tag

Related Objects

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
MarkTraceur renamed this task from Create edit tags to differentiate multimedia edits to Wikipedia articles to [L] Create edit tags to differentiate multimedia edits to Wikipedia articles.Nov 18 2020, 5:50 PM

Since the description mentions "those measurements": what, specifically, are we trying to measure?

I have 2 concerns:

  1. Are edit tags a good enough proxy for the information that we desire?
    1. People might upload multiple images in 1 edit. Does that matter?
    2. This will not keep track of image additions that get reverted. Does that matter? If so, do we have a method of accounting for those?
    3. There is a significant chance that we might not reach every single bot owner to requests they add the tag, which could have massive impact on the numbers. Should we figure out a way to programmatically, always, add a tag for bot edits?
  2. Do the edit tag "categories" give us sufficient detail?
    1. Does 2017_wikitext_editor belong under wikitext or VE? I.e. are we tracking usage the image selection tool, or the edit process?
    2. Many cases will likely go untracked or get lumped into 'wikitext' or 'bot'. E.g. mobile edit, contenttranslation possibly...
    3. Could there be other use cases in the future that we might want to track? E.g. an image recommendation tool.
    4. If we fail to implement an edit tag for one of these categories, is there still value in having the others? Or they useful for more than internal comparison?

I simply worry that we'll spend a lot of time implementing this (thing that has many scenarios & edge cases) and that we'll end up finding it doesn't exactly give us the kind of data that answers the questions that we have.

Another idea/question (for @nettrom_WMF, I suspect): can we easily intersect edit tags?
If so, we can probably "simply" write some code that figure out whether any edit includes a new image, and add a generic "Image add" tag, which can then be intersected with the many other existing edit tags (e.g. visualeditor)
This would likely be both simpler to implement (only have to do it once, in a central location where a lot of tool-specific edge cases are no longer relevant) and provide more flexibility to analyze the data.

Since the description mentions "those measurements": what, specifically, are we trying to measure?

We're trying to measure how multimedia content gets added to articles. My impression is that we're doing this to understand if there's a particular area to focus on when building products that can make it easier to add multimedia content, which if I remember correctly is one of the team's KRs for the current fiscal year. I hope @Ramsey-WMF or @CBogen can pitch in to clarify, and we can add that to the task description of this task and the parent task.

I have 2 concerns: [Morten notes that these were originally numbered, but I changed them into bullet points to make my responses less confusing]

  • Are edit tags a good enough proxy for the information that we desire?

I think they will be, and you're asking good questions that will help us make sure that we implement them in a way that works. Since media usage information is not versioned in MediaWiki, the alternative is something like diffing revisions and parsing those diffs to understand if an edit added media.

  • People might upload multiple images in 1 edit. Does that matter?

There's an open question mentioned in the task description about whether we should differentiate between uploads and adding media. I think that depends on how uploads are logged by MediaWiki. If a user uploads two files through VE and adds them into an article, does that show up as two file uploads and one edit in the system? Are those uploads tagged in a way that makes them easy to separate from uploads outside of VE? Depending on how easy they are to identify, we might not need to do something specific about them.

  • This will not keep track of image additions that get reverted. Does that matter? If so, do we have a method of accounting for those?

It often comes up in analyses, we might in this case be wondering what the revert rate of media edits are. There are standardized ways of identifying them and we now also have the "mw-reverted" tag applied to reverts. There's T266374 that'll look at the similarities and differences between approaches to identifying reverts in analyses.

  • There is a significant chance that we might not reach every single bot owner to requests they add the tag, which could have massive impact on the numbers. Should we figure out a way to programmatically, always, add a tag for bot edits?

We often use wt:MediaWiki History for this type of analysis, where identifying bot edits is also standardized. In cases we're I've been working with edits from other data sources, I've re-implemented the same bot identification approach. So having a separate tag for bot edits it not necessary.

  • Does 2017_wikitext_editor belong under wikitext or VE? I.e. are we tracking usage the image selection tool, or the edit process?

Good question! I see that the 2017 wikitext editor has the same tool as VE. We might therefore want to track it as VE.

  • Many cases will likely go untracked or get lumped into 'wikitext' or 'bot'. E.g. mobile edit, contenttranslation possibly...

Good point! I think we should consider what additional dimensions we need to deal with here (see also my point below about tag intersections).

  • Could there be other use cases in the future that we might want to track? E.g. an image recommendation tool.

Yes, and maybe the image recommendation tool should have its own edit tag?

Another idea/question (for @nettrom_WMF, I suspect): can we easily intersect edit tags?
If so, we can probably "simply" write some code that figure out whether any edit includes a new image, and add a generic "Image add" tag, which can then be intersected with the many other existing edit tags (e.g. visualeditor)
This would likely be both simpler to implement (only have to do it once, in a central location where a lot of tool-specific edge cases are no longer relevant) and provide more flexibility to analyze the data.

Yes. As mentioned above we often use the MediaWiki History dataset for analyzing edits. In that dataset all tags associated with a given revision are available, so we can make those intersections (relatively) easily. Other datasets often do the same thing (there's an event table that works in a similar fashion). Having a generic "image add" tag would make a lot of sense.

CBogen renamed this task from [L] Create edit tags to differentiate multimedia edits to Wikipedia articles to [L] Create edit tags to measure multimedia edits to Wikipedia articles.Nov 19 2020, 9:29 PM
CBogen updated the task description. (Show Details)

Thanks all - I just updated the description of the task based on the previous discussion.

@ppelberg @JTannerWMF - I wanted to let you know that the SD team is planning to implement this ticket and give you the opportunity to let us know if you'd like to do code review or otherwise comment - thanks!

@ppelberg @JTannerWMF - I wanted to let you know that the SD team is planning to implement this ticket and give you the opportunity to let us know if you'd like to do code review or otherwise comment - thanks!

Thank you for the heads up, @CBogen. I do not think we need to do code review for the implementation of this edit tag. Although @Esanders, if you think otherwise please comment as much.

Adding the tag y'all are describing sounds good to us. A resulting question and comment:

Question

  • Are you able to implement this as a hidden tag [i]?
    • I ask this above considering the primary "user"/"consumer" of this tag seems like it will be staff [ii] as opposed to editors. Please correct me if this is not so!

Comment
We'll be curious to know how y'all end up "drawing the boundaries" / "defining" what determines when the add media tag is applied.[ii]


i. "Administrators (those with the editinterface user right) will see an "edit" link to modify the MediaWiki:Tag-<tag name> page. Set the contents of this page to - to hide the tag from all change lists." via https://www.mediawiki.org/wiki/Help:Tags
ii. "...If a user uploads two files through VE and adds them into an article, does that show up as two file uploads and one edit in the system? Are those uploads tagged in a way that makes them easy to separate from uploads outside of VE? Depending on how easy they are to identify, we might not need to do something specific about them." via T266067#6634887

Question

  • Are you able to implement this as a hidden tag [i]?
    • I ask this above considering the primary "user"/"consumer" of this tag seems like it will be staff [ii] as opposed to editors. Please correct me if this is not so!

We think the tag could also be valuable to editors, but defer to @Whatamidoing-WMF on this. What do you think?

Comment
We'll be curious to know how y'all end up "drawing the boundaries" / "defining" what determines when the add media tag is applied.[ii]

ii. "...If a user uploads two files through VE and adds them into an article, does that show up as two file uploads and one edit in the system? Are those uploads tagged in a way that makes them easy to separate from uploads outside of VE? Depending on how easy they are to identify, we might not need to do something specific about them." via T266067#6634887

We've landed on one edit tag regardless of the number of uploads. @nettrom_WMF is doing some investigation into whether there are other ways we can separate uploads/searches from edits when doing analysis.

If editors aren't asking for this right now, then I'd make the tag hidden. It can always be changed later if someone wants it.

If editors aren't asking for this right now, then I'd make the tag hidden. It can always be changed later if someone wants it.

Thanks! I updated the acceptance criteria for the task to indicate that it should be a hidden tag.

  • Does 2017_wikitext_editor belong under wikitext or VE? I.e. are we tracking usage the image selection tool, or the edit process?

Good question! I see that the 2017 wikitext editor has the same tool as VE. We might therefore want to track it as VE.

You won't be able to tell from the edit alone if a tool was used to insert the media, or if it was hand-typed. Even the 2010 wikitext editor has a media insertion tool, and there are probably gadgets that can generate image syntax.

For this reason I don't see any value it recording 2017WTE as "VE". It isn't a visual edit.

If you want to know when users use the media insertion dialog (either in VE or 2017WTE) we already track usage via https://meta.wikimedia.org/wiki/Schema:VisualEditorFeatureUse.

You won't be able to tell from the edit alone if a tool was used to insert the media, or if it was hand-typed. Even the 2010 wikitext editor has a media insertion tool, and there are probably gadgets that can generate image syntax.

For this reason I don't see any value it recording 2017WTE as "VE". It isn't a visual edit.

Thanks for catching that! And I totally agree, we can't combine them into one category. If we need to determine specifically what tool they used to add the image, we'll turn to VisualEditorFeatureUse (as you mentioned).

Change 663209 had a related patch set uploaded (by Matthias Mullie; owner: Matthias Mullie):
[mediawiki/core@master] Add change tags for media additions/removals

https://gerrit.wikimedia.org/r/663209

I have a patch in code review that deviates slightly from the task description.
I figured it'd be confusing or misleading to categorize media replacements as "add media" (but we probably also wouldn't want to ignore those), so instead of a singular "add media" tag, I've added 3: one for additions (only), one for removals (only), and another for changes (both additions & removals).
New media (including replacements) thus includes both "add media" and "change media".
Does that work?

I have a patch in code review that deviates slightly from the task description.
I figured it'd be confusing or misleading to categorize media replacements as "add media" (but we probably also wouldn't want to ignore those), so instead of a singular "add media" tag, I've added 3: one for additions (only), one for removals (only), and another for changes (both additions & removals).
New media (including replacements) thus includes both "add media" and "change media".
Does that work?

That sounds good to me! How do you differentiate between additions, removals, and changes?

I'd also like @nettrom_WMF to verify that this will work, thanks!

I think I'd like to draw up some different scenarios (using images as the example media) and go through what edit tags end up getting applied, just to make sure I've got this right.

Changes in mediaEdit tag
User adds one or more imagesmw-add-media
User deletes one or more imagesmw-remove-media
User adds two images, deletes one imagemw-change-media
User updates 1 image to a different onemw-change-media

The last row ends up getting the mw-change-media tag because to the system it looks like a delete and an add. I don't think there's a way to get around that. We do have some ability to measure these if it's done using VE, because if the edit session gets sampled the logged events will reflect that the image was changed rather than added.

From my reading of the parent task, we're primarily interested in understanding how images get added, and the add/change media tags will let us track that. As far as I can tell, this is good to go.

Change 663209 merged by jenkins-bot:
[mediawiki/core@master] Add change tags for media additions/removals

https://gerrit.wikimedia.org/r/663209

The patch that introduces this edit tag has been merged, but I still need to switch it on.
What projects will we be tracking these multimedia edits on? All WMF projects? All Wikipedias? English Wikipedia?

The patch that introduces this edit tag has been merged, but I still need to switch it on.
What projects will we be tracking these multimedia edits on? All WMF projects? All Wikipedias? English Wikipedia?

All WMF projects, thanks! (unless there’s a reason not to?)

The patch that introduces this edit tag has been merged, but I still need to switch it on.
What projects will we be tracking these multimedia edits on? All WMF projects? All Wikipedias? English Wikipedia?

All WMF projects, thanks! (unless there’s a reason not to?)

All of these tags have to be stored. It's not a huge deal, but if we have no intention of ever looking at that data in certain projects, we might as well skip that.

All of these tags have to be stored. It's not a huge deal, but if we have no intention of ever looking at that data in certain projects, we might as well skip that.

Ok, let's start with all Wikipedias. If we expand our tools to other projects we can come back to those.

Change 674882 had a related patch set uploaded (by Matthias Mullie; author: Matthias Mullie):
[operations/mediawiki-config@master] Enable media change tags on wikipedias

https://gerrit.wikimedia.org/r/674882

Change 675148 had a related patch set uploaded (by Urbanecm; author: Urbanecm):
[mediawiki/core@master] Revert "Add change tags for media additions/removals"

https://gerrit.wikimedia.org/r/675148

Change 675148 merged by jenkins-bot:
[mediawiki/core@master] Revert "Add change tags for media additions/removals"

https://gerrit.wikimedia.org/r/675148

Change 675149 had a related patch set uploaded (by Urbanecm; author: Urbanecm):
[mediawiki/core@wmf/1.36.0-wmf.36] Revert "Add change tags for media additions/removals"

https://gerrit.wikimedia.org/r/675149

Change 675149 merged by jenkins-bot:
[mediawiki/core@wmf/1.36.0-wmf.36] Revert "Add change tags for media additions/removals"

https://gerrit.wikimedia.org/r/675149

Mentioned in SAL (#wikimedia-operations) [2021-03-26T17:10:52Z] <hashar@deploy1002> Started scap: Revert "Add change tags for media additions/removals" - T266067 T278429

Mentioned in SAL (#wikimedia-operations) [2021-03-26T17:42:32Z] <hashar@deploy1002> Finished scap: Revert "Add change tags for media additions/removals" - T266067 T278429 (duration: 31m 43s)

I've long wondered how often people add |alt= text. (I've been hoping that VisualEditor's design encourages it.) I don't know if measuring that is a goal for your project, but you might consider it.

Note that the core patch was reverted, as it caused T278429.

Change 674882 abandoned by Matthias Mullie:
[operations/mediawiki-config@master] Enable media change tags on wikipedias

Reason:

https://gerrit.wikimedia.org/r/674882

Change 679816 had a related patch set uploaded (by Matthias Mullie; author: Matthias Mullie):

[mediawiki/core@master] Add change tags for media additions/removals

https://gerrit.wikimedia.org/r/679816

Change 679816 merged by jenkins-bot:

[mediawiki/core@master] Add change tags for media additions/removals

https://gerrit.wikimedia.org/r/679816

I think I'd like to draw up some different scenarios (using images as the example media) and go through what edit tags end up getting applied, just to make sure I've got this right.

Changes in mediaEdit tag
User adds one or more imagesmw-add-media
User deletes one or more imagesmw-remove-media
User adds two images, deletes one imagemw-change-media
User updates 1 image to a different onemw-change-media

The last row ends up getting the mw-change-media tag because to the system it looks like a delete and an add. I don't think there's a way to get around that. We do have some ability to measure these if it's done using VE, because if the edit session gets sampled the logged events will reflect that the image was changed rather than added.

From my reading of the parent task, we're primarily interested in understanding how images get added, and the add/change media tags will let us track that. As far as I can tell, this is good to go.

@nettrom_WMF
FYI: the new implementation permitted adding multiple tags. Because of this, I decided to drop the mw-change-media tag.
Instead, edits that both add & remove images will have both mw-add-media and mw-remove-media tags.
Edits that only add images will still get mw-add-media & images that only remove media will be tagged mw-remove-media.
I suppose this is ok on your end and has nothing but advantages (more granularity). If not, please let me know and I can go back and change things to what we previously discussed.
(I have updated the task description to reflect these changes)

I suppose this is ok on your end and has nothing but advantages (more granularity).

Your supposition is correct, I'm happy with this. Looking forward to having these edit tags available!

Change 674882 restored by Matthias Mullie:

[operations/mediawiki-config@master] Enable media change tags on wikipedias

https://gerrit.wikimedia.org/r/674882

Doesn't seem to be collecting data (see https://en.wikipedia.org/wiki/Special:Tags). Will investigate.

My bad. This defaults to not capture data unless specifically enabled, so also requires config to enable this on wikipedias.
With the revert etc, I forget that that had not yet been deployed.
I've restored that patch & will seek to deploy that soon.

Change 674882 merged by jenkins-bot:

[operations/mediawiki-config@master] Enable media change tags on wikipedias

https://gerrit.wikimedia.org/r/674882

Mentioned in SAL (#wikimedia-operations) [2021-05-13T18:19:11Z] <urbanecm@deploy1002> Synchronized wmf-config/InitialiseSettings.php: 04eb9d30b069e60004a42fcb128a958a24aee229: Enable media change tags on wikipedias (T266067) (duration: 01m 07s)

Change 690082 had a related patch set uploaded (by Urbanecm; author: Urbanecm):

[operations/mediawiki-config@master] Revert "Enable media change tags on wikipedias"

https://gerrit.wikimedia.org/r/690082

Change 690691 had a related patch set uploaded (by Urbanecm; author: Urbanecm):

[operations/mediawiki-config@master] Properly enable media change tags on Wikipedias

https://gerrit.wikimedia.org/r/690691

Change 690082 merged by jenkins-bot:

[operations/mediawiki-config@master] Revert "Enable media change tags on wikipedias"

https://gerrit.wikimedia.org/r/690082

Mentioned in SAL (#wikimedia-operations) [2021-05-13T20:21:51Z] <urbanecm@deploy1002> Synchronized wmf-config/InitialiseSettings.php: REVERT: 9dc74e45579c9b868571529171421c4bf7de41fa: Revert "Enable media change tags on wikipedias" (T266067, T282822) (duration: 01m 07s)

I reverted the enabling patch, as it caused T282822: Certain tags are no longer activated by default. Also uploaded a patch that hopefully will enable this feature properly, but that should be ideally CR'ed first.

I reverted the enabling patch, as it caused T282822: Certain tags are no longer activated by default. Also uploaded a patch that hopefully will enable this feature properly, but that should be ideally CR'ed first.

Thanks for reverting that & putting a new version together - lgtm!

Change 690691 merged by jenkins-bot:

[operations/mediawiki-config@master] Properly enable media change tags on Wikipedias

https://gerrit.wikimedia.org/r/690691

Mentioned in SAL (#wikimedia-operations) [2021-05-19T11:13:13Z] <mlitn@deploy1002> Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:690691|Properly enable media change tags on Wikipedias (T266067 T282822)]] - part 1 (duration: 01m 34s)

Mentioned in SAL (#wikimedia-operations) [2021-05-19T11:14:46Z] <mlitn@deploy1002> Synchronized wmf-config/CommonSettings.php: Config: [[gerrit:690691|Properly enable media change tags on Wikipedias (T266067 T282822)]] - part 2 (duration: 01m 04s)

Etonkovidova subscribed.

Media add/remove tags are present e.g. https://en.wikipedia.org/wiki/Special:Tags. They listed as Active, and data is being collected.