Commons:Village pump/Proposals/Archive/2019/02

From Wikimedia Commons, the free media repository
Jump to navigation Jump to search

Wikidata RfC related to the inclusion of Wikimedia Commons categories

Everyone is invited to give their opinion whether or not Wikidata should allow for the inclusion of items with only a Wikimedia Commons category, this request for comment was started to help the Structured Data on Wikimedia Commons programme and could be found on Wikidata at "Wikidata:Wikidata:Requests for comment/Allow for Wikidata items to be created that only link to a single Wikimedia Commons category (Wikidata notability discussion)". Note that this request for comment is on Wikidata so following that link will take you off Wikimedia Commons (as if this website is a wikidrug ). --Donald Trung 『徵國單』 (No Fake News 💬) (WikiProject Numismatics 💴) (Articles 📚) 19:23, 12 February 2019 (UTC)

Speedy revision deletion tag for overwritten files

CHANGES IMPLEMENTED:

{{Overwritten revdel}} and Category:Overwritten files requiring revision deletion were created. LX (talk, contribs) 17:56, 24 February 2019 (UTC)

The following discussion is closed. Please do not modify it. Subsequent comments should be made on the appropriate discussion page. No further edits should be made to this discussion.

There is currently no standard way to ask administrators to delete a specific revision of a file (revision deletion) that's been uploaded in violation of Commons:Overwriting existing files. Such revisions are typically copyright violations (and even if they're the overwriter's own work, there is normally no licensing statement for the specific revision), so history splitting is hardly ever a viable option.

One way to request deletion is to write a message on the administrators' noticeboard. That gets the job done, but starting a section on a discussion board that's on a lot of people's watchlists for routine housekeeping isn't exactly a streamlined process. Starting a deletion discussion is also possible, but is even more process heavy and too slow to be fit for this purpose. Another way to do it is {{speedy}} with a custom message, but with overworked administrators, that can lead to more than just individual revisions being deleted. It would be better to have a more specific tag.

I believe that we should have a tag similar to {{Non-free frame revdel}} for this purpose. We could call it something like {{Overwritten revdel}}. It should put files into a separate subcategory of Category:Candidates for revision deletion, e.g. Category:Overwritten files requiring revision deletion. The tag should be limited to cases where the revision to be deleted is completely different from the original revision. Furthermore, the revision to be deleted should either be a copyright violation or lack the information necessary to enable history splitting. LX (talk, contribs) 19:29, 5 February 2019 (UTC)

Votes (overwritten revdel)

 Support as per above discussion. --Yann (talk) 07:58, 6 February 2019 (UTC)

Discussion (overwritten revdel)

 Comment The maintenance category already exists. I even put some files in it half a year ago. We don't need a tag (which is only more cumbersome to remove), someone just needs to watch that category. - Alexis Jazz ping plz 19:36, 5 February 2019 (UTC)
I don't oppose the general idea, but I don't think a tag is needed. Administrators just need to made aware of the category. - Alexis Jazz ping plz 13:19, 6 February 2019 (UTC)
Sometimes you need to explain the reason of revdel. Tags can be useful in this case. 4nn1l2 (talk) 13:28, 6 February 2019 (UTC)
For the cases that need that, I think a message could be left in the edit comment or upload comment. But I won't oppose a tag. - Alexis Jazz ping plz 15:21, 6 February 2019 (UTC)
Comment - I would support this - I've always been surprised there's no sort of policy on this, Me and Alexis have both photoshopped a few images here together (I believe 2?) and only 1 was revdelled which was only because I asked an admin to do it otherwise it wouldn't have been, Anyway I would support this. –Davey2010Talk 20:30, 5 February 2019 (UTC)
By the way, Media with unacceptable data in old versions awaits some reasonable procedure for some 1½ years. Incnis Mrsi (talk) 20:34, 5 February 2019 (UTC)
Fact of the matter is that if it is not listed on Commons:Admin backlog it is never going to be looked at. I doubt many admins even know those categories exist. --Majora (talk) 04:29, 6 February 2019 (UTC)
 Comment I would also support this.   — Jeff G. please ping or talk to me 07:15, 6 February 2019 (UTC)

The discussion above is closed. Please do not modify it. Subsequent comments should be made on the appropriate discussion page. No further edits should be made to this discussion.

Whitelist for Flickr accounts belonging to federal agencies of the United States

As discussed here, Is there a way we can whitelist Flickr accounts belonging to U.S. federal agencies? It seems many of these accounts are defaulted to "All Rights Reserved" despite U.S. copyright law.--TriiipleThreat (talk) 22:50, 12 February 2019 (UTC)

That would be a question for Zhuyifei1999 who maintains the Flickr checking bot. --Majora (talk) 22:56, 12 February 2019 (UTC)
I could add that if there is consensus that the flickr accounts are entirely uploading image from their employees. --Zhuyifei1999 (talk) 23:52, 12 February 2019 (UTC)
I would  Support such a whitelist.   — Jeff G. please ping or talk to me 23:00, 12 February 2019 (UTC)
FBI top secret info

I also found that this photo was actually taken in area 51. I knew they weren't from here. - Alexis Jazz ping plz 15:57, 13 February 2019 (UTC)

  • If the file is Creative Commons on Flickr, leave that license and review that license. Insert the appropriate PD template (if it's not there yet) in addition to it. The Creative Commons license can be useful in some jurisdictions and might be needed if we find out that for whatever reason PD doesn't apply.
  • If the file is "all rights reserved" and the uploader claims PD-US-expired, PD-old-70 or a variant, tag for human review.
  • If the file is "public domain mark" and the uploader claims PD-US-expired, PD-old-70 or a variant, do not insert any other license and proceed as usual.
  • If the file is "all rights reserved" or "public domain mark", insert the appropriate PD template (if it's not there yet) and remove any other license. Also remove any less precise (like the generic {{PD-USGov}}) templates.
And omit the self-destruct sequence. The cartoons try to teach you otherwise, but it is really not essential. - Alexis Jazz ping plz 04:34, 21 February 2019 (UTC)
Thanks. That's more complexities than I had expected. Will do this weekend. And self-destruct was just a joke to give some examples to what 'undefined behavior' means :) --Zhuyifei1999 (talk) 06:17, 21 February 2019 (UTC)
I don't think I have time to implement all these myself in these few weeks (sorry, too much IRL stuffs going on). Code is here; patches welcome. --Zhuyifei1999 (talk) 22:56, 24 February 2019 (UTC)

@Zhuyifei1999: It doesn’t have to be that complicated just unconditionally pass the file and if it doesn’t have a tag just add a PD tag, if it already has a liscense other than PD then add the PD tag anyway. Skip the ones that already have a PD tag. The bot can always be tweaked later.—TriiipleThreat (talk) 12:13, 25 February 2019 (UTC)

Again, patches welcome. I currently is unable to allocate time for this in these few weeks. --Zhuyifei1999 (talk) 18:37, 25 February 2019 (UTC)

Create a separate list for Flickr accounts that require an additional human review

Note: Zhuyifei1999 (who maintains FlickreviewR) has stated this is technically possible.

Flickr is a rich source of images for us. Because some Flickr users are known for license laundering or have other issues that can't be overcome, Commons created Commons:Questionable Flickr images. Once on this list, tools downright refuse to upload anything from these photographers. And images get deleted blindly because authors are on the list, regardless of the actual reason they were listed.

Unfortunately, this list grew to also include many accounts that accidentally uploaded something that doesn't adhere to our strict rules. Or made some mistakes while also having good, properly licensed own work. For an example, look at https://www.flickr.com/photos/numberstumper/291229304/. A perfectly usable and correctly licensed photo of the town hall in Bradford. Can't upload it with any tool, because Paul Stumpr has also taken photos of magazines and a cardboard Fred Flintstone. Another example: Russian Orthodox Church in Antwerpen. This photo was deleted because the account is on the "bad authors" list. The account is on the bad authors list because of this incident and while images from this account appear to require a human review, photos that were taken with a Canon EOS 5D Mark II or iPhone 4 are fine.

If this proposal passes, tools should ignore the new list or merely warn the uploader without blocking the upload. FlickreviewR should tag the images for human review while also still providing its own review. (in case the license changes before the human reviewer gets to it) For the accounts that are placed on this list, the admin that adds the account should also provide a description of what the license reviewer needs to look out for with that particular Flickr account. - Alexis Jazz ping plz 23:38, 9 February 2019 (UTC)

There are now two proposals. I added the second one later. The second proposal is a subset of the first. If the first proposal passes, the second would be redundant. If only the second proposal passes, any changes to how tools handle accounts on the new list will be determined by future discussions/proposals.

Create a list for Flickr accounts that require an additional human review: votes

Create a list, independent from Commons:Questionable Flickr images for Flickr accounts that do have properly licensed images that are within our scope but require a human review.

  • I would  Support this as blacklisting should be a last resort and not a first resort (like Wikipedia's already do with "spam" websites which are useful and educational but get blacklisted because of one assumption of bad faith or incident. A lot of Flickr accounts have thousands of good images with hundreds of bad ones, most importers won't check the blacklist first and excluding good content over trivial reasons that could be handled by a handful of volunteers should not be an option. Post Script: You could copy this comment and any other comment I make to the actual proposals village pump when you're going to present your ideas as I'd rather not "vote" twice. --Donald Trung 『徵國單』 (No Fake News 💬) (WikiProject Numismatics 💴) (Articles 📚) 09:46, 10 February 2019 (UTC)
Donald Trung's vote copied from User:Alexis Jazz/Proposal incubator per his request. - Alexis Jazz ping plz 16:18, 15 February 2019 (UTC)

Create a list for Flickr accounts that are not all bad (subset of the first proposal): votes

Create a list, independent from Commons:Questionable Flickr images for Flickr accounts that do have properly licensed images that are within our scope but can't be blindly reviewed by the bot because some of their Flickr uploads are problematic. This will lay some groundwork to handle those users differently in the future and better inform license reviewers. If, how and for which user group this list could be used by tools is to be determined by future discussions/proposals. Pinging @Clindberg, Donald Trung, Abzeronow, Jeff G., BevinKacon and Natuur12: the first proposal isn't void, but here's an alternative to implement the second list without deciding yet what to do with it. - Alexis Jazz ping plz 04:53, 21 February 2019 (UTC)

Create a list for Flickr accounts that require an additional human review: discussion

Discuss details for this proposal here.

will just create another backlog, when you don't even need tools to upload from Flickr. If an image is that valuable, it can be uploaded manually. See backlog Category:License review needed.

A fair chunk of files from Flickr that need a human license review is exactly because they were uploaded manually. The full size doesn't get uploaded, the source not linked properly, and that's where the timewasting starts. With this, it's even feasible to show license reviewers on the file page what they need to look out for. They may often not even have to go to Flickr, making for relatively easy license reviews. Also, accounts that are not blacklist-worthy now could be on the new list. For example, Flickr accounts that are known to often share photos of sculptures in non-FoP countries.

Will also create more admin deletion work, when users abuse it.

I think that's a very dim view of Commons users. They're not out to get you. - Alexis Jazz ping plz 21:50, 17 February 2019 (UTC)

Yes I think that is a lack of COM:AGF. Users may occasionally err on copyright, but most do not deliberately upload copyvios. Most Flickr users are not Marco Verch, creating a list that requires human review would help get the good photos on Commons and keep the bad ones out. Abzeronow (talk) 17:41, 18 February 2019 (UTC)
Flickr abuse is a view held by many. Commons:Village_pump/Proposals/Archive/2018/08#Restrict_usage_of_Flickr2Commons.--BevinKacon (talk) 20:09, 18 February 2019 (UTC)
I see I didn't mention it here, but Marco Verch is an interesting example. I've seen several photos from Marco Verch pass the human license review, because the reviewers assumed that he was on the blacklist for some copyvio/DW/whatever. They didn't know Marco Verch is a bastard. Having separate lists would also help with that after the current blacklist would be sorted out. - Alexis Jazz ping plz 20:42, 18 February 2019 (UTC)
That discussion is more about how some users unknowingly upload bad files through a useful tool. Yes, that tool doesn't detect dupes in batch uploads(but it will in individual uploads). Which still is a small % of all Flickr uploads and the vast majority of Flickr users act in good faith. Some of the unsuitable for Commons files also have to do with lack of FOP, a matter that could be helped by such a list that Alexis Jazz proposes. It also would help keep the real bad apples(Marco Verch) on the blacklist and make that meaningful. Abzeronow (talk) 05:18, 19 February 2019 (UTC)
There may be options short of requiring review. Maybe just collect such accounts in categories (or be able to run a report), and at some point do some spot checks to see if there is a substantial amount of problem images coming through. Or maybe there would be a way to get a special warning message on the Flickr import interfaces to ask uploaders to double-check that the images appear to originate at that Flickr account, as there have been problems in the past, and see if that helps the rate any. Carl Lindberg (talk) 05:34, 19 February 2019 (UTC)
@Clindberg: I don't disagree, but I believe the license reviewing process for these can be streamlined to the point those reviews will take quite little effort. But either way, categorizing and/or adding a warning to the Flickr import interface will require a separate list for such accounts which is what this proposal attempts to realize. - Alexis Jazz ping plz 11:24, 19 February 2019 (UTC)
@Natuur12, Clindberg, and BevinKacon: uploading images manually is a pain (especially if you want to upload more than two or three of them), users who perform manual uploads often make mistakes (copypaste the wrong source link, not upload the full size image, enter the wrong license) and when such an image does make it onto Commons, a license reviewer will sometimes erroneously give it a review because they don't know why anyone is on the blacklist or decline it "because blacklist". I think it's quite essential to differentiate between accounts from which we want absolutely no content and those from which we can accept some content. As well as provide an easy-to-see reason why an account is on a list for license reviewers. So what exactly is it that you oppose, and how could that be resolved? To limit overriding the blacklist to, say, autopatrolled users, is more complicated from a technical point of view. (but perhaps not impossible) Or do you oppose the very idea of having two lists instead of lumping everything together on a single list? - Alexis Jazz ping plz 20:18, 20 February 2019 (UTC)
No, I don't oppose a second list; it gives us some options other than a total blacklist which could be very useful. The proposal though was requiring a human review for this second list; I was saying there could be ways to use the second list without creating any additional human reviews, at least for now. But yes, if it turns out there are still problems with uploads from these users, especially after giving uploaders a special warning for them, we could either merge with the black list or simply remove the automated Flickr review and mark them as still needing human review, in the existing queue. We could change how we deal with this list over time. Carl Lindberg (talk) 23:30, 20 February 2019 (UTC)

Add rights from the autopatrollers user group to the rollbackers user group

We currently have 643 rollbackers. When you remove the autopatrollers, patrollers and license reviewers from that list (who are all autopatrolled), 25 oddballs (3.9%) remain. Among those are Rodhullandemu who should have this right removed this moment, JeffGBot which is a bot (so also autopatrolled) and NNW who now uses a new account, NordNordWest. (which is autopatrolled) The 22 users (3.4%) that remain haven't contributed in years (or a few dozen edits at most) so were never really considered for autopatrol.

Basically, if you're a rollbacker, you're autopatrolled. So why not merge rights? For clarity: this proposal is supposed to make the "autopatrolled" user group redundant for rollbackers.

Add rights from the autopatrollers user group to the rollbackers user group: votes

Do we want to bot-copy descriptions to captions?

Structured Data on Commons released its first feature last month: media files can now have captions in different languages. Captions are quite close to descriptions, except that they are structured by language. It is technically possible to bot-copy descriptions to captions (e.g., [1], [2] were copied using pywikibot). There is a potential copyright issue here, in that captions are CC-0, which perhaps could be avoided by only copying short descriptions (say, under 200 characters) where they are sufficiently short/simple that they can't be copyrighted (as per WMF legal). Do we want to do that for all files, or are there other concerns that need addressing? Thanks. Mike Peel (talk) 21:15, 8 February 2019 (UTC)

Wait...why...are captions licensed under something different than almost the entire rest of almost every other project? (This might be a stupid question, I'll admit.) GMGtalk 21:20, 8 February 2019 (UTC)
@Keegan (WMF) can probably answer this better than me, but in general I understand it's because facts/short captions can't be copyrighted along the same lines as {{PD-simple}}/{{PD-ineligible}}. It matches what Wikidata uses, and there's some rationale on Wikidata. Thanks. Mike Peel (talk) 21:37, 8 February 2019 (UTC)
Indeed, captions are to be CC0 in order to work with the licensing for the rest of the structured data project, which is based on Wikidata's CC0. Captions are the Commons equivalent of a Wikidata label, meant to be pulled from the API with other data from other structured statements and fields once they are available. More information about why the database is CC0 is available in Mike's link. Keegan (WMF) (talk) 22:36, 8 February 2019 (UTC)
Ah. Stupid question confirmed. I've been poking around WD for a little while now and I honestly hadn't realized it was licensed differently. GMGtalk 23:06, 8 February 2019 (UTC)
Captions are stored on Wikidata and Wikidata is CC0 by design. So they have a different license. --Majora (talk) 21:38, 8 February 2019 (UTC)
Hello Keegan (WMF). I think this statement is misguiding. Nothing legally prevent to mix CC0 data, granted this license was legally applied in the first place, with data covered with any other license. Nothing prevent to have an API that also provide the license under which each label is covered, and nothing prevent to store label with different licenses into the same database. The only possible issue is when someone want to mix content from data covered under incompatible license. As most works of Wikimedia are under CC-by-sa-3.0 unported, this means allowing to provide specific license for each label would end up in no conflict at all for most cases that will be useful for wikimedia community and external partners that are not wanting to avoid reciprocity of rights on derived works. Cheers, Psychoslave (talk) 09:16, 6 March 2019 (UTC)
No. WMF legal have given no guarantee of immunity against claims of damage from breaking moral rights. The idea that 200 characters cannot have a copyright holds no water. -- (talk) 21:34, 8 February 2019 (UTC)
No thank you. I'm not a big fan of this caption system to begin with. I'm not a big fan of storing data where it can't be controlled directly by the community that deals with it. --Majora (talk) 21:38, 8 February 2019 (UTC)
@Majora: The data is held on Commons (in the commons wikibase installation), not on Wikidata. Thanks. Mike Peel (talk) 21:46, 8 February 2019 (UTC)
Ah. Well they should probably make that more clear to people who haven't been following along with its development since I was under the impression it was stored elsewhere. I'm more neutral on this idea then. --Majora (talk) 21:48, 8 February 2019 (UTC)
@Majora: I have added a section in the FAQ answering this − hope that helps: Commons:File_captions#Where_are_captions_stored?. Jean-Fred (talk) 13:55, 11 February 2019 (UTC)

 Weak support Only if limited to short descriptions. Some descriptions are quite long and detailed, and there's nothing wrong with that - there is a lot of valuable information available for some images. But the captions have a different purpose as the "Commons equivalent of a Wikidata label", as Keegan wrote, the "one-line explanation of what this file represents", as the caption description says. So, even if not considering possible licensing issues, long descriptions should not be copied to captions. With a rule like "only the first sentence (text before the first full stop) and only if this sentence is not longer than 200 characters", I think it might be a possible approach. Gestumblindi (talk) 23:37, 8 February 2019 (UTC)

I'm just not sure this would work. For example, this caption is comparatively meaningless. This caption is equally so if only the first sentence is provided. GMGtalk 00:52, 9 February 2019 (UTC)
@GreenMeansGo: The first one would need human clean-up (currently in the description, after this proposal both in the description and the caption). The second one is 614 characters long, so the limit of 200 characters I proposed above would mean it wouldn't be copied over. (We could partially copy descriptons, but as you say that probably doesn't help.) Thanks. Mike Peel (talk) 11:08, 9 February 2019 (UTC)
  • No. Lots of crap in descriptions, liabled to just reproduce the crap. Arbitrary example: File:Terrible Trail, The Meek Cutoff - Flickr - brewbooks.jpg - Jmabel ! talk 00:58, 9 February 2019 (UTC)
    @Jmabel: So how do we clean the descriptions up? The example you've given is ~730 characters long, so over the 200 character cut-off I was suggesting. Thanks. Mike Peel (talk) 11:08, 9 February 2019 (UTC)
    I didn't spend a bunch of time searching for a precise example before posting, I was just trying to illustrate the sort of thing I meant, but here:
    The only way to clean this up is for someone to do the hard work of cleaning this up. I do that a lot on photos I think are of historical interest or likely to be used; otherwise, frankly, when dealing with other people's photos I usually stick to cleaning up categories and usually don't go near the descriptions unless they are actively inaccurate. (Would normally have fixed that "Elliot Bay" one, but I'm leaving it right now as part of making my point.) - Jmabel ! talk 16:11, 9 February 2019 (UTC)
  •  Comment The idea is not bad in itself, but note that there is a kind of inadequacy between the fact to notice to the uploader about the license of the structured content, and the fact to copy the content that they did not deliberately put in the structured content area. There is something a little hypocritical in that, it is like we say : "be aware that the structure datas are under CCO, but anyway, and without you agreement, we will copy your CC-BY-SA 3.0 contribution to this CC0 field." Christian Ferrer (talk) 05:48, 9 February 2019 (UTC)
+ I have serious doubt that all the descriptions are simple enough to be exempt from copyright protection, example File:Gashnian 20170306 18.jpg, I don't think we can move this text that is originally licensed under BY to CCO. Christian Ferrer (talk) 06:09, 9 February 2019 (UTC)
@Christian Ferrer: That caption is 646 characters long, so the maximum length of 200 characters I suggested at the start would exclude that one from being copied. Thanks. Mike Peel (talk) 11:08, 9 February 2019 (UTC)
This text could be limited to the first 200 characters, it wouldn't be less copyrightable IMO. Christian Ferrer (talk) 11:18, 9 February 2019 (UTC)
@Christian Ferrer: My suggestion was that if the description is over 200 characters, then it wouldn't be copied at all. I wasn't suggesting only using parts of descriptions. Thanks. Mike Peel (talk) 11:24, 9 February 2019 (UTC)
@Mike Peel: yes I understood, I say only that a 200 long characters text may also have a copyright. Example my own comment that you are currently reading and that is under CC-BY-SA 3.0. Who can say that this comment is CC0? Christian Ferrer (talk) 12:06, 9 February 2019 (UTC)
Though it's true that the persons who are writting descriptions are certainly less creative than I'm trying to be right now.... :) Note that I do not oppose, I simply speak. And it's true that a lot of descriptions are very simple, even too much simple sometimes, lacking of useful infos. Christian Ferrer (talk) 13:54, 9 February 2019 (UTC)
  •  Oppose I would like to have a separate description in the structured data, where the old descriptions can be copied. If the Captions should be bot-filled, then with the title of the image. --GPSLeo (talk) 10:05, 9 February 2019 (UTC)
    @GPSLeo: You mean at Commons talk:Structured data? Feel free to start such a discussion. Thanks. Mike Peel (talk) 11:08, 9 February 2019 (UTC)
  •  Oppose A note of nit picking but unfortunate reality here, in follow up to my earlier "no". If I see any of my batch upload projects where this is being done, I plan to mass revert on copyright grounds per COM:L, unless the text is specifically and unambiguously released as CC0 at source. This includes metadata such as titles, descriptions or captions. Anyone populating CC0 captions has the burden of proof to ensure that the text has been correctly released. The claim at the start of this thread that this statement may make copying text of certain lengths into captions okay, is misleading. Keegan (WMF) (talk · contribs) does not write for WMF Legal and does not claim to be a lawyer or a legal academic, so please avoid quoting what they write as if it has legal weight for the WMF, or is a meaningful legal opinion that unpaid volunteers could use to protect themselves from future claims of damages. If anyone wishes the WMF Legal department to publish an opinion that could be taken into a courtroom, then ask them for one in writing. -- (talk) 14:42, 9 February 2019 (UTC)
  •  Oppose concerns (raised above). --Steinsplitter (talk) 14:44, 9 February 2019 (UTC)
  •  Question @Mike Peel: is it possible to insert a caption in wikitext? Like {{Information|Description={{Caption:en}} or whatever. - Alexis Jazz ping plz 16:48, 9 February 2019 (UTC)
  • Regretful  Oppose. I don't think it's legal. It's true that common phrases and similes are not covered by copyright, as they are seen as the building blocks of the language, capable of being reused and repurposed in many different transformative ways. Also in cases where there's a substantial chance there was coincidental independent creation. But, regettably, neither of those grounds apply here, for a systematic programme of extraction and licence-washing of individuals' contributions that are specific to particular contexts, being taken for the exact same purpose and exact same context; for some contributors cumulatively amounting to tens of thousands or even hundreds of thousands of words. One cannot argue that a taking on such a scale and so systematic is just incidental. If the descriptions are licensed CC-BY-SA one can't just wave that away and claim that they can be reissued CC0. Therefore, regrettably, IMO the existing descriptions cannot be reused unless they have been licensed CC0, or are directly derived from sources that are PD or CC0. Jheald (talk) 22:26, 9 February 2019 (UTC)
@Jheald: You entered 5 tildes instead of 4. - Alexis Jazz ping plz 22:32, 9 February 2019 (UTC)
Thx. Fixed. Jheald (talk) 22:42, 9 February 2019 (UTC)
  • Mike Peel, I'm confused how the captions can be CC0 if created automatically from a non-CC0 licensed work or a public domain work. The CC 0 legal code requires the "affirmer" to "voluntarily elect" to apply CC0 terms to "his or her Copyright and Related Rights in the Work". If the work is already in the public domain (for example, deemed not eligible for copyright) then there is nothing for which CC0 applies and the PD mark would be more appropriate. If one's work is under copyright, nobody else can put the work under CC0 terms on your behalf. -- Colin (talk) 12:35, 10 February 2019 (UTC)
  •  Oppose I'm too worried about the copyright. Even for short descriptions. A single tweet would rarely be copyrightable. But if you collect thousands of tweets from one person, it becomes another story. @Mike Peel: I also find it very worrying that when users enter captions now, they are not informed about the CC0. This means all the captions that have been entered so far are licensed as BY-SA 3.0, not CC0. - Alexis Jazz ping plz 17:24, 10 February 2019 (UTC)
  •  Support However lets do it after captions are properly marked as CC0. I would also start with short captions with clearly identifiable language. Short text (less than 5-10) words likely falls under {{PD-text}} and I do not trust templates like {{En}} to be correct. --Jarekt (talk) 13:35, 12 February 2019 (UTC)
True, though captions are also full of random junk, like the example of adding "bollocks" to portraits of politicians. Nothing to stop it happening. -- (talk) 19:57, 1 April 2019 (UTC)

Sample set

@Mike Peel: Can I suggest processing a batch of, say, 1,000 randomy-chosen images, and writing the proposed captions to a gallery page(s) in user space? Then we can all review them, and look for anti-patterns to avoid. You'll need to strip wikicode, for instance - or skip anything that uses it. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 19:15, 9 February 2019 (UTC)

@Pigsonthewing: See User:Mike Peel/Captions for some examples. They are randomly selected, and the code only looks for labels marked with {{En}} and doesn't strip out any wikicode or HTML yet (that work would be done ahead of a bot proposal). Thanks. Mike Peel (talk) 19:55, 9 February 2019 (UTC)
Good idea, interesting. Almost should be ok, indeed, but a few cases are more debatable such as File:Distant County Hall - geograph.org.uk - 896433.jpg and the other files coming from Geograph project. @Clindberg: hello, have an opinion about the fact to apply a CC0 license to descriptions not under CC0 at their origin? Christian Ferrer (talk) 20:42, 9 February 2019 (UTC)
@Christian Ferrer: I don't think there is any way to safely apply CC0 to licensed description text. Short phrases are not copyrightable, but entire sentences could be, depending on the wording. If it's just bare factual information like date and place and subject there is probably no copyright, but I would think it would be possible for some descriptions (even short ones) to be copyrightable. Carl Lindberg (talk) 21:28, 10 February 2019 (UTC)
@Mike Peel: Thank you. Most look good. On images like File:Ishiguro Koreyoshi - Kozuka with Chrysanthemums - Walters 5112783.jpg and File:Zoophytes- 1. 2. Fongie Actinie. (Nouvelle-Irlande.); 3. 4. Fongie à gros tubrcules. (Vanikoro.); 5.- 9. Tubinolie rouge. (Nouv-Zélande.) (NYPL b13624459-1267199).jpg, the |title= should be captured, before, or instead of the description. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 20:51, 9 February 2019 (UTC)
It's probably worth skipping audio files, too. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 21:33, 9 February 2019 (UTC)
There are some line like ''Abstract/medium:'' 1 negative : glass ; 5 x 7 in. or smaller , with no filename. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 21:51, 9 February 2019 (UTC)
File:Beach handball at the 2018 Summer Youth Olympics – Girls Preliminary Round – RUS-ASA 29 (cropped 2).jpg didn't go well either. (caption: "lang=en") - Alexis Jazz ping plz 21:04, 9 February 2019 (UTC)
@Pigsonthewing and Alexis Jazz: Yes, there are improvements that can be made here. This was code that I wrote in 5 minutes, and as I said it just checks for {{En}} and the character limit I suggested at the start of this discussion. You can see the code at [3]. I can improve it if needed to do what you're suggesting and more - but I am only going to do that for code that I can then actually use to make edits here. Thanks. Mike Peel (talk) 21:45, 9 February 2019 (UTC)
@Mike Peel: I'd say: work on the caption-wikitext-inclusion thing first. There was opposition against captions in general from the start because nobody wants duplicated data and having to fix typos in two places. You have to learn how to walk before you can run. - Alexis Jazz ping plz 21:54, 9 February 2019 (UTC)
@Alexis Jazz: It's up to the structured data on commons team to sort out the ParserFunction/Lua access to the captions, I can't do anything to help with that. On the other hand, we have captions now, so we can start to use them, and that's a good step forward. Copying the description to the captions is a start, then we can figure out including them in {{Information}}, {{Wikidata Infobox}}, and elsewhere, as the next step. Thanks. Mike Peel (talk) 22:06, 9 February 2019 (UTC)
@Mike Peel: I disagree about the order. Get caption-wikitext-inclusion working first so duplicate descriptions/captions can be avoided. If you were to start copying now, it'll just create a mess when caption-wikitext-inclusion becomes available but the descriptions/captions are no longer in sync because users have edited one but not the other. - Alexis Jazz ping plz 22:22, 9 February 2019 (UTC)
Do you have any evidence that doing that is planned? Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 23:32, 9 February 2019 (UTC)
Users edit descriptions all the time. You're saying they will stop doing that if descriptions are copied to captions to avoid them becoming desynchronized? - Alexis Jazz ping plz 17:20, 10 February 2019 (UTC)
No; I'm asking you what evidence you have, that "Get caption-wikitext-inclusion working " is planned. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 15:51, 13 February 2019 (UTC)
  • Within the example set are seven files from the Portable Antiquities Scheme, we are current approaching 500,000 files from there. All the metadata is CC-BY-3.0, and so none may be reused as CC0. -- (talk) 22:43, 9 February 2019 (UTC)
  • Within the set are nine files from the Fleuron project, the batch made for an interesting image processing experiment of 250,000 files. The metadata relies on a Gale database, where the database rights are reserved. Clearly, systematically extracting any part of data and republishing as CC0 would break the expectation of limiting reuse, effectively by creating a new CC0 database with no attribution being preserved back to Gale. -- (talk) 22:57, 9 February 2019 (UTC)
  • Within the set are a significant number of files from Flickr, where the sources are not released as CC0. The licenses at source, such as the frequent default of CC-BY-2.0, must be presumed to apply to the metadata including the given titles and descriptions. Recasting these as CC0 cannot be supported as being compliant with their original releases. -- (talk) 23:10, 9 February 2019 (UTC)
    • I can't reply to any of these as the comments are too vague to let me investigate them, plus I am not a copyright lawyer. Mike Peel (talk) 23:16, 9 February 2019 (UTC)
    • I remain to be convinced that a simple, factual description like "A silver hammered penny of Edward the Confessor, minted in Southwark between 1042 and 1044. Moneyer: Wulfwine." can be copyrighted. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 23:30, 9 February 2019 (UTC)
      • Looking at its source page at the BM, I suspect that such a caption reflects exactly the sort of knowledge and judgment and choice of expression that represent scholarship that can be protected by copyright, especially if the proposition is to similarly take 10,000 more such descriptions.
        None of the information given "of Edward the Confessor", "minted at Southwark", "between 1042 and 1044", "moneyer: Wulfwine" is obvious just from looking at the image. Even the term "silver hammered penny" represents a judgment, and a choice of description.
        I think you are on quite treacherous ground, to assert that there is no copyright here. Jheald (talk) 00:20, 10 February 2019 (UTC)
  •  Comment Look at Text & Data Mining; This is not exactly the same thing but somewhat a bit similar topic in the extend that we talk about structured datas. Christian Ferrer (talk) 06:00, 10 February 2019 (UTC)
  • Note that some captions have already been manually copied from the descriptions by users who are not the descriptions authors. Christian Ferrer (talk) 07:52, 10 February 2019 (UTC)
    • Yeah, this is a good point Christian and gets at the deeper issue here. Even if a bot wouldn't be right for this task, we could easily semi-automate it to allow for rapid-fire manual checks for meaninglessness or other problems. But the real issue is that even if we do neither, it is deeply intuitive to simply copy/paste the existing descriptions, and will likely be done on a large scale, even if it is done piecemeal and manually across the entire project.
      Now I don't claim to be the most experienced user in the history of Commons, but I've been generally aware and even slightly involved in the ongoing discussion about structured data. I had no idea whatsoever that captions were licensed differently than descriptions until this discussion. The reason for that is probably that there is no indication whatsoever either in the upload wizard or on the file pages that these are licensed differently. That's a problem, and on such a copyright savvy project as Commons, it's a little surprising that we've implemented a system where users are shadow-licensing their content with no notification or explanation. The implication of that is that these contributions aren't actually licensed under CC0, because no notification means no license. GMGtalk 13:50, 10 February 2019 (UTC)
      • "WE" have not implemented this. We, the community, have not agreed anything about captions. The rationale that some discussion on a Phabricator task can replace a Commons community consensus is bizarre and simply a convenient fiction to justify a WMF desired change. The problem with literally mass ripping metadata from Wikimedia Commons and pretending that it has no copyright, has always been a foundational and extremely obvious logical conflict with the structured data proposals. But hell, who am I, I just have opinions based on years of creating content on this project, but as I've never been paid for it, my voice can be safely marginalized. -- (talk) 13:59, 10 February 2019 (UTC)
        • Well, even if we didn't implement it, we need to fix it, because without any type of notification whatsoever, the entire enterprise is basically just copyfraud. I mean, it's possible that I'm missing something obvious here, but it seems pretty straightforward. GMGtalk 14:10, 10 February 2019 (UTC)
          • Kind of missing the point. We, the community, do not own it. We do not get to say how it works. It is not ours to "fix", so why try? Frankly apart from being annoyed, I have been given zero incentive to care about this change, or to help with using this badly thought out and badly implemented "feature". The single option I have been given is to hide it from my view rather than remove it until it might be acceptable. Somehow that has been politically spun as being positive.
            Consequently, maybe we need a legal case, or a "WMF copyfraud" bad PR incident, to get the WMF to care when we ask for a change, or we politely suggest that the WMF properly tests major changes before rolling them out on what they think is "their" project. -- (talk) 14:30, 10 February 2019 (UTC)
  • Note that we can also try to go further by steps, there is mainly two types of content here : the "own works" and the content coming from external sources, you can begin with the "own works" :
1/ send a mass message to all the "own works" uploaders (or to all users), and notice to them that we are starting to copy the descriptions to the captions for the files tagged as "own work", and that there is a license change for the text coming from the description, and that they can object if they wish, and then you will proceed to an announced date
2/ or proceed all "own works" without sending any messages, assuming that there are normally no copyright infringements in the "own works" descriptions
3/ create maintenance categories such as Category:Media with captions or/and Category:Media without captions or/and Category:Media from external sources without captions, ect, ect...

That is just ideas, I don't know if it is the good way. Christian Ferrer (talk) 18:08, 10 February 2019 (UTC)

Fix the captions licensing

As we have seen in the proposal above, file captions are licensed as CC0. However, users who enter captions are currently not informed about this. At all. They also copy descriptions to captions, but descriptions often have a different license. In general, we can assume all captions that have been entered so far by users who are familiar with Commons are licensed the same as the proposal I'm writing now and all wikitext here: Creative Commons BY-SA 3.0. (lawyers may want to debate this, but for regular Commons users this is nitpicking) Captions that were entered by users with few edits, sadly no.

The Commons community in general has limited leverage over these development decisions, but we do have some. So here is a proposal, without hurting the developers too much, as they are given a choice. The proposal is this: one of the following should be done:

1. Delete all file captions that have been entered so far from the database. Inform the user clearly that the captions they enter will be released as CC0, similar to the message above the "Publish changes" button when editing wikitext. Also, create a system that will prevent users from mass copy-pasting file descriptions to file captions. Considering the license difference, they will generally need to rewrite the description in their own words for the file caption.. at least.

alternatively

2. Delete all file captions from IP-users and users with few edits, as they may not be sufficiently aware of wikitext licensing. Change the license for the remaining file captions and future captions to Creative Commons BY-SA 3.0.

alternatively!

3. The developers disable captions for now, have WMF legal actually look at the whole thing and enable it again with permission from legal in whatever form legal deems appropriate.

So they have three options. A complicated one if they must have CC0, a more simple one if they switch to BY-SA 3.0 or they battle it out with legal. This vote is not for which option the developers should pick. This proposal is merely saying the developers have to pick one of them. - Alexis Jazz ping plz 22:48, 10 February 2019 (UTC)

Voting (fix the captions licensing)

@Kaldari: Which country's TOO should we follow, then?   — Jeff G. please ping or talk to me 12:56, 8 March 2019 (UTC)
Please write a help page explaining to a novice user precisely how to tell whether a text they are copying in to the field has potential copyright or not, and whether they are at risk of a legal claim of damages, especially if all they did was copy a line from an on-wiki CC-BY licensed description. Vague hand waiving of probably short enough text is probably not copyrightable, not sure because the WMF lawyers will not give me a statement they would be prepared to go to court on... is not an adequate answer. Your expertise here would be super, rather than relying on a majority vote of unpaid volunteers, as if that can override basic copyright law. Thanks. -- (talk) 17:11, 8 March 2019 (UTC)

Discussion (fix the captions licensing)

  •  Question Wouldn't it just be easier to modify one of the interface displays, say MediaWiki:Wikibasemediainfo-entitytermsforlanguagelistview-caption, to read Captions (Note: All captions written in this box are released under the Creative Commons CC0 1.0 Public Domain Dedication). That way what is happening is clear to everyone who sees that box? --Majora (talk) 22:54, 10 February 2019 (UTC)
    Well I boldly tried it. Unfortunately interface displays apparently can't use wikimarkup. So the small tags actually displayed and the external link did not format. Still a possibility to fiddle with one of the interface messages to make sure people know what they are doing when they use the captions box. --Majora (talk) 23:03, 10 February 2019 (UTC)
    @Majora: I'm not sure how that would look, but perhaps that could be part of the solution. But that message would also need to be translated in many languages. And it would still leave us with the captions that have already been entered. And I'm guessing some users will ignore the message and copy descriptions anyway, so something should be put in place to prevent that. Informing users about the captions license will be needed, whichever path is chosen. - Alexis Jazz ping plz 23:14, 10 February 2019 (UTC)
    Well I'm working on the beta cluster to see if I can get anything that actually looks proper. We can always use the translations of MediaWiki:Wikimedia-copyrightwarning if I can get the actual formatting correct. --Majora (talk) 23:18, 10 February 2019 (UTC)
    Don't think this is going to work, unfortunately. There just isn't enough options to change that would make this doable and the only option that would be viable doesn't want to display URLs in any readable fashion. Oh well. --Majora (talk) 23:32, 10 February 2019 (UTC)
  • @Abzeronow: the developers can pick any of the three options they like. The third option is to take it to legal. If legal says they don't have to delete anything, well, that's fine. - Alexis Jazz ping plz 23:25, 10 February 2019 (UTC)
    • This might be a dumb question, but why not a fourth option to change license of wikitext to CC-0 instead of CC-BY-SA? Abzeronow (talk) 23:30, 10 February 2019 (UTC)
      • We can't retroactively change a non-CC0 license to CC0 without the permission of the copyright holders. That would be voiding their copyright. --Majora (talk) 23:32, 10 February 2019 (UTC)
        • Also it would be quite the feat. Few wikis have changed their license. English Wikinews has (is now CC BY), and Wikidata is obviously CC0. It's not impossible, but it would only be valid for new contributions. Go figure, we haven't even updated to Creative Commons 4.0 yet. Another issue are imported descriptions, like those from Flickr. - Alexis Jazz ping plz 23:44, 10 February 2019 (UTC)
          • Obviously, Commons should live up their name and going forward change to CC-0. I guess some sort of permission and/or fair use rationale will have to suffice in the meantime though. Abzeronow (talk) 17:09, 11 February 2019 (UTC)
Here's an actual alternative to deleting current captions.

Mass-deliver an electronic letter to the talk pages of everyone who created a file caption before the Creative Commons 0 (Zero) license is launched and inform them that they can opt into releasing their file captions with the Creative Commons 0 (Zero) license or otherwise they will be deleted. This opt in system could also work for Mike Peel's proposal above for {{Own}} files. --Donald Trung 『徵國單』 (No Fake News 💬) (WikiProject Numismatics 💴) (Articles 📚) 08:55, 11 February 2019 (UTC)

Alternatively, every file caption added before a certain date could be could have a "{{Caption before March 2019}}" license or something template added to them stating that the caption is released under a different license. Nah, bad idea as these file captions can still be edited and changed and others could be added confusing re-users. --Donald Trung 『徵國單』 (No Fake News 💬) (WikiProject Numismatics 💴) (Articles 📚) 08:55, 11 February 2019 (UTC)

"Delete all file captions from IP-users and users with few edits, as they may not be sufficiently aware of wikitext licensing. Change the license for the remaining file captions and future captions to Creative Commons BY-SA 3.0." This is a very odd proposal, this applies to literally all licensing on Wikimedia Commons and it would make no sense to delete their contributions if they are going to be released under the same license as the rest of the website is. This is like saying "let's delete all Wikipedia articles by new users and IP-users because they might not be aware of what license they might be using", not everyone is Marco Verch and I highly doubt that the users with few edits and IP-users thought that they would retain full copyright © for their additions. Also this wouldn't "change" the license but retain it because as far as anyone us concerned all text (including file captions) on Wikimedia Commons is Creative Commons BY-SA 3.0. --Donald Trung 『徵國單』 (No Fake News 💬) (WikiProject Numismatics 💴) (Articles 📚) 10:15, 11 February 2019 (UTC)
 Info There is already a patch just needing to be reviewed, for adding the license information for structured data to the footer --GPSLeo (talk) 11:22, 11 February 2019 (UTC)

  •  Comment We need to do something. I'm inclined to say we should disable the entire thing for now regardless until we figure out what that is. If there's anything I have a strong opinion about it's that we should abandon the notion that we are currently dealing with captions that are licensed under CC0, because they're not. We don't currently know what the licenses actually are, but presumably some proportion of them are not freely licensed and are creative enough to qualify under copyright protection.
I've tried to think of a few scenarios of "where do we go from here", and I struggle to answer that in any way that doesn't look like manually reviewing all current captions, sending out mass messaging to obtain active CONSENT, and deleting the remainder. But when I look at that level of mess, I struggle to justify anything other than deleting the whole lot under PCP and restarting the project from scratch.
Now that's a massively crappy solution that wastes several weeks worth of work and doesn't really make anybody happy. But...I mean...giving legally appropriate notification of licensing terms is really Commons 101 stuff. In a situation where we're looking to do this structured data thing over the entire foreseeable future, maybe it's not all that bad to call it a good test run, do a thorough post mortem, and learn from our mistakes. GMGtalk 12:45, 11 February 2019 (UTC)
Of course it's a massively crappy solution. When I proposed on the Village Pump that the change was reversed, several voters said the equivalent of "oh, let's run this for a month and see how it goes before voting again". I hesitate to say "I told you so", because that sounds like I won something when actually we are all losing volunteer time, good faith and new users. -- (talk) 13:14, 11 February 2019 (UTC)
@Donald Trung: asking consent afterwards would, while ugly (but this whole operation is never going to get pretty..) be an acceptable solution. I'm not sure it'll be worth it though. For users who have entered many self-written (not copy-pasted..) captions it could be interesting. But to realize all this.. I'm afraid starting from scratch will be better. Cut our losses and get it right next time. - Alexis Jazz ping plz 16:21, 11 February 2019 (UTC)

 Oppose all three options. Go discuss things at Commons_talk:Structured_data#CC0_licensing_mockups first.

— Mike Peel (talk) 12:54, 11 February 2019 (UTC)

@Mike Peel: implementing a license notification after going live is entirely unacceptable (seriously.. why did you think that would be okay?) and doesn't do anything to resolve the license issues around captions that have already been entered. - Alexis Jazz ping plz 16:13, 11 February 2019 (UTC)

@Alexis Jazz: To be honest, I assumed that the notification was there and that I'd just accepted it at some point. My suggestion is to discuss things with the people working on this first, and then put together a proposal, not the other way around. Mike Peel (talk) 16:22, 11 February 2019 (UTC)
@Mike Peel: what those people think doesn't really matter. Believing really hard something isn't copyvio doesn't magically make it public domain. Captions without any CC0 license notification can't be live. Period. They need to fix the legal issues, then they go live. Not the other way around! - Alexis Jazz ping plz 16:29, 11 February 2019 (UTC)
Yeah, they're really two separate issues: how we fix the fact that we aren't providing notification currently (the discussion at COM:SDC), and what to do with the past contributions that were not provided notification (this discussion). Solving one, even if done quickly and smartly, doesn't really address the other at all. GMGtalk 16:32, 11 February 2019 (UTC)
Pinging @WMF Legal: , they should've been involved with this from the start so this ugly, ugly mess could've been avoided. Any caption I in my individual capacity have created on Wikimedia Commons falls (irrevocably) under the Creative Commons 0 (Zero) free license. --Donald Trung 『徵國單』 (No Fake News 💬) (WikiProject Numismatics 💴) (Articles 📚) 16:36, 11 February 2019 (UTC)
Any caption I in my individual capacity have created on Wikimedia Commons falls (irrevocably) under the site's Creative Commons Attribution-ShareAlike License, version 3.0.   — Jeff G. please ping or talk to me 14:31, 12 February 2019 (UTC)
Well Jeff, I hope you're willing to file a takedown notice over it, because from the looks of things, it seems that much of the community is content to sweep the whole thing under the rug, retroactively license them however we please, and pretend like nothing ever happened. GMGtalk
Good approach. It is very straightforward to issue the WMF with a takedown and it sets a nicely referenceable precedent. -- (talk) 16:00, 12 February 2019 (UTC)
Could @WMF Legal: react at some point here? Personally I wouldn't find that a takedown would be a great way to achieve an outcome on this situation when there is still room for taking feedback into account. I'm afraid that if no appropriate action is taken this is how it will end up at some point, which would be – to my mind – really terrible for the image of the movement in general and WMF in particular. Solutions do exist: some went already pointed above, and we can discuss further to get new ones if needed. --Psychoslave (talk) 09:34, 6 March 2019 (UTC)
A month later. That's your answer. The WMF's official position is "lalala, I can't hear you" and hope this goes away. -- (talk) 13:14, 6 April 2019 (UTC)

Acceptance of files from external sources without a license review

We have an endless number of files from external sources without a license review. Many have been around for many years, many of the links died. Thanks to Donald Trung's proposal to archive all external links, this shouldn't be a common sight anymore from now on. But we still have all those old files that were never reviewed. Deleting everything makes no sense. Pretending license reviews are not needed makes no sense. This proposal does not apply to files without a source, a source that still works or a source that can be checked through archive.org or similar means.

Proposal to keep files with a source that is no longer available and that:

was uploaded by a former or current license reviewer, administrator or OTRS-member

or

matches all of the following:
  • Has a similar or identical source to files with a license review or uploaded by the groups above (for "example.com/gallery1", "example.com/gallery2" would be similar)
  • Matches the general style of those files. Locations, subjects, time period, photography/art style, EXIF if present, watermarks. Not every single thing needs to match, but we should be convinced the work is from the same source.
  • Comes from a source with a general waiver or license (no exclusion of specific files, or only exclusion based on criteria that we can tell without having the source available, for example: "the license only applies to photos taken in Italy")
  • Uploaded by a user who is not known for copyfraud.

These files will be marked as "This file was uploaded by a trusted user" and won't require a license review. - Alexis Jazz ping plz 17:41, 13 February 2019 (UTC)

Acceptance of files from external sources without a license review: votes

Acceptance of files from external sources without a license review: discussion

Discuss details for this proposal here.

@Roy17: Category:Chama Ice Cave from farsnews was uploaded in December 2018. For that reason, I didn't include a fixed time period like 3 years because nobody can predict linkrot. - Alexis Jazz ping plz 17:20, 17 February 2019 (UTC)
  • I would  Support this as I had supported the original proposal by Fæ, if my proposal gets accepted this proposal will probably not be needed for newer files as they'll have archived external links, but as this practice hasn't been adopted yet many links have become useless. Though all these users most certainly aren't perfect, it beats deletion. I still think that we should've utilised the Internet Archive Bot years ago. --Donald Trung 『徵國單』 (No Fake News 💬) (WikiProject Numismatics 💴) (Articles 📚) 19:42, 17 February 2019 (UTC)
@Donald Trung: Good point, that's why I have an incubator. I'm going to wait for your proposal to pass. After that, this will be more of a "grandfathering" proposal, which I think is more likely to be accepted. Fæ's proposal was too specific and not really worked out. Fæ said to "vote on the principle", but that's generally not what VPP is for. - Alexis Jazz ping plz 21:02, 17 February 2019 (UTC)
  • This looks reasonable. Yann (talk) 04:21, 22 February 2019 (UTC)
  • @Snaevar: I don't even.. What do you mean? What does the Russavia/Jimmy Wales portrait thing have to do with anything? Also, this proposal doesn't overrule the Precautionary principle. If an admin would have committed copyfraud, PRP deletion could still happen. But that would have to be judged on a case-by-case basis. - Alexis Jazz ping plz 14:15, 19 October 2019 (UTC)
  • @BevinKacon: The proposal is restricted to sources with a site-wide license or license restrictions that can be checked without having the actual source available. The general style must also match. In what scenario would a (semi-) automated upload cause a problem? I'm willing to consider excluding them if there is a good reason. - Alexis Jazz ping plz 20:02, 21 October 2019 (UTC)
    No external source is perfect or is in line with Commons policies. I come across derivative works, accidental uploads and FOP problems which have falsely gone through automated license review such as Commons:Deletion requests/File:Smoke from a wildfire (44218945101).jpg. See long history of User talk:Panoramio upload bot. If the license review back log needs reducing, bots are needed, not this.--BevinKacon (talk) 14:50, 26 October 2019 (UTC)
    @BevinKacon: This isn't about the backlog or bots. This is about linkrot. Though going forward, bots would help to prevent this from happening in the first place, but that's useless for what already happened. This proposal does not override COM:PRP, COM:DW or COM:FOP. - Alexis Jazz ping plz 15:02, 26 October 2019 (UTC)
    I think it's a very lengthy set of requirements, when simple a discussion at COM:VP on each case would be enough.--BevinKacon (talk) 15:20, 26 October 2019 (UTC)
    @BevinKacon: There are many cases. COM:VP could easily be flooded, and it wouldn't solve anything because actually reviewing them is impossible because the source links died and we have no "uploaded by a trusted user" status atm. It is a somewhat lengthy set of requirements, but it does cover many cases without the need for discussing every single case. Just check all the boxes and you're done. Those cases that it doesn't cover could be discussed on COM:VP or COM:VPC. In case it wasn't clear: many files from external sources do not have a licensereview template. They are not in the LR queue. They exist in a limbo where they could be deleted if someone notices the linkrot. - Alexis Jazz ping plz 15:38, 26 October 2019 (UTC)
  • @BevinKacon, Ankry, and El Grafo: It's easy to oppose, but do you have a solution? It is a simple fact that we already have tons of files from external sources without a license review, and many of those sources are no longer available. They are merely waiting for someone to notice that at which point they can be deleted. This is currently a slow process, but could very easily be accelerated greatly, I've accidentally done so myself not too long ago. (and undid myself when I noticed what was happening) This can lead to a massive loss of files. And many of those are quite obviously free, there is certainly no "significant doubt" which COM:PRP requires. And Ankry: you've seen COM:ANU#Varaine, right? This is already reality, and this proposal changes nothing about that. - Alexis Jazz ping plz 11:42, 31 October 2019 (UTC)
Let me say the above in a more frightening way: I'm not a admin, yet (with some effort) I have the potential power to delete hundreds of thousands of pictures for silly reasons. That's bad. And others undoubtedly have the same power. That's worse. - Alexis Jazz ping plz 11:47, 31 October 2019 (UTC)
I don't have a one-size-fits-all solution, because there is none. But there must be some middle ground between handling everything on a case by case basis as it has been don in the past and just grandfathering in everything uploaded by a certain group of people. All I've see here so far are vague statements like "lots of stuff has lost its source". Has anybody actually had a look at where that stuff came from? I've had several cases where the original source seemed lost, but a replacement could be found – typically because the collection the file came from changed their system and did not redirect their old URLs (example). The sensible approach would be a more strategic one, imho. Before giving up and implementing another COM:Grandfathered old files, there should be some effort to fix what's fixable. Step back, analyze the situation and take inventory: Which sites are actually completely gone, which ones have moved, which ones could be replaced easily (keep ID, just change parts of the URL), which ones need to be fixed manually (like in my example)? --El Grafo (talk) 14:52, 31 October 2019 (UTC)
@El Grafo: I'm terrified of naming any examples, because they'll be the first to be headed for the shredder, simply by mentioning them. There is a category with over 10000 files from a news agency. Some are still available, but many are not. If I remember correctly, they should still be available at a different URL but that new URL is a lot of work to find and sometimes may be impossible. Another category has nearly 1000 files, but many of them are in use for infoboxes as they are photos of Wikipedia-notable people. The source website has vanished, only a portion was archived and another portion actually has a license review. The rest is left in limbo. Our current rules don't even allow keeping them, there is no "handling everything on a case by case basis", they will just be deleted even if a hundred people vote keep. - Alexis Jazz ping plz 00:16, 1 November 2019 (UTC)
@Kaldari: What alternative would you suggest? I put it in for uploads that match all the other criteria but still shouldn't be trusted because the uploader is manipulative. For example, Varaine. - Alexis Jazz ping plz 00:52, 8 November 2019 (UTC)
@Alexis Jazz: Crap, I misread the proposal. I thought they were OR criteria, not AND. Vote changed to support. Kaldari (talk) 17:58, 8 November 2019 (UTC)
@Ankry: I'm guessing here, but is your actual problem the line "Uploaded by a user who is not known for copyfraud." which you are interpreting as "we will delete stuff if a user commits copyfraud at some point in the future"? Because that's not what it means! Within the spirit of the proposal this can be reworded. For example, "The uploads of the user around the time period of uploading are not suspected of copyfraud". This is what it means anyway, but the current proposed wording is simpler. (if I write everything in legalese it would look like a EULA) - Alexis Jazz ping plz 14:11, 30 November 2019 (UTC)