Commons:Bots/Requests/Smallbot 6

Smallbot (talk · contribs)

Operator: Smallman12q (talk · contributions · Statistics · Recent activity · block log · User rights log · uploads · Global account information)

Bot's tasks for which permission is being sought: An extension of Commons:Bots/Requests/Smallbot 5. User:Bdcousineau has requested that

  1. replace {{NARA-cooperation}} with {{Gerald R. Ford Presidential Library and Museum-cooperation}} in the categories listed at User:Bdcousineau/Sandbox5 and future files uploaded by NARA.
  2. to replace the institution to Institution:Gerald R. Ford Presidential Library for those files
  3. to remove the first page of files such as this one which give location/copyright notice in Category:PDF files in English from the Gerald R. Ford Presidential Library and Museum
  4. to add a custom {{Uncategorized-GFPLM}} template (similar to {{Uncategorized-NARA}}) to contributed media lacking categories.

Automatic or manually assisted: Automatic

Edit type (e.g. Continuous, daily, one time run): one time run

Maximum edit rate (eg edits per minute): 10

Bot flag requested: (Y/N): N

Programming language(s): VBScript (Javascript, XMLHTTP, MSHTML, XMLDOM, COM). Acrobat 8 IAC/OLE Acrobat IAC & SDK. Source will be made available after cleanup as upload proceeds.

Source: User:Smallbot/source/GFPLM/Smallbot 6

Smallman12q (talk) 03:37, 18 November 2012 (UTC)[reply]

Discussion

The only one that may be controversial is the removal of the first page of certain pdfs. Bdcousineau has suggested that the first page, which provides copyright/location, is not needed as it provides a poor thumbnail in categories. Smallman12q (talk) 03:37, 18 November 2012 (UTC)[reply]

You say the bot is automatic, but for the removal of first page task you will have to supply a manual selection of PDFs as not every PDF in the category has a first page that should be removed. Otherwise this looks all straight forward. Test run? --Dschwen (talk) 05:09, 18 November 2012 (UTC)[reply]
The first page will only be removed if it contains the text "The copyright law of the United States (Title 17, United States Code) governs the making of photocopies or other reproductions of copyrighted material." It would be done so automatically.Smallman12q (talk) 21:41, 18 November 2012 (UTC)[reply]

I'm squeamish about removing the front pages. I don't think thumbnails are particularly important for pdfs, so that doesn't seem like a compelling reason to me. And if that really is a problem, maybe we can submit a feature request to ask for thumbnailing an arbitrary page (as was recently done with video as I understand it). For reusers who print out these documents, or recirculate them electronically, it makes sense to give them the front page to decide whether to use it or not. --99of9 (talk) 09:10, 18 November 2012 (UTC)[reply]

That is a good point, I agree. --Dschwen (talk) 14:40, 18 November 2012 (UTC)[reply]
That was my concern...that's why I'm seeking consensus as to whether these 1st pages giving location/copyright notice should be removed for these uploads (and possibly future NARA uploads).Smallman12q (talk) 21:41, 18 November 2012 (UTC)[reply]

Please ignore me if I'm not supposed to wiegh in, but I like 99of9's suggestion - the copyright page is NOT part of the original record, it was added recently during the digitization process. My intention is that viewers/users are not turned off/away, thinking the historic document is unavailable. Bdcousineau (talk) 00:26, 19 November 2012 (UTC)[reply]

Anyone is welcome to weigh in. Thanks for the info. Which suggestion did you like - the thumbnailing feature request? So you're ok with keeping the copyright pages in the documents? --99of9 (talk) 00:37, 19 November 2012 (UTC)[reply]

I think we should move this request forward by excluding the PDF 1st page removal task for now. The other tasks seem to be fairly uncontroversial and test runs could be performed on them. --Dschwen (talk) 18:46, 19 November 2012 (UTC)[reply]

I'm ok with moving forward by "excluding the PDF 1st page removal task" for now. I like the "thumbnailing an arbitrary page" solution- the copyright page can stay in, I just found it misleading as a first page. I didn't realize there were options other than including it/discarding it. I appreciate learning this for the future. Bdcousineau (talk) 02:01, 20 November 2012 (UTC)[reply]

If first page removing is important in any way, I suggest moving first page to last page. This would be nearly as useful as both keeping it and removing it.--Pere prlpz (talk) 15:20, 21 November 2012 (UTC)[reply]

  Tasks 1,2, and 4 are done. I'll wait for consensus as to how to best modify the pdfs if at all.Smallman12q (talk) 21:20, 21 November 2012 (UTC)[reply]
What happened here? Your bot should check if a modification is necessary before performing it. --Dschwen (talk) 21:25, 21 November 2012 (UTC)[reply]
Oops...didn't check if it appeared before. It'll appear the same, but I can fix it, if you want.Smallman12q (talk) 21:41, 21 November 2012 (UTC)[reply]
The bot visited the page twice in 20mins. Why? Anyhow I think such errors should be cleaned up. It's no big thing and,a s you said does not impact the appearance of the template, but it could potentially be confusing to editors and it just isn't right. --Dschwen (talk) 23:14, 21 November 2012 (UTC)[reply]
  Fixed-Thanks for taking an interest in the upload. You can view the source here. The reason for the double visit is that the bot went through a list of categories, one by one, loaded their pages and added in the institution tag. None of the pages originally had an institution tag, so I didn't add a check, however, several of the pages were in multiple categories. So without a check, the institution tag was added multiple times. (Note to self...add check in future.) Cheers.Smallman12q (talk) 01:35, 22 November 2012 (UTC)[reply]
Has a consensus been reached on what to to with the PDF modification task? If it hasn't by now we should probably close the request. --Dschwen (talk) 17:13, 19 February 2013 (UTC)[reply]
No. For now we upload as is. Future uploads will be from the NARA ARC catalog directly so this shouldn't be a problem. You can close.Smallman12q (talk) 21:31, 24 February 2013 (UTC)[reply]

Closed. Tasks 1,2, and 4 approved. --Dschwen (talk) 21:46, 24 February 2013 (UTC)[reply]