blog.archive.org
Internet Archive Blogs
A blog from the team at archive.org
New Views Stats for the New Year
Posted on December 19, 2018 by Alexis Rossi
We began developing a new system for counting views statistics on archive.org a few years ago. We had received feedback from our partners and users asking for more fine-grained information than the old system could provide. People wanted to know where their views were coming from geographically, and how many came from people vs. robots crawling the site.
The new system will debut in January 2019. Leading up to that in the next couple of weeks you may see some inconsistencies in view counts as the new numbers roll out across tens of millions of items.  
With the new system you will see changes on both items and collections.
Item page changes
An “item” refers to a media item on archive.org – this is a page that features a book, a concert, a movie, etc. Here are some examples of items: Jerky Turkey, Emma, Gunsmoke.
On item pages the lifetime views will change to a new number.  This new number will be a sum of lifetime views from the legacy system through 2016, plus total views from the new system for the past two years (January 2017 through December 2018). Because we are replacing the 2017 and 2018 views numbers with data from the new system, the lifetime views number for that item may go down. I will explain why this occurs further down in this post where we discuss how the new system differs from the legacy system.
Collection page changes
Soon on collection page About tabs (example) you will see 2 separate views graphs. One will be for the old legacy system views through the end of 2018. The other will contain 2 years of views data from the new system (2017 and 2018). Moving forward, only the graph representing the new system will be updated with views numbers. The legacy graph will “freeze” as of December 2018.
Both graphs will be on the page for a limited time, allowing you to compare your collections stats between the old and new systems.  We will not delete the legacy system data, but it may eventually move to another page. The data from both systems is also available through the views API.
People vs. Robots
The graph for new collection views will additionally contain information about whether the views came from known “robots” or “people.”  Known robots include crawlers from major search engines, like Google or Bing. It is important for these robots to crawl your items – search engines are a major source of traffic to all of the items on archive.org. The robots number here is your assurance that search engines know your items exist and can point users to them.  The robots numbers also include access from our own internal robots (which is generally a very small portion of robots traffic).
One note about robots: they like text-based files more than audio/visual files.  This means that text items on the archive that have a publicly accessible text file (the djvu.txt file) get more views from robots than other types of media in the archive. Search engines don’t just want the metadata about the book – they want the book itself.
“People” are a little harder to define. Our confidence about whether a view comes from a person varies – in some cases we are very sure, and in others it’s more fuzzy, but in all cases we know the view is not from a known robot. So we have chosen to class these all together as “people,” as they are likely to represent access by end users.
What counts as a view in the new system
How the new system differs from the legacy system
When we designed the new system, we implemented some changes in what counted as a “view,” added some functionality, and repaired some errors that were discovered.  
In some cases, the differences above can lead to drastic changes in views numbers for both items and collections. While this may be disconcerting, we think the new system more accurately reflects end user behavior on archive.org.
If you have questions regarding the new stats system, you may email us at info@archive.org.
Posted in News, Technical | 7 Replies
7 thoughts on “New Views Stats for the New Year
Thanks, this is something people were asking for in Italy as well. Other than known robots, will all downloads still count towards “human” pageviews? I’m thinking of torrent downloads, for instance, which may show up with unusual user agents.
Thanks for the new stats and the blog post explaining them, Alexis.
I think video producers will appreciate the new stats.
I’ll forward this new explanation to producers who ask about the stats in the Community Media Archive.
Some links on how difficult it is to get good video consumption metrics and many others: https://threader.app/thread/1078003966863200256
Nemo,
Great article and right on the money. Basically any attempt to get a fine-grained assessment of b—s–t, just gives one a fine-grained view of the same b—s–t. Note, I am not saying here that we should ‘not’ try to make the ‘views’ thing better, only that we should not delude ourselves into thinking that this ‘fine-grained’ version is any ‘better’ when in reality it is not and I can give numerous reasons why that is the case. Until we have a better explanation of what those bots are doing and where they are coming from, such assurances that this ‘new’ fine-grained approach will be any better are meaningless. Of course the question then becomes “why change it for something which cannot be shown to be any better?” Just rearranging the chairs on the Titanic when the end result is the same.
Gerry
What happens when bots are used to harvest items for individuals to view off-line. This could never give a reliable measurement of human-views since the item can be viewed by multiple individuals after it is harvested. We would need much more information about who or what is accessing the items and thinking that what we have here is fine-grained does not in reality make it so.
Gerry
Happy future
Thanks for the new stats and the blog post explaining them, Alexis.
Happy New Year
Sajad,
Cheers
Comments are closed.
Recent Posts
The New Enlightenment: Discussion with Peter B. Kaufman & Catherine Stihler
“Jump Cut” is a Model Open Journal: Digitized from Microfilm & Hosted on Archive.org
The Sacred Geometry of Respect, Trust, and Equity
Access to Rare Historical Materials Makes an Ocean of Difference for Stanford Professor
Now Accepting SMS Donations
Recent Comments
Mikki on The New Enlightenment: Discussion with Peter B. Kaufman & Catherine Stihler
Adelaide Dupont on “Jump Cut” is a Model Open Journal: Digitized from Microfilm & Hosted on Archive.org
chickaDEE Magazine on “Jump Cut” is a Model Open Journal: Digitized from Microfilm & Hosted on Archive.org
Jan on The Sacred Geometry of Respect, Trust, and Equity
Janice on Now Accepting SMS Donations
Categories
78rpm
Announcements
Archive Version 2
Archive-It
Audio Archive
Books Archive
Cool items
Education Archive
Emulation
Event
Image Archive
Jobs
Lending Books
Live Music Archive
Movie Archive
Music
News
Newsletter
Open Library
Past Event
Software Archive
Technical
Television Archive
Upcoming Event
Video Archive
Wayback Machine – Web Archive
Web & Data Services
Archives
Meta
Log in
Entries feed
Comments feed
WordPress.org
Proudly powered by WordPress
Skip to contentBlogAnnouncements25th Anniversaryarchive.orgAboutEventsDevelopersDonate