Digging deep in Trove: Success, challenge and uncertainty

Author:

Dr Marie-Louise Ayres

Publication date:

Tuesday, 4 September, 2012

Abstract:

Trove, Australia’s national discovery service, provides access to more than 300 million resources managed by more than 1000 Australian and international organisations, and by members of the public. Trove has been enthusiastically embraced by Australians from all walks of life, and from around the country. More than 50,000 unique visitors search the service every day, and tens of thousands of Australians engage much more actively than by just searching. Trove users correct newspaper text, add content, tags and annotations, and create online lists of Trove resources – all within the Trove community space. Trove has extended the reach of the National Library and Trove contributors far beyond our expectations. But while it is gratifying to bask in public and political acclamation, we must also ask – why is Trove a success? Is Trove as a whole a success, or just some parts of it? And what challenges do the answers to that question pose to libraries’ traditional roles, priorities and strengths.

My esteemed predecessor, Dr Warwick Cathro, spoke to this meeting in September 2010 about Trove’s user engagement features, and other Library staff have published papers on many aspects of Trove.

In this paper, I will give you a brief overview of why we developed Trove, how the public has responded to the service, whether it has met our wishes and expectations, what I consider to be the key factors in Trove’s success, and what challenges and opportunities this success offers.

The National Library has a long history of developing innovative discovery services to help users discover Australian collections held in cultural and research institutions across the country. Most of these services – dating back to the 1990s – were focused around specific formats, while some were subject based. They included the Register of Australian Archives and Manuscripts, Picture Australia, Music Australia, Australia Dancing, Australian Research Online, the Pandora Web Archive, and the Australian Newspapers service, as well as smaller and more specialised services. We also developed Libraries Australia free, opening up the entire national union catalogue for public searching.

Each of these services relied on national willingness to collaborate, leadership from the National Library itself, commitment to supporting research, and willingness to invest in the service of Australian users. At the time, each of the services was highly innovative, and some – like Picture Australia – were copied around Australia and the world. They were very popular with our users, and became indispensable parts of Australia’s research infrastructure.

However, the format and subject specialisations we developed came at a cost. Users had to search in many different places to find content of interest and each service offered slightly different search and browse functions to navigate that content. In a Google oriented world, we needed to make it much easier for our users to find and get material they wanted. We wanted our users to find material they were actively looking for, and to find material in many different formats and from many different contributors to meet their research needs. We wanted them to be curious and explore a wide world of content. Our separate discovery services were not meeting this need.

There was, of course, another more practical, problem for the National Library. We recently reviewed our long list of software applications developed in-house. As one of our clever young IT people wrote: our ‘keep-writing-new-stuff-and-never-turn-anything-off-approach is catching up with us’.

Certainly our habit of creating new specialist discovery services was catching up with us. All of these services were developed at different times, by slightly different groups of people, using different tools, software, approaches and interfaces. And of course that meant that we were attempting to maintain an ever-growing suite of specialist services in a resourcing environment that was contracting rather than growing.

With so many of our resources devoted to maintaining these services, we could not roll our service improvements in a timely way. We would come up with a good idea for improving one of our services – but we could not afford to implement that improvement in the other services. Requests for enhancement could languish for years, and some services were so old there was no point even trying. We were particularly frustrated that just as the technical world was catching up with our dream of making it possible for our users to really interact with our content and each other, we simply could not make this happen across the full suite of our discovery services.

We started to think about a new framework in late 2006 and worked on this idea – then called the Single Business Discovery Service – throughout 2007. The Library decided to fund the project in the 2008-2009 financial year, and in October 2008 the project officially commenced, with a dedicated project team in place. We launched Trove stage 1 in October 2009, continued development on stages 2 and 3 in 2010, and completed stage 4 – the final formal project stage in May 2011. In June 2012 we released Trove 5.0, and development continues, although at a slower pace than was the case in the project phase.

So where are we at the beginning of September 2012? I would like to share some key statistics with you:

Trove provides access to more than 300 million resources in all formats, contributed by well over 1000 cultural organisations and vendors. Of these, around 120 million are licensed articles available online to patrons of subscribing libraries. Of the remaining content, nearly 140 million resources are freely available online, with the remainder analogue resources that require additional steps for users to access.
Trove now averages more than 50,000 unique user visits every day, and this average is growing every month. In 2011-2012, there were nearly 15 million unique visits to the Trove website. Total page views grew by a third in the 2011-2012 financial year. Almost 10% of all use is via mobile devices. 75% of all users arrive via Google, thanks to our efforts to have our content indexed by Google on a regular basis.
Trove accounts for 65% of all National Library pageviews, and occupies three-quarters of the Library’s bandwidth. Digitised newspapers – which are by far the most popular content - occupy 60% of the Library’s 42,000 Terabyte digital collection.

So Trove is big!

Web 2.0 and social media functions were integrated into Trove from the start. This means that our users can find what they want, get a hold of it, and interact with it in many ways.

Our standout success is in newspaper correction. Scanning and Optical Character Recognition of old newspapers can only go so far, and at least 35% of the text computer generated from scanned newspaper images contains errors. By September 2012, Trove users had corrected more than 70 million lines of newspaper text. At an average of 15 seconds per corrected line, and at the lowest Library pay rate, that means that the legion of newspaper volunteers had contributed around A$12 million dollars in value, nearly 300,000 person hours, to improve newspaper content. Three individuals have corrected more than 1 million lines of text each.

They have also added more than 1.5 million tags and 42,000 comments to resources they find in Trove, and have created more than 25,000 lists on subjects as diverse as quotas on the import of margarine from the 1940s to the 1960s, climate change, Aboriginal art, mining, knitting patterns, and family or local history. Our users talk to and help each other via our Trove Forum. They have posted nearly 2,500 messages in the Forum – some of which have been read by more than 1000 individuals. We have more than 2,600 Twitter followers – five times the number of just over a year ago. In addition, Trove users have added more than 140,000 images to our Trove FlickR pool which is harvested into Trove.

These are big numbers, but of course we have to look beyond numbers to impact.

One way to look at this is ‘reach’. Trove reaches all parts of the world, and we are often surprised at the discoveries overseas researchers find in our service. For example, an American historian was astonished to find Australian newspaper articles on Lincoln in Trove, visited Australia for the first time to present a conference paper on Lincoln and Australia, visited research archives here and found more leads which have opened up a whole new area of enquiry for him. As he told us, he would never have considered Lincoln’s influence in Australia if he had not found these digitised newspaper articles – simply by Googling.

Reaching international researchers is wonderful, but naturally our main focus is to serve the Australian public.

We believe that Australians—wherever they live and whatever they do—should be able to easily discover and obtain the information they seek and to engage with rich digital content to support their research and lifelong learning.

We need to do more work to analyse the Trove user demographics to see how well we are doing with this aim. But some preliminary work has shown us that Trove has a truly national reach. Trove usage is closely aligned with actual population distribution. This graph shows the percentage of Trove use against the distribution of Australians in each of our states and territories.

We believe that non-metropolitan use is slightly lower than it should be and continue to think hard about how to reach rural and remote communities. This is a core aim of the Australian Government, and of our portfolio department, the Department of Regional Australia, Local Government, Arts and Sport. We have recently discussed ways of helping our Northern Territory library colleagues – who have ongoing contact with remote and often very disadvantaged Indigenous communities – to alert these citizens to Trove and to content that may be of interest to them. For example, we know that many Indigenous elders have been delighted to find photographs of their family members and their country in earlier times, and that in some cases elders have used these materials to connect young Indigenous people with their history and culture.

Another way to look at impact is to look at what our users tell us, each other or the wider community about the ways in which Trove has supported their research. We receive constant compliments from users – academics, private researchers, teachers and students. We hear of research that would hardly have been possible before Trove, of new lines of enquiry, of Australians becoming engaged with their history and culture in new and exciting ways. This is exactly what we hoped would happen. We get very few complaints – but lots of requests for more, more, and more content – especially digitised newspapers.

Another very important aspect of impact is dissemination of Trove data. We made an early decision that Trove should be an open data store to the maximum degree possible. Almost all Trove content is freely available using our Trove Application Programming Interface (or API). So as well as being a content aggregator – where users can search just one service instead of many – Trove is now also a disseminator.

We are already seeing Trove content turn up in the specialist websites of individuals and organisations. Digital historians have developed new tools to ‘mine’ Trove content – especially newspaper content – for many different historical purposes. Some very recent examples of API use include: an experimental search system across digitised US, New Zealand and Australian newspapers; connecting Trove articles with records in a specialist website on Australian performing arts; displaying New Zealand material held in Australian libraries in the Digital NZ service; and extracting a set of records relating to adoption in Australia. We hope to see many more creative uses of Trove data in apps, websites and mashups.

As I think you will agree from the amazing level of Trove use and user engagement, Trove is certainly a success. In 2011, the National Library won the Australian Government’s top prize for Excellence in eGovernment, reflecting the esteem in which the service is held by our peers and by our users. It is worth reflecting on the key factors leading to that success.

The first is that Australian libraries have a thirty-year history of collaborating to efficiently manage, describe and provide access to our combined collections. November 2011 marked 30 years since the launch of the Australian Bibliographic Network, later called Kinetica, and still later called Libraries Australia. 1200 member libraries cooperate to create a high quality national union catalogue. Our National Bibliographic Database contains 23 million bibliographic records and 50 million holdings. Every one of those records and holdings is pushed through for public discovery to Trove – which focuses on the needs of Australians – and to OCLCs WorldCat, with its international reach.

These 30 years of cooperation have developed a very strong professional community. Australian librarians feel a strong sense of common purpose, and very strong pride in what their collective effort has achieved for the Australian public. They are delighted that the National Bibliographic Database is the backbone of Trove, and feel a real sense of ‘ownership’ of this national flagship.

Our cousins in galleries, museums, archives and historical institutions do not have the long history of cooperation we have enjoyed in libraries, so working with and encouraging them to commit to a national access strategy has and continues to take much more work. We find that some of these institutions are very willing to be part of national discovery, some espouse willingness but do not commit the resources required to actually make it happen, some suffer from serious capacity constraints, and some see national discovery as a threat to their local institutional ‘brand’. We have a lot of experience with institutions in other sectors, and we are very persistent about encouraging them to see the benefits in contributing to Trove.

The second major success factor is that the National Library has had lots of practice! Each of the specialist services I mentioned earlier involved identifying an audience and user need, interacting extensively with groups representing those users, encouraging other cultural organisations such as art galleries, museums and archives to contribute to a national service for a greater good, and developing services on very lean budgets and in quite short timeframes.

A third major success factor was that technology finally caught up with our dreams. Web 2.0 and social media technologies enable us to offer services we wanted for our users 10 years ago. Because we had many ideas for how such technologies could be deployed, we grasped these opportunities quickly and put them to work for our users.

There is a very real feature of the Australian character which we call ‘having a go’. It is no accident that Australia was the first country to archive websites, was a leader in collection digitisation and digital collection management, has led the world in newspaper digitisation services, and has been a leader in discovery. We did not wait until we were given extra money to solve problems – indeed we have never been given any additional funding for any of these activities. We did not wait until solutions were in place or the market could meet our requirements, and we have forged ahead with our visions even where no other models existed.

And last, of course, was the people who made Trove happen. This is the development team who worked on Trove during its project phase.

Warwick Cathro’s leadership was inspirational and essential, and it was wonderful that the eGovernment award came literally in Warwick’s last week after 33 and 1/3 years spent working at the National Library. Many of the people in this photograph still work on Trove, some have left the Library for new opportunities, and some are working on other Library priorities, including our new digital library infrastructure. Beyond this group, of course, were the cataloguers, the digitisation teams, the communications and marketing team who assisted with messaging, promotion, and actually came up with the Trove name. There were the reference staff with real world experience of how users move through the information universe. There were the IT infrastructure staff, making sure the servers and databases kept running all day every day. There were even our financial gurus, keeping a careful eye on the project bottom line. Trove development was governed by a Project Board (of which I was a member), and of course our Corporate Management Group and our Council oversaw the whole project. The number of people who worked full time on Trove’s development was really quite small. But the ‘family’ that nurtured Trove was much larger than the development team.

This is still the case. This year, only around 10 of the Library’s 420 staff work full time on the Trove service. This is probably surprising to many who see Trove’s size, its reach and its success. But beyond that core team, there are many more who contribute in some way to Trove’s ongoing success. I cannot emphasis enough how important it is to have this wide sense of ownership in the Trove enterprise, and pride in its achievements.

We set out with a set of aims for the Trove project, and we have met most of those aims.

Looking back now, there are some things we wanted to achieve that did not work out. Early in the project, we imagined a world in which we or other libraries could build what we called ‘Trove Local’ – a world in which local ‘views’ of Trove could replace individual library catalogues. This was an ambitious vision and one which was beyond both our means and – I think it is fair to say – the appetite of our Australian library community.

We spoke of Trove as being the first place Australians would come to do pursue their research, of it being a place where they could find everything relevant to their interests. In other words, we saw Trove very much as a starting place. Now – with 75% of all users arriving from Google, and more and more of our content available in other services – we see Trove as a starting place, a destination and – as more of our content is used elsewhere – a ubiquitous part of the Australian information commons.

Not all of our experiments with social media have been successful. For example, we tried and abandoned a Trove blog and a Trove Facebook presence, when they simply did not justify the resources we expended on them. But we keep our eye on new social media opportunities and are willing to keep trying new opportunities and to stop using them if they do not help our users to interact with our content, with us and with each other.

One area of Trove that we feel is under-loved is the People zone – information about Australian people and organisations. We will be adding significantly to the people content – by loading the Virtual International Authority File and researcher data from Australian universities – and hope to see the People zone refreshed and more useful in the next couple of years.

While our failures have been few, our successes have not always been exactly what we anticipated.

The popularity of the Australian newspapers component of Trove, especially text correction facilities, has far exceeded our expectations. As I’ve mentioned, more than 70 million lines of newspaper text had been corrected by individuals by September 2012. Software developers are keen to develop solutions to improve the text automatically – but those solutions are unproven, we cannot afford to experiment in this way at the moment, and in the meantime our text correctors get much pleasure and pride from their contribution to this national endeavour.

The National Library began its newspaper digitisation program – initially with just one major daily newspaper from each Australian capital city – with no additional funding, but with the strong belief that making this content freely discoverable and searchable would democratise access to Australian history. We could not have anticipated the incredibly varied use of this content, nor the ways in which it has allowed totally new areas of research to be offered. The majority of newspapers use is by family and local historians. Australia has an aging population, many of whom are well-educated and want to pursue research into their own and their community’s pasts. Trove’s newspapers has brought these non-professional historians right into the Library’s orbit and has, in effect, given us a huge new support base which simply did not exist in the past.

It has also opened up a whole new area of library cooperation. The Library is now running a very successful Contribution Model for newspaper digitisation. Libraries and historical societies pay us to digitise, manage and deliver local newspaper content through Trove. In the next financial year alone, we expect to digitise more than 2 million newspaper pages through this contribution model. This model offers the kinds of efficiencies that our Australian Bibliographic Network (ABN) introduced 30 years ago. Instead of every Library doing its own thing, trying to set up digitisation programs, trying to preserve digitised content for the long term, trying to figure out ways of delivering content to audiences with high expectations, we now have a robust model for a national digital newspaper library.

So, the digitised newspapers component of Trove has been a huge success: for our users; as a communications and marketing exercise for the Library; and for the Australian library community.

However, one of the slightly unsettling results of this uptake of digitised newspapers is that it has swamped use of all the other resources that – for decades – our profession has been dedicated to. I need to convey some caveats about the following graphs. As we all know, how we count things makes a big difference. In these graphs, for example, we are actually counting metadata records. This means we are counting newspaper and journal articles as ‘1’ – and we are also counting entire personal archives and big collections of pictures and runs of journals that are described with just a single MARC record as ‘1’. Nevertheless, it is worth looking at this graph of the breakdown of works in Trove.

And then a graph of which content is most popular:

If items and usage are viewed together, the disparity between the volume of particular content types available, and of their use is quite obvious.

We have 7 million newspaper pages in Trove – and searching on those pages accounts for more than 80% of all use of the service. My background is in Manuscripts, and I love these materials with a great passion. As you can see, they account for a tiny proportion of both content and use. We have put a huge effort into including journal articles – but as you can see, use of the article zone is very low compared with the volume of content that can be discovered.

For most users, Trove really is about the full text of newspapers. The material we have spent so many decades collecting and describing, and for which we have developed truly efficient national cataloguing, inter library loan and discovery systems, is not what primarily attracts our users. They want full content. Now, none of us is going to stop collecting books (whether they are print or e), or acquiring datasets for our users, or working hard to collect unique and rare materials which tell wonderful stories about our past. We must collect the web domain and we must collect born digital material for our future users. But at the National Library, we are certainly moving more of our resources into making this full text available for our users. This year, we are starting a program of reducing overseas collecting – except in our great areas of strength, South-East Asia and the Pacfic – and moving most overseas collecting to digital form, in order to redirect our resources to increasing our digitisation capacity. We will continue to digitise our ‘boutique’ collections – pictures, maps, music – but will increasingly emphasis bulk digitisation of books, journals and newspapers.

In some cases, we have been surprised at *who* has used which services and who has not. For example, when we decided to develop Trove so that licensed articles could be found there and used by patrons of subscribing libraries, we thought that our main target audience for this service was registered patrons of the National, state and territory libraries. Our data, patchy though it is, suggests that most take up of the service has been by patrons of university libraries, while take up, take-up by NSLA library patrons is low and is likely to remain so. This has raised quite a few questions for us that we are considering at the moment.

Similarly, we invested in developing an API (Application Programming Interface) for machine to machine access to almost all Trove content. We knew and indeed hoped that the API would be used for things we could not even imagine. But we had expected that Australian libraries would use this API to make content of interest to their local user base – for example, photographs of towns from one state held and digitised by another state library – available in their local services. Instead, use of the API to date has been by those undertaking sophisticated text mining, or data mashups, and we have had a number of approaches by commercial discovery layer vendors wishing to expose Australian newspaper content in their services. So we are still adjusting our thinking around the API and suspect that its time has not yet quite come.

One issue that was foreseen by some within the Library and not by others was that offering a very large discovery service with advanced user engagement figures leads people to ask questions! In 2011-2012, the Trove team answered nearly 3000 questions from Trove users, as well as directing a large number of enquiries about newspaper digitisation plans to our colleagues in the Digitisation team. Around 40% of these enquiries about how to use Trove – the rest are about its content.

This meant that a significant new task had to be managed within a finite service team. We have managed this challenge by:

Providing as much self-help on the website – and on other channels such as You Tube – as possible. We are refreshing this content at the moment, and [hope to prepare a new set of instructional videos this financial year].
Encouraging our users to help each other through the Trove forum. Experienced newspaper correctors, in particular, are generous with the assistance they provide to ‘newbies’.
Looking carefully at common user problems and prioritising solutions in our development plan.
Using the same enquiry software as the rest of the Library, and most other NSLA libraries. This software has now evolved to the point where we are able to efficiently pass enquiries that originate with Trove to other parts of the Library or even to other libraries that may be better able to deal with the enquiry.

Our aim is to maximise one-to-many answers in the form of self-help, minimise one-to-one answers through our enquiry service, and to answer individual questions according to the Library’s service charter.

It would also be true to say that in the heat and rush of the Trove project, we did not really address some important governance and sustainability issues. In the last year or so, going ‘back’ to address a number of these issues has been a major focus. I think you would be surprised at how many hours of discussion by how many people was required to come up with a clear picture of what Trove is, what we want it to be, what our strategic directions should be, what content is in and out of scope for Trove, which users we most want to serve etc. We are now moving from uncertainty to a clearer path and coherent plan for the future.

Of course there are many things that we would like to offer through Trove that our current resource base will not support. We have a list of hundreds of potential contributors to Trove – mostly of unique or difficult to find Australian content. But last year, we managed to add only seven new contributors, although we hope to add many more in 2013. Those seven were important and big collections, but this disparity between potential content and what we can achieve will give you some sense of the constraints we work within. Most non-library potential contributors simply do not have the technical and intellectual capacity to make their content available to aggregators in any streamlined way. We are currently changing our focus from bespoke solutions – and a lot of hand-holding – to thinking about ways in which we can encourage capacity improvement and deploy our resources in more targeted and efficient ways.

Similarly, we have a long, long list of both urgent and desirable new services and service enhancements. They include: rewriting our newspapers component to deal with a much bigger than expected body of content; making the newspapers zone mobile device friendly; improving the functionality of lists so that they can used in more flexible, specialised and possibly even branded ways; providing more support to include and display newspapers in languages other than English; developing better and cheaper ways of adding new content to the service; improving and building the people component; and making it easy for users to reach potential ordering services for resources they find in the service, among others.

We rode an innovation wave between 2008 and 2011, 2012 has been a year of consolidation, and I expect that 2013 to 2014 will see us begin on a new but more modest wave of innovation. Of course as Trove gets bigger, and attracts more users, more of our capacity – servers, bandwidth, software, people – must be devoted to maintaining performance of the current system, rather than continuing on a sharp innovation curve.

I hope this introduction to the thinking that led to Trove, the results of the project, my thoughts on what was needed to make Trove a success, and where we might head next are helpful to those of you – and I know there are several – who are considering developing national discovery services in a range of formats.

It has been a wonderful adventure so far, has capitalised on years of investment, shaken some of our certainties, opened up Australian history and culture in new ways, given us many headaches – and exercised our organisation’s intelligence to the full.