Monday, 22 November 2010

The UK National Archives Catalogue Day

The annual Catalogue Day was held at the Kew on Friday19 November. About 50 people (my guess) attended, including staff and two clients from outside the UK. Most of the presentations were short, 20 minutes. Two were more substantive of which I only was able to attend the first, on Resource Discovery, by Tim Gollins who is TNA's Head of Digital Preservation and Resource Discovery.

Gollins described himself as a computer/information technology geek, but did a good job presenting the material in a non-technical way. He explained that he views the resource discovery challenge as the scientific one of connecting clients needs with the organization's information, projected through a computer interface.

The original TNA online catalogue, PROCAT, was developed in 2000 with eight million entries and has grown to 11 million. Although the interface has changed the catalogue today is built on PROCAT and its departmental hierarchy.

In ten years the technology explosion has changed client expectations and presents further challenges.
- the volume of data is increasing. With an average 45 facts per catalogue entry there are 500,000,000 facts in the catalogue, and millions more when other data such as Documents Online are included.
- the user base is more widely distributed. For every document ordered in the reading rooms 221 are now ordered and delivered online.  That's up from about 170 a year ago.
- the construction of the current catalogue system is inflexible. Add-ons and a proliferation of tools leads to confusion for users.
- search systems technology has changed radically in the past decade.
- funding limitation are placing a premium on economies of scale. Paradoxically a uniformity of approach can lead to increased flexibility for users at lower cost.
- the public policy agenda, "big society", "transparency".

A catalogue, a means to find a resource, should be as simple and comprehensive as possible and present a coherent and consistent view of the collection reflecting a user viewpoint.

Gollins explained that the aim of their resource discovery initiative is to provide a facilitated discovery browsing approach, much as in online shopping, where information can be filtered based on the terms the user understands. In his view these are subject, place and people.

On subject, Gollins contrasted TNA's objective with Google's which is to find the one hit sufficient to answer a request in a "good enough" manner. TNA needs to be comprehensive identifying all relevant resources. For them "good enough" isn't. The approach is through taxonony, which seems to means tags.
On place, they are working on map interfaces as a means to browse and filter information. The work is at an early stage.
On people, they are looking to tune the search system to respond to people's names, both for the prominent and not so prominent.

TNA have a substantial cataloguing team working to improve the data on which resources discovery is based and are looking to find ways to enlist users to contribute (big society).  They hope to build communities around records and open up data in bulk to allow geeks to create new interfaces and build mashups.

I asked about the compatibility with initiatives in archives elsewhere, nationally and internationally. While there is some conversation internationally TNA believe they are ahead of their counterparts. I would expect that making data available in bulk would be some indication that anything they develop should be amenable to integration on an international scale. It would be unfortunate if the archival community did not eventually work toward an international catalog(ue), just as WorldCat is doing for bibliographic information, so that the client does not have to identify which organization holds the information they seek as a first step in the search.

Several of the shorter presentations referred to the TNA Labs website at in particular the UK History Photo Finder. It has a map interface. I wasn't successful in getting it to work (it is a Labs application so that kind of glitch is to be expected.)

I was very sorry to have to miss the talk "Views from the bottom, Voices from below: Poverty and Punishment, Law and Lunacy" by Paul Carter and Sarah Hutton.

Finally, I'd like to commend TNA on holding this event, and for their other initiatives to communicate two-way with their clients, an initiative other organization would be well advised to follow.


Paul Jones said...

John, does this statement mean what I think it means? "For every document ordered in the reading rooms 221 are now ordered and delivered online." Only a small portion of all documents are available online. Does this mean then that those that are available are being viewed, at a rough estimate, thousands of time more often than the average document in the collection? Or am I misunderstanding something? Thanks.

JDR said...

Yes Paul. The most popular collections, those of most interest for family history, have been digitized. You still have to visit to see those of less general interest, like Court Rolls. It's the 20/80 rule in action, only more so.