The end of the card catalogue age
Cataloguers are experts in many of the fields that are integral to the organisation of information in the (so-called) “information age”.
Taxonomies, classification, controlled vocabularies, and facet analysis; we were masters in all of these when IT geeks were still playing with the rubber keys of their ZX Spectrums. We could have been rich, had power, but we don’t. Those IT Geeks stopped playing 3D Deathchase and became our bosses.
We are effectively still in the “card catalogue age” …
We were so intent on maintaining legacy card catalogue metadata that we failed to grasp the new opportunities presented to us, instead viewing them as threats. But Michael Gorman’s ‘boogie-woogie Google boys’ (http://www.slc.bc.ca/rda1007.pdf ) have won.
After nearly 50 years the end of AACR and MARC is at hand and there is an ever-expanding list of formats and technologies (NLP, RDA, RDF, SKOS, OWL, DCMI etc. etc.) bumping up against ‘traditional’ cataloguing.
I do not believe ‘traditional’ cataloguing will survive as it is, and I remain to be convinced that it will be RDA alone that replaces AACR2. The future is uncertain.
In this post I will look at one example of OEM-UK metadata creation to illustrate how cataloguers can apply our expertise to increase output and semantically enrich metadata (and yes, I did say ‘increase output’) in the post MARC/AACR world.
The OEM-UK way
The OEM-UK way is to enable research and re-use by producing semantically rich metadata, as efficiently as possible, with the resources we have. Then to release it in multiple formats and in multiple systems (e.g. Linked Data, MARC21, CKAN, LMS etc.) with open licences.
We aim to get the maximum bang for our bucks (or in this case, JISC’s bucks!).
We only manually create metadata that is likely to be important to researchers (identified in consultation with our subject and collection experts) and if there is no way to auto-generate it; this is entered by a non-professional member of project staff using Drupal (see this previous blog post for more details Cataloguing the Drupal way ).
I will use the cataloguing of exam papers to illustrate how professional cataloguers contribute to the OEM-UK metadata creation process.
IOE exam papers
The IOE has many tens of thousands of uncatalogued exams ranging in date from the late 19th century through to the 1980’s, covering various subjects and levels.
The OEM-UK way is to create semantically rich records, using the available resources, in the time we have; we set about working out how to represent nearly 5000 examinations in under 6 weeks using 1 member of project staff. We started by asking the question:
What are people researching when they consult the exams collection?
Our subject and collection experts told us that researchers are overwhelmingly interested in what subjects were examined, when, and at what level.
Based on this we came up with the list of metadata that we had to capture:
1 – the exam boards (e.g. University of Cambridge Local Examination Syndicate)
2 – the exam level (e.g. Higher School Certificate)
3 – the exam title (e.g. Modern History III (The British Empire))
4 – the subject of the exam (e.g. ‘Modern history’)
5 – the date the exam was sat
Others fields such as the 245 will be created by combining (or inferring) other metadata fields. If non-essential metadata can be auto-generated it may well be created; but we are not going to use limited resources creating non-essential metadata just in case it might be of use to someone in the future, even though all our evidence indicates it will not …
‘Just in case’ metadata
Too often Cataloguers get hung up on creating metadata because it might be of use to someone, without any actual evidence indicating it will be. I refer to this as ‘just in case’ metadata.
Two examples of exam paper ‘just in case’ metadata are: Publisher – sometimes the publisher of an exam is not the exam board (and?); and year of publication – sometimes the exam was published the year before it was sat (so?).
Self evidently creating less metadata per record enables us to create more records; surfacing more material by dropping ‘just in case’ metadata is, to our minds, a worthy trade-off.
But the OEM-UK way is also about semantic quality …
The semantics, stupid
By not recording ‘just in case’ metadata we can spend more time creating metadata that we believe to be of real use to researchers. I believe OEM-UK records are semantically richer than equivalent records produced using ‘traditional’ cataloguing methods.
We decided we had to capture every exam title and then use a controlled vocabulary to create the subjects if we were going to create (genuinely) semantically rich metadata. But it was a tough task we set ourselves …
Some exam booklets contain over 120 individual examinations covering a huge range of subjects.
The OEM-UK way is to use non-professional staff to create the records (i.e. they cannot subject index – even if we’d had the time!).
In addition, if we created one metadata record for each booklet, the traditional way, we would not be able to list subjects and titles (the records would be enormous).
We decided to break the link between the physical item and the metadata record and instead concentrate on the intrinsically important intellectual unit; the individual examination. We created Drupal records containing title, date the exam sat, exam board, and exam level for every examination (i.e. if a booklet contained 120 exams it would be represented by 120 records).
At first glance it does not seem possible for one member of staff to create nearly 5,000 records in under 6 weeks, and there is no subject indexing. But we had a plan, and professional cataloguers were integral to it …
What are our professional cataloguers doing?
The various tools for auto-generating the input of metadata offered by Drupal facilitates a far quicker ‘data entry’ process than that offered by a traditional LMS. But we also needed to use professional staff before and after the ‘data entry’ (i.e. record creation) stage to enrich the records and to reduce the amount of metadata that needs to be created manually (therefore increasing output).
This ‘strategic’ use of professional staff is at the core of the OEM-UK way. Three examples of this strategic application of professional expertise are;
1 – Creating an OEM-UK exam subject controlled vocabulary (ESCV) – enabling both semantic enrichment of the metadata and speeding up the record creation process. The latter is achieved by enabling the retrospective completion of subject metadata post-data entry by running string matching scripts against the exam titles and ESCV (i.e. project staff do not need to enter the subjects in Drupal, increasing speed and enabling non-professional staff to enter records which will be semantically enriched before release).
2 – Integrating the ESCV into the London Education Thesaurus (LET – is the IOE thesaurus used to index our core collections); enabling semantic linking to other material (e.g. between exam papers and textbooks).
3 – Enriching records with extrapolated metadata, creating additional semantically rich links via LET. For example, we have a School Certificate Examination with the title ‘Modern History III (The British Empire)’. We manually enter the title in Drupal (AJAX doing about half the work). But we can then extrapolate all sorts of extra information without the need to manually enter it: School Certificate Examinations were sat at secondary level therefore we can auto populate records with the LET terms ‘School Certificate Examinations’ and ‘Secondary education’. They are obviously examinations so all records get the LET form term ‘Examinations’. Part of the title matches the ESCV term ‘Modern History’ so they get the indexing term ‘Modern history (subject of instruction)’.
We are confident that the OEM-UK exam records will be semantically richer than records created using traditional cataloguing methods.
Where are we now?
One full-time member of project staff successfully created records for all 4,700 examinations in a little over 5 weeks; a rate of over 200 a day! We have captured every title and have used ESCV/LET to create semantically rich metadata.
We believe that not recording that a booklet was published by ABC Printers 1 year before the exam was sat is a price well worth paying for the high number of semantically rich records produced by OEM-UK.
We will retro-load the records into our SirsiDynix Symphony LMS soon and our professional cataloguers are already involved in this process, using their expertise to ensure that the MARC record are of the highest possible quality.