Enriching our bib records using the BL Linked Data Service

As members of the Bloomsbury Library Management System Group (comprising Bloomsbury Colleges and Senate House Libraries), the IOE library has been participating in an interesting project to ensure that our MARC21 bibliographic data is in good shape and fit for purpose in the 21st Century. This means that any member who migrates to Kuali OLE will be able to do so in the knowledge that their expensively created MARC data is working hard for them.  Those who are at this stage only looking to add a new discovery front-end system will gain similar benefits. We have also been doing some experimentation with next generation discovery systems such as VuFind and it has revealed that shortcomings in your MARC data can be exposed by these. For example, if your language code is missing, then filtering on language will give a misleading result.

We have been using the Linked Open Data service from the British Library to try to enrich our bibliographic  records. The first question is why would you do this? Well, there are a number of reasons ranging from ensuring that the number of identifiers in the records is maximised (the print and electronic ISBN for example) to adding the Dewey classification number or Library of Congress Subject Headings in order to both assist retrieval and provide a platform on which common points of access might be derived across a consortium of library catalogues, for example, from the BLMS members.

In terms of the methodology, we used the BNB Sparql endpoint service and created some php scripts which fired every BLMS record which had an ISBN at it. This retrieved the target record where exactly one match was found. From that BNB record, we retrieved the fields ISBN, DDC (082) amd LCSH (650). These were then compared with the original source record from BLMS members. Where a value differed (potentially an enrichment), this was recorded in a database for further scrutiny.

At this stage, the procedure is fairly simplistic in certain ways. It only looks at the first ISBN in the source record; discards the record set where more than one result is returned as potentially unsafe. These are all things that could be added as enhancements without too much difficulty. The main problem that occurred was that for some unknown reason the script kept pausing and not resuming. The answer was to kill it and restart it at the point where it had reached. Rather messsy, but did the job!

The results were as follows:

1.3 million records had an ISBN (43% of the dataset). 299,149 (23%) BNB record’s data were found within this and harvested via our fairly risk averse process. Of these, 473,757 discrete proposed data enrichments were found spanning over 198,205 records. The breakdown of the enrichments types is:

ISBN10    43,800
ISBN13    15,127
DDC    97,712
LCSH    317,118

The next stage is to work out how we might validate these and incorporate them back into our host Library Management Systems. Alternatively, those institutions migrating to Kuali could use this as a staging post pre-migration to optimise their Marc data.

This has been a very useful example of a practical application of linked data and one which we will continue to explore. The next question is whether this can be done using title where there is no ISBN to use as a hook. That can turn into something of a nest of vipers, to be left perhaps for another day!

This entry was posted in MARC, OEM-UK and tagged , , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s