The VuFind Browse Handler

Small picture of Mark
Founder & Expert Button Pusher of Teaspoon Consulting

I developed the first version of VuFind's browse handler while working at the National Library of Australia. The source code was released as open source and contributed to the VuFind project.

This article gives a brief summary of what browse is all about, and shows how the VuFind implementation of browse hangs together.

Overview

Alphabetic browse provides a hierarchical way of navigating library collections. Here's what you can do with it:

Put it all together, and you get something like this (taken from Villanova's VuFind instance):

An alphabetical browse by title

The major pieces

Alphabetic browse is made possible by several components working together. These are:

The implementation

VuFind's browse code is open source. The authoritative version of the code can be found here: https://github.com/vufind-org/vufind-browse-handler.

Browse indexing

Browse indexing is the process of extracting terms from VuFind's bibliographic and authority indexes, and producing a sorted list of headings. The resulting list is stored in SQLite for convenient access.

Indexing is divided into three major steps:

The sort key is a byte sequence produced by ICU4J. This key determines the ordering of the headings, taking into account punctuation and multi-byte characters.

The indexing process is summarised in the following diagram:

The browse indexing process

Browse handling

The browse handler is a Solr request handler that combines the results from SQLite, the bibliographic index, and the authority index. It handles each browse request by:

The browse handling process

Frequently asked questions

Why do we need a database and Solr?

In principle you could build a browse straight off your Solr indexes by using the Lucene API directly—by writing a Solr query component that opened the bibliographic index and used a Lucene TermEnum to seek to the right term of the index being browsed.

The sticking point for this is sorting: Lucene will give you a single field in sorted order, but for browse we want to sort by one field (with collation, etc.) but display another. There might be ways of getting around this (like prefixing each term with its sort key and stripping it off at runtime), but it's hard to beat the simplicity of SQL for this (particularly once you throw forwards/backwards pagination into the mix).

Why SQLite and not something else?

Mainly because it was convenient when I first wrote the code. It works on multiple platforms, is OS-cache-friendly, and is simple to update (just build a new database and throw away the old one).

Can I change the way headings are sorted?

Sure. You can write your own "Normaliser", which can produce sort keys using whatever scheme you desired. I've prepared an example here that you could start with: https://github.com/vufind-org/vufind-browse-custom-normaliser.

When I try a browse it always takes me to the top of the list!

In April of 2012 I switched the browse code from using a home-grown sort key generator to using ICU4J. This had the benefit of allowing locale-specific sorting rules and proper UTF-8 handling, but changed the sort keys in a way that wasn't backwards compatible.

People occasionally report their browses jumping them to the top of the list of headings. This is a good sign that your browse indexes were built using a different version of the code to what you're now running.

To troubleshoot:

Links

Questions? Hassle me at mark@teaspoon-consulting.com.