Library classification and visualization: some random thoughts

As a strong believer of the tenet “library cataloging must change or die”, I have a special interest in classification.


I have done some research on FRSAD recently as part of a new article that will be published in Chinese about recent developments of RDA and BIBFRAME. FRSAD is definitely an interesting model by adopting a somewhat different approach than FRBR and FRAD (especially the two entities “Thema” and “Nomen”), partly because of its goal not to “impose any constraint on the forms that subject authority systems take in particular implementations”. The processes to incorporate FRSAD with FRBR, FRAD, and RDA are going on. A new discussion paper was published in August by ALA/ALCTS/CaMMS/Subject Analysis Committee and was submitted to RDA Joint Steering Committee as LC’s suggestions on this issue. It will be interesting to see how RDA will integrate with FRSAD, after which RDA will finally be a complete content standard.


However, beyond just mapping different classification schemes and authority systems to RDA (or BIBFRAME), there are more questions to think about in library classification. For example, there is a stronger focus on machine agents rather than human beings in the linked data movement. Even so, how to present library data/metadata efficiently to the library members/users is still an important thing. To build upon this point, I would say that a different bibliographic infrastructure won’t change the way bibliographic data are presented to the human users significantly; however, everyone knows that a number (or maybe a group of numbers) and some weird strings most libraries are using to present classification information are not a good presentation of a resource’s position in the knowledge universe by any means.


In David Weinberger’s book Everything is Miscellaneous, he talks about the three orders of resource organization, namely, organizing physical items per se, recording information about sources in the printed form, and the digital order, which is where we are trying to reach.


Library catalog, as a simulation of traditional printed catalog, is still in the second stage largely in that we are trying to describe a resource in a limited number of fields -- and more specifically for classification information, we are trying to deduce the full contents of any resource into numbers and strings, which sometimes don’t even make sense to library members, not to mention to support user tasks as defined by FRSAD (namely, find, identify, select, and explore).


In fact, this issue has been addressed by the library community quite early and a lot of solutions have been presented. Some of the examples emerged in the so-called “Library 2.0” movement started from 2005.


The most popular way to solve this classification conundrum is probably tag cloud, which often presents user-generated tags. Put aside quality issues of folksonomies and the drive force for library members to add tags to the library system, tag cloud can be a useful and even compelling way to present thematic information to the members. And normally, tag cloud system will also use different font sizes to mark the frequency of each tag. One really good tag cloud system is And because of its relatively active user group, the quantity of tags is large enough to be meaningful. However, unlike other kinds of information on the website, LT tag cloud is not available through API.


Another famous Library 2.0-wise solution is SerialsSolutions’ AquaBrowser discovery system, in which “word cloud” mechanism is used to broaden library members’ search. Normally, three kinds of relationship between words are presented in the system, namely, association, spelling variation, and translation. It is a good idea except for two things. First, the relationships between words are not always clear to library members; and second, the relationships between these words, in most cases, are not knowledge relationships.

Still another interesting example in this category is OCLC Classify (, which uses a different approach to present the classification information of a resource, which is to display all the classfication numbers and subject headings of this resource in MARC records arranged by librarians in bar charts. So in general, the scope of this information is still library-centralized. However, because of the various practices in different libraries, you can still find very diverse information here, that can help you understand the resource better. I would say that it can be a great way to facilitate members (rather than just catalogers) to discovery resources in Worldcat (and potentially, in library catalog). Unfortunately, this tool is still limited in Classify rather than the larger, and is largely unknown to the library end users.

After taking Information Visualization class and reading more and more visualization works, I have the feeling that there are more creative ways for library interface to present classification information. For example, the bar chart used by OCLC is pretty neat and similar with what people are doing in visualization. But there are more examples inside and outside library world, which may or may not be directly towards library members, but are inspiring (I hope) for this topic.


One really neat visualization in this area is an interactive Multilevel Ring chart made by Bepress as a discovery tool for all its “free, full-text scholarly articles”. You can easily see the topical distribution of all the resources, and get access to all the resources belong to a topic. Interactive side of this graph makes it even more interesting: you can zoom-in and zoom-out in the graph for more/less detailed information. And our university library’s IR system has a similar chart for local contents on the website.

A blog post written by Drew Skau in blog offered some great examples of visualization of names, categorization, and classification of the things. Though most of the examples here are top-level chart of classification schemes, which may not be easily adopted by library catalogs. These nicely generated graphs can offer us better ideas of how to make our classification scheme meaningful and beautiful in other ways.


Another possibility to think about is a “thick description” approach to visualize a small set of resources. Delayed Gratification, an UK magazine made a visualization about the topics of all the novels on 2011 Booker Award Longlist. Again, it may not be an idea that can be adopted in library’s catalog directly; however, it is definitely valuable to think about how to expose more detailed information (like storylines or theme) that are buried in library collections, and present them to the human users, which is also a point closely connected with Linked Data.

The last cool visualization of library collection I want to introduce is a paper published by William Denton in 2012, which is an analysis of the collections of Toronto Public Library and San Francisco Public Library. The heatmaps in the paper present the distribution of each library’s collection as well as the comparison of the two libraries’ collection in a very direct and appealing way. Again, I don’t see it as something that is very meaningful to library members. But it is definitely a great way to present the information to library directors and acquisition and maybe reference librarians, to whom the information is meaningful and the display of the information can help them do their works more efficiently.


Any ideas about this topic? Do you have any visualization done in your library? I look forward to hearing your sharing.

Add new comment