Personal tools
You are here: Home Documents A Usability Research Agenda

A Usability Research Agenda

last modified 2007-03-08

A Usability Research Agenda for Digital Libraries, by Teal Anderson and Sayeed Choudhury, Digital Knowledge Center, Sheridan Libraries, Johns Hopkins University

Abstract

Usability testing is becoming more common in the library community. Digital libraries present issues and opportunities that merit the investigation of usability testing methods with the aim of identifying the most appropriate approaches to digital library usability. These issues and opportunities include: meaningful quantitative measures, the location and diversity of digital library users, partial interface control, realistic vs. controlled test settings, and the balance of user feedback and librarian expertise. The discussion of these issues and opportunities serves as a foundation for a usability research agenda for digital libraries.

Introduction

The growing body of literature on usability testing in physical and digital libraries is encouraging. It not only indicates an increasing awareness within the library community of the importance of user-centered design of digital library resources, but it also helps to spread that awareness. This literature will enable many in the library community to see that usability testing extends the library's patron-oriented mentality into the digital environment. Accounts of usability tests in which library users did not interpret library web site terminology as librarians intended [1,2] may inspire librarians who have yet to consider the usability of their sites to begin to do so. Reports on such methods as card-sorting [3] and task-based think-aloud protocols [4,5] provide the library community with precedents for incorporating these methods into the design and evaluation of its digital resources. We have found such case studies from other libraries helpful to us in our own usability endeavors, which have included focus groups, link name and page content expectation tests, online surveys, field observations, and think-aloud, task-based tests. As librarians gain experience in usability testing, usability expertise will allow the library community to ensure that resources are utilized to full potential.

However, this body of literature will better support the library community in its usability efforts if it includes accounts of usability research in libraries, in addition to the reports of usability testing. We distinguish between usability testing and usability research as follows: The main purpose of usability testing is to assess the effectiveness, ease of use, and subjective satisfaction of a system. The main purpose of usability research is to determine the best methods and practices for usability testing. While usability testing offers the opportunity to observe patrons interacting with the library's digital presence, usability research is an essential complementary activity. Digital library usability research will permit the development of best practices for usability testing within digital libraries. At present, while usability is a "developing theme" [6] for the library community, the literature on usability testing in digital libraries offers a limited exploration of theoretical frameworks [7] or models [8] for digital library usability. The focus has been on the basics of testing; methods and practices have been borrowed from other fields. While importing methods and practices from other disciplines can be useful initially, it is important to seek out the most relevant methods and practices for digital library usability. Through usability research, libraries can transform their usability practices to deal appropriately with the issues and opportunities presented by digital libraries.

In the context of the existing body of literature on digital library usability and of our own usability testing experience, we discuss here a set of issues and opportunities that form the basis of a research agenda. Some of these issues and opportunities require the particular attention of the library community because they are unique to digital libraries. Others, while not unique to digital libraries, are still in need of resolution with respect to digital library usability. In an attempt to begin to close the gap between existing accounts of usability testing and a set of best practices for usability in libraries, we propose a usability research agenda for digital libraries. We anticipate that collaborative efforts will allow the library community to tackle the issues raised in this agenda. We envision that, together, usability testing and research will allow digital libraries to present their resources and services in ways that are easier and more satisfying for library patrons to use.

Unique Issues and Opportunities

Existing usability research neither adequately addresses the unique issues relevant to libraries nor takes advantage of the research opportunities afforded by libraries. Usability research within libraries could provide a solution to these problems.

Issue: Meaningful quantitative measures

As usability is becoming a buzzword in the library world, libraries are relying on general web usability resources to guide their testing procedures. This is true of those who have published digital library usability case studies [9,10], as well as within our own library. We have depended on Nielsen [11], Rubin [12], and Dumas and Redish [13] for guidance in the development of our usability tests. Although this has been useful, we are concerned that not all of the methods suggested in these general web usability resources are relevant for evaluating the usability of digital libraries. Furthermore, and perhaps more importantly, not all of the methods that are needed in digital library usability evaluation are described in these resources. The need to determine what metrics are appropriate for digital library evaluation has been raised before [14], but here we expand on this issue by explaining why some commonly accepted usability metrics are not appropriate for digital library usability. In think-aloud, task-based tests, possible quantitative measures to complement the qualitative data of users' comments include how long it takes users to complete each task and how many tasks they complete successfully.

How long it takes users to complete each task can be measured and analyzed using the mean, median, range, or standard deviation of completion times [15]. These are good metrics for the usability of e-commerce sites, because in those sites it is a problem if it takes too long for the user to accomplish what they came to do. Users are likely to leave and look for another online store where they can make their purchases more efficiently. In a digital library, on the other hand, the task is not to find a book and buy it. The task is to find some information and learn from it. Whether the users are looking for known items or for some resources on a topic, they could come across related items that extend beyond what they were seeking originally and that contribute to their research. Finding these related resources and determining whether they are actually relevant takes additional time, but it may be time well spent. Furthermore, when a participant finds the information requested in a task but continues to work on the task and finds more "right answers," the observers can take two measures: "time to get first correct answer" and "time to get all answers." Someone must then decide which of these numbers to use when averaging this user's "time to complete task" with those of the other participants. Should the shorter time, the longer time, or an average of the two be used? Any of these three numbers fails to convey the richness of what happened when that user worked on that task. Thus, "time to complete task" is not a meaningful metric for digital library usability.

Similarly, how many tasks a user completes successfully is another metric that is difficult to employ in digital library usability evaluation. The problem is defining successful completion of a task. If the task is a known-item search, presumably "success" means that the participant found the item in question or correctly determines that the item is not available in that digital library. However, this simple definition of success does not capture the difference between finding the requested item and finding the requested item plus other relevant resources, as librarians hope patrons will do. If the task is an unknown-item search, the definition of "success" is even more difficult to pin down. Does "success" mean that the participant found some items on a given topic? Does it mean that they found all of the items in that library on that topic? Does it mean that they eliminated any search results that they determined to be irrelevant to the task? Does it mean that once they had found all of the relevant items, they did not continue down fruitless paths? Restricting the test to known-item searches when the digital library is used for unknown-item searches is not the right way to deal with the problem of defining successful task completion. "Success" is also difficult to define when there are multiple paths to the correct answer. Should the participant who follows the most direct path be considered more successful than the participant who followed a meandering path but ultimately found the same information? Furthermore, should the participant who completed the task without referring to the help section be considered more successful than the participant who used it several times? To account for these factors without complicating the definition of a successfully completed task, they could be reflected in "time to complete task." That is, all of these participants would be considered successful, but the ones who meandered or referred to help would take longer to complete the task. However, "time to complete task" is not adequate for representing this level of complexity: fruitless meandering and use of help would be confounded by serendipitous discovery, or fruitful meandering. Thus, new metrics need to be created that capture meaningful aspects of digital library usability tests, because, on their own, "time to complete task" and "number of successfully completed tasks," are not adequate.

New metrics can be created by measuring factors that have not been measured before, and/or by combining simpler, existing metrics. One new metric that has been devised by the latter method is that of "search efficacy." Kelly and Cool [16] use the ratio of "number of documents saved" to "number of documents viewed" to measure "search efficacy." There may be other compound measures to which "time to complete task" or "number of successfully completed tasks" will contribute. Other factors which could be considered in devising new metrics include: the minimum necessary number of pages viewed, the actual number of pages viewed, the number of times help pages are accessed, the number of new resources discovered or viewed. These possibilities should be explored via a reconsideration of the possible and useful measurements in task-based usability tests.

Some may contend that the analysis of qualitative data, namely the logs of participant comments and actions, is sufficient. This is good data to collect, but it is not sufficient. Quantitative and qualitative data should be compared for the presence of converging evidence. If similar patterns can be seen in both, the conclusions that are drawn can be much stronger than if the evidence diverge or if there is only one type of evidence from which to draw. In addition, on its own, qualitative data can be misused; for example, too great a focus can be placed on one comment or action sequence because it is particularly salient. An incident may have been the exception rather than the rule. Quantitative measures help keep the test analysis objective by putting individual incidents in the context of the whole test iteration. Some have contended that qualitative data should be the focus because "complicated quantitative techniques" [17] are expensive. Quantitative techniques do not need to be complicated, time consuming, or expensive. To encourage librarians to use quantitative techniques, the digital library usability research community should provide a suggested set of metrics, with step-by-step instructions for collecting and analyzing them.

Issue and opportunity: Location of user population

For digital libraries affiliated with physical institutions, easy access to nearby users is an opportunity. Physical libraries offer usability research opportunities by drawing in a steady flow of members of their actual user population. Every day, a diverse group of library users comes into the library to use the library's physical and digital resources. We have taken advantage of this proximity of users. Our library is centrally located on campus, and students come not only for library materials and information, but also to study and socialize. We have no trouble recruiting students to participate in usability tests. However, testing only these convenient participants leaves out other members of our intended audience, such as faculty and staff who access the digital library remotely, or who do not use the library resources that are available to them. Whether they are in their dorm rooms or on-campus offices, or on another campus, either across town or on another continent, not all of our library's users regularly walk through our physical doors. It is impossible to know how representative of the whole user population those who enter the physical library are. Thus, to have a complete picture of the usability of digital library resources, remote users must be included in the usability evaluation. For digital libraries not affiliated with physical institutions, usability testing with remote users is even more critical.

An area that deserves more attention is which of the many possible approaches to testing with remote users are best for evaluating digital library usability. One option is to hold test sessions at multiple locations. This can work well for a library that serves several campuses, if space is available for testing on each campus, and if the users agree to come to the nearest campus for a test session. The drawbacks include the travel expense and the challenges of coordinating a test without sufficient familiarity with the location. Another option is to have the user install remote observation software, to send them the test documents via email, and to hear their comments over the phone while seeing them work through the tasks. This method has been endorsed by OCLC [18]. While this approach avoids the cost of holding sessions at multiple locations, it, too, has disadvantages. First, some participants may be unable or unwilling to download the remote observation software, whether because they are inexperienced in using computers, or because they do not have permission to install software on the computer from which they access the web. Second, the data collected in this manner is not nearly as rich as that which can be gathered in person. The observers do not see what else the participants are doing besides what is visible on the screen. If participants take notes on a piece of paper or cover part of the screen with their hands, the observer will not know about it unless they happen to mention it. Third, if participants are not accustomed to holding a phone while using the web, this could interfere with their performance. A third approach to testing with remote users is to catch them when they are already gathered in one place. For a more specialized user population, such as the users of an astronomy digital library, usability testing could be held at a meeting of the professional association of that population. This kind of opportunistic testing has some of the same drawbacks as testing at multiple locations, except when the conference happens to be held in the same city as the headquarters of the digital library. A fourth strategy for testing with remote users is to develop partnerships with other libraries and to exchange the duties of testing remote digital libraries with local users. This approach may be especially useful for testing digital libraries internationally. These strategies are not unique to digital libraries, but a better understanding of their advantages and disadvantages would help librarians select the best methods for including remote users in their usability evaluations.

Issue and Opportunity: The Diversity of Library Users

A digital library's users may be diverse along a number of different parameters. These may include a user's field of study, level of study, academic role, experience with computers, experience with libraries, language(s), and cultural background. In addition, library users arrive with various goals and expectations. This diversity presents a number of questions for designing usability tests. First, which of these parameters are relevant to digital libraries and thus require consideration of different user groups? That is, does it matter whether test participants include a freshman neuroscience major, a musicology faculty member, and an electrical engineering graduate student, or would a usability test be equally informative if all of the test participants were electrical engineering graduate students? In our testing experiences, level of study seems to be relevant. For example, undergraduate students in lower-division courses tend to report searching for books and articles on their course syllabi, undergraduate students in upper-division courses and graduate students talk more about researching topics, and faculty talk about searching for works by familiar authors. Furthermore, participants who are unfamiliar with the subject area of the task tend to accept their search results as sufficient. Participants who are familiar with the subject area, however, tend to refine their searches until they find the scope of results they expect. Second, how important is it to include participants from all of these groups? Some test designers have focused their testing on one particular group, such as undergraduates who have little experience using the digital library [19]. This approach, which excludes the faculty and graduate student user groups, seems only to allow narrow conclusions to be drawn from such a test, as only the reactions of undergraduates have been observed. If there are a large number of user groups, including all groups in the same iteration of testing may not be feasible. A possible solution is to test with participants from some of the user groups in the first iteration, and from other groups in the second iteration, and so on. In this case, should the selection of groups be randomized or matched? These are common practices for between-subjects designs in experimental psychology, but we have not seen them employed in library usability literature. Another possible solution is to lump similar groups together and consider them as one user group. However, this may result in the exclusion of an important aspect of user diversity. Third, what are the consequences of advertising the tests across all of the user groups but scheduling them on a first-come, first-served basis? The resulting test sample may be diverse along several parameters, but it is a convenience sample. An investigation of the effect defining user groups and selecting participants by a number of different parameters is needed.

Once the user groups are established, another question is whether to give all of the groups the same tasks, or to give each group a set of tasks designed to reflect what users in that group would typically use the digital library to do. For example, we might ask the participants who are affiliated with arts and sciences departments to look for articles on the history of linguistics, and participants from engineering departments to look for articles on polymers. Closer approximations to participants' actual search terms would reduce cases where participants indicate that they would "never use the digital library to do that." However, designing different tasks for different users creates two problems. First, there is the challenge of ensuring that the tasks are comparable. Second, it opens the door to another problematic approach: if it is better to use two different tasks for two different groups, to approximate their real tasks, is it not even better to design different tasks for each participant, so that the tasks fit their individual profiles and are representative of what that participant uses the digital library to do. In fact, if designing tasks for each participant is good, is it not better to allow them to define their own tasks, since, after all, they are more familiar with what they have used the digital library to do than the testing team is? The results of user-defined tasks are particularly difficult to measure and compare.

Issue: Partial control and partial testing

An issue particular to digital libraries which contain collections owned or managed by other entities is that of partial control. For example, libraries often subscribe to many online article indexes, databases, and online journals. For the users who reach these resources through a library web site, the interfaces of these resources, which are largely outside the library's control, may be perceived as part of the library web site. Whether or not users recognize where the library ends and an outside resource begins, they will visit pages on both sides of that line in order to accomplish many of the goals they have in using the digital library. A full understanding of the usability of the library necessitates observing users as they navigate back and forth between a digital library and its outside resources. If a task is designed to prompt this sort of use, several challenges arise. First, the user might get off-track and have to be guided back to the digital library. Policies for when and how this should occur must be incorporated into the design of the test, to ensure consistent test conduct. The next challenge is to decide what to do with the data collected while the user is using an outside resource. Should tasks be designed so that the user can complete the task "successfully" regardless of the usability of the outside resource, or should that part of their experience be factored into the whole picture of that task? Some have suggested that the ability to use a library web site should be treated separately from the ability to use its resources [20]; they suggest testing up to the point where the library ends and not beyond that point. Perhaps it is acceptable to test up to that point in a usability test with limited scope and purpose. However, in a test where the goal is a broad understanding of use of the digital library, it seems that the actions beyond the border of the library should be taken into account. Thus, partial control does not necessitate partial testing.

If an interface outside the library's control is found to have usability problems, the challenge is convincing those who control the interface that they need to address the usability problems in their resource. Such critiques have turned up on the Library User Interface Issues mailing list [21]. Apparently, this approach is effective. Those who control the interfaces have been known to respond with assurances that they will conduct their own testing and/or fix the problem identified by the library. Another approach is to contact directly the group who controls the interface.

Broader Issues

Some research questions that pertain to web usability in general are also relevant with respect to digital library usability. Discovering the contexts of use of digital libraries and finding a workable balance between the influence of users and "experts" are two topics of this kind.

When the development of an interface is informed by an understanding of the context(s) in which it is used, the interface can be designed to fit into that context. Field observations allow for a far greater understanding of the context of a digital library's use than do controlled, task-based tests in a lab setting. The tension between the more realistic setting of the field observation and the more reliable lab test is not unique to digital library usability. Yet it is an unsolved problem that hampers digital library usability. This tension has been recognized by some [22], but others present it as a settled matter, asserting that usability testing should be held in a quiet location that permits the "undivided attention of the user interacting with the product" [23]. This is a hasty conclusion, when it is not known that digital libraries are always used in quiet locations that permit the user to employ their full attention to using them. A test in a quiet lab may be optimal from the test facilitator's perspective, because the test can proceed without interruption and there are no distractions to hinder good observations and accurate notes of what the test participant is doing and saying. However, the results of such a test may not mean very much if that user normally accesses the digital library while multi-tasking in a noisy environment with frequent interruptions. Furthermore, participants may be eager to get the "right answer" while they are being observed, and thus more likely to use help in a lab setting than in the real world. On the other hand, the divergent goals of participants in a field observation present a challenge for gathering useful quantitative data. Explorations of how to combine the best aspects of field and lab settings to produce environments that are equivalent to the settings in which digital libraries are used but allow for the rigor of the usability lab will help resolve this problem.

The balance between the influence of users and "experts" is another challenge that, while not unique to digital library usability, is an issue that needs to be resolved within that field. Once, a member of our library's web redesign team asked what we would do if the results of our usability testing indicated that we ought to design the site differently than we knew was the "right" way to design it. This is a delicate area. When librarians conduct usability tests, they need to keep in mind that the users may not be trained in searching for information, but they are the "expert" while they are participating in a usability test. The librarian has to hold back from helping the user complete the tasks, so that the user's expertise is seen in the interaction with the interface. Usability testing is undertaken to escape the trap of the "we know best" philosophy [24]. Yet when it comes to interpreting the test results and deciding what changes to make to the digital library interface based on these results, the librarian must resume the role of the expert, applying principles of both library science and usability. There must be a balance between allowing the results of usability testing to drive design, and accounting for what the librarian already knows about the content of this digital library, the experiences of other users who weren't tested in this iteration, and libraries in general.

Digital Library Usability Research Agenda

In our libraries, the word "usability" is no longer unfamiliar to librarians. Several of them have worked with us to develop and conduct a variety of tests, and several more have served as test participants. However, the iterations of usability testing we have done for our homepage, our main navigation bar, our catalog, our intranet, and our content management system, as well as for an online journal project and online collections of sheet music and medieval texts, are only the beginning. Usability testing will continue, but it will occur in the context of a larger research agenda, focused on addressing library-specific usability issues and opportunities, with the aim of formulating a set of best practices for digital library usability testing.

These issues and opportunities will be the heart of our research agenda:

  • Quantitative methods for digital library usability
  • Location of user population and test participants
  • Diversity of user population and test participants
  • Testing part vs. whole digital library
  • Test environment (natural vs. lab settings)
  • Balance between user feedback and librarian expertise

The purpose of this agenda is not to reinvent the usability wheel for libraries, but to take into account the usability research that is being applied to other web sites and to use what is relevant for libraries, as well as to determine where the usability needs of digital libraries differ, and to meet those needs. We invite the library community to join us in pursuing these goals.

References

1. Kathleen Collins and Jose Aguinaga, "Learning as We Go: Arizona State University West Library's Usability Experience," in Usability Assessment of Library-Related Web Sites: Methods and Case Studies, ed. Nicole Campbell (Chicago: American Library Association, 2001), 16-29.

2. Louise McGillis and Elaine G. Toms, "Usability of the Academic Library Web Site: Implications for Design," College & Research Libraries 62 (July 2001): 355-67.

3. Angi Faiks and Nancy Hyland, "Gaining user insight: A case study illustrating the card sort technique," College & Research Libraries 61 (July 2000): 349-57.

4. Brenda Battleson, Austin Booth, and Jane Weintrop, "Usability Testing of an Academic Library Web Site: A Case Study," Journal of Academic Librarianship 27 (May 2001): 188-98.

5. Janet Chisman, Karen Diller, and Sharon Walbridge, "Usability testing: A case study," College & Research Libraries 60 (November 1999): 552-69.

6. George Buchanan, "Report on the Sixth European Conference on Digital Libraries," D-Lib Magazine 8 (October 2002). Available online at: http://www.dlib.org/dlib/october02/buchanan/10buchanan.html

7. Robert J. Sandusky, "Digital Library Attributes: Framing Usability Research," in Usability of Digital Libraries: A workshop at JCDL 2002 (n.p., [2002]), 35-38.

8. Kyunghye Kim, "A Model-based Approach to Usability Evaluation for Digital Libraries," in Usability of Digital Libraries: A workshop at JCDL 2002 (n.p., [2002]) 33-34.

9. Susan Augustine and Courtney Green, "Discovering How Students Search a Library Web Site: A Usability Case Study," College & Research Libraries 63 (July 2002): 354-65.

10. Battleson, Booth, and Weintrop, "Usability Testing of an Academic Library Web Site."

11. Jakob Nielsen, Usability Engineering (San Diego: Academic Press, 1993).

12. Jeffrey Rubin, Handbook of Usability Testing: How to Plan, Design, and Conduct Effective Tests (New York: John Wiley, 1994).

13. Joseph S. Dumas and Janice C. Redish, A Practical Guide to Usability Testing (Exeter: Intellect, 1999).

14. Ingeborg T. Sølvberg et al., "Report of Breakout Group on Metrics and Testbeds.," part of Fourth DELOS Workshop. Evaluation of Digital Libraries: Testbeds, Measurements, and Metrics (June 2002). Available online at: http://www.sztaki.hu/conferences/deval/presentations/Breakout_metrics.doc

15. Rubin. Handbook of Usability Testing. 260-63.

16. Diane Kelly and Colleen Cool, "The effects of topic familiarity on information search behavior," in Proceedings of the Second ACM/IEEE Joint Conference on Digital Libraries (New York: Association for Computing Machinery, 2002): 74-75.

17. Elaina Norlin and CM! Winters, Usability Testing for Library Web Sites: A Hands-On Guide (Chicago: American Library Association, 2002): 44.

18. Online Computer Library Center, "Remote Usability Testing," HCI at OCLC ([Dublin, OH]: OCLC, 2003). Available online at http://www.oclc.org/usability/remotetesting/

19. Battleson, Booth, and Weintrop, "Usability Testing of an Academic Library Web Site," 190.

20. McGillis and Toms, "Usability of the Academic Library Web Site: Implications for Design."

21. LUII: Library User Interface Issues, http://www.cochran.sbc.edu/luii/

22. Christine Borgman et al., "Report on Breakout Group: Evaluating Digital Library Users and Interfaces," part of Fourth DELOS Workshop. Evaluation of Digital Libraries: Testbeds, Measurements, and Metrics (June 2002). Available online at: http://www.sztaki.hu/conferences/deval/presentations/breakout_grp_summary.doc

23. Norlin and Winters, Usability Testing for Library Web Sites: A Hands-On Guide. 3.

24. Norlin and Winters. Usability Testing for Library Web Sites. viii.

« May 2012 »
Su Mo Tu We Th Fr Sa
12345
6789101112
13141516171819
20212223242526
2728293031