CAPM Workshop Report
The report from the Johns Hopkins University (JHU) workshop on automated document scanning systems, with a focus on CAPM.
On May 14, 2001, a group of library practitioners and administrators, economists, engineers and scholars gathered at The Johns Hopkins University (JHU) for a workshop on automated document scanning systems, with a focus on CAPM. A list of participants appears in Table 1. The first half of the workshop involved presentations by the JHU-University of Colorado at Boulder (UC) research team covering the system's design, a cost analysis of the system, and the results of a multiattribute survey used to assess patron benefits associated with CAPM. Discussions of the results by the workshop participants followed each presentation. In the second half of the workshop, the participants focused on four broad questions that capture the significant issues in the development of CAPM. The questions were:
- What library services or cost s avings might CAPM provide?
- How ought the value of new services associated with CAPM be assessed?
- What are the issues in implementing the system?
- What are the most important remaining research needs?
Structured Discussion
A structured discussion method was conducted to obtain input from the workshop participants regarding specific issues under each of the four questions. The method used was the Nominal Group technique (Delbecq et al., 1975) which is designed to efficiently generate ideas, and to ensure wide participation by all group members in the discussions. The method ensures that each person had an opportunity to offer their thoughts and that the discussion does not become dominated by a few individuals. We have had favorable feedback from participants in other workshops who had used the method (e.g., Beim and Hobbs, 1997; Hobbs and Horn, 1997).
In the discussion, the group was presented one question at a time. Each participant was then asked to write down brief responses to the question, which were collected and compiled into lists. The lists were projected onto a screen and discussed one at a time in a round-robin format. Through the discussions each list was edited and revised. Each list appears in the results sections of this report. After each discussion, participants were asked to name and rank on paper the issues within each list they felt were most important. The structure of the lists and the exact method of voting differed slightly for each question; details are provided in the sections below. The votes were collected and analyzed to determine if there are certain issues that were clearly seen as important by a majority of the participants. A discussion of the analysis follows.
Results--Question 1: Services
As can be seen in Table 2, the issues provided in response to Question 1 were clustered into eight general topics. Workshop attendees were asked to name the five most important topics and then rank them. In addition, most of these topics contain lists of subtopics. The topic "access" was discussed extensively, resulting in a long list of subtopics. Given the interest in the issue of access, participants were also asked to rank what they felt were its five most important subtopics. Voting on subtopics was not done for any of the other general topics.
Table 2: Responses to Question 1- Cost (especially if archived electronicaly, and done collaboratively)
- Institutional cooperation
- Time
- Access
| Cost (especially if archived electronicaly, and done collaboratively) |
| Institutional cooperation |
| Time |
| Access · Encourage Use · Full text search · Table of /contents indexing (browsability); linking of materials via artifical · intelligence (machine learnning, auto doc analysis) · Access to un-indexed, less accessiblee older materials; mmake more useful, influential · Access to fragile or otherwise restricted materials · Easier browsing · Registry of images · OCR implications for audio books, Braille · Spillover: bettter cattaloging improves access to other materials · Widen the audience for collections | Distance Learning | Facilitate analysis of usage statistics (understand users and needs better) | Less boring work for employees remaining | Preservation |
Given the implications that CAPM has for issues related to access and the extensive discussion that took place during the workshop on this issue, it is not surprising to find that participants overwhelmingly designated improved access as the most important library service that CAPM can provide (see Table 3). Nearly all votes (22 of 23) indicated that access was the most important issue. The voting also shows that participants saw some importance in the issues of cost savings, and institutional cooperation. However, it is clear that improved access is the most significant service that CAPM might offer.
Table 3: Highest Ranked Topics from Question 1
| Subtopic | Weighted Sum of Votes 1 | First Place Votes |
| Access | 114 | 22 |
| Institutional Cooperation | 66 | 1 |
While the results for the main topics of Question 1 are clear, the results for the access subtopics are not. Participants were more divided in their opinions on the subtopics. The four subtopics ranked highest (see Table 4) were (1) encourage use, (2) full text searching, (3) table of contents indexing, (4) easier browsing. Each of these topics received several first and second place ranks.
Table 4: Highest Ranked Subtopics| Subtopic | Weighted Sum of Values | First Place Votes |
| Table of contents indexing (browsability; linking of materials via artificial intelligence (machine learning, auto doc analysis) | 68 | 6 |
| Encourage use | 66 | 7 |
| Full text search | 59 | 2 |
| Easier browsing | 48 | 3 |
Results--Question 2
Question 2 focused on the matter of assessing the value of the services CAPM will provide.
The structure of the list (see Table 5) and, hence, the structure of the voting
differed from that of Question 1. The suggestions given for Question 2 were
clustered into only three categories: (1) user focused valuation, (2) comparative
valuation, and (3) prototype testing. "User focused valuation" pertains to
benefits that would be realized directly by end users, i.e., library patrons.
As can be seen from table 5, the subtopics include convenience to the user,
timeliness, and ease of use, among others. "Comparative valuation" refers
to the possibility of assessing CAPM's value by comparing its performance
to other document delivery systems and projects such as JSTOR. "Prototype
testing" would involve implementing the system and examining various outcomes
such as the extent of system use by library patrons.
Participants were asked to provide the 5 most important issues out of the entire list of
41 topics. As with the voting for Question 1, they were asked to rank their
five choices in order of importance.
With so many issues from which participants could choose, the likelihood of one
or two clear winners emerging is diminished. However, a few points from the
list did garner more votes than the rest (see table 6). User satisfaction,
topic B on the list, received some emphasis in the voting. Valuation by this
method could be done with a user satisfaction survey. Comparing CAPM to current
systems (Topic M) under the comparative valuation heading was also seen as
an important way to assess the value of CAPM. Finally, many participants felt
that assessing the value added to library collections through, for example,
measuring increases in utilization, is a useful way to value the benefits
that CAPM can bestow.
Although the participants were not asked to vote on the three general categories, the
manner in which participants allocated their votes across the categories can
be examined. Such information might provide some insight into the relative
importance of the three categories. As table 7 shows, participants emphasized
user-focused valuation and comparative valuation more heavily than they emphasized
prototype testing. While the three categories are roughly the same size in
terms of the number of subtopics, the total weighted votes for "user focused
valuation" is more than double that of "prototype testing", while the total
weighted votes for "comparative valuation" is nearly three times that of "prototype
testing".
Table 5: Responses to Question 2
| User Focused Valuation | Prototype, "small-scale" |
| User Satisfaction | Charging students with real choices (e.g., current ttime for delivery) |
| Convenience to Users | Not spending too much time on simulating value, but implement on a pilot basis and test with actual cost-benefit ratio |
| Timeliness | Test willingness to pay by access fee depending on time of day |
| "My" time versus cost of enhanced system (consideration of scholarly methods for conducting research) | Ask other universities regarding their > (potential) payments for access to CAPM materials (e.g., per book, other models) |
| Ease for user | Does use result in higher quality scholarship and instruction? |
| Does system encourage students/faculty to pursue different questions (how to track this change?) | Measure second (and subsequent) requests for previously low-use titles in remote storage |
| Value of new research methods from searching across scanned materials |
Need to determine (possible) increased use of materials |
| Assessment of existing OCR capabilities (error...); what nontext material was lost/not captured | Percent decrease/increase in use of off-site materials |
| False hits (comparison with present systems) | Use/re-use of digitized materials |
| Comparative valuation | Cost of managing/tracking digital resources (especially once released to users) |
| Comparison to current systems (need to differentiate betweeen fixed and operating costs in relation to implementation and maintenance costs) | Enhanced scanning capabilities |
| Learning from other projects (e.g., JSTOR, Making of America) | Other Ideas |
| Compare waiting time for materials in CAPM-enabled versus traditional off-site shelving | How does this align with University's goals, objective's? |
| More comparison of cost data (Current cost models, current staffing models, ILL cost of figuures) | Comparison of physical impact of human vs. machine page turning |
| Costs compare with costs to build and equip new buildings to house physical collections | Consider uncertainties/possible outcomes |
| Consider both self-service and closed stack environments for comparison | Supporting of U. innovation |
| Value added to collection (increased uitilization, etc.) | Quality dimensions such as image color, image pixel count, and accuracy of OCR |
| Comparison to next best alternatives (e.g., existing roboting retrieview, scannning off-shore) | Reviewing enhancements to wider institution and mission of institution |
| Comparison to commmercial digital providers (cut a deal with copyright holders?) | Measurement of incorrect "picks" by CAPM robot versus human pickers and determine costs of these inaccuracies |
| Abe Charnes model for sharing costs of collections used by non-overlapping constituencies |
Table 6: Highest Ranked Topics for Question 2
| Topic | Weighted Sum of Votes | First Place Votes |
| User Satisfaction | 33 | 4 |
| Comparison to Current Systems | 39 | 4 |
| Value Added to Collections (e.g., increased utilization) | 28 | 2 |
**A total of 20 responses was received for Question 2
Table 7: Voting Across Topic Headings for Question 2
| Topic Heading | Sum of weighted Votes |
| User Focused Valuation | 97 |
| Comparative Valuation | 124 |
| Prototype Testing | 43 |
Results--Question 3: Implementation issues
The round-robin discussion and voting on Question 3 topics was omitted from the workshop due to a lack of time. However, the first step in the Nominal Group technique was conducted and the group provided their ideas regarding Question 3. Table 8 contains the list of responses.
Table 8 Response to Question 3
| System Design | All registered JHU students should have automatic access; non-JHU users can perhaps subscribe |
| Shelve books into separate containers with space separation | Insitutional subscriptions following the JSTOR model |
| Place cour4se reserve materials in easy access area | Where and How to Deploy |
| Scan entire books, provide all page images, search by keyword, return relevant images | Potential area of deployment is new library facilitiies (no existing facility comparison problem) |
| Real-time user manipulation for "remote-control" of item | Need to field test local and consortium implementation to determine relative merits and cost implications |
| Need to determine criteria for material selection appplicable for CAPM | Multi-institution repositories where the participants are of a similar type (e.g., research versus liberal arts, etc.) |
| Consider partially versus fully automated systems | Implement at a wide level such as at the state level (one institution is too small) |
| How will article level request be considered (e.g., will entire volume be scanned)? | Should implement at JHU and begin selling access to other institutions |
| Maintenance of persistent digital files | Implement over the WWW |
| Need better measures for scanning, OCR, presentation processing, markup | Use local implementation to fully assess the cost-effectiveness |
| If partially automated, how do humans and robots prevent mis-shelving (who checks on whom)? | Can be viable commercially for valuable research collections; may not be able to compete with other full-scale systematic digitization approaches |
| Keeping users informed about previously scanned materials | Obstacles and Problems |
| Number of robotic systems (relative to scale) | Obstacles include costs, cost uncertainties |
| CAPM will increase access to low use volumes; what about access to high use volumes | Proprietary attitudes on report of institutions |
| Authentication required for subscription-based models | Administrative resistance |
| Browsing is essential | Initial user resistance |
| Need to adapt to different quality (e.g., brittle, fragile books | Malfuntioning technology |
| Should re-shelve books in a holding area for fast future access | Lack of large spaces for repositories |
| Must define tolerable error rates for scanning | Technology (especially page turner) |
| Must be able to adjust the resolution of scanner to the needs of the page image | Copyright issue |
| Management Structure | Licensing issues from users |
| Institution should absorb costs; can increase overall library fee | Quality of OCR (e.g., foreign scripts, strange fonts) |
| Could charge per use; but other institutions may begin limiting access if JHU (or others) already have electronic files | Bias toward scannable materials |
| Cost sharing could be proportional based on use levels and amount of materials contributed | Extension of traditional storage management, but prohibitive setup and operating costs |
| Access: be careful not to institutionalize the "haves" and "have nots" | Scalability (how to meet potential demand) |
| Subscription model could work if a registry of digital files is available | Displacing people (staff) is an issue |
| Use at cost subscription rates | Vision of no human intervention may be an obstacle |
| Once materials are digitized, anyone who is interested should have access | Copyrights issue is the largest obstacle; set charges based on publisher profits |
| Should be internationally accessible | Current design of shelving facilities is an obstacle |
| Could share use of type-specific page-turners | Lack of page-turner is significant issue to resolve |
The responses to Question 3 were clustered into four categories. "System Design" refers to details of how the system itself will operate, including issues such as how to shelves the books and how much control users should have over the scanning process. "Management Structure" refers to how the system will be paid for and who will have access. Issues related to location and scale of implementation fall under "Where and How to Deploy". Finally, the last category, "Obstacles and Problems", covers issues that may be problematic in the implementation or management of the CAPM system.
Results--Question 4: Research NeedsQuestion 4 addressed the issue of future research needs. The list produced for Question 4 was similar to that for Question 2 in that it comprised many topics. There were seven clusters for Question 4:
- implications for the use of library collections
- costs
- the connections between valuation results and actual human behavior
- copyright issues
- inter-organizational cooperation
- preservation
- technology issues
Table 9 shows the entire list of 48 topics while Table 10 presents thhe results of the voting.
Table 9: Question 4 ResponsesA.
| Collection Use | Off the shelf Digital Rights Management systems (evaluation of their capability) |
| How many used more than 1x? | Cooperation |
| Use of titles before/after digitization | Interorganization cooperatin, business model (fee structure, how to make self-financing) |
| Survey of how materials were used (searched, used for research--not just hits) | Effect of fees on affordable delivery |
| Costs | Would publishers pay for "reputable" archiving function? |
| Relax existing building constraint, other shelving facilities | Capturing scale of economies/national system, market |
| Controlled field testing | Preservation |
| Compare between various modes of service (and compare with traditional modes) | Long term preservation needs of scanned items (interfaces, storage, creation of appropriate formats and associated metadata; develop plan) |
| Cost of CAPM features versus alternatives: e.g., robotic picking versus human | Risk of items getting permanently lost |
| Scanning proactively versus reactively | Are there preservation benefits? |
| Would comprehensive off-shore scanning beat any CAPM | Technology |
| Valuation/Human Behavior | Restrictions for Robot |
| Look for examples showing that prediction of willingness to pay accords with subsequent reality | System should be desiggneed to be "distributed" (so multiple institutions): standards, organizational issues |
| Change in demand for services as a result of CAPM | Full book scanning versus automatic index/ToC scanning |
| User wants/needs (specifically--e.g., full text versus ToC) | What percentage of materials not appropriate for handling/scanning? (need to know page turning technology first) |
| Why do people select what they do? | Flow of materials |
| What disciplines most likely to use? | Bottlenecks or processes that slow down production |
| How CAPM would affect student and faculty choice of research areas? | Delivery formats |
| Types of collections that would most benefit; segmenting the market, focusing on certain markets where problems are the least and values are the highest--especially at early stage | Build and test real page turner (what percentage could be served by present commerical off the shelf turners?) |
| How should serials be handled? | Time/Efficiency Studies (actual turnaround/quality) |
| Copyrights | Can we do this for nonbook materials? |
| Model of negotiation of copyright material (Universities might band together and make first move) | Page turning/browsing versus real time; Value of ToC information |
| Intellectual rights | Consider multistage deployment |
| Estimate upper limit on copyright fee at publisher's profit per book sold | What level of human intervention is optimal from the point of view of quality? |
One technology issue was clearly seen as crucial by the participants of the workshop: building and testing a robotic page-turner. The robotic page-turner is the final technology needed to make the CAPM system fully automated. The ability to successfully develop a page-turner has important implications for the future of CAPM since it is likely to speed the system and allow it to be more available for use. The importance of this issue was clearly emphasized in the workshop, with over half the responses ranking this issue as most important. The group suggested that without the robotic page-turner, CAPM is little different from existing retrieval systems such as the AS/R system at the University of California--Northr$ The addition of the page-turner, however, would make CAPM fundamentally different since it would make the system fully automated.
Participants identified several specific issues to consider in the development of the page-turner. Most obvious, and perhaps most important, is the ability of a robot to safely handle the pages of books. The rate at which the page-turner can operate is another important matter. There is likely a tradeoff between page-turner speed and preservation of the books; this tradeoff will have to be considered in the design of the robot. Determining acceptable error rates (i.e., missed pages) is another matter that will be crucial in the design of the robotic page-turner. Participants also discussed cost issues associated with the page-turner and were interested in how the costs and performance of commercially available page-turners would compare to the costs and performance of a model designed specifically for CAPM.
Table 10: Results of Question 4 Voting| Topic | Total Weighted Votes | First Place Votes |
| Building and Testing Robotic Page-Turner | 107 | 12 |
| Model of Negotiation for Copyrights | 53 | 2 |
| Model for Inter-Organizational Cooperation | 47 | 0 |
| Long-term Preservation Needs of Digital Files | 41 | 1 |
** A total of 23 responses were received for Question 4
Other topics received some emphasis as well. Copyright limitations could impede the ability of a system like CAPM to reach its full potential. Attendees at the workshop discussed this matter extensively. It was noted that the value of CAPM would be enhanced if the digital files for scanned publications could be retained. If these files could be kept requests for items already digitized could be processed faster, providing more convenient access to the user. However, the ability to save these files hinges on whether there will be copyrights restrictions. It is possible that arrangements could be made with holders of copyrights, but it is unclear regarding the nature of such arrangements. Conducting research to identify a model of copyright negotiation was, therefore, selected widely in the voting.
Two other issues received some emphasis in the voting for Question 4. The first relates to developing a model for inter-organizational cooperation. It is hoped that CAPM could eventually foster the establishment off-site shelving consortia. Developing a business model for organizing the cooperation needed to manage such consortia was cited in the voting, as an important research need. The other relates to the long-term preservation needs of digitized information. There are technological and management challenges in preserving the long-term usability of digital information. Developing systems to meet those challenges was seen as a significant matter warranting research attention.
The Future of CAPM: Conclusions from the WorkshopSeveral conclusions can be drawn from the input received at the workshop. First, access is the key issue in thinking about the services that CAPM can provide. CAPM might impact access in various ways depending on how it is implemented and utilized. Given the growing popularity of digital technologies, CAPM might encourage more use of library collections. With CAPM, patrons who prefer online information may begin to utilize sources they previously ignored. Access can also be improved relative to traditional libraries with CAPM because, with the use of full text searching and indexing, it will introduce the ability to browse the stacks of off-site shelving facilities. Thus, it might actually enhance user accessibility to move volumes from a library to an off-site CAPM enhanced facility. Determining which attributes of the system impact access and how is an important next step. Given the emphasis placed on this issue, it is important that CAPM's implications for improved user access to library collections be a major focus of future work.
Another issue emphasized at the workshop relates to the automatic page-turner. Without the page turner, CAPM is not fully automated and many of the benefits it could offer may go unrealized. Development of the robotic page-turner will be one of the next steps in the implementation of CAPM. In developing the page-turner, issues such as safety, speed, and error rates must be carefully considered.
The need to assess the value of the benefits that CAPM will introduce received a great deal of attention during the workshop. This is certainly a central issue in the development and implementation of CAPM. Though the participants were divided in their opinions about this matter, as evidenced by the diversity of the rankings in response to Question 2, some conclusions can be formed. First, there must be a significant focus on library users in any assessment of CAPM benefits. A user satisfaction survey after the system is implemented is one possible method. Currently, there are plans at JHU to further utilize multiattribute methodologies to measure possible benefits before CAPM is implemented. Thus far, a contingent valuation survey has been used to assess CAPM's potential benefits to patrons. The survey elicited willingness-to-pay values for various configurations of CAPM attributes. However, it can be argued that money is not the most appropriate medium for assessing user values for library services (see Saracevic and Kantor, 1997a, 1997b). The multiattribute methods used in future valuation work will ask users to tradeoff various attributes of library services themselves, rather than stating preferences in terms of money. The findings of such surveys will augment the monetary-based information already gathered and will enrich our understanding of how users value the services that CAPM will offer.
The second conclusion that can be drawn from the input received about valuation is that comparison with other systems should be used to assess CAPM's value. There are currently many systems or projects, both automated and manual, for retrieving and delivering printed information. Interlibrary loan, online services such as Uncover, JSTOR, and the AS/R system at UC Northridge are some examples. Assessing how CAPM's services, costs, and benefits compare to some of these systems would offer insight into the value of CAPM.
Of course, all these issues speak to the future research needs of the CAPM system, but the participants of the workshop offered other specific research issues. First and foremost, is the development of the page-turner, which was discussed above. Another issue that manifested itself in several ways was the role that copyrights might play in the implementation of CAPM. The interplay of digital technologies and copyrights is a widely discussed matter in today's world. How the matter is resolved can become a hindrance to the implementation of CAPM, or it can become a significant aid. Involvement in the resolution of the copyrights-technology issue could be a worthwhile part of future work within the CAPM project. Also, since fostering library consortia is part of the long-term vision for CAPM, research into effective organizational structures for such consortia has implications for the CAPM system. When such consortia begin to receive more in-depth consideration, investigations into cooperative structures should be done. Efficiently designed consortia can offer costs savings and increased patron access to larger collections.
The results from the workshop show that there is great interest in CAPM, which is indicative of the growing interest in digital technologies within the library community in general. The input received from the workshop will prove valuable in the continuing development of the CAPM system. Proposals have already been submitted to begin development of the page-turner and to further assess the benefits library patrons might derive from CAPM. More work will be done to address some of the other important matters such as CAPM's impact on access, and copyrights issues.
CitationsG.K. Beim and B.F. Hobbs, "Event Tree Analysis of Lock Closure Risks," J. Water Resources Planning & Management, 123(3), May 1997, 169-178.
A. Delbecq, A. Van de Ven, and D. Gustafson, Group Techniques for Program Planning: A Guide to Nominal Group and Delphi Processes, Scott Foresman and Co., Glenview, IL, 1975.
B.F. Hobbs and G.T.F. Horn, "Building Public Confidence in Energy Planning: A Multimethod MCDM Approach to Demand-Side Planning at BC Gas,",Energy Policy , 25(3), Feb. 1997, 357-375.
T. Saracevic, and P. B. Kantor. 1997a. Studying the value of library and information services. Part I. Establishing a theoretical framework. Journal of the American Society for Information Science. 48 (6): 527-542.
T. Saracevic, and P. B. Kantor 1997b. Studying the value of library and information services. Part II. Methodology and Taxonomy. Journal of the American Society for Information Science. 48 (6): 543-563.
1The weighted sum of votes is calculated using the following formula:
sum = (# of 1st place votes)(5) + (# of 2nd place votes)(4) + (# of 3rd place votes)(3) + (# of 4th place votes)(2) + (# of 5th place votes)(1).
For Question 4, participants were asked to rank 7 issues, therefore the formula is: sum = (# of 1st place votes)(7) + (# of 2nd place votes)(6) +...+ (# of 7th place votes)(1).