Botanical Cyberinfrastructure: Issues, Challenges, Opportunities, and Initiatives
Beaman, Reed S. , Cellinese, Nico , Heidorn, P. Bryan , Guo, Youjun , Green, Ashley M. , Thiers, Barbara .
HERBIS: Integrating digital imaging and label data capture for herbaria.
THE HERBIS (www.herbis.org) project integrates automated document capture technology in a suite of tools that tightly integrate digital imaging and label data capture from herbarium specimens. Only about five million of the hundred million plus specimens in herbaria worldwide are available in the electronic domain via portals such as the Global Biodiversity Information Facility. Our proof of concept research centers on automated metadata capture directly from specimen images using Optical Character Recognition (OCR), Natural Handwriting Recognition (NHR), and Natural Language Processing (NLP). Results to date include integration of image processing tools implemented as web services (including embedded third-party OCR software) accompanied by cross-platform web services that others involved in herbarium digitization can use. Searchable text extracted through OCR and specimen images are available through a project web interface. We are currently looking at methods to statistically evaluate OCR success and are establishing training sets for the NLP implementation. We are in the early stages of integrating third-party NHR software. By wrapping image processing, image to text conversion, and data markup capabilities into distributed interoperable web services, greater efficiency, portability, and scalability are achieved. The HERBIS project is a collaboration between Yale University, the University of Illinois at Urbana-Champagne, and the New York Botanical Garden funded through the National Science Foundation.[c.e.:srb]
Log in to add this item to your schedule
1 - Yale University, Peabody Museum of Natural History, Biodiversity Informatics, Po Box 208118, New Haven, Connecticut, 06520-8118, USA
2 - Yale University, Peabody Museum of Natural History, Botany Division, Po Box 208118, New Haven, Connecticut, 06520-8118, USA
3 - University of Illinois at Urbana-Champaign, Graduate School of Library and Information Science, 501 East Daniel St. MC-493, Champaign, Illinois, 61820-6212, USA
4 - New York Botanical Garden, Institute of Systematic Botany, 200Th Street & Southern Boulevard, Bronx, New York, 10458-5126, USA
specimen data capture
Presentation Type: Symposium or Colloquium Presentation
Location: 206/Performing Arts Center
Date: Monday, July 31st, 2006
Time: 10:15 AM