Digital Libraries

Digital Libraries

Research Goal

To develop techniques to enhance information retrieval and knowledge management of large digital collections. Our work has included portal building initiatives in a wide variety of domains and in multiple languages testing collection building, search, visualization, and analysis techniques.

 

For related projects, see our OOHAY! page.

Acknowledgements

We would like to thank the many collaborators who have been a part of our digital library initiatives and recognize their significant contributions to the work including:

  • CITIDEL (Computing and Information Technology Interactive Digital Educational Library), Ed Fox, Rao Shen from Virginia Tech and Dr. Lillian Cassel for their contributions to the GetSmart project
  • National Center for Super Computer Applications (NCSA) for allowing us access to their super computer and file storage systems.
  • Compendex, Inspec, NCM and BIOSIS for the use of bibliographic information from their databases to serve as a testbed for application development.

Approach and Methodology

Testbed:

  • 110,000+ Internet home pages from the entertainment section of Yahoo!
  • 1M citations and abstracts from the Compendex and Inspec databases

Techniques:

  • Automatic Indexing: stop wording and algorithmic index phrase formation.
  • Concept Space: index phrase co-occurrence information is used to generate an automatic thesaurus for search term suggestions.
  • Kohonen Self-Organization Map (SOM): algorithms construct single- and multi-layer self-organizing maps for information categorization and visualization. 
  • Automatic summarization: key sentences are extracted from a document to help users assess relevance and quickly digest available information.
  • Meta-Search and Meta-Search Based Collection Techniques: available online resources are intelligently combined to create focused collections and increase search coverage.
  • Concept Mapping: personal knowledge models are expressed in concept maps to support learning from digital libraries. 
  • The GetSmart system brings together basic curriculum functions, concept mapping and advanced information retrieval techniques in a web-based learning environment. One important goal is to support the accumulation of knowledge and educational processes using matching and merging algorithms. 
  • Multi-Lingual issues have been explored practically and theoretically in the development of portals in several languages.

Funding

Funding for this research was received from the following sources:

DUE-0226344 10/1/2002-09/30/2004
NSF National SMETE Digital Library $699,996
"An Active Object-based Digital Library for Microeconomics Education"
 
IIS-0302353 2/1/03 – 5/31/04
National Science Foundation $92,965
“SGER: DGPort: Intelligent Web Searching for Digital Government Research.”
 
DUE-0121741, Program 7444. 9/1/01-8/31/03
NSF National SMETE Digital Library $398,956
"Intelligent Collection Services for and about Educators and Students: Logging, Spidering, Analysis and Visualization”
 
CTS-0204375 12/20/01 - 12/31/02
NSF/NSE/SGER $99,980
"NanoPort: Intelligent Web Searching for Nanoscale Science and Engineering"
 
IIS-9800696 8/18/98 - 7/31/02
NSF/CISE/CSS $274,164
"An Intelligent CSCW Workbench: Analysis, Visualization, and Agents"
 
IIS-9817473 5/1/99 - 4/31/2002
National Science Foundation $500,000
"DLI –Phase 2: High Performance Digital Library Classification Systems: From Information Retrieval to Knowledge Management"
 
IRI 94-11318COOP 9/1/94 - 8/31/99
University of Illinois/sub-NSF $678,041
"Building the Interspace: Digital Library Infrastructure for a University Engineering Community"
 
N6601-97-C-8535 7/1/97 - 6/30/00
University of Illinois/sub-DARPA $1,078,991
"The Interspace Prototype: An Analysis Environment based on Scalable Semantics"
 
IRI9525790 9/1/95 - 8/31/98
NSF/CISE/IRIS $200,755
"Concept-based Categorization and Search on Internet: A Machine Learning, Parallel Computing"
 
IRI9411330 3/1/96 - 4/30/98
University of California at Santa Barbara/sub-NSF $49,220
"Supplement to Alexandria DLI Project: A Semantic Interoperability Experiment for Spatially-Oriented Multimedia Data"

Team Members

Dr. Hsinchun Chen hchen@ai.eller.arizona.edu
Wingyan Chung  
Guanpi Lai (Greg)  
Gondy Leroy  
Chienting Lin  
Byron Marshall  
Daniel McDonald  
Thian-Huat Ong  
Jialun Qin  
Yilu Shou  
Fiona Sung  
Wei Xi  
Jie Xu  

Publications

  1. B. Marshall, H. Chen, T. Madhusudan, "Matching Knowledge Elements in Concept Maps using a Similarity Flooding Algorithm", accepted for publication in Decision Support Systems, Vol.42, No.3, Dec 2006, Pages 1290-1306
  2. Byron Marshall, Dan McDonald, Hsinchun Chen, Wingyan Chung, "EBizPort: Collecting and Analyzing Business Intelligence Information," Journal of the American Society for Information Science and Technology (JASIST). Special Issue on Document Search Interface Design for Large-scale Collections and Intelligent Access, 2004, Volume 55, Issue 10, (2004), 873-891.
  3. Wingyan Chung, Yiwen Zhang, Zan Huang, Gang Wang, Thian-Huat Ong, Hsinchun Chen. "Internet for Information Science and Technology," JASIST, Special Issue on Information Seeking, Volume 55, Issue 9, (2004), 818-831.
  4. Byron Marshall, Therani Madhusudan,"Element Matching in Concept Maps," Proceedings of the fourth ACM and IEEE Joint Conference on Digital Libraries (JCDL-2004) June 7-11, Tucson, AZ
  5. Byron Marshall, Yiwen Zhang, Hsinchun Chen, Ann Lally, Rao Shen, Edward Fox and Lillian Cassel. "Knowledge Management and E-Learning: the GetSmart Experience," Presented at the Third ACM and IEEE Joint Conference on Digital Libraries (JCDL-2003), Houston, May 2003.
  6. Chun Q. Yin, L. Dwayne Nickels, Charles Zhi-kai Chen, T. Gavin Ng, Hsinchun Chen. "DGPort: A Web Portal for Digital Government." Presented at dg.o2003, Digital Government Conference, May 18-21, 2003, Boston, MA.
  7. H. Chen, A. Lally, B. Zhu and M. Chau. "HelpfulMed: Intelligent Searching for Medical Information over the Internet." Journal of the American Society for Information Science and Technology (JASIST), Volume 54, Issue 7, (2003), 683-694.
  8. K. Tolle and H. Chen, "Comparing Noun Phrasing Techniques for Use with Medical Digital Library Tools," Journal of the American Society for Information Science and Technology (JASIST), Special Issue on Digital Libraries, Volume 51, Number 4, (2000) Pages 352-37.
  9. D. G. Roussinov and H. Chen. "Information Navigation on the Web by Clustering and Summarizing Query Results." Information Processing and Management, Vol 37 Number 6, (2001) 789 - 816.
  10. C. C. Yang, J. Yen, and H. Chen. "Intelligent Internet Searching Engine based on Hybrid Simulated Annealing." Decision Support Systems 28(2000) 269 -277
  11. K. M. Tolle, H. Chen, and H. Chow. "Estimating Drug/Plasma Concentration Levels by Applying Neural Networks to Pharmacokinetic Data setss." Decision Support Systems, Special Issue on Decision Support for Health Care in a New Information Age, Volume 30, Number 2, Pages 139-152, 2000.
  12. L. Houston, H. Chen, B. R. Schatz, R. R. Sewell, K. M. Tolle, T. E. Doszkocs, S. M. Hubbard, and D. T. Ng. "Exploring the Use of Concept Spaces to Improve Medical Information Retrieval," Decision Support Systems, Special Issue on Decision Support for Health Care in a New Information Age, Volume 30, Number 2, Pages 171-186, 2000.
  13. C. Lin, H. Chen and J. F. Nunamaker, "Verifying the Proximity Hypothesis for Self-Organizing Maps," Journal of Management Information Systems, Volume 16, Number 3, Pages 57-70, 2000.
  14. H. Chen, "Introduction to the Special Topic Issue: Part 2, Towards Building a Global Digital Library," Journal of the American Society for Information Science, Special Issue on Digital Libraries, Volume 51, Number 4, Pages 311-312, 2000.
  15. H. Chen. "Digital Libraries," Journal of the American Society for Information Science, Special Issue on Digital Libraries, Volume 51, Number 3, 2000.
  16. G. Leroy, K. M. Tolle, and H. Chen, "Customizable and Ontology-Enhanced Medical Information Retrieval Interfaces," presented at the IMIA WG6 Triennial Conference on Natural Language and Medical Concept Representation, Phoenix, Arizona, December, 1999.
  17. H. Chen, "Semantic Research for Digital Libraries," D-Lib Magazine, Volume 5, Number 10/11, October/November 1999.
  18. B. R. Schatz, W. Mischo, T. Cole, A. Bishop, S. Harum, E. Johnson, L. Neumann, H. Chen, and D. T. Ng, "Federated Search of Scientific Literature," IEEE Computer, Special Issue on Digital Libraries, Volume 32, Number 2, Pages 51-59, February, 1999.
  19. B. R. Schatz and H. Chen, "Digital Libraries: Technological Advances and Social Impacts," IEEE Computer, Special Issue on Digital Libraries, Volume 32, Number 2, Pages 45-50, February, 1999.
  20. H. Chen, J. Martinez, A. Kirchhoff, T. D. Ng, and B. R. Schatz. "Alleviating Search Uncertainty Through Concept Associations: Automatic Indexing, Co-occurrence Analysis, and Parallel Computing," Journal of the American Society for Information Science, Special Issue on "Management of Imprecision and Uncertainty in Information Retrieval and Database Management Systems," Volume 49, Number 3, Pages 206-216, 1998.
  21. H. Chen, "Artificial Intelligence Techniques for Emerging Information Systems Applications: Trailblazing Path to Semantic Interoperability," Journal of the American Society for Information Science, Volume 49, Number 7, Pages 579-581, 1998.
  22. H. Chen, A. L. Houston, R. R. Sewell, and B. R. Schatz, "Internet Browsing and Searching: User Evaluation of Category Map and Concept Space Techniques." Journal of the American Society for Information Science, Special Issue on "AI Techniques for Emerging Information Systems Applications," Volume 49, Number 7, Pages 582-603, 1998.
  23. H. Chen, Y. Zhang, and A. Houston. "Semantic Indexing and Searching Using a Hopfield Net," Journal of Information Science, Volume 24, Number 1, Pages 3-18, 1998.
  24. H. Chen, J. Martinez, T. D. Ng, and B. R. Schatz, "A Concept Space Approach to Addressing the Vocabulary Problem in Scientific Information Retrieval: An Experiment on the Worm Community System," Journal of the American Society for Information Science, Volume 48, Number 1, Pages 17-31, January, 1997.
  25. H. Chen, B. R. Schatz, T. D. Ng, J. P. Martinez, A. J. Kirchhoff, C. Lin, "A Parallel Computing Approach to Creating Engineering Concept Spaces for Semantic Retrieval: The Illinois Digital Library Initiative Project," IEEE Transactions on Pattern Analysis and Machine Intelligence, Special Section on "Digital Libraries: Representation and Retrieval," Volume 18, Number 8, Pages 771-782, August, 1996.
  26. B. Schatz, B. Mischo, T. Cole, J. Hardin, A. Bishop, and H. Chen, "Federating Diverse Collections of Scientific Literature," IEEE Computer Special Issue on "Building Large-scale Digital Libraries," Volume 29, Number 5, Pages 28-36, May, 1996.
  27. B. Schatz and H. Chen, "Building Large-Scale Digital Libraries," IEEE Computer, Special Issue on "Building Large-scale Digital Libraries," Volume 29, Number 5, Pages 22-27, May, 1996.
  28. H. Chen, C. Schuffels, and R. Orwig, "Internet Categorization and Search: A Machine Learning Approach," Journal of Visual Communication and Image Representation, Special Issue on "Digital Libraries," Volume 7, Number 1, Pages 88-102, 1996.
  29. H. Chen and T. Ng, "An Algorithmic Approach to Concept Exploration in a Large Knowledge Network (Automatic Thesaurus Consultation): Symbolic Branch-and-bound Search vs. Connectionist Hopfield Net Activation," Journal of the American Society for Information Science, Volume 46, Number 5, Pages 348-369, June 1995.
  30. H. Chen. "Machine Learning for Information Retrieval: Neural Networks, Symbolic Learning, and Genetic Algorithms." Journal of the American Society for Information Science, Volume 46, Number 3, Pages 194-216, April 1995.
  31. H. Chen, B. Schatz, T. Yim, and D. Fye, "Automatic Thesaurus Generation for an Electronic Community System," Journal of the American Society for Information Science, Volume 46, Number 3, Pages 175-193, April 1995.
  32. H. Chen and J. Kim, "GANNET: A Machine Learning Approach to Document Retrieval," Journal of Management Information Systems, Volume 11, Number 3, Pages 9-43, Winter 1994/95.
  33. H. Chen and B. Schatz, "Semantic Retrieval for the NCSA Mosaic," Proceedings of the Second International World Wide Web Conference "94, Chicago, Illinois, October 17-20, 1994.
  34. H. Chen, "Collaborative Systems: Solving the Vocabulary Problem," IEEE COMPUTER, Special Issue on Computer-Supported Cooperative Work, Volume 27, Number 5, Pages 58-66, May 1994
  35. H. Chen, K. J. Lynch, K. Basu, and T. Ng."Generating, Integrating, and Activating Thesauri for Concept-Based Document Retrieval," IEEE Expert, Special Series on Artificial Intelligence in Text-Based Information Systems, Volume 8, Number 2, Pages 25-34, April 1993.
  36. E. Carmel, S. F. Crawford, and H. Chen, "Browsing in Hypertext: A Cognitive Study," IEEE Transactions on Systems, Man, and Cybernetics, Volume 22, Number 5, Pages 865-884, 1992.
  37. H. Chen. "Knowledge-Based Document Retrieval: Framework and Design," Journal of Information Science: Principles & Practice, Volume 18, Number 3, Pages 293-314,June 1992.
  38. H. Chen and V. Dhar, "Cognitive Process as a Basis for Intelligent Retrieval Systems Design," 
  39. Information Processing and Management, Volume 27, Number 5, Pages 405-432, 1991.
  40. H. Chen and V. Dhar, "User Misconceptions of Information Retrieval Systems," International Journal of Man-Machine Studies, Volume 32, Number 6, Pages 673-692, 1990.