While small-scale search engines in specific domains and languages are increasingly desired by Web users, most existing search engine development tools do not support the development of search engines in languages other than English, cannot be integrated with other applications, or rely on proprietary software. A tool that supports search engine creation in multiple languages is thus highly desired. To study the research issues involved, we designed and implemented a toolkit, called SpidersRUs, for multilingual search engine creation. The toolkit consists of a Spider module, an Indexer module, a Search module, a Graphical User Interface module, and an Index Structure. This study demonstrates that the proposed architecture is feasible in effectively and efficiently developing search engines in different language such as Chinese, Spanish, Japanese, and Arabic.
We would like to thank Chia-Jung Hsu for his contribution to this project. We would also like to thank other members of the Artificial Intelligence Lab at the University of Arizona who have tested the toolkit and shared with us their ideas and comments.
Approach and Methodology
In this study, we reviewed related literature and suggested the criteria for an ideal search tool. We proposed an architecturefor a multilingual search engine building tool and implemented it in Java programming language. The design and implementation of the tool consists of a Spider module, an Indexer module, a Search module, a Graphical User Interface module, and an Index Structure. We also conducted a case study on using the tool to develop a medical search engine in Chinese and demonstrated the effectiveness and efficiency of the toolkit.
Funding for this research was received from the following sources:
|IIS-9817473||April 1999 – March 2002|
|NSF Digital Library Initiative-2|
|High-performance Digital Library Systems: From Information Retrieval to Knowledge Management|
|NSF National SMETE Digital Library||$92,965|
|Intelligent Collection Services for and about Educators and Students: Logging, Spidering, Analysis and Visualization|
|Dr. Hsinchun Chen|
- Chau, M., Qin, J., Zhou, Y., Tseng, C., and Chen, H., "SpidersRUs: Automated Development of Vertical Search Engines in Different Domains and Languages," in Proceedings of the ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL'05), Denver, Colorado, USA, June 7-11, 2005.
- Qin, J., Zhou, Y., and Chau, M., "Building Domain-Specific Web Collections for Scientific Digital Libraries: A Meta-Search Enhanced Focused Crawling Method," in Proceedings of the ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL'04), Tucson, Arizona, USA, June 7-11, 2004, pp. 135-141.
- Chau, M., Huang, Z., and Chen, H., "Teaching Key Topics in Computer Science and Information Systems through a Search Engine Project," ACM Journal on Educational Resources in Computing (JERIC), 3(3), 1-14, 2003.
- Chen, H., Fan, H., Chau, M., and Zeng, D., "Testing a Cancer Meta Spider," International Journal of Human-Computer Studies (IJHCS), 59(5), 755-776, 2003.
- Chau, M. and Chen, H., "Comparison of Three Vertical Search Spiders," IEEE Computer, 36(5), 56-62, 2003.
- Chen, H., Lally, A. M., Zhu, B., and Chau, M., "HelpfulMed: Intelligent Searching for Medical Information over the Internet," Journal of the American Society for Information Science and Technology (JASIST), 54(7), 683-694, 2003.
- Chau, M., Zeng, D., Chen, H., Huang, M., and Hendriawan, D., "Design and Evaluation of a Multi-agent Collaborative Web Mining System," Decision Support Systems (DSS), Special Issue on Web Retrieval and Mining, 35(1), 167-183, 2003.
- Chen, H., Chau, M., and Zeng, D., "CI Spider: A Tool for Competitive Intelligence on the Web," Decision Support Systems (DSS), 34(1), 1-17, 2002.
- Chen, H., Fan, H., Chau, M., and Zeng, D., "MetaSpider: Meta-Searching and Categorization on the Web," Journal of the American Society for Information Science and Technology (JASIST), 52(13), 1134-1147, 2001.
- Chau, M., "Spidering and Filtering Web Pages for Vertical Search Engines," in Proceedings of The Americas Conference on Information Systems, AMCIS 2002 Doctoral Consortium, Dallas, Texas, August 8-11, 2002.
- Chau, M., Chen, H., Qin, J., Zhou, Y., Qin, Y., Sung, W. K., and McDonald, D., "Comparison of Two Approaches to Building a Vertical Search Tool: A Case Study in the Nanotechnology Domain," in Proceedings of The Second ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL'02), Portland, Oregon, USA, July 14-18, 2002, pp. 135-144.
- Chau, M., Zeng, D., and Chen, H. "Personalized Spiders for Web Search and Analysis," in Proceedings of the First ACM/IEEE-CS Joint Conference onDigital Libraries (JCDL'01), Roanoke, Virginia, USA, June 24-28, 2001, pp. 79-87.
- Chen, H., Chung, Y., Ramsey, M. and Yang, C. "A Smart Itsy Bitsy Spider for the Web," Journal of the American Society for Information Science, Special Issue on AI Techniques for Emerging Information Systems Applications, Volume 49, Number 7, Pages 604-618, 1998.
- Chen, H., Chung, Y., Ramseym, M. and Yang, C. "An Intelligent Personal Spider (Agent) for Dynamic Internet/Intranet Searching," Decision Support Systems, Volume 23, Pages 41-58, 1998.