Dark Web and GeoPolitical Web Research

Dark Web and GeoPolitical Web Research

 

The Dark Web Project

Find out more about the Dark Web project:

Research Goal

The AI Lab Dark Web project is a long-term scientific research program that aims to study and understand the international terrorism (Jihadist) phenomena via a computational, data-centric approach. We aim to collect "ALL" web content generated by international terrorist groups, including web sites, forums, chat rooms, blogs, social networking sites, videos, virtual world, etc.  We have developed various multilingual data mining, text mining, and web mining techniques to perform link analysis, content analysis,  web metrics (technical sophistication) analysis, sentiment analysis, authorship analysis, and video analysis in our research.  The approaches and methods developed in this project contribute to advancing the field of Intelligence and Security Informatics (ISI). Such advances will help related stakeholders to perform terrorism research and facilitate international security and peace. It is our belief that we (US and allies) are facing the dire danger of losing the "The War on Terror" in cyberspace (especially when many young people are being recruited, incited, infected, and radicalized on the web) and we would like to help in our small (computational) way.

Funding

We thank the following agencies for providing research funding support.

Defense Threat Reduction Agency, WMD Intent Identification and Interaction Analysis Using the Dark Web (HDTRA1-09-1-0058) (July 2009 - July 2012)
 
Air Force Research Lab, Dark Web WMD-Terrorism Study (Subcontract No. FA8650-02) (July 2009 - July 2012)
 

National Science Foundation (NSF) (September 2003 – August 2010)

* (CRI: CRD) Developing a Dark Web Collection and Infrastructure for Computational and Social Sciences (NSF #CNS-0709338)  
* (EXP-LA) Explosives and IEDs in the Dark Web: Discovery, Categorization, and Analysis (NSF # CBET-0730908) 
* (SGER) Multilingual Online Stylometric Authorship Identification: An Exploratory Study (NSF # IIS-0646942)  
* (ITR, Digital Government) COPLINK Center for Intelligence and Security Informatics Research (partial support)  (NSF # EIA-0326348)

 
Library of Congress, Capture of Multimedia, Multilingual Open Source Web-based At-Risk Content (July 2005 – June 2008)
 
DHS / CNRI, BorderSafe Initiative (partial support) (October 2003 - September 2005)

Acknowledgements

We thank the following academic partners and colleagues for their support, help, and comments. Many of our terrorism research colleagues have taught us much about the significance and intricacy of this important domain. They also help guide us in the development of our scientific, computational approach.

  • Officers and domain experts of Tucson Police Department, Arizona Department of Customs and Border Protection, and San Diego Automated Regional Justice Information System (ARJIS) Program
  • Dr.  Marc Sageman
  • Dr. Edna Reid
  • Dr. Johnny Ryan, The Institute of International and European Affairs (IIEA)
  • Rick Eaton, Simon Wiesenthal Center
  • Dr. Joshua Sinai, The Analysis Corporation
  • Dr. Shlomo Argamon, Illinois Institute of Technology
  • Chip Ellis, Memorial Institute for the Prevention of Terrorism (MIPT)
  • Rex Hudson, Library of Congress
  • Dr. Chris Yang, Drexel University
  • Dr. Gabriel Weimann, University of Haifa, Israel
  • Dr. Mark Last, Ben-Gurion University, Israel
  • Drs. Henrik Larsen and Nasrullah Memon, Aalborg University, Denmark
  • Dr. Katrina von Knop, George Marshall Center, Germany
  • Dr. Jau-Hwang Wang and Robert Chang,  Central Police University, Taiwan
  • Dr. Ee peng Lim, Singapore Management University, Singapore
  • Dr. Feiyue Wang, Chinese Academy of Sciences, China
  • Dr. Michael Chau, Hong Kong University

There has been significant interest from various intelligence, justice, and defense agencies in our computational methodologies, tools, and systems. However, we do not perform (security) clearance-level work nor do we conduct targeted cyber space crime or intelligence investigations. Our research staff members are primarily computer and information scientists from all over the world, and have expertise in more than 10 languages. We perform academic research, write papers (see below), and develop computer programs. We sincerely hope that our work can contribute to international security and peace.

Approach and Methodology

Claims: Dr. Gabriel Weimann of the University of Haifa has estimated that there are about 5,000 terrorist web sites as of 2006. Based on our actual spidering experience over the past 5 years, we believe there are about 50,000 sites of extremist and terrorist content as of 2007, including: web sites, forums, blogs, social networking sites, video sites, and virtual world sites (e.g., Second Life). The largest increase in 2006-2007 is in various new Web 2.0 sites (forums, videos, blogs, virtual world, etc.) in different languages (i.e., for home-grown groups, particularly in Europe).  We have found significant terrorism content in more than 15 languages.

Testbed: We collect (using computer programs) various web contents every 2 to 3 months; we started spidering in 2002. Currently we only collect the complete contents of about 1,000 sites, in Arabic, Spanish, and English languages. We also have partial contents of about another 10,000 sites. In total, our collection is about 2 TBs in size, with close to 500,000,000 pages/files/postings from more than 10,000 sites.

We believe our Dark Web collection is the largest open-source extremist and terrorist collection in the academic world. (We have no way of knowing what the intelligence, justice, and defense agencies are doing.) Researchers can have graded access to our collection by contacting our research center.  

Web sites

Our web site collection consists of the complete contents of about 1,000 sites, in various static (html, pdf, Word) and dynamic (PHP, JSP, CGI) formats. We collect every single page, link, and attachment within these sites. We also collect partial information from about 10,000 related (linked) sites. Some large well-known sites contain more than 10,000 pages/files in 10+ languages (in selected pages).

Forums

We collect the complete contents (authors, headings, postings, threads, time-tags, etc.) of about 300 terrorist forums. We also perform periodic updates. Some large radical sites include more than 30,000 members with close to 1,000,000 messages posted. See a recent poster summarizing our capabilities in analyzing forums.

We have also developed the Dark Web Forum Portal, which provided search access to several international jihadist “Dark Web” forums collected by the Artificial Intelligence Lab at the University of Arizona. Users could search, view, translate, and download messages (by forum member name, thread title, topic, keyword, etc.). Preliminary social network analysis visualization was also available.  The Portal is no longer up and running, but the forum data, in a plain text version, is available through the Data Infrastructure Buildings Blocks for Intelligence and Security Informatics (DIBBs for ISI) project:  see the DIBBs project page for a pointer to the portal.

Blogs, social networking sites, and virtual worlds

We have identified and extracted many smaller, transient (meaning, the sites appear and disappear very quickly) blogs and social networking sites, mostly hosted by terrorist sympathizers and “wannabes.” We have also identified more than 30 (self-proclaimed) terrorist or extremist groups in virtual world sites. (However, we are still unsure whether they are “real” terrorist/extremists or just playing the roles in virtual games.)

Videos and multimedia content

Terrorist sites are extremely rich in content, with heavy usage of multimedia formats. We have identified and extracted about 1,000,000 images and 15,000 videos from many terrorist sites and specialty multimedia file-hosting third-party servers. More than 50% of our videos are IED (Improvised Explosive Devices) related.

Computational Techniques (Data Mining, Text Mining, and Web Mining)

Our computational tools are grouped into two categories:

  • Collection
  • Analysis and Visualization

I. Collection

Web site spidering 
We have developed various focused spiders/crawlers based on our previous digital library research. Our spiders can access password-protected sites and perform randomized (human-like) fetching. Our spiders are trained to fetch all html, pdf, and word files, links, PHP, CGI, and ASP files, images, audios, and videos in a web site. To ensure freshness, we spider selected web sites every 2 to 3 months.

Forum spidering 
Our forum spidering tool recognizes 15+ forum hosting software and their formats. We collect the complete forum including: authors, headings, postings, threads, time-tags, etc., which allow us to re-construct participant interactions. We perform periodic forum spidering and incremental updates based on research needs. We have collected and processed forum contents in Arabic, English, Spanish, French, and Chinese using selected computational linguistics techniques.

Multimedia (image, audio, and video) spidering 
We have developed specialized techniques for spidering and collecting multimedia files and attachments from web sites and forums. We plan to perform stenography research to identify encrypted images in our collection and multimedia analysis (video segmentation, image recognition, voice/speech recognition) to identify unique terrorist-generated video contents and styles.

II. Analysis and Visualization

Social network analysis (SNA) 
We have developed various SNA techniques to examine web site and forum posting relationships. We have used various topological metrics (betweenness, degree, etc.) and properties (preferential attachment, growth, etc.) to model terrorist and terrorist site interactions. We have developed  several clustering (e.g., Blockmodeling) and projection (e.g., Multi-Dimensional Scaling, Spring Embedder) techniques to visualize their relationships. Our focus is on understanding “Dark Networks” (unlike traditional “bright” scholarship, email, or computer networks) and their unique properties (e.g., hiding, justice intervention, rival competition, etc.).

Content analysis 
We have developed several detailed (terrorism-specific) coding schemes to analyze the contents of terrorist and extremist web sites. Content categories include: recruiting, training, sharing ideology, communication, propaganda, etc. We have also developed computer programs to help automatically identify selected content categories (e.g., web master information, forum availability, etc.).

Web metric analysis 
Web metrics analysis examines the technical sophistication, media richness, and web interactivity of extremist and terrorist web sites. We examine technical features and capabilities (e.g., their ability to use forms, tables, CGI programs, multimedia files, etc.) of such sites to determine their level of “web-savvy-ness.” Web metrics provides a measure for terrorists/extremists’ capability and resources. All terrorist site web metrics are extracted and computed using computer programs.

Sentiment and affect analysis 
Not all sites are equally radical or violent. Sentiment (polarity: positive/negative) and affect (emotion: violence, racism, anger, etc.) analysis allows us to identify radical and violent sites that warrant further study. We also examine how radical ideas become “infectious” based on their contents, and senders and their interactions. We reply much on recent advances in Opinion Mining – analyzing opinions in short web-based texts. We have also developed selected visualization techniques to examine sentiment/affect changes in time and among people. Our research includes several probabilistic multilingual affect lexicons and selected dimension reduction and projection (e.g., Principal Component Analysis) techniques.

Authorship analysis and Writeprint 
Grounded in authorship analysis research, we have developed the (cyber) Writeprint technique to uniquely identify anonymous senders based on the signatures associated with their forum messages. We expand the lexical and syntactic features of traditional authorship analysis to include system (e.g., font size, color, web links) and semantic (e.g., violence. racism) features of relevance to online texts of extremists and terrorists. We have also developed advanced Inkblob and Writeprint visualizations to help visually identify web signatures. Our Writeprint technique has been developed for Arabic, English, and Chinese languages. The Arabic Writeprint consists of more than 400 features, all automatically extracted from online messages using computer programs. Writeprint can achieve an accuracy level of 95%.

Video analysis 
Significant portion of our videos are IED related. Based on previous terrorism ontology research, we have developed a unique coding scheme to analyze terrorist-generated videos based on the contents, production characteristics, and meta data associated with the videos. We have also developed a semi-automated tool to allow human analysts to quickly and accurately analyze and code these videos.

IEDs in Dark Web analysis 
We have conducted several systematic studies to identify IED related content generated by terrorist and insurgency groups in the Dark Web. A smaller number of sites are responsible for distributing a large percentage of IED related web pages, forum postings, training materials, explosive videos, etc. We have developed unique signatures for those IED sites based on their contents, linkages, and multimedia file characteristics. Much of the content needs to be analyzed by military analysts. Training materials also need to be developed for troops before their deployment (“seeing the battlefield from your enemies’ eyes”).

The GeoPolitical Web Project

The Geopolitical Web project is a research effort (expanded from Dark Web research) with the ultimate goal of developing computational approaches for monitoring public opinion in regions of conflict, assessing country risk indicators in the social media of fragile or weakening states, and correlating these risk signals with commonly accepted quantitative geopolitical risk assessments. Country risk – the likelihood that a state will weaken or fail – and the methods of assessing it continue to be of serious concern to the international community. Country risk has traditionally been assessed by monitoring economic and financial indicators.  However, social media (such as forums, blogs, and websites) are now important transporters of citizens’ daily conversations and opinions, and as such may carry discernible indicators of risk, but they have been as yet little-used for this task.  We have developed a generic framework (see figure below) for identifying appropriate data sources and conducting analyses. The ultimate goal of our system is to collect data and conduct analysis related to assessing country risk. The proposed system consists of four main components:  information identification and data collection, data representation, analytic approaches, and visualization and analysis tools.  The GeoPolitical Web system currently includes forum data from 14 countries, 6 languages, 70 forums, 650K authors, 3.3M threads, 26M messages. Languages included are: English (12), Arabic (52), French (3), Indonesian (4), Pashto (1), and Urdu (2). Our system supports large-scale automated social media collection and update and multiple forum search, visualization, and machine translation via a common web interface.

Team Members (Selected)

  • Dr. Hsinchun Chen, hchen@eller.arizona.edu
  • Cathy Larson
     
  • Alumni Team Members
    • Ahmed Abbasi
    • Enrique Arevelo
    • Victor Benjamin
    • Alfonso A. Bonillas
    • Wingyan Chung
    • Carrie Fang
    • Edward Huang
    • Kira Joslin
    • Guanpi Lai (Greg)
    • Dan McDonald
    • Jialun Qin
    • Wei Xi
    • Jennifer Jie Xu
    • Lijun Yan
    • Rob Schumaker
    • Danning Hu
    • Dr. Yilu Zhou
    • Arab Salim
    • Edna Reid
    • Lu Tseng
    • David Zimbra

Publications

Press and Media

Dark Web research has been featured in many national, international and local press and media, including: National Science Foundation press, Associated Press, BBC, Fox News, National Public Radio, Science NewsDiscover MagazineInformation OutlookWired MagazineThe Bulletin (Australian), Australian Broadcasting Corporation, Arizona Daily StarEast Valley Tribune, Phoenix ABC Channel 15, and Tucson Channels 4, 6, and 9. See Recognition for links to these and other stories. Our research has been recognized for its contribution to national security.

As an NSF-funded research project, our research team has generated significant findings and publications in major computer science and information systems journals and conferences. However, we have taken great care not to reveal sensitive group information or technical implementation details (specifics). We hope our research will help educate the next generation of cyber/Internet savvy analysts and agents in the intelligence, justice, and defense communities.

A Few Words about Civil Liberties and Human Rights: The Dark Web project is NOT like Total Information Awareness (TIA) (at least we try very hard not to be like it). This is not a secretive government project conducted by spooks. We perform scientific, longitudinal hypothesis-guided terrorism research like other terrorism researchers (who have done such research for 30+ years). However we are clearly more computationally-oriented; unlike other traditional terrorism research that relies on sociology, communications, and policy based methodologies. Our contents are open source in nature (similar to Google’s contents) and our major research targets are international, Jihadist groups, not regular citizens. Our researchers are primarily computer and information scientists from all over the world. We develop computer algorithms, tools, and systems. Our research goal is to study and understand the international extremism and terrorism phenomena.  Some people may refer to this as understanding the “root cause of terrorism.”

 

Books (Monograph, Edited Volumes, and Proceedings)

Intelligence and Security Informatics (ISI) related; Dark Web research included.

  • C. Yang, M. C.-L. Chau, J.-H. Wang, and H. Chen. Security Informatics (Annals of Information Systems vol. 9), Springer, 2010.
  • H. Chen, D. Zeng, and P. Yan. Infectious Disease Informatics: Syndromic Surveillance for Public Health and Biodefense, Springer, 2010.
  • H. Chen and C. Yang (Eds.), Intelligence and Security Informatics: Techniques and Applications, Springer, 2008.
  • H. Chen, E. Reid, J. Sinai, A. Silke, and B. Ganor (Eds.), Terrorism Informatics: Knowledge Management and Data Mining for Homeland Security, Springer, 2008.
  • H. Chen, T. S. Raghu, R. Ramesh, A. Vinze, and D. Zeng (Eds.), Handbooks in Information Systems -- National Security, Elsevier Scientific, 2007.
  • C. Yang, D. Zeng, M. Chau, K. Chang, Q. Yang, X. Cheng, J. Wang, F. Wang, and H. Chen. (Eds.), Intelligence and Security Informatics, Proceedings the Pacific-Asia Workshop, PAISI 2007, Lecture Notes in Computer Science (LNCS 4430), Springer-Verlag, 2007.
  • S. Mehrotra, D. Zeng, H. Chen, B. Thursaisingham, and F. Wang  (Eds.), Intelligence and Security Informatics, Proceedings of the IEEE International Conference on Intelligence and Security Informatics, ISI 2006, Lecture Notes in Computer Science (LNCS 3975), Springer-Verlag, 2006.
  • H. Chen, F. Wang, C. Yang, D. Zeng, M. Chau, and K. Chang (Eds.), Intelligence and Security Informatics, Proceedings of the Workshop on Intelligence and Security Informatics, WISI 2006, Lecture Notes in Computer Science (LNCS 3917), Springer-Verlag, 2006. 
  • H. Chen, Intelligence and Security Informatics for International Security: Information Sharing and Data Mining, Springer, 2006.
  • P. Kantor, G. Muresan, F. Roberts, D. Zeng, F. Wang, H. Chen, and R. Merkle (Eds.),Intelligence and Security Informatics, Proceedings of the IEEE International Conference on Intelligence and Security Informatics, ISI 2005, Lecture Notes in Computer Science (LNCS 3495), Springer-Verlag, 2005.
  • H. Chen, R. Moore, D. Zeng, and J. Leavitt (Eds.), Intelligence and Security Informatics, Proceedings of the Second Symposium on Intelligence and Security Informatics, ISI 2004, Lecture Notes in Computer Science (LNCS 3073), Springer-Verlag, 2004.
  • H. Chen, R. Miranda, D. Zeng, T. Madhusudan, C. Demchak, and J. Schroeder (Eds.),Intelligence and Security Informatics, Proceedings of the First NSF/NIJ Symposium on Intelligence and Security Informatics, ISI 2003, Lecture Notes in Computer Science (LNCS 2665), Springer-Verlag, 2003.

 

Journal Articles (Published and Forthcoming)

2014

  • V. Benjamin, D. Zimbra, and H. Chen, “Bridging the Virtual and Real: The Relationships between Web Content, Linkage, and Geographical Proximity of Social Movements,” Journal of the American Society for Information Science and Technology, Volume 65, Number 11, Pages 2210-2222, 2014.

2013

  • Y. Zhang, D. Yang, and H. Chen, “Examining Gender Emotional Differences in Web Forum Communication,” Decision Support Systems, Volume 55, Number 3, Pages 851-860, 2013.
  • L. Fan, Y. Zhang, Y. Dang, and H. Chen, “Analyzing Sentiments in Web 2.0 Social Media Data in Chinese: Experiments on Business and Marketing Related Chinese Web Forums,” Information Technology and Management, Volume 14, Number 3, Pages 231-242, 2013.

2012

  • T. Fu, A. Abbasi, D. Zeng, and H. Chen, “Sentimental Spidering: Leveraging Opinion Information in Focused Crawlers,” ACM Transactions on Information Systems, Volume 30, Number 4, November, 2012.
  • M. Yang, M. Kiang, H. Chen, and Y. Li, “Artificial Immune System for Illicit Content Identification in Social Media,” Journal of the American Society for Information Science and Technology, Volume 63, Number 2, Pages 256-269, 2012.

2011

  • H. Chen and Y. Zhang, “AI, Virtual Worlds, and Massively Multiplayer Online Games,”IEEE Intelligent Systems, Volume 26, Number 1, Pages 80-82, January/February, 2011.
  • Y. Zhang, Y. Dang, and H. Chen, “Gender Classification for Web Forums,” IEEE Transactions on Systems, Man, and Cybernetics, Volume 41, Number 4, Pages 668-677, July 2011.

2010

  • D. Zimbra, A. Abbasi, and H. Chen, “A Cyber-archeology Approach to Social Movement Research: Framework and Case Study,” Journal of Computer-Mediated Communication, forthcoming, 2010.
  • S. Kaza and H. Chen, “Mapping Ontologies using WordNet and Mutual Information: An Experiment in Public Safety Information Sharing,” Decision Support Systems, forthcoming, 2010.
  • G. Wang and H. Chen, “A Hierarchical Naïve Bayes Model for Approximate Identity Matching,” Decision Support Systems, forthcoming, 2010.
  • H. Chen, Y Zhang, and Y. Dang, “Intelligence and Security Informatics,” Encyclopedia of Library and Information Sciences, forthcoming, 2010.
  • A. Abbasi, H. Chen, and Z. Zhang, “Selecting Attributes for Sentiment Classification using Feature Relation Networks,” IEEE Transactions on Knowledge and Data Engineering, forthcoming, 2010.
  • S. Kaza and H. Chen, “Identifying High-Status Nodes in Knowledge Networks,” Annals of Information Systems, Volume 12, Pages 91-108, 2010.
  • C. Huang, T. J. Fu, and H. Chen, “Text-based Video Content Classification for Online Video-Sharing Sites,” Journal of the American Society for Information Science and Technology, Volume 61, Number 5, Pages 891-906, 2010.
  • T. J. Fu, A. Abbasi, and H. Chen, “A Focused Crawler for Dark Web Forums,” Journal of the American Society for Information Science and Technology, Volume 61, Number 6, Pages 1213-1231, 2010.
  • H. Lu, D. Zeng, and H. Chen, “Prospective Infectious Disease Outbreak Detection Using Markov Switching Models,” IEEE Transactions on Knowledge and Data Engineering, Volume 22, Number 4, Pages 565-577, 2010.
  • H. Chen and D. Zimbra, “AI and Opinion Mining,” IEEE Intelligent Systems, Volume 25, Number 3, Pages 74-76, May/June, 2010.
  • A. Abbasi, Z. Zhang, D. Zimbra, H. Chen, and J. F. Nunamaker, “Detecting Fake Websites: The Contribution of Statistical Learning Theory,” MIS Quarterly, Volume 34, Number 3, Pages 435-461, September 2010.

2009

  • S. Kaza, J. Xu, B. Marshall, and H. Chen, “Topological Analysis of Criminal Activity Networks: Enhancing Transportation Security,” IEEE Transactions on Intelligent Transportation Systems, Volume 10, Number 1, Pages 83-91, 2009.
  • D. Hu, S. Kaza, and H. Chen, “Identifying Significant Facilitators of Dark Network Evolution,” Journal of the American Society for Information Science and Technology, Volume 60, Number 4, Pages 655-665, 2009.
  • A. M. Perez, D. Zeng, C. Tseng, H. Chen, Z. Whedbee, D. Paton, and M. C. Thurmond, “A Web-based System for Near Real-time Surveillance and Space-time Cluster Analysis of Foot-and-mouth Disease and Other Animal Diseases,” Preventive Veterinary Medicine, Volume 91, Number 1, Pages 39-45, 2009.
  • H.-M. Lu, D. Zeng, and H. Chen, "Multilingual Chief Complaint Classification for Syndromic Surveillance: An Experiment with Chinese Chief Complaints," International Journal of Medical Informatics, Volume 78, Number 5, Pages 308-320, 2009.
  • A. Abbasi and H. Chen, “A Comparison of Tools for Detecting Fake Websites,” IEEE Computer, Volume 42, Number 10, Pages 78-86, October 2009.
  • H. Chen and D. Zeng, “AI for Global Disease Surveillance,” IEEE Intelligent Systems, Volume 24, Number 6, Pages 66-69, November/December, 2009.
  • D. Zeng, H. Chen, Z. Cao, F. Wang, X. Zheng, and Q. Wang, “Disease Surveillance Based on Spatial Contact Networks: A Case Study of Beijing 2003 SARS Epidemic,”IEEE Intelligent Systems, Volume 24, Number 6, Pages 77-82, November/December, 2009.
  • Y. Chen, A. Abbasi, and H. Chen, “Framing Social Movement Identity with Cyber-Artifacts: A Case Study of the International Falun Gong Movement,” Annals of Information Systems, Volume 9, Pages 1-24, 2009.
  • Y. Dang, Y. Zhang, H. Chen, P. Hu, S. Brown and C. Larson, "Arizona Literature Mapper: An Integrated Approach to Monitor and Analyze Global Bioterrorism Research Literature," Journal of the American Society for Information Science and Technology, Volume 60, Number 7, Pages 1466-1485, July 2009.
  • D. Hu, S. Kaza, and H. Chen, "Identifying Significant Facilitators of Dark Network Evolution," Journal of the American Society for Information Science and Technology, Volume 60, Number 4, Pages 655-665, April 2009.

2008

  • J. Xu, D. Hu, and H. Chen, “Dynamics of Terrorist Networks: Understanding the Survival Mechanisms of Global Salafi Jihad,” Journal of Homeland Security and Emergency Management, Volume 6, Number 1, 2009.
  • R. Schumaker and H. Chen, “Interaction Analysis of the ALICE Chatterbot: A Two-Study Investigation of Dialog and Domain Questioning,” IEEE Transactions on Systems, Man, and Cybernetics, Volume 40, Number 1, Pages 40-51, 2010.
  • A. Abbasi, and H. Chen, "CyberGate: A System and Design Frame-work for Text Analysis of Computer Mediated Communication," MIS Quarterly (MISQ), Vol. 32 No. 4 (December 2008, Special Issue on Design Science Research), pgs. 811-837.
  • A. Abbasi, H. Chen, S. Thoms, T. Fu, "Affect Analysis of Web Forums and Blogs Using Correlation Ensembles," IEEE Transactions on Knowledge and Data Engineering, Volume 20, Number 9, Pages 1168-1180, September 2008.
  • H. Chen, W. Chung, J. Qin, E. Reid, M. Sageman, and G. Weinmann, "Uncovering the Dark Web: A Case Study of Jihad on the Web," Journal of the American Society for Information Science and Technology, Volume 59, Number 8, Pages 1347-1359, 2008.
  • A. Abbasi, H. Chen, H. A. Salem. "Sentiment Analysis in Multiple Languages: Feature Selection for Opinion Classification in Web Forums," ACM Transactions on Information Systems, Vol. 26, No. 3, Article 12, June 2008.
  • Abbasi, A. and Chen, H., "Writeprints: A Stylometric Approach to Identity- Level Identification and Similarity Detection in Cyberspace," ACM Transactions on Information Systems, Vol. 26, No. 2, Article 7, March 2008 (29 pgs).

2007

  • E. Reid and H. Chen. "Mapping the Contemporary Terrorism Research Domain". International Journal of Human-Computer Studies, 65, Pages 42-56, 2007.
  • J. Qin, Y. Zhou, E. Reid, G. Lai, and H. Chen. "Analyzing Terror Campaigns on the Internet: Technical Sophistication, Content Richness, and Web Interactivity,"International Journal of Human-Computer Studies, 65, Pages 71-84, 2007.
  • E. Reid and H. Chen. "Internet-Savvy U.S. and Middle Eastern Extremist Groups."Mobilization: An International Quarterly, 12(2), pp. 177-192, 2007.
  • R. Schumaker and H. Chen, "Leveraging Question Answer Technology to Address Terrorism Inquiry," Decision Support Systems, Volume 43, Number 4, Pages 1419-1430, 2007.
  • E. Reid and H. Chen, "Internet-savvy U.S. and Middle Eastern Extremist Groups,"Mobilization: An International Quarterly Review, Volume 12, Number 2, Pages 177-192, 2007.
  • T. S. Raghu and H. Chen, "Cyberinfrastructure for Homeland Security: Advances in Information Sharing, Data Mining, and Collaboration Systems," Decision Support Systems, Volume 43, Number 4, Pages 1321-1323, 2007.

2006

  • Reid, E., and Chen, H., "Extremist Social Movements Groups and Their Online Digital Libraries," Information Outlook, Volume 10, Number 6, Pages 57-65, June 2006.
  • Chen, H., Xu, J., "Intelligence and Security Informatics for National Security: A Knowledge Discovery Perspective," Annual Review of Information Science and Technology (ARIST), Volume 40, Pages 229-289, 2006.
  • J. Li, R. Zheng, and H. Chen, “From Fingerprint to Writeprint,” Communications of the ACM, Volume 49, Number 4, Pages 76-82, April 2006.
  • R. Zheng, J. Li, H. Chen, and Z. Huang, “A Framework for Authorship Identification of Online Messages: Writing-Style Features and Classification Techniques,” Journal of the American Society for Information Science and Technology, Volume 57, Number 3, Pages 378-393, 2006.

2005

  • H. Chen and F. Wang, "Artificial Intelligence for Homeland Security," IEEE Intelligent Systems Special Issue on Artificial Intelligence for National and Homeland Security, pp. 12-16, September/October 2005.
  • Abbasi, A., and Chen, H. "Applying Authorship Analysis to Extremist-Group Web Forum Messages," IEEE Intelligent Systems, Special Issue on Artificial Intelligence for National and Homeland Security, pp. 67-75, September/October 2005.
  • Y. Zhou, E. Reid, J. Qin, G. Lai, and H. Chen. “U.S. Domestic Extremist Groups on the Web: Link and Content Analysis,” IEEE Intelligent Systems, Special Issue on Artificial Intelligence for National and Homeland Security, pp. 44-51, September/October 2005.

 

Conference Papers

2013

  • V. A. Benjamin, W. Chung, A. Abbasi, J. Chuang, C. A. Larson, and H. Chen, “Evaluating text visualization: An experiment in authorship analysis,” ISI 2013: 16-20, Proceedings of 2013 IEEE International Conference on Intelligence and Security Informatics, Seattle, Washington, June 2013.

2012

  • H. Chen, “Dark Web: Exploring and Mining the Dark Side of the Web,” International Conference of Formal Concept Theory (ICFCA), Belgium, May 7-10, 2012.
  • H. Chen, “From Dark Web to GeoPolitical Web: Exploring the Value of Social Media Informatics,”European Intelligence and Security Informatics Conference (EISIC), Denmark, August 20-24, 2012.
  • D. Zimbra and H. Chen, “Scalable Sentiment Classification across Multiple Dark Web Forums,” IEEE International Conference on Intelligence and Security Informatics, ISI 2012, Washington, DC, June 2012.
  • M. Yang and H. Chen, “Partially Supervised Learning for Radical Opinion Identification in Hate Group,” Proceedings of 2012 IEEE International Conference on Intelligence and Security Informatics, ISI 2012, Washington, DC, June 2012.
  • J. Woo and H. Chen, “A Event-Driven SIR Model for Topic Diffusion in Web Forums,” Proceedings of 2012 IEEE International Conference on Intelligence and Security Informatics, ISI 2012, Washington, DC, June 2012.
  • P. Hu, D. Wan, Y. Dang, C. Larson, and H. Chen, “Evaluating an Integrated Forum Portal for Terrorist Surveillance and Analysis,” Proceedings of 2012 IEEE International Conference on Intelligence and Security Informatics, ISI 2012, Washington, DC, June 2012.

2011

  • H. Chen, C. Larson, T. Elhourani, D. Zimbra, and D. Ware, “The GeoPolitical Web: Assessing Societal Risk in an Uncertain World,” Proceedings of 2011 IEEE International Conference on Intelligence and Security Informatics, ISI 2011, Beijing, China, July 2011.
  • H. Chen, D. Denning, N. Roberts, C. Larson, X. Yu, and C. Huang, “The Dark Web Forum Portal: From Multi-lingual to Video,” Proceedings of 2011 IEEE International Conference on Intelligence and Security Informatics, ISI 2011, Beijing, China, July 2011.
  • S. Zeng, M. Lin, and H. Chen, “Dynamic User-level Affect Analysis in Social Media: Modeling Violence in the Dark Web,” Proceedings of 2011 IEEE International Conference on Intelligence and Security Informatics, ISI 2011, Beijing, China, July 2011.
  • J. Woo, J. Son, and H. Chen, “An SIR Model for Violent Topic Diffusion in Social Media,” Proceedings of 2011 IEEE International Conference on Intelligence and Security Informatics, ISI 2011, Beijing, China, July 2011.

2010

  • Y. Zhang, S. Zeng, C. Huang, L. Fan, X. Yu, Y. Dang, C. Larson, D. Denning, N. Roberts, and H. Chen, “Developing a Dark Web Collection and Infrastructure for Computational and Social Sciences,” Proceedings of 2010 IEEE International Conference on Intelligence and Security Informatics, ISI 2010, Vancouver, Canada, May 2010.
  • D. Zimbra and H. Chen, “Comparing the Virtual Linkage Intensity and Real World Proximity of Social Movements,” Proceedings of 2010 IEEE International Conference on Intelligence and Security Informatics, ISI 2010, Vancouver, Canada, May 2010.

2009

  • Y. Zhang, Y. Dang, and H. Chen, "Gender Difference Analysis of Political Web Forums: An Experiment on an International Islamic Women's Forum," in Proceedings of the IEEE International Intelligence and Security Informatics Conference (Dallas, Texas, June 8-11, 2009).
  • Y. Zhang, S. Zeng, L. Fan, Y. Dang, C. Larson, and H. Chen, "Dark Web Forums Portal: Searching and Analyzing Jihadist Forums," in Proceedings of the IEEE International Intelligence and Security Informatics Conference (Dallas, Texas, June 8-11, 2009).
  • H. Chen, "IEDs in the Dark Web: Lexicon Expansion and Genre Classification," in Proceedings of the IEEE International Intelligence and Security Informatics Conference (Dallas, Texas, June 8-11, 2009).
  • T. Fu, C. Huang, and H. Chen, "Identification of Extremist Videos in Online Video Sharing Sites," in Proceedings of the IEEE International Intelligence and Security Informatics Conference (Dallas, Texas, June 8-11, 2009).

2008

  • H. Chen, “Nuclear Threat Detection via the Nuclear Web and Dark Web: Framework and Preliminary Study,” Proceedings of the First European Conference on Intelligence and Security Informatics, EuroISI 2008, Esbjerg, Denmark, December 2008.
  • C. Mielke and H. Chen, “Mapping Dark Web Geolocation,” Proceedings of the First European Conference on Intelligence and Security Informatics, EuroISI 2008, Esbjerg, Denmark, December 2008.
  • C. Mielke and H. Chen, “Botnets, and the CyberCriminal Underground,” Proceedings of 2008 IEEE International Conference on Intelligence and Security Informatics, ISI 2008, Taipei, Taiwan, June 2008.
  • T. Fu and H. Chen, “Analysis of Cyberactivism: A Case Study of Online Free Tibet Activities,” Proceedings of 2008 IEEE International Conference on Intelligence and Security Informatics, ISI 2008, Taipei, Taiwan, June 2008.
  • Y. Chen, A. Abbasi, and H. Chen, “Developing Ideological Networks Using Social Network Analysis and Writeprints: A Case Study of the International Falun Gong Movement,” Proceedings of 2008 IEEE International Conference on Intelligence and Security Informatics, ISI 2008, Taipei, Taiwan, June 2008.
  • Chen, H., and Dark Web Team (2008). "IEDs in the Dark Web: Genre Classification of Improvised Explosive Device Web Pages," in Proceedings of the IEEE International Intelligence and Security Informatics Conference (Taipei, Taiwan, July 17-20, 2008). Springer Lecture Notes in Computer Science.
  • Chen, H., and the Dark Web Team (2008). "Discovery of Improvised Explosive Device Content in the Dark Web," in Proceedings of the IEEE International Intelligence and Security Informatics Conference (Taipei, Taiwan, July 17-20, 2008). Springer Lecture Notes in Computer Science.
  • Chen, H. and the Dark Web Team (2008). "Sentiment and Affect Analysis of Dark Web Forums: Measuring Radicalization on the Internet, " in Proceedings of the IEEE International Intelligence and Security Informatics Conference (Taipei, Taiwan, July 17-20, 2008). Springer Lecture Notes in Computer Science.
  • H. Chen, “Cyber Terrorism in Web 2.0: An Exploratory Study of International Jihadist Groups,” Proceedings of 2008 IEEE International Conference on Intelligence and Security Informatics, ISI 2008, Taipei, Taiwan, June 2008.
  • H. Chen, S. Thoms, T. Fu. "Cyber Extremism in Web 2.0: An Exploratory Study of International Jihadist Groups," in Proceedings of the 2008 IEEE Intelligence and Security Informatics Conference, Taiwan, June 17-20, 2008.

2007

  • H. Chen. "Interaction Coherence Analysis for Dark Web Forums," in Proceedings of the 2007 IEEE Intelligence and Security Informatics Conference, New Brunswick, NJ, May 23-24, 2007, p. 342-349.
  • H. Chen. "Categorization and Analysis of Text in Computer Mediated Communication Archives Using Visualization," in Proceedings of the 2007 Joint Conference on Digital Libraries (JCDL), Vancouver, BC, Canada, June 18-23, 2007, p. 11-18.

2006

  • H. Chen, "Visualizing Authorship for Identification," In Proceedings of the Intelligence and Security Informatics: IEEE International Conference on Intelligence and Security Informatics (ISI 2006), San Diego, CA, USA, May 23-24, 2006.
  • H. Chen, "A Framework for Exploring Gray Web Forums: Analysis of Forum-Based Communities in Taiwan," In Proceedings of the Intelligence and Security Informatics: IEEE International Conference on Intelligence and Security Informatics (ISI 2006), San Diego, CA, USA, May 23-24, 2006.
  • Y. Zhou, J. Qin, G. Lai, E. Reid, and H. Chen, "Exploring the Dark Side of the Web: Collection and Analysis of U.S. Extremist Online Forums," In Proceedings of the Intelligence and Security Informatics: IEEE International Conference on Intelligence and Security Informatics (ISI 2006), San Diego, CA, USA, May 23-24, 2006.
  • E. Reid, and H. Chen, "Content Analysis of Jihadi Extremist Groups' Videos," In Proceedings of the Intelligence and Security Informatics: IEEE International Conference on Intelligence and Security Informatics (ISI 2006), San Diego, CA, USA, May 23-24, 2006.
  • J. Xu, H. Chen, Y. Zhou, and J. Qin, "On the Topology of the Dark Web of Terrorist Groups," In Proceedings of the Intelligence and Security Informatics: IEEE International Conference on Intelligence and Security Informatics (ISI 2006), San Diego, CA, USA, May 23-24, 2006.

2005

  • Zhou, Y., Qin, J., Lai, G., Reid E. and Chen, H., "Building Knowledge Management System for Researching Terrorist Groups on the Web," Proceedings of the AIS Americas Conference on Information Systems (AMCIS 2005) , Omaha, NE, USA, August 11-14, 2005.
  • “Mapping the Contemporary Terrorism Research Domain: Researchers, Publications, and Institutions Analysis,” ISI Conference 2005, Atlanta, GA, May, 2005.
  • Chen, H. 2005. "Applying Authorship Analysis to Arabic Web Content." ISI Conference 2005, Atlanta, GA, May, 2005.
  • Reid, E., Qin, J., Zhou, Y., Lai, G., Sageman, M., Weimann, G., and Chen, H., "Collecting and Analyzing the Presence of Terrorists on the Web: A Case Study of Jihad Websites," IEEE International Conference on Intelligence and Security (ISI 2005), Atlanta, Georgia, 2005.
  • D. McDonald, H. Chen, and R. Schumaker, "Transforming Open-Source Documents to Terror Networks: The Arizona TerrorNet," American Association for Artificial Intelligence Conference Spring Symposia (AAAI-2005), March 2005. Stanford, CA.

2004

  • Chen, H., Qin, J., Reid, E., Chung, W., Zhou, Y., Xi, W., Lai, G., Bonillas, A. and Sageman, M., "The Dark Web Portal: Collecting and Analyzing the Presence of Domestic and International Terrorist Groups on the Web," Proceedings of the 7th International Conference on Intelligent Transportation Systems (ITSC), Washington D.C., October 3-6, 2004.
  • E. Reid, J. Qin, W. Chung, J. Xu, Y. Zhou, R. Schumaker, M. Sageman, H. Chen, "Terrorism Knowledge Discovery Project: A Knowledge Discovery Approach to Addressing the Threats of Terrorism," Proceedings of the Second Symposium on Intelligence and Security Informatics, June 10-11, 2004, Tucson, AZ, 2004, pp. 125-145.

2003

  • H. Chen, "The Terrorism Knowledge Portal: Advanced Methodologies for Collecting and Analyzing Information from the Dark Web and Terrorism Research Resources," presented at the Sandia National Laboratories, August 14, 2003.

 

Special Poster Presentations

 

Presentations in Seminars or Conferences (PowerPoint)

Password-protected; please send request via email and provide a brief explanation of your interest.

  • “Affect and Sentiment Analysis of Web Forums,” July, 2007.
  • “Large-scale Forum Analysis of Selected Radical Sites,” March, 2007.
  • “Explosives and IEDs in the Dark Web: Discovery, Categorization, and Analysis,” Febuary, 2007.
  • “ClearGuidance.com Analysis,” September, 2006.
  • “Writeprints and Ink Blots: Visualizing Authorship for Identification and Authentication,” Tucson, August, 2005.
  • “Data Mining & Webometric Analysis of Terrorist/Extremist Groups’ Digital Artifacts,” Singapore, August 2005.
  • “Applying Authorship Identification to Web Forums: Analysis of English and Arabic Extremist Group Postings,” Tucson, April, 2005.
  • “Content and Link Analysis of Domestic and International terrorism Websites,” Tucson, AZ, March 23, 2005.
  • “Advanced Methodology for Collecting and Analyzing, Information from the "Dark Web",” Tucson, AZ, Feb 10, 2005.
  • “Multilingual Authorship Analysis for Web Content: A Comparison of English and Arabic Language Models,” Tucson, December, 2004.