Cybersecurity is an important challenge in today's world, as corporations, governments, and individuals are more frequently victims of cyber-attacks and hacking. Such attacks exploit weaknesses in technical infrastructures and human behavior. Understanding the motivations and incentives of individuals and institutions, both as attackers and defenders, will aid in creating a more secure and trustworthy cyberspace. Developing “methods to model adversaries” is one of the critical but unfulfilled research needs recommended in the “Trustworthy Cyberspace” report by the National Science and Technology Council (2011).
Demand for knowledge and tools to conduct cyber-crime has grown so widespread that entire international virtual communities and black markets have spawned across the Internet to help facilitate trade between cyber criminals. Black market participants offer expertise, snippets of code, or fully-developed applications in exchange for other virtual goods or financial gain.
Despite a high relevance to our society due to our collective reliance on cyberspace, cyber-criminal communities and related activities have remained largely unexplored. Existing web and social media content presents a rich opportunity for a variety of research, as virtual communities often maintain large stores of useful data digestible through many forms of computational analyses. The discussions and interactions occurring within such communities, when treated as research data, allow high-impact, data-driven research. Researchers can empirically test hypotheses and discover new, unprecedented phenomena using web and social media content as data. Online anonymity, multilingual challenges, hacker community culture, and the sheer volume of online messages contributed by the diverse cyber citizens all make cyber content analysis an essential yet strenuous research endeavor.
To address these challenges, we started by identifying several important categories of information necessary for cyber security investigation. Our research focuses on hacker community information (the actors) and honeypot information (malware output), and is supplemented by further malware analysis and the selection of emerging peer-to-peer (P2P) network information. Data sources for each information category are identified and collected to assist in our hacker community analysis.
Researchers in the AI Lab are developing an integrated and scalable computational social media collection and analytics framework in support of cyber attacker community analysis. Our research team addresses important social science research questions regarding hacker skills, community structure and ecosystem, contents and artifacts, and cultural differences. We have developed techniques for the automated collection of multilingual hacker forums and Internet Relay Chat (IRC) from international (U.S., Russian and Chinese) hacker communities. We have also deployed scalable honeypot platforms to collect malware in the wild and generate feature representation for malware attribution. We leveraged our extensive experience in social media analytics from our previous NSF funded Dark Web research for topics and sentiment, temporal extraction, and social networks.
Manual collection methods are deployed for emerging cyber security research and news and other security vectors based on our social science and security analysis research questions. Collected data is scrubbed and transformed for usage in various analyses. Additional hacker and malware signatures (e.g., programming languages used, attack targets, source code used) and other geopolitical information (e.g., locations) are identified to assist in hacker community analysis. Numerous types of social science and security analyses allow us to gain new perspectives and knowledge from the acquired data: hacker signature analysis (profile), cyber crime attribution (linking malware to actors), hacker community structure (and skills), and cultural metrics identification (for US, Russian, and Chinese groups). In addition, our research aids time-event extraction, covert hacker community content collection, and vulnerability threat assessment.
This integrated computational framework and its associated algorithms and software allow researchers and practitioners to: (1) detect, classify, measure and track the formation, development and spread of topics, ideas, and concepts in cyber attacker social media communication; (2) identify important and influential cyber criminals and their interests, intent, sentiment, and opinions in online discourses; and (3) deduce and recognize hacker identities, online profiles/styles, communication genres, and interaction patterns.
In this SBE/TTP project, we are developing open source tools, a large longitudinal research testbed, and a web-based Hacker Research Portal in support of cyber attacker community investigation and research. These resources have been introduced to the inter-disciplinary community of social, computing, and cyber security researchers and practitioners through conference presentations and links on the azsecure-data.org data, analytics, and tools research repository for intelligence and security informatics research.
The PI, Dr. Hsinchun Chen, is a leader in security informatics research. His COPLINK for crime data mining and Dark Web for open source terrorism social media analytics projects, both funded by NSF, have been highly successful. The Hacker Web research team consists of experts in hacker community research (Dr. Tom Holt, Michigan State, School of Criminal Justice, with funding from National Institute of Justice), cybersecurity and autonomic computing research (Dr. Salim Hariri, University of Arizona, Electrical and Computer Engineering Department, with funding from NSF and Department of Defense (DOD)) and hacker community sociology research (Dr. Ron Breiger, University of Arizona, Sociology Department, with funding from NSF and DOD).
The figure below illustrates the research framework: