Data Infrastructure for Intelligence and Security Informatics
Introduction
This NSF-funded Data Infrastructure Building Blocks pilot demonstration project is intended to address a significant gap in the availability of open source research data for researchers in ISI.
The many unintended and illicit uses of the Internet have created significant cybersecurity challenges; recent examples of hacking and cybercrime abound and do not need to be enumerated here. Academics in the fields of computational science, information science, social sciences, engineering, and many other areas have been called upon to enhance our collective ability to fight terrorism, cybercrime, and other security concerns.
In response to the need to strengthen cybersecurity research, the Artificial Intelligence Lab responded to the National Science Foundation’s Data Infrastructure Building Blocks (DIBBs) solicitation. In October 2014, the Lab was awarded $1,499,531 for a three-year Pilot Demonstration Project to make available significant data and analysis tools to serve the ISI community.
AZSecure: Data Science Testbed for Security Researchers
NEW DATA AVAILABLE NOW: The AZSecure Data Science Testbed for Security Researchers is available now at the DIBBs website (http://azsecure-data.org/).
If you are an ISI researcher, we would welcome your feedback. Please contact us at ailab@eller.arizona.edu. We need to hear from you!
Project Activities
The primary focus of the project is on data collection, data management, and access, with a view toward identifying and collecting data that will be of the highest interest to the ISI community and providing the data to the community in the easiest, most useful, and most direct way possible.
Initial data collection activities will also follow good data management practices.
ISI researchers do use a great deal of data, including open source social media and publications such as Twitter, forums, blogs, news, and journals, and private or closed sources of data from shipping records to satellite data, and from network data to call data. Therefore, the kind of data that will be collected by the project is limited only by what can be acquired.
Data will be added to the DIBBs portal in three phases. In the first phase, we will make our (AI Lab’s) data available. The Lab's data will include select data sets from the Dark Web and GeoPolitical Web portals, as well as data sets from our active Hacker Web research. Data will be made available as TXT files and will be found through a web-enabled catalog. In the second phase, we will make our partners’ data available (University of Texas at Dallas and University of Virginia). This includes several data sets relating to electricity, airlines, and sensor data, and phishing. In the third phase, we will solicit and make available data from external researchers with an interest in depositing their data or otherwise collaborating.
The project timeline for making data available is as follows:
- Phase I: Initial data load and simple web-based catalog............... by 3/30/2014
- Phase II: Partner data, initial data load:........................................ by 4/30/2015
- Phase III: External data, initial data load....................................... by 7/30/2015
The data collection process is iterative and ongoing throughout the project period; as we and our partners continue to collect and update data, it will be continuously added to the portal. In later phases of the project, we will develop and make available analysis tools that can be used with data from the portal.
Funding
We thank the National Science Foundation for providing funding support: National Science Foundation,"CIF21 DIBBs: DIBBs for Intelligence and Security Informatics Research Community: Pilot Demonstration Project," $1,499,531 (NSF ACI-1443019), October 15, 2014 - September 30, 2017.
Our project abstract is available on the NSF page for the award, "CIF21 DIBBs: DIBBs for Intelligence and Security Informatics Research Community."
See more about the NSF funding for DIBBs on the NSF DIBBs press release page, "Laying the Groundwork for Data-driven Science" (at http://www.nsf.gov/news/news_summ.jsp?cntn_id=132880).
Team Members
- Hsinchun Chen, MIS
- Mark Patton, MIS
- Cathy Larson, MIS (retired)
- Riley McIsaac, MIS
- Victor Benjamin, MIS (graduated)
- Andy Pressman, MIS
- John Grisham, MIS (graduated)
Project Collaborators
- Ahmed Abbasi, University of Virginia
- Paul Hu, University of Utah
- Bhavani Thuraisingham, University of Texas at Dallas
- Chris Yang, Drexel University
Selected Publications
- V. Benjamin and H. Chen. "Time-to-event Modeling for Predicting Hacker IRC Community Participant Trajectory." Proceedings of the IEEE International Conference on Intelligence and Security Informatics, Sept. 24-26, 2014, The Hague, Netherlands.
- W. Li and H. Chen. "Identifying Top Sellers In Underground Economy Using Deep Learning-based Sentiment Analysis." Proceedings of the IEEE International Conference on Intelligence and Security Informatics, Sept. 24-26, 2014, The Hague, Netherlands.
- W. Li, A. Abbasi, S. Hu, V. Benjamin, and H. Chen. "Modeling Interactions in Web Forums." Proceedings of the IEEE International Conference on Intelligence and Security Informatics, Sept. 24-26, 2014, The Hague, Netherlands.
- M. Patton, E. Gross, r. Chinn, S. Forbis, L. Walker, and H. Chen. "Uninvited Connections: A Study of Vulnerable Devices on the Internet of Things (IoT)." Proceedings of the IEEE International Conference on Intelligence and Security Informatics, Sept. 24-26, 2014, The Hague, Netherlands.
- V. Benjamin, D. Zimbra, and H. Chen, “Bridging the Virtual and Real: The Relationships between Web Content, Linkage, and Geographical Proximity of Social Movements,” Journal of the American Society for Information Science and Technology, 65(11), November 2014.
- Y. Zhang, Y. Dang, and H. Chen, "Research note: Examining gender emotional differences in Web forum communication." Decision Support Systems, 55(3), 2013.
- T. Fu, A. Abbasi, D. Zeng, and H. Chen, Sentimental Spidering: Leveraging Opinion Information in Focused Crawlers. ACM Transactions on Information Systems, 30(4), 2012.
- H. Chen. Exploring and Mining the Dark Side of the Web. New York: Springer-Verlag, 2012.
- A. Abbasi, Z. Zhang, D. Zimbra, H. Chen, and J. F. Nunamaker, “Detecting Fake Websites: The Contribution of Statistical Learning Theory,” MIS Quarterly, 34(3), pp. 435-461, September 2010. [Winner, MISQ Best Paper, 2010.]
Conferences, Workshops, and Competitions
Conferences & Workshops
- WiCyS 2017 (Women in CyberSecurity), Tucson, AZ, USA, March 31 - April 1, 2017
- ISI 2016 (IEEE International Conference on Intelligence and Security Informatics), Tucson, Arizona, USA, September 27-30, 2016
- IEEE ICDM Workshop on Intelligence and Security Informatics, Atlantic City, New Jersey, USA, November 14, 2015
- PAKDD 2015 (Pacific Asia Knowledge Discovery and Data Mining ), Ho Chi Minh city, Viet Nam, May 19, 2015
- PAISI 2015, Ho Chi Minh city, Viet Nam, May 19, 2015 (Proceedings available through Springer)
- IEEE ISI, Baltimore, Maryland, USA, May 27-29, 2015
- DEF CON 23, Las Vegas, Nevada, USA
- European ISI (EISIC) 2015, Manchester, United Kingdom, September 7-9, 2015
Competitions
- Cyber Security Awareness Week Conference - Run by students at New York University, features Capture the Flag and Embedded Systems competitions (information for 2015 forthcoming); generally held in November
- DEF CON Capture The Flag - An archive of previous challenges can also be found here.
- Hack Arizona - A hackathon in partnership with the University of Arizona Libraries, held in March
- National Collegiate Cyber Defense Competition - Helps students assess their depth of understanding and operational competency in network security and infrastructure, held in the fall following regional qualifying rounds
- Robocalls: Humanity Strikes Back - A FTC-sponsored competition at DEF CON, with a challenge to create a tool that people can use to block or forward unwanted robocalls automatically to a honeypot.