Digital Archaeology Project to Use Big Data
When we think about some of the most pressing questions facing society today — on topics ranging from immigration to inequality to overpopulation—it might be helpful to consider how humans have handled similar issues throughout history, suggest researchers at the University of Arizona.
After all, these are not altogether new challenges. Evidence of inequality in the U.S. Southwest, for example, dates back at least as far as A.D. 800, when large agricultural villages began to flourish, notes UA anthropologist Barbara Mills.
To help provide researchers, scholars and the general public with a "deep history" understanding of some of the grand challenges facing society, Mills and fellow UA researcher Sudha Ram are leading an interdisciplinary National Science Foundation-funded project to build an online system that pulls together and synthesizes archaeological data spanning several centuries of U.S history.
The project, called cyberSW, focuses specifically on pre-Hispanic archaeological data from the American Southwest—Arizona, Utah, New Mexico and Colorado—from A.D. 800 to the 1500s, shortly after the arrival of the Spanish in the region.
Mills and Ram, along with collaborators Matthew Peeples of Arizona State University, Scott Ortman of the University of Colorado, Boulder, and Jeffery Clark of the Tucson-based Archaeology Southwest, were awarded $1.7 million through the NSF's RIDIR program, which stands for Resource Implementations for Data Intensive Research in the Social Behavioral and Economic Sciences. Nearly $1.1 million of that will go to the UA.
The three-year project was one of four proposals chosen for funding in the third year of the RIDIR program, which supports projects that enable new types of data-intensive research.
Under the leadership of Mills and Ram, cyberSW investigators will work to pull together massive amounts of data, from hundreds of disparate datasets on Southwest archaeology, into one centralized online system. Visitors to the website will be able to not only easily access data that is currently decentralized, but also view relationships between that data to better inform their research and understanding of the Southwest's rich history and its possible implications for modern challenges.
"This is not just a database," said Ram, professor of management information systems and director of the INSITE: Center for Business Intelligence and Analytics in the UA's Eller College of Management. "We're building a knowledge discovery system that integrates multiple archaeological databases and various artifacts and objects, and we're trying to figure out relationships that aren't already known. People can query the website and it'll show them various data and how they're related to each other, and they’ll be able to run large-scale network analysis and statistical analysis that will support various stakeholders, including researchers and students, as well as the public."
The Southwest, with its long history of human occupation, is an area especially rich in archaeological data.
"There has been a history of research in the Southwest for over a century now, and preservation is so good in the Southwest," said Mills, a professor of anthropology in the UA's College of Social and Behavioral Sciences. "We have these existing databases, but they're not synthesized; they're not pulled together."
As it stands, hundreds of datasets, with information on millions of objects, exist independently, and there is no one federal repository for data uncovered in archaeological excavations. Instead, repositories are managed at the state level, and many of these repositories do not have digitally available data.
The cyberSW project will be a valuable tool for researchers who now must scour extensive archives, access multiple databases or even collect new data for their work, Mills said.
"It will streamline a lot of people's analysis because they won't have to reinvent or replicate what other people have done," she said. "Right now, these different databases sit in all these locations and they're not being used together. We have data repositories, but it's still not synthesis. The power of using them together is going to be so much better."
When completed, the cyberSW system will house, among other things, documents, images, maps and population data, along with tools and tutorials for conducting network analysis through the site. There will also be a citizen science component, in which members of the public will eventually be able to contribute to the site's content.
"The idea is to make this a living system, so other people are able to contribute relevant, related information to it," Ram said.
The site also will be highly visual.
"Someone might be interested in Hopi yellow ware, a particular kind of pottery, and they may want to know where it occurs and get a distribution map of that. We want to be able to see what that distribution looks like," Mills said.
The hope is that the system that can provide a long-range historical perspective on issues of concern in modern society, Mills said.
"The way we have envisioned it is to really be able to address broad social science questions. So it's not just archaeological data and questions; we're using this deep history to be able look at social science questions over long periods of time," Mills said.
"We have decadal- and centennial-scale data we can use to look at long-term changes, so we're interested in broad social science questions like the evolution of inequality," she said. "There are more people migrating in the last five years in human history than ever before. What happens when people migrate into an area? What happens when people aggregate and become urban? How do societies persist? Why do some societies persist and others collapse?"
The finished system will be organized as a graph database. Unlike relational databases, in which data is viewed in columns and rows, graph databases use network science to focus on the relationships between data.
"It's a totally different way of storing, querying and understanding data," Ram said. "You really need to interrelate all these different datasets to each other, and that's when the answers to the grand challenge questions start coming. Otherwise, you just see a small part of the answer."