The Institute for Higher Education Policy (IHEP) has released the third paper in its “Protecting Students, Advancing Data” series that promotes the safeguarding of student data all while ensuring that students and their families have transparent and relevant information to make informed college decisions.
Dr. Amy O’Hara
As part of IHEP’s Postsecondary Data Collaborative initiative, the brief, “Postsecondary Data Infrastructure: What is Possible Today,” introduces a “Five Safes” framework to guide secure data practices across higher education. This week’s brief examines “safe projects,” “safe people,” “safe settings,” “safe data” and “safe outputs” for institutions to secure their data access and use, and similarly explores examples of how the government and other agencies control data use and analysis.
“When strengthened, the postsecondary data infrastructure could improve how students and parents view institutions and programs, giving them better information when making decisions,” said Dr. Amy O’Hara, author of the report and a research professor in the Massive Data Institute at Georgetown University. “The infrastructure could also facilitate new channels of discovery, enabling data joins and cross-school, cross-cohort and longitudinal analyses that measure student outcomes to see what works, ultimately improving outcomes for students.”
IHEP’s brief points out that, for any data infrastructure to become functional, there must be trust between data providers, intermediaries and users. It notes that the methods in which data are shared and analyzed in higher ed are “strikingly similar” across domains like health care, defense, housing or human services.
Even so, the report adds that a “robust data infrastructure must have strong controls in place” to mitigate any potential risks.
The first of the “Five Safes” that O’Hara introduces in the brief includes building “Safe Projects.” Such projects require “governance protocols to control project requests, review and approval processes, and may require institutional board or ethics board review and approval,” the brief said. Incorporating “clear and thorough” data use agreements (DUA) is similarly crucial for establishing an understanding of acceptable data uses, linkages or scope of analysis.
Institutions can additionally ensure that data users are “Safe People” by screening and training those who will work with the student data, the brief said, noting that researchers today must meet varying requirements to access certain data systems. Screenings or credentialing could include proof of research competence or mandatory training, institutional affiliation, background checks or fingerprinting, for instance.
“In the future, a user’s vetting and approval by one organization could carry over to other associated organizations,” the brief said. “ This will require durable credentials and agreed upon standards and training.”
In the brief, O’Hara notes that having “Safe Settings” — the data user’s interface and environment — is the most important control factor for guiding data practices, regulating data inputs, computation and outputs. With “safe settings,” data users can create “Safe Data.”
“The practices for both impose restrictions on what an analyst can use, what an analyst can do, the analyst’s computing environment and the analyst’s physical location,” the brief said.
The fifth safe, “Safe Outputs,” ensures student privacy by reducing the risk of a student or individual being re-identified in the data results. To create “safe outputs,” institutions can round, aggregate or suppress data results “to obscure unique observations in tables, figures or maps.”
Further, data users may change data by “swapping or noise injection,” which could be “changing the ages or races of individuals in a sparsely populated area or changing income dollar amounts by a small amount,” the brief explained.
O’Hara said that future techniques for safeguarding outputs must recognize that some student data sets may require more privacy than others, or that only some student characteristics may need to be protected to avoid re-identification.
The brief identifies several examples of institutions using data intermediaries to support safe and secure data use and analyses.
A partnership between the Santa Clara County Office of Education and the University of California at Santa Cruz, for instance, has led to the creation of the Silicon Valley Regional Data Trust. The trust is comprised of data from the 66 school districts, juvenile probation and Health and Human Service agencies in San Mateo, Santa Clara and Santa Cruz counties.
Data results in real-time can then inform administrative and intervention decisions.
Another example the brief includes on standardizing data models to promote statistical comparisons is the Observational Health Data Sciences and Informatics (OHDSI) program out of Columbia University. The collaborative serves as an intermediary for academia, government and industry, supporting confidential statistics, epidemiology, informatics and clinical sciences research across 20 countries.
IHEP’s president, Dr. Michelle Asha Cooper, said that the fight for giving students timely information on college outcomes and upholding student privacy is not an “either/or proposition.”
“Students have a right to information about college outcomes,” Cooper said. “They also have a right to trust that their data are protected, secured and used responsibly.”
Tiffany Pennamon can be reached at email@example.com. You can follow her on Twitter @tiffanypennamon.