Skip to content

The DISTANCE model for collaborative research: distributing analytic effort using scrambled data sets

Data-sharing is encouraged to fulfill the ethical responsibility to transform research data into public health knowledge, but data sharing carries risks of improper disclosure and potential harm from release of individually identifiable data. The study objective was to develop and implement a novel method for scientific collaboration and data sharing which distributes the analytic burden while protecting patient privacy. A procedure was developed where in an investigator who is external to an analytic coordinating center (ACC) can conduct original research following a protocol governed by a Publications and Presentations (P&P) Committee. The collaborating investigator submits a study proposal and, if approved, develops the analytic specifications using existing data dictionaries and templates. An original data set is prepared according to the specifications and the external investigator is provided with a complete but de-identified and shuffled data set which retains all key data fields but which obfuscates individually identifiable data and patterns; this” scrambled data set” provides a “sandbox” for the external investigator to develop and test analytic code for analyses. The analytic code is then run against the original data at the ACC to generate output which is used by the external investigator in preparing a manuscript for journal submission. The method has been successfully used with collaborators to produce many published papers and conference reports. By distributing the analytic burden, this method can facilitate collaboration and expand analytic capacity, resulting in more science for less money.

Authors: Moffet HH; Warton EM; Parker MM; Liu JY; Lyles CR; Karter AJ

Inf Secur Comput Fraud. 2014;2(3):33-38.

PubMed abstract

Explore all studies and publications

Back To Top