Biobanking relies on data sharing to build statistical power and provide sufficient sample heterogeneity. Samples also need to be collected using standard principles to ensure direct comparability, and data needs to be stored in a manner that is readily accessible to researchers. However, from the perspective of the general public, there is an expectation that data remain private and confidential, while still benefiting society. Murtagh et al. (2016) discuss an integrated approach that can meet the needs of researchers as well as the expectations of the general public.1
To meet these needs, researchers have combined four tools to offer an open-source mechanism to securely analyze harmonized research participant–level data without it ever needing to leave the host site. The tools are:
- DataSHaPER: a structured method for harmonizing data using a series of steps, progressing from initial definition of a research question through to the generation and validation of a harmonized data set
- Opal: open-source software for data storage
- Mica: open-source software for data exploration
- DataSHIELD: open-source software for data analysis
Together, Opal, Mica and DataSHIELD deny users access to research participant–level data, while providing full and flexible access to the information held within. Researchers developed these tools in conjunction with the five-year Biobank Standardisation and Harmonisation for Research Excellence in the European Union (BioSHaRE-EU) project. BioSHaRE-EU involved 15 European, population-based cohort studies and successfully facilitated data harmonization and standardization for pooled analysis to investigate several common diseases and traits relevant to public health. Murtagh et al. note that although BioSHaRE-EU focused on federated analysis, the tools could also be used in a single central repository.
Using BioSHaRE-EU as a case study, Murtagh et al. observed participants and conducted individual and group interviews between December 2011 and November 2015 to define the usability of the tool from the point of view of all 15 of the cohorts involved. They asked participants about their experience in using, developing and/or implementing one or more of the four tools, and their thoughts on the usability and appropriateness of the tools for sharing and translating data from multiple cohort and biobank studies. From participant responses, the authors have identified criteria that most effectively facilitate access and optimize the value of studies while protecting research participants. Murtagh et al. thus found that systems governing data access need to:
Protect participant information and/or identities
Meet participants’ expectations of the way their data is used
- Ensure review systems are transparent and not compromised by the specific interests of one particular group of stakeholders
Facilitate timely and efficient data access procedures
Finally, Murtagh et al. highlight the fact that interfacing across multiple governance systems is a challenge, and that journals increasingly require access to the data a research paper is based on. Therefore, integrated approaches to data sharing are essential to the future of biobanking.
1. Murtagh, M.J., et al. (2016) “International data sharing in practice: New technologies meet old governance,” Biopreservation and Biobanking, 14(3) (pp. 231–240), doi: 10.1089/bio.2016.0002.