Gone are the days when biobanks were single, stand-alone collections, comprising nothing more than tissue specimens. Modern biobanks frequently belong to a network of similar biorepositories with interconnected data streams to facilitate research access and sharing. Quinlan et al. (2015) reviewed some of the informatics challenges associated with setting up an interconnected biobanking network, describing the initial process for the United Kingdom’s Breast Cancer Campaign Tissue Bank (BCCTB).1
Describing modern biobanking as a distribution network requiring an informatics structure and policy in place for efficient operation, Quinlan and coauthors outline the issues faced initially when setting up the BCCTB network between four existing tissue banks. Factors faced by the new system included managing raw data—from unprocessed samples; allied clinical data; and cataloging, processing and storage conditions—and dealing with ethical issues regarding data return.
In addition to the issues listed, BCCTB managers soon appreciated that in order to realize the potential contained within the distributed tissue banks, they needed to create a unified and harmonized network. This system would have to not only cope with data storage across four main collection centers, but also enable researchers to access the data and samples contained therein. This meant creating a virtual network that could centralize queries while recognizing the independence and unique attributes of the individual repositories.
In addressing the bioinformatics needs across the BCCTB partners, the researchers realized that locations within regional National Health Service (NHS) clinical divisions constrained individual data collection systems. This meant integrating both manual and digital data handling systems within one central framework. Although the BCCTB was a stand-alone biobank, the informatics system had to allow efficient networking among the existing four repositories to enable research on samples and associated data.
Quinlan et al. thus describe setting up a central framework informatics system that combines a virtual network, centralized access and remote locations. They used University of Dundee informatics expertise to build a federated system that searches data held at each contributing biobank. The team opted to keep data in each local database and supply the network members with a plug-in giving access to a central system.
First, they set up data collection standard operating procedures (SOPs) in consultation with all centers involved in the BCCTB. These SOPs described the minimum information to accompany biosamples as required by researchers that would still support the local biobanks. The SOPs also had to allow development of a network accessed through a single Web portal, since standardized data recording would facilitate better use of clinical details by researchers, who could query and search all the samples held within the BCCTB network.
The informatics developers built the central system using PostgreSQL as the database engine with Data Nucleus implementation of Java Data Objects, using Java for scripting requirements. They ran the system on an Apache Tomcat installation. At the member institutions, users uploaded data into the biobank’s Node, which then communicated with the BCCTB network.
Following consultation with the individual institutions, the team provided three methods for data input:
- manual entry via spreadsheet
- direct entry via web interface
- entry using Java Script Object Notation (JSON) via the Web portal
From here, the Nodes were responsible for “mapping”: dealing with the inconsistency of data fields unique to each institution. Developing a “centralized data dictionary” that encompassed all variations of terms used facilitated unified searching across the member repositories by centralizing the data for queries.
Once enabled, the informatics team found that direct entry using the JSON scripting performed better than uploading data from spreadsheets. The former method enabled automatic data updates, whereas spreadsheets required weekly backups to the central portal. The authors note that this constraint remains an ongoing challenge to smooth data flows.
Another problem encountered involved the time taken to run searches through the central portal. Developers overcame this problem by having the central server cache results from searches, then querying individual databases for up-to-date results before presenting them to users.
In presenting the overview of informatics challenges experienced in setting up the BBCTB network, Quinlan et al. suggest that in the near future, technological capability will be a limiting factor in biobank operation, possibly preventing them from reaching full potential in the research world. In their opinion, issues that need to be overcome include networking among individual repositories, standardizing access for researchers, and ensuring a consistent technological environment at both clinical collection and sample storage sites. In order to meet donor and research expectations, biobanking needs to maximize data and sample utilization through improved access for researchers. The report authors believe that in the future, information technology systems will factor in accreditation, forming part of the evaluation for biobank assessment.
1. Quinlan, P.R. et al. (2015) “The informatics challenges facing biobanks: A perspective from a United Kingdom biobanking network,” Biopreservation and Biobanking (pp.363-70), doi: 10.1089/bio.2014.0099