The importance of biobanking for development of personalized medicine is well established. For this reason, many biobanks are now functioning as digital repositories. Accumulation of population samples and their associated data drives research into the many variations of disease pathogenesis within individuals for discovery of tailored therapeutic strategies. However, the key bottleneck is data; without the huge data sets from emerging technologies such as genomics and proteomics, researchers have no way to explore the patterns of pathogenesis from which personalized treatments arise.
Data acquisition, management and dissemination have therefore become highly important aspects of biobanking development and administration. Not only are data sets becoming larger and more unwieldy, but they must remain connected with the relevant sample and easily transmissible among institutions for collaborative projects. Furthermore, different disciplines each require different types of biosamples, and their experimental methods frequently return file formats that may not work well with overall data analysis. Data storage, management and sharing need to take all of these factors into consideration.
Izzo et al. (2014) describe a novel data model developed to manage a biobanking digital repository.1 They customized and tested out its functionality using the Biobanking Integrating Tissue-omics (BIT)–Institute Giannina Gaslini (IGG) biobank, which collects pediatric tissue, blood samples and neuroblastic tumors from across Italy.
First, the researchers built a novel data model using JavaScript Object Notation (JSON), providing a graphical interface for users and allowing flexible searches for metadata management and query. They incorporated the JSON data model within a web-based XTENS digital repository comprising a web portal, an internal database and a data grid storage element. An iRODS (Integrated Rule-Oriented Data System) functions as middleware to manage the data grid. Customizing the XTENS portal, the team reconfigured the interface for easier patient, sample and data management. Input functionality allows users to manage multiple sample types, adding data arising from numerous studies, which are termed “events” in the programming syntax. An overall SysAdmin function defines user roles to limit access within the system, thus promoting patient privacy and confidentiality in addition to maintaining security.
Apart from data input, the system also allows efficient interrogation with user-defined queries generated within a flexible search interface for maximal metadata extraction. It also facilitates biospecimen tracking and regular updating with new clinical details as they arise. Furthermore, by implementing the data grid system, researchers can easily retrieve large files such as whole diagnostic images and entire genomic sequencesto view remotely.
According to the authors, their system performed well in managing the BIT-Gaslini biobank as a digital repository, running data queries at an acceptable speed. Izzo et al. note that performance could be improved with further customization according to user needs. Although the system was not shared outside the IGG, the team plans to test among the European biobank network using an anonymized system. They also note that the framework chosen allows for maximal security implementation if needed.
In conclusion, although the authors do not suggest that their method denotes a standard, they feel that their XTENS-based tool has the potential to deliver an effective and highly flexible system for digital repository management for collaborative biobanking.
Reference
1. Izzo, M. et al. (2014) “A digital repository with an extensible data model for biobanking and genomic analysis management,” BioMed Central Genomics 15(Suppl 3):S3, doi: doi:10.1186/1471-2164-15-S3-S3.
Leave a Reply