The evolution of research in personalized and translational medicine means that access to suitable biosamples and relevant allied data is often a limiting factor. For this reason, researchers look to multi-institution biobanks for materials with increasing interest. Jacobson et al. (2015) explore federated networks as an alternative to the more common centralized management systems in place for facilitating biosample sharing and data exchange.1
Focusing on the implementation of a text information extraction system (TIES) developed by four major cancer centers for recording data, the authors examine how this system has enabled a federated network that manages data and biospecimen sharing in addition to maintaining compliance with regulatory and privacy issues. Once it was created, the institutions used TIES to establish the TIES Cancer Research Network (TCRN), which oversees research activities using data and samples contained at each of the center’s biobanks.
TIES, written in Java, is an open-source, computer-based tool that the TCRN uses to upload biosample and allied clinical data to its network. It uses natural language processing to create biosample records from text-based electronic medical information. This tool provides de-identified data linked with the biospecimens; these data are available for searching by qualified users across many institutions. One of the benefits described by Jacobson et al. is that this approach allows complex searches to identify patient cohorts that would otherwise be unreachable.
The TCRN operates as a federated network, meaning that member institutions retain independence and that they store data, including biosamples, on site. According to Jacobson et al., federation avoids some of the potential drawbacks of centralization, including failure of the main hub and increasing complexity as the number of member institutions rises. The TCRN, operating as a robust network, supplies the interconnectedness within which members operate to exchange data. An executive committee comprising representatives from each site provides oversight and management, with policies and processes subcommittees giving a focused outlook on biobanking issues.
Governance ensures compliance with regulatory issues, although individual members retain the power to make decisions relevant to institutional operation. Each obtains a waiver from its own internal review board, which allows unhindered data reuse and sharing among TCRN investigators. Each member follows standard operating procedures, governance regulations and other guidelines to ensure quality control and consistency of biosamples and data available to researchers.
Although member institutions themselves rather than a centralized repository hold the data, access is fully available through TIES. The system provides three primary data stores: private, containing health information; research, containing de-identified annotated texts; and collaborative, containing study metadata. Security controls set user levels with access restrictions depending on research roles and study requirements. In addition to this, three levels of processing de-identify all text to the HIPAA Safe Harbor standard to ensure donor confidentiality.
Currently, the TCRN supports searching across 5.8 million cases and 2.5 million patients. The federated network in place gives scientists in translational cancer research valuable access to a diverse collection of biospecimens and allied data. Jacobson et al. describe the TCRN as providing a “unique infrastructure for accelerating translational cancer research in the era of personalized medicine.”
1. Jacobson, R.S., et al. (2015) “A federated network for translational cancer research using clinical data and biospecimens,” Cancer Research, 75 (pp. 5194–5201).