In 2008, the Korean National Institute of Health (KNIH) organized a team of experts to drive the Korean Biobank Project (KBP) to build the National Biobank of Korea (NBK) and network. There are 17 biobanks in university hospitals, with the NBK at the center, associated with KNIH. Each biobank can enter and manage human biodata using the Biobank Information Management System (BIMS) operated by the NBK. The BIMS stores clinical data on each patient as well as sample handling and management information. However, despite this central database, the participating biobanks cannot easily share information, because the BIMS biodata forms are limited to 18 items. Therefore, the system lacks common formats and does not comprehend information from other systems. To overcome this, the biobanks share information using spreadsheets. Spreadsheets have their own inherent problems, such as data entry errors and the fact that multiple researchers cannot access a given spreadsheet simultaneously. Park, Cho and Kim (2016) have developed an integrated database to analyze the raw data from 15 of the participating biobanks in Korea.1
Their study used a three-step data analysis:
- Specification: defining metadata (e.g., data name, data type, value type, specimen name, specimen type, tube, unit and reference value)
- Classification: separating detailed items into domains and concepts using high, middle and low classifications
- Standardization: linking to international standard codes to clarify item meanings
Park et al. provided each of the 17 biobanks that are part of KNIH with a password-protected USB drive. Only 15 of the 17 biobanks provided data. In total, Park et al. received 7,197,252 raw data items. They developed the database using MySQL Server 5.6. It allows the user to visualize the primary key of all of the taxonomy tables containing t_clinical_item and t_specimen_item when they store metadata for each item. Thus, the raw data are readily accessible. Furthermore, Park et al. refined the metadata into 1,796 clinical items and 1,792 specimen items, with classifications consisting of 15 high-, 163 middle-, and 3,588 low-class items. Park et al. also linked international standard codes to 69.9% of the clinical items and 71.7% of the specimen items. Data querying was nine times shorter as a result.
The authors suggest that their database has the potential to solve biobanking issues surrounding data sharing (i.e., information exchange), and issues relating to synonyms that come about as a result of information expressed in multiple ways.
1. Park, H.S., Cho, H., & Kim, H.S. (2016) “Development of an integrated biospecimen database among the regional biobanks in Korea,” Healthcare Informatics Research, 22(2) (pp. 129–141).