Data analysis has guided scientists through their research and hypothesis testing for centuries, helping them make decisions that changed the world forever. The mathematician, John Tukey, described data analysis in his 1961 paper The Future of Data Analysis as, “amongst other things: procedures for analyzing data, techniques for interpreting the results of such procedures, ways of planning the gathering of data to make its analysis easier, more precise or more accurate, and all the machinery and results of (mathematical) statistics which apply to analyzing data.” As Tukey imagined the future of data analysis, he could not have predicted the advancements in modern computing and the monumental change this would have to scientific data analysis.
Data analytics in its earliest forms has been around since the 19th century, its real usage and progress has been correlated with the rise of computers and the improvement in their performance. It is hard to believe that sixty years ago, the box plot was considered a revolutionary idea as data analysis has evolved into non-parametric and parametric statistics and data analytics now includes the use of advanced artificial intelligence like machine learning.
While data analytics has become an essential element for identifying new scientific discoveries, it requires that data is digitized and processes are digitalized. Data needs to exists in a way that makes it accessible and actionable. Procedures also need to be digital so that there is understanding of how that data was created in the first place. Often this requires creating a digital twin to the physical lab to mirror physical processes and rethinking processes to optimize them for a digital workflow. To take advantage of all the benefits of the latest data analytics tools, organizations need to make all data and associated scientific metadata available. Data analytics can help researchers shorten the timeline to producing results, integrate historic information to make better decisions based on past experiences, and predict outcomes through in silico processes.
The first step to taking advantage of the benefits offered by data analytics is deploying the right data analytics solutions. Today’s data analytics solutions vary from options that allow organizations to develop their own solutions by coding, to tools that allow them to create solutions through a friendly user interface. There are also data analytics tools in the market that enable users to select from a library of preconfigured ways to analysis and visualize data or can be configure for the scientist’s specific needs like the tools available from R Studio. Thermo Fisher has developed an R™ package that eases the access of data through the OData API. This allows scientists to execute analyses using one of the most widely used statistical software tools for data analysis.
Data analytics solutions can take different shapes and forms, such as descriptive visualizations, predictive modelling, business intelligence tools, high-performance data infrastructure, among many others. Depending on the volume, velocity and variety of the data, these solutions are often described as “Big Data Solutions.”
Digitized Data Versus Digitalized Science
To get the most from data analytics solutions, the information should be digitized. Digitized data are optimized for access and consumption, allowing for more efficient implementations of data analytics solutions. It is important to understand the distinction between digitizing data and optimizing scientific processes to work in a digital world, also known as digitalization. Thermo Fisher Scientific’s team has worked together with multiple labs to digitalize their workflows and processes using Thermo Fisher™ Platform for Science™ software. Now, through the partnership with RStudio™, Thermo Fisher can enable companies to make data accessible, to develop data analytics solutions that could enable them to expand functionality in their Platform for Science software workflows, as well as to extract insights from their data.
The PDF Image Extractor Shiny application is an example of how technology can be used to take a physical record and convert it into digital information.
The connection with the Platform for Science software aids in structuring this information for later processing and analysis. In summary, images are extracted from PDF documents, with the option to extract data from these images using optical character recognition (OCR). For instance, sample names could be parsed for automatic matching, and values such as the RIN can be captured. Finally, the information is sent to Platform for Science software for validation and publishing.This is the first step in digital transformation: connecting data and making it accessible and actionable.
Digitalization optimizes laboratory processes and provides a workflow in the software and mirrors physical laboratory activities. By reducing the time required to perform calculations and run assays, laboratories can run more efficiently and effectively. The NGS Library Preparation application allows scientists to take advantage of information stored in Platform for Science software to get their NGS Libraries ready ahead of time in an interactive way. Scientists can quickly access different plates containing samples and drag-and-drop the elements in an empty virtual plate. Then, plates containing adapters can be accessed and these adapters can be added to the selected samples by dragging-and-dropping as well. Finally, the plate can be sent back to Platform for Science software and used to continue the next generation sequencing workflow.
Once data is digitized and processes are digitalized, data analytics becomes possible. Data analytics solutions cover a wide range of applications that can change the way of doing science and business, by accelerating the discovery of new insights and the decision-making process that surrounds it. Information structured in Platform for Science software can be accessed by Shiny applications to perform different levels of data analytics.
In this example, the Shiny application uses descriptive analytics to interactively visualize and study the relationships between different variables, using a heatmap when studying activity of multiple compounds. Data combined from different experiments can be visualized at the same time or, if desired, data from specific experiments can be isolated as well. Polyserial correlations between each explanatory variable and the binary activity variable are presented in a table, and some box plots to study the distribution at different activity levels are presented to be complementary to the analysis as well.
As new technologies proliferate, new data analytics solutions will be created. Keeping up with the evolutionary pace will determine success for scientific organizations, as those embracing cutting-edge technologies will be able to efficiently and effectively reach their goals and accelerate discovery. Whether your laboratory is talking its first steps in digitizing data or optimizing your scientific processes to work in a digital world, Thermo Fisher has solutions to help guide your transformation. Visit our website to learn more.