Subscribe Here!

Why is Data Growth a Big Problem for Science?

by Catherine Chiang – October 17, 2017

Within recent years, developments in scientific research have enabled scientists to collect and generate far more data than ever beforebut the scientific industry’s ability to store, protect, and manage this data has not kept pace.

Too Much Data, Too Quickly

Rapid breakthroughs in scientific technology have enabled the generation of huge quantities of data at far faster rates and lower costs than just a decade before. Meanwhile, the field’s ability to store and manage this data is still lagging behind.

A prominent example is DNA sequencing. The first time the human genome was sequenced, it took 13 years and between $50 million and $1 billion to complete; now, sequencing a human genome takes 1 to 2 days and costs below $1,500.

A 2015 PLoS Biology study predicts that by 2025, between 100 million and 2 billion human genomes could have been sequenced, with data storage demands of 2 to 40 exabytes. That’s more than the projected storage needs of YouTube and Twitter.

The dropping costs of DNA sequencing means that more researchers and labs can afford to sequence DNAresulting in more data being generated from disparate sources and stored in silos. Without a way to aggregate and analyze all of this data, scientists cannot fully take advantage of today’s wealth of genomic information.

In recent years, advancements in technologies such as electron microscopy and flow cytometry have resulted in similar explosions in data growth.

Challenges of Scientific Data Management

Effective management of scientific data enables and aids research, but the scientific community struggles to store, protect, and manage its growing data.

Long-term preservation of data would enable scientists to access the results of previous studies and conduct ongoing, robust research, but data loss is prevalent due to the challenges of scientific data management. Often, data is not preserved after the completion of a study, data is too difficult to find, or data is too difficult to access because it is stored on older media.

A 2013 study found that the odds of sourcing datasets decline 17% every year and over 80% of datasets over 20 years old are not available. This prevents scientists from utilizing the potential gold mine of information gained from past studies.

“My team and I realized that the additional cost of storing data represents about 1/1 000 of the global budget. Thus, publication of new articles based on use of archives in the ensuing 5 years represents a profit of 10%. We basically have research that costs almost nothing. Without any data storage strategy, we completely miss out on potential discoveries and low-cost research. Once data has been properly stored, however, its cost is practically zero,” said Cristinel Diaconu, research director at the Centre Nationnal de la Recherche Scientifique.

Scientists need not only more data storage, but more computing power and effective modes of moving their data to where it’s needed. Unfortunately, the costs of processing power and storage can be prohibitively high.

“The cost of computing is threatening to become a limiting factor in biological research,” said Folker Meyer, a computational biologist at Argonne National Laboratory in Illinois, who estimates that computing costs ten times more than research. “That’s a complete reversal of what it used to be.”

In addition, it’s essential that data management platforms preserve rich metadata to make it easier for scientists to retrieve the data they need from the growing sea of scientific data.

Scientists need cost-effective, streamlined data storage and management solutions built to handle petabytes and more of data.

Curious if Igneous can help you manage your scientific data? Talk to us!

Contact us

Related Content

Top 10 IT Trends for 2019

February 19, 2019

In 2019 and beyond, 451 Research sees a key shift in the world of IT—the breaking apart and coalescing of old silos of technology. Today, technological advances feed off each other to drive innovation. With this new paradigm of technological innovation, 451 Research shares 10 IT trends they predict for 2019.

read more

“Interesting Times” for Unstructured Data Management

January 10, 2019

The expression “may you live in interesting times...” is subject to much debate. To some it is a celebration of the the opportunities to be found in times of transition. To others, it is a cautionary phrase that should be heeded to avoid misfortune. No matter which side of these interpretations you find yourself aligned with, there is no question that 2019 will be a year of significant opportunities and challenges for those responsible for the proper care, management, and stewardship of unstructured data.

read more

8 Principles for a Better Data Management Strategy

December 5, 2018

I’ve spent the better part of three decades leading one of the most demanding high-performance computing infrastructures in the world. One of the greatest challenges of HPC infrastructure is keeping data available and meeting the needs of the business with supporting engineers located in dozens of locations around the world. Here are some key takeaways for anyone struggling with this problem.

read more