Steven Newhouse, Head of Technical Services at EMBL’s European Bioinformatics Institute, one of the world’s global hubs for life science data recently sat down for an interview of how big data is changing the bioinformatics industry.
“Our main activities are to collect data from all over the world and then store it, process it, and then to extract knowledge out of that data and finally distribute the knowledge and the raw data back to the community,” said Newhouse.
“Our big data infrastructure is distributed across three databases where we store over 150 petabytes of data (one petabyte is equivalent to 20 million four-draw filing cabinets full of data), so it’s a very large amount of data that we store on behalf of the world’s global community.”
“One of the big changes over the last few years is the adoption of what’s called cloud-native applications. This is the ability to build applications that scale well on cloud infrastructures and are able to operate on big data. This is one of the focuses of our strategy to move to this cloud-native world.”
“Our stored data is increasing by 40 percent each year and with that, the amount of analysis that we need to do is increasing. We’re investing in our data centre capacity, as well as our networking to connect to data centers, and also connecting our data to public cloud providers. This will allow us to increase our analysis capacity to help undertake the work that we need to do.”
“We’re building out these cloud-native applications to run on big data structure. What we do here will provide a best practices model that can be adopted elsewhere in the live sciences community and show a hybrid-cloud approach that can build on our internal big data structure.”