IRB Barcelona scientists develop The Bioteque to collect biological data

There is a lot of research going on around the world, and that means a lot of data.

On a personal level, we’ve seen computer hard drives constantly jump into memory to keep up with all the information, bigger images, etc. Many people own an external drive with a storage capacity of 1 terabyte (terabyte) or 2 terabytes.

To show the scale of the problem, the European Institute of Bioinformatics (EIBI)EMBL-EBI), from managing a volume of 40 petabytes to working with 250 petabytes in just six years. A petabyte is 1,024 TB, which is the equivalent of 256,000 1 TB drives.

The rapid development of different disciplines in biological and biomedical research fields (such as genomics, proteomics and transcriptomics) in recent decades has led to an exponential growth in the amount of Biological data Available.

About the Bioteque developed by IRB Barcelona scientists

Scientists led by Patrick Aloy, ICREA Researcher and Head of the Laboratory of Structural Bioinformatics and Network Biology at IRB Barcelona, ​​have developed a computational tool to coordinate, integrate and simplify this data. The result is a knowledge graph that provides information on how different biological entities relate to each other, including over 30 million functional interactions.

Bioteque works by integrating different levels of biological complexity and can report, for example, about two related genes, whether they interact physically, whether they are active in the same type of cell, and whether they are linked to the same disease. It can also predict a cell type’s sensitivity or resistance to a particular drug.

“this is account resource The one we have developed is one of the first goals to standardize biological information and is the only one that addresses such diversity and amount of data. It allows for easy and consistent access to practically all currently available biological knowledge, and has tremendous potential to accelerate biomedical research, Alloy said.

Nearly 1,000 descriptors for 12 biological entities

The information in Bioteque has been organized into 12 types of biological entities, such as gene, disease, tissue, cell, etc. For each of these entities, the tool takes into account a series of descriptors or characteristics, for example, the pattern of mutations of a gene, the profile of physical interactions of the resulting proteins, the expression of the gene in different cell types, or its relationship to different diseases. Among the 12 biological entities, the system covers about 1,000 types of descriptors.

“We worked with information from 150 different databases, so first we had to combine it, i.e. put it all in the same ‘language.’ And then we transformed that knowledge into numerical descriptors that could be interpreted by algorithms, that way we could exploit these networks and connections computationally,” said Adria Fernandez, first author of the article and a doctoral student in the same lab.

Bioteque will be expanded periodically with new databases, as they are published. no Tool, databases and algorithms are open access.