(Laboratory of Human Genetics of Infectious Diseases, The Rockefeller University, New York, USA, 10065)
ABSTRACT: Nowadays, the biomedicine researches are highly benefited by the development and improvement of biotechnology, and these technologies produced huge amounts of biological data. It becomes very important to find better approaches to integrate and apply these complex biological data to support and drive the biomedical studies. Computational biology was emerged for this purpose, by covering and fusing multiple domains of sciences, including computer science, biological science, medicine, chemistry, pharmacy, mathematics, and statistics. This commentary article will briefly introduce the field of computational biology, elaborate why it is highly demanded, and discuss its current situations and opportunities.
KEY WORDS: computational biology; biological data; computational biology talents; current situation
WhatiscomputationalbiologyMost of the readers might be more familiar with bioinformatics, which is essential data acquisition and software application for supporting biological studies. Computational biology is a broader discipline than bioinformatics, as it also includes the development of innovative, integrative and sophisticated computational methods and tools to not only facilitate but also drive the biological researches. Nature Journal defines computational biology as: “an interdisciplinary field that applies and develops computational methods to analyse large collections of biological data to make new predictions or discover new biology”[1]. MIT uses a number of keywords to describe the field of computational biology: “gene expression and regulation; DNA, RNA, protein sequence, structure, and interactions; molecular evolution; protein design; network and systems biology; machine learning; quantitative and analytical modeling”[2]. This field has been emerged since early 2000, as Science Journal and Nature Journal published two articles entitled by using the name of the field in the same year of 2002: “Computational Biology”[3]and “Computational systems biology”[4]respectively.
Computational biology is like an eco-system that integrates computer science and biological science, and mutually benefits both domains: the biological studies often raise questions and encounter difficulties that need computational support to deal with; the computational work can assist or even drive biological discoveries in a more efficient, analytical, practical, and novel manner; the biological studies will then validate, polish, improve and react to the computational work; the successful applications of computational methods on biological studies testify the usefulness of computational biology and endeavor more efforts to this field.
The education of computational biology usually involves the participation from multiple departments, mainly spanning across computer science and biological science, while also including mathematics, statistics, medicine, chemistry, and pharmacy. This training would produce individuals who would be equally at ease with programming, algorithm design, database development, mathematical and statistical analysis as they would be with biology, biochemistry, genomics and pharmaceutics. The multi-disciplinary curriculum of computational biology seems extremely tough and challenging, but it is very well-recognized and fast emerging in scientific researches, with the potential to impact and benefit the whole spectrum of biomedical science[5].
Whyiscomputationalbiologydemanding
In a big picture, we are now experiencing the remarkably explosive growth of biomedicine and healthcare data, and there are different types of data including genomics, transcriptomics, proteomics, interactions, structures, pathways, as well as text-based data like medical records and clinical trial reports. These data can be extracted from different resources (e. g. human tissues, human populations), examined under different conditions (e. g. healthy, disease, therapeutic, environmental), and generated by different technologies (e. g. sequencing, imaging). The volume and the complexity of these data make it very hard to be well understood and interpreted. It requires novel and state-of-the-art computational approaches and instrumentations to efficiently integrate and analyze these data, thus to identify meaningful patterns from the large amounts of data. It also provides opportunities to design, develop and validate new computational methods and tools to explore biological data better.
In a zoomed picture of one laboratory, the biologists want to discover the disease-causing genes and mutations, so they spent a lot of time and money on preparing and running the whole-genome sequencing and RNA sequencing on the patients of their interests. They will receive the data from the sequencing center or company, with a preliminary analysis report. However, the report might not be comprehensive or exciting, which will probably in turn disappoint the biologists and prevent the discovery from moving forward. There are so many other state-of-art analyses tools, prediction tools, and public databases that could be applied to examine the sequencing data, but because of the limited computational biology expertise, the discovery will be highly depended on the report provided by the sequencing center or company. It is a big pity. to waste projects, funds, and time. Therefore, the person knowing both computation and biology will be highly needed to mine the data, and make more comprehensive analyses.
Currentsituation,difficulty,andsuggestions
In US, Europe, Japan and Singapore, the education of computational biology has been implemented from undergraduate level to PhD level for almost two decades, which has produced a lot of professionals in the field. However, in China, there is no undergraduate education for computational biology in any university, to the best of my knowledge. We have some graduate programs on computational biology, but most of the graduate students were previously educated in either biological science or computational science only Therefore, they have to gradually start to learn some skills from the other field during working.
On one hand, in biomedical researches, there are increasingly demands to incorporate computational work to support biomedical studies. People in this field are mostly trained in biological science and medicine, with little experience in computation and programming. They usually learn some basic tools as needed to support their currently ongoing studies, lacking the overall knowledge and expertise in computational science to create new interdisciplinary ideas and discoveries. On the other hand, the computational and AI industry is now moving fast and deep into the biomedical and healthcare industry. However, most of the people in computational and AI industry are purely from the computer science background, having very limited knowledge on biology or medicine. A good and practical computational product for biomedical purposes highly rely on a good understanding and a rational integration of biological data, otherwise the computational product could be overfitting, impractical, misleading, or occurring other problems. Hence, the field of computational biology in China is now mostly mixed by people from different backgrounds, lacking the individuals with integrative knowledge and expertise in both domains to initiate, manage, supervise, and evaluate the projects that bridge computer science and biological science together.
Therefore, we need to implement and promote such multi-disciplinary undergraduate education in universities in China. It will be very difficult to ask a high school graduate to choose computational biology as his or her major in the university, because this field does not make much sense to him or her before entering the university. Alternatively, we could organize some seminars to advertise computational biology for all the first-year students, and allow the students who are really interested in this field to choose to shift their main major to computational biology or take a second major in computational biology in their second or third year in the university. Once they complete the undergraduate education, they could further pursue a master and PhD degree in computational biology with specialization in certain biomedical studies (e. g. cancer genomics, drug design, bioimaging), or certain computational studies (e. g. software development, database development, machine learning), or both. These graduates could also join biomedical companies, pharmaceutical companies, or AI companies, which have openings for people with computational biology background.
Conclusions
In the next decade, the volume of biological data will be remarkably huger, the complexity of biological data will be much higher, the computational power will be further improved, and the high-performance computers or cloud servers will be widely popularized. Therefore, the individuals with the computational biology expertise will have much expanded space and many more opportunities to make groundbreaking discoveries, that others cannot do[5-6].
In spite of the importance of computational biology in contributing to the scientific researches as we mentioned in this commentary article, two aspects that have not changed are: the close collaboration with biologists and clinicians, and the experimental validation of the hypotheses or predictions made by computational analyses in a real biological system.