Phylogenetic analysis using Machine learning

3
Copyright © 2021 TutorsIndia. All rights reserved 1 Phylogenetic Analysis Using Machine Learning Dr. Nancy Agnes, Head, Technical Operations, Tutorsindia info@ tutorsindia.com I.INTRODUCTION: The interpretation of the phylogenetic tree is an essential yet challenging aspect of evolutionary studies. To conduct an evolutionary study of the organisms is the core of biological research. The resulting phylogeny is then subjected to a plethora of analyses essential for further genomic research (Azouri 2021). The phylogenetic analysis involves several methods that can be used to interpret data. Recently, researchers have begun studying the use of machine learning in inferring phylogenetic trees. II. PHYLOGENETIC ANALYSIS: The study of the evolutionary history of a species or a group of organisms is known as phylogenetic analysis. Here, the evolutionary relationship between different species or organisms having a common ancestor is represented with the help of branching diagrams. This diagram is called the phylogenetic tree, which can be either rooted or unrooted. Phylogenetic analysis can also be used to study the relationship between characteristics of an organism, including genes and proteins. The applications of phylogenetic analysis are numerous. These include reconstruction of the ancestral gene for the derivation of extant genes, study of human disease and epidemiology, interpretation of the evolution of ecological and behavioural traits, estimation of historical biogeographic relationships, and many more. Interesting Blog: Performance Evaluation Metrics for Machine-Learning Based Dissertation III. CURRENTLY AVAILABLE METHODS FOR INFERENCE: Previously, morphological features were used in the assessment of similarities among species and in phylogenetic analysis. It has drastically changed over time. Nowadays, this analysis uses information extracted from DNA, RNA or protein. The generation of a phylogenetic tree involves the alignment of sequences. The most widely-used tool for this is the alignment-based methodology. In this method, the two sequences are stacked in a way to highlight their common symbols and substrings. This comparison of sequences helps to identify patterns of shared ancestry between species. (Munjal 2019). However, exploiting these large-scale molecular data poses significant challenges. One of the most difficult tasks is to develop effective techniques for the extraction of missing data. The Maximum likelihood or Markov Chain Monte Carlo (MCMC) methods and probabilistic models of sequence evolution are highly reliable statistical methods used for the reconstruction of gene and species trees. Even so, many of these approaches are not scalable enough to study phylogenomic datasets of hundreds or thousands of genes and taxa. Thus, the development of a quick and efficient method is the need of the hour ( Bhattacharjee 2020). IV. APPLICATION OF MACHINE LEARNING:

description

The interpretation of the phylogenetic tree is an essential yet challenging aspect of evolutionary studies. To conduct an evolutionary study of the organisms is the core of biological research. The resulting phylogeny is then subjected to a plethora of analyses essential for further genomic research (Azouri 2021). The phylogenetic analysis involves several methods that can be used to interpret data. Recently, researchers have begun studying the use of machine learning in inferring phylogenetic trees. Contact: 🌐: www.tutorsindia.com 📧: [email protected] 💬(WA): +91-8754446690 🇬🇧(UK): +44-114352002

Transcript of Phylogenetic analysis using Machine learning

Page 1: Phylogenetic analysis using Machine learning

Copyright © 2021 TutorsIndia. All rights reserved 1

Phylogenetic Analysis Using Machine Learning

Dr. Nancy Agnes, Head, Technical Operations, Tutorsindia info@ tutorsindia.com

I.INTRODUCTION:

The interpretation of the phylogenetic tree is an

essential yet challenging aspect of evolutionary

studies. To conduct an evolutionary study of the

organisms is the core of biological research. The

resulting phylogeny is then subjected to a plethora of

analyses essential for further genomic research

(Azouri 2021). The phylogenetic analysis involves

several methods that can be used to interpret data.

Recently, researchers have begun studying the use of

machine learning in inferring phylogenetic trees.

II. PHYLOGENETIC ANALYSIS:

The study of the evolutionary history of a species or a

group of organisms is known as phylogenetic

analysis. Here, the evolutionary relationship between

different species or organisms having a common

ancestor is represented with the help of branching

diagrams. This diagram is called the phylogenetic

tree, which can be either rooted or unrooted.

Phylogenetic analysis can also be used to study the

relationship between characteristics of an organism,

including genes and proteins.

The applications of phylogenetic analysis are

numerous. These include – reconstruction of the

ancestral gene for the derivation of extant genes,

study of human disease and epidemiology,

interpretation of the evolution of ecological and

behavioural traits, estimation of historical

biogeographic relationships, and many more.

Interesting Blog: Performance Evaluation Metrics

for Machine-Learning Based Dissertation

III. CURRENTLY AVAILABLE METHODS FOR

INFERENCE:

Previously, morphological features were used in the

assessment of similarities among species and in

phylogenetic analysis. It has drastically changed over

time. Nowadays, this analysis uses information

extracted from DNA, RNA or protein. The generation

of a phylogenetic tree involves the alignment of

sequences. The most widely-used tool for this is the

alignment-based methodology. In this method, the

two sequences are stacked in a way to highlight their

common symbols and substrings. This comparison of

sequences helps to identify patterns of shared

ancestry between species. (Munjal 2019). However,

exploiting these large-scale molecular data poses

significant challenges. One of the most difficult tasks

is to develop effective techniques for the extraction

of missing data.

The Maximum likelihood or Markov Chain Monte

Carlo (MCMC) methods and probabilistic models of

sequence evolution are highly reliable statistical

methods used for the reconstruction of gene and

species trees. Even so, many of these approaches are

not scalable enough to study phylogenomic datasets

of hundreds or thousands of genes and taxa. Thus, the

development of a quick and efficient method is the

need of the hour ( Bhattacharjee 2020).

IV. APPLICATION OF MACHINE LEARNING:

Page 2: Phylogenetic analysis using Machine learning

Copyright © 2021 TutorsIndia. All rights reserved 2

Machine learning has found various applications in

the field of technology-driven research. One such

usage of machine learning is in the inference of the

phylogenetic tree. In a recent study, researchers

utilized the machine learning method to predict the

best model for the most common prediction task:

phylogenetic tree reconstruction for a given

collection of sequences (Abadi 2020).

A research study gave a detailed analysis of plant

diversity trends to date, demonstrating that using

machine learning to forecast future diversity could be

tremendously beneficial. They applied machine

learning approaches to phylogenetic diversity in

vascular plants (Park 2020). Bhattacharjee et al., for

the very first time, demonstrated the potential and

feasibility of using deep learning techniques to

compute distance matrices. The study evaluated both

matrix factorization (ME) and autoencoder (AE) and

aimed to develop improvised models for better

results. They showed that both these methods are

reliable and can be applied for handling large-scale

datasets. They also highlighted the ability of these

techniques over the heuristic-based techniques to

automatically learn complicated inter-variable

associations. Their research can also be used as a

model for applying machine learning methods to the

phylogenetic analysis (Bhattacharjee 2020).

In another research, a machine learning framework

was developed to rank the neighbouring trees in

accordance with their prosperity to increase the

likelihood. They applied multiple features and

utilized machine learning to improve an optimal tool.

The study suggested specific ways to practice

machine learning algorithms in phylogenetic

analysis. Furthermore, they presented a methodology

that can significantly speed up tree-search algorithms

without sacrificing accuracy(Azouri 2021).

A recent review focused on the application of

machine learning-based techniques in the data

analysis of the human microbiome. It provided an

insight into the plethora of advantages that machine

learning has to offer over classical methods. The

most common techniques covered in this review

involved Support Vector Machines, Random Forest,

k-NN and Logistic Regression. This review

suggested how machine learning can contribute to the

development of new models that can be useful in

predicting classifications in the field of microbiology,

inferring host phenotypes to predict diseases and

characterization of state-specific microbial signatures

using microbial communities(Macros 2021).

FUTURE SCOPE:

All the recently conducted research emphasizes the

potential of artificial intelligence and machine

learning in the inference of phylogenetic trees. These

studies highlight the ability of machine learning in

elevating the scale of analyzed datasets and the

degree of sophistication in evolutionary

models(Azouri 2021). Machine learning can thus be

of high interest in the near future and contribute to

efficient phylogenetic analysis in biological research.

METHODS DOMAIN PURPOSE REFERENCES

Machine learning &

Phylogenetic analysis

TSS (The Substitution

Score)

ISS (Internal substitution

score)

To predict the

pathogenecity of human

mtDNA variants

(Akpinar 2020).

Machine learning &

Phylogenetic analysis

ModelTeller

(computational

methodology)

To examine the accuracy of

phylogenetic analyses,

using machine learning

(Abadi 2020).

Machine learning &

Phylogenetic analysis

Random Forest (RF)

based learning and

NeoPLE (prediction

approach)

To highlight the use of

candidate trees and

successfully establish a

model that can describe the

relationship between

likelihood and extracted

features through the

exploitation of deep

neighbor information of

each individual tree

(Ling 2020)

Page 3: Phylogenetic analysis using Machine learning

Copyright © 2021 TutorsIndia. All rights reserved 3

REFERENCES:

1. Abadi, S.,Avram, O., Rosset, S., Pupko, T.,

& Mayrose, I. (2020). ModelTeller: model

selection for optimal phylogenetic

reconstruction using machine learning.

Molecular Biology and Evolution, 37(11),

3338-3352.

2. Akpınar, B. A., Carlson, P. O., Paavilainen,

V. O., & Dunn, C. D. (2020). A novel

phylogenetic analysis and machine learning

predict pathogenicity of human mtDNA

variants. bioRxiv.

3. Azouri, D., Abadi, S., Manour, Y. Et al.

(2021). Harnessing machine learning to

guide phylogenetic-tree search algorithms.

Nat Commun 12, 1983.

4. Bhattacharjee, A., & Bayzid, M. S. (2020).

Machine learning based imputation

techniques for estimating phylogenetic trees

from incomplete distance matrices. BMC

genomics, 21(1),497.

5. C. Ling, W. Cheng, H. Zhang, H. Zhu and

H. Zhang, “Deep Neighbor Information

Learning From Evolution Trees for

Phylogenetic Likelihood Estimates,” in

IEEE Access, vol. 8, pp. 220692-220702,

2020. Doi: 10.1109/ACCESS.2020.3043150

6. Hillis, David M (1997/03/01). Phylogenetic

analysis. Current Biology, 7, R129-R131.

Doi: 10.1016/S0960-9822(97)70070-8.

7. Macros-Zambrano, L.J., Karaduzovic-

Hadziabdic, K., Loncar Turukalo, T.,

Przymus, P., Trajkovik, V., Aasmets, O.,

Berland, M., Gruca, A., Hasic, J., Hron, K.

And Klammsteinner, T. (2021). Application

of machine learning in human microbiome

studies: a review on feature selection,

biomarker identification, disease prediction

and treatment. Frontiers in microbiology,

12,p.313.

8. Munjal G., Hanmandlu M., Srivastava S.

(2019) Phylogenetics algorithms and

applications. In: Hu YC., Tiwari S., Mishra

K., Trivedi M. (eds) Ambient

communications and computer systems.

Advances in intelligent system and

computing, vol 904. Springer, Singapore.

9. Park, D. S., Willis, C. G., Xi, Z., Kartesz, J.

T., Davis, C. C., & Worthington, S. (2020).

Machine learning predicts large scale

declines in native phylogenetic diversity.

New Phytologist, 227(5), 1544-1566.