Coloquio Queretano del IMUNAM - Juriquilla
From best matches to gene families: How to use paralogs in phylogenomics
Peter F. Stadler, University of Leipzig
Martes 6 de agosto de 2019
Sala A2 del CAC, 17:00 horas
Best match graphs (BMGs) arise naturally as the first processing intermediate in algorithms for orthology detection. Let T be a phylogenetic (gene) tree T and sigma an assignment of leaves of T to species. The best match graph (G,sigma)
is a digraph that contains an arc from x to y if the genes x and y reside in different species and y is one of possibly many (evolutionary) closest relatives of x compared to all other genes contained in the species sigma(y). I will give two alternative characterizations of BMGs and show that a minimally resolved tree that explains a BMG can be reconstructed in cubic time. The symmetric part of a BMGs represents the empirical estimate for the orthology relation on the gene set as inferred from a reciprocal best match heuristic.
BMGs are therefore close relatives of co-graphs, which describe perfect duplication/speciation scenarios. Whenever a BMG deviates from a cograph structure, this implies that the reciprocal best match heuristic has produced incorrect orthology assignments. A reasonable approach therefore it to correct the data by editing the BMG into its nearest co-graph. Cographs, in turn, are equivalent to event-labeled gene trees that identify duplication and speciation events. These trees also impose constraints on the species tree and the possible reconciliation maps. Taken together, therefore, it is possible to start from reciprocal best matches of the proteoms of a set of species and
eventually arrive at the phylogenetic tree of these taxa without the use of a conventional tree reconstruction method. In fact, an analysis of the workflow show that it only makes use of gene duplication events, while sets of 1-1 orthologs do not contribute at all. In this sense the approach is orthogonal to classical phylogenetic methods.