Speaker: Shibu Yooseph , Celera/Applied Biosystems
Haplotyping as Perfect Phylogeny: A Direct Approach
Abstract: A full Haplotype Map of the human genome will prove extremely valuable as it will be used in large-scale screens of populations to associate specific haplotypes with specific complex genetic-influenced diseases. A prototype Haplotype Mapping strategy is presently being finalized by an NIH working group. The biological key to that strategy is the surprising fact that genomic DNA can be partitioned into long blocks where genetic recombination has been rare, leading to strikingly fewer distinct haplotypes in the population than previously expected. In this talk we explore the algorithmic implications of the "no-recombination in long blocks" observation, for the problem of inferring haplotypes in populations. This assumption, together with the standard population-genetic assumption of infinite sites, justifies a model of haplotype evolution where the haplotypes in a population are assumed to evolve along a coalescent, which as a rooted tree is a perfect phylogeny.
We consider the following algorithmic problem, called Perfect Phylogeny Haplotyping problem (PPH), which is as follows - given n genotypes, does there exist a set of at most 2n haplotypes such that each genotype is generated by a pair of haplotypes from this set, and such that this set can be derived on a perfect phylogeny? We present a simple and efficient algorithm that solves this problem, and produces a linear-space data-structure to represent all of the solutions. In addition, we prove a non-trivial upper bound on the number of PPH solutions, showing that that number is vastly smaller than the number of haplotype solutions (each solution being a set of n pairs of haplotypes that can generate the genotypes) when the perfect phylogeny requirement is not imposed. (This is joint work with Vineet Bafna, Dan Gusfield, and Giuseppe Lancia)
Short Bio: Shibu Yooseph obtained his PhD in Computer and Information Science from Univ. of Pennsylvania in 1997. His thesis advisor was Prof. Tandy Warnow and his thesis focused on algorithms for phylogeny construction. From 1997-98 he was a DIMACS postdoctoral fellow; there he worked on resource allocation problems in networks. From 1998-2000, he worked as a research associate with Prof. Michael Waterman at the University of Southern California on gene family evolution. He has been at Celera/Applied Biosystems since 2000, working in functional genomics and proteomics.
Hosts: Jon Doyle, Computer Science, NCSU