Sharma Thankachan
Bio
Sharma V. Thankachan is an associate professor in the Department of Computer Science at NC State University. His research focuses on string algorithms, compressed data structures, and their applications in areas such as computational biology.
Prior to joining NC State in 2022, Thankachan was an assistant professor at the University of Central Florida. He completed postdoctoral research at the Georgia Institute of Technology and the University of Waterloo. He earned his Ph.D. in computer science from Louisiana State University and a B.Tech. in electrical and electronics engineering from the National Institute of Technology Calicut, India.
Thankachan has advised several doctoral students, including:
-
Mano Prakash Parthasarathi
-
Paul Macnichol
-
Oliver Chubet (co-advised with Donald Sheehy)
His doctoral alumni include:
-
Dr. Paniz Abedin, Assistant Professor of Computer Science at Florida Polytechnic University
-
Dr. Daniel Gibney, Assistant Professor of Computer Science at The University of Texas at Dallas
-
Dr. Sahar Hooshmand, Assistant Professor of Computer Science at California State University, Dominguez Hills
Education
Ph.D. Computer Science Louisiana State University, Baton Rouge 2014
B.Tech Electrical and Electronics Engineering National Institute of Technology, Calicut, India 2006
Area(s) of Expertise
Algorithms and Theory of Computation
Publications
- Special Section: 12th International Computational Advances in Bio and Medical Sciences (ICCABS 2023) , Journal of Computational Biology (2025)
- Editorial: Special Section on Computational Advances in Bio and Medical Sciences , IEEE Transactions on Computational Biology and Bioinformatics (2025)
- Non-overlapping indexing in BWT-runs bounded space , Theoretical Computer Science (2025)
- Bounded-Ratio Gapped String Indexing , Lecture notes in computer science (2024)
- Text Indexing for Faster Gapped Pattern Matching , Algorithms (2024)
- Contextual Pattern Matching in Less Space , 2023 DATA COMPRESSION CONFERENCE, DCC (2023)
- Non-overlapping Indexing in BWT-Runs Bounded Space , Lecture notes in computer science (2023)
- On the Hardness of Sequence Alignment on De Bruijn Graphs , Journal of Computational Biology (2022)
- Ranked Document Retrieval in External Memory , ACM Transactions on Algorithms (2022)
Grants
This project aims to address the following question: How to model the combined information of a pan-genome collection succinctly (and in a biologically meaningful way) such that the genomic analysis on that representation is both easy-to-compute and accurate? Pan-genome collections may be represented as high-scoring Multiple Sequence Alignment (MSA) data, indexed text data, or the more popular graph-based representations (pan-genome graphs). These models need to support read mapping queries efficiently. This research will lead to a new class of string/graph algorithms for the analysis of pan-genomic data.
Being able to store, search, and analyze massive data sets efficiently is one of today's pressing challenges. This project will study a collection of problems under text compression and indexing with tremendous current relevance, owing to a specific characteristic prevalent in many modern text data sets, called high repetitiveness. This characteristic makes the data highly compressible using some specialized schemes. However, the theoretical understanding of those schemes is still in a nascent stage. We will address some of the fundamental open problems on the effectiveness of several schemes that are popular in practice
Honors and Awards
- National Science Foundation Faculty Early CAREER Development Award - 2022