Speaker: Prasenjit Mitra , Stanford University
Resolving Semantic Heterogeneity and Enabling Interoperation Among Information Sources
Abstract: The Internet has provided easy access to very large amounts of information. The ability to compose information from multiple sources is crucial in order to derive more knowledge from diverse sources. A major roadblock to composing information from distributed, autonomous information sources is their semantic heterogeneity. Interoperation provides a scalable alternative to integration, where the responsibility of maintenance as sources change falls on the integrator.
In this talk, I will outline algorithms that resolve semantic heterogeneity among information sources semi-automatically using algorithms that use techniques from information retrieval and natural language processing. The algorithms form the basis for an articulation generator - a tool for establishing rules expressing semantic correspondence among information sources. I will also briefly outline an algebra and show how the composition of information from multiple sources can be optimized. Finally, I will sketch how our tool has the potential to be used in several fields, like bioinformatics, and e-commerce, where data from diverse sources need to be composed.
Short Bio: Prasenjit Mitra is a Ph.D. candidate in Electrical Engineering at Stanford University. He obtained a Master of Science degree in Computer Science from the University of Texas at Austin in 1994 and a Bachelor of Technology with Honours in Computer Science and Engineering from the Indian Institute of Technology, Kharagpur. His primary research interests are in enabling interoperation among heterogenous database systems and its applications in various fields like electronic commerce, and bioinformatics. He also has broad interests in topics in database systems, data mining, knowledge discovery, and information retrieval.
Host: J. Doyle and M. Singh, Computer Science, NCSU