CSC 442 - Introduction to Data Science
Catalog Description:Overview of data structures, data lifecycle, statistical inference. Data management, queries, data cleaning, data wrangling. Classification and prediction methods to include linear regression, logistic regression, k-nearest neighbors, classification and regression trees. Association analysis. Clustering methods. Emphasis on analyzing data, use and development of software tools, and comparing methods.
Contact Hours:
- Lecture: 3 hours
Co-requisites: None
Restrictions: None
Coordinator: Dr. Rada Chirkova
Textbook: Data Mining with R
Course Outcomes:
Our goal is to help you gain skills in handling and analyzing data from 'end to end.' To mirror data science working environments, some of your assignments and in-class work will be done in multidisciplinary teams. By the end of this course, we want you to be able to:
- Use software tools to do data cleaning and data wrangling
- Use software tools to view and query stored data in a variety of formats
- Use software tools to implement k-nearest-neighbors, tree-based methods, multiple regression, and logistic regression for classification
- Develop association rules using the a priori algorithm
- Use software tools to implement k-means and hierarchical clustering
- Evaluate and compare the performance of classification algorithms
Topics:
- What is Data Science?, Syllabus Review, Course Project, Course Software
- Statistics and Data Structures Boot Camps
- SQL, Business Intelligence, NoSQL
- Geospatial Analytics
- Introduction to R
- Visualization
- Prediction and Classification: Performance Metrics, K-nearest neighbors, tree-based methods, multiple regression, logistic regression
- Using R with large data sets
- Large-scale data analysis
- Cluster Analysis: Distance metrics, k-means clustering, hierarchical clustering
- Association Analysis
- Time Series
See Course Listings