Skip to main content

CSC 442 – Introduction to Data Science

Catalog Description:

Overview of data structures, data lifecycle, statistical inference. Data management, queries, data cleaning, data wrangling. Classification and prediction methods to include linear regression, logistic regression, k-nearest neighbors, classification and regression trees. Association analysis. Clustering methods. Emphasis on analyzing data, use and development of software tools, and comparing methods.

Contact Hours:

  • Lecture: 3 hours

Prerequisites: [MA305 or MA405] and [ST305 or ST312 or ST370 or ST372
Co-requisites: None
Restrictions: None
Coordinator: Dr. Rada Chirkova
Textbook: Data Mining with R

Course Outcomes:

Our goal is to help you gain skills in handling and analyzing data from ‘end to end.’ To mirror data science working environments, some of your assignments and in-class work will be done in multidisciplinary teams. By the end of this course, we want you to be able to:

  • Use software tools to do data cleaning and data wrangling
  • Use software tools to view and query stored data in a variety of formats
  • Use software tools to implement k-nearest-neighbors, tree-based methods, multiple regression, and logistic regression for classification
  • Develop association rules using the a priori algorithm
  • Use software tools to implement k-means and hierarchical clustering
  • Evaluate and compare the performance of classification algorithms

Topics:

  • What is Data Science?, Syllabus Review, Course Project, Course Software
  • Statistics and Data Structures Boot Camps
  • SQL, Business Intelligence, NoSQL
  • Geospatial Analytics
  • Introduction to R
  • Visualization
  • Prediction and Classification: Performance Metrics, K-nearest neighbors, tree-based methods, multiple regression, logistic regression
  • Using R with large data sets
  • Large-scale data analysis
  • Cluster Analysis: Distance metrics, k-means clustering, hierarchical clustering
  • Association Analysis
  • Time Series