MATH 3080 Foundations of Data Science
- Division: Natural Science and Math
- Department: Mathematics
- Credit/Time Requirement: Credit: 3; Lecture: 3; Lab: 0
- Prerequisites: Math 1210 and (either Math 2040 or Math 3040) with a C or better in each course
- Semesters Offered: Spring
- Semester Approved: Fall 2020
- Five-Year Review Semester: Fall 2025
- End Semester: Spring 2026
- Optimum Class Size: 20
- Maximum Class Size: 25
Course Description
Students will get an introduction to Python programming, data analysis tools, and the necessary statistics to acquire, clean, analyze, explore, and visualize data real-life data sets. Using statistics, students will learn to make data-driven inferences and decisions, and to communicate those results effectively.
Justification
Data collection and the analysis of data is ubiquitous and fast becoming a prerequisite to economic success for businesses. This course provides a subset of the tools necessary to leverage data for prediction. This course will support the bachelor’s in software engineering degree by providing relevant mathematics coursework.
Student Learning Outcomes
- Students will acquire data through we-scraping and data APIs.
- Students will clean and reshape messy datasets.
- Students will learn to use statistical software to deploy statistical methods including generalized linear regression, cluster analysis, and classification.
- Students will apply dimensionality reduction and perform basic analysis of network data.
- Students will evaluate outcomes, make decisions based on data, and effectively communicate those results.
- Students will understand and be able to apply the theoretical foundations underlying the methods applied throughout the course.
Course Content
This course will include introduction to data analysis tools in Python, descriptive statistics, data structures with Numpy & Pandas, introductory hypothesis testing & statistical inference, web scraping and data acquisition via APIs, generalized linear regression, classification methods including logistic regression; k-nearest neighbors; decision trees; support vector machines; and neural networks, data visualization, clustering methods, dimensionality reduction; including principle component analysis; network analysis; rating, ranking, and elections, cleaning and reformatting messy datasets using regular expression or dedicated tools such as open refine; natural language processing; ethics of big data. This course supports a learning environment where perspectives are recognized, respected and seen as a source of strength.
Key Performance Indicators: Student learning will be evaluated through:Attendance / Participation 0 to 15%Class Group Activities 10 to 15%Computer Projects 20 to 50%Quizzes 0 to 20%Homework 5 to 25%Midterm Exams / Tests 20 to 40%Final Exam 15 to 35%Representative Text and/or Supplies: McKinney, W. (current edition). Python for data analysis: Data wrangling with pandas, NumPy, and IPython. Sebastopol, CA: O'Reilly Media.Géron, A. (current edition). Hands-on machine learning with Scikit-Learn and TensorFlow: Concepts, tools, and techniques to build intelligent systems. Beijing; Boston; Farnham; Sebastopol; Tokyo: O'Reilly.A computer and statistical software are required for this course. Free software such as Python or R are recommended, but subscription software (e.g., SAS, SPSS) may be used at the discretion of the instructor.Pedagogy Statement: Instructional Mediums: LectureHybrid