Questions?

Contact SACS enrollment management.

sacsenrollment@mmc.edu

**MS Data Science Curriculum**

**MS Data Science Curriculum**

Our professional program will prepare you to analyze big data and communicate your findings to influence decisions in any industry. In interactive classes, you will learn every aspect of data science. The curriculum includes programming languages, data infrastructure, data collection, data engineering, machine learning, statistical inference and data visualization. In the comprehensive capstone course, you will apply your data science education to a real-world business problem in a domain of your interest.

**The MS Data Science curriculum encompasses all important aspects of data science, including:**

- Programming languages (Python, R, SAS, SQL)
- Mainstream computer programming for data science
- Statistical inference and decision modeling
- Big data management and analytics
- Artificial intelligence and computational machine learning

- NLP and text analytics
- Predictive modeling and analytics
- Visualization and unstructured data analysis
- Data conscientiousness
- Ethics of data science

**Degree Requirements**

14 courses, 42 graduate credits

Students will gain a common background in data science through thirteen core courses. The degree culminates with a comprehensive real-life, industry-type capstone, oriented toward the student’s domain of interest.

**COURS**ES

Courses are offered on a Fall, Spring and Summer Semester schedule. The Fall and Spring Semester includes three concurrent courses, Each class is held once per week, for three hours. The Summer Semester has one course. It will take two years to complete the program.

3 credit hours, Spring, Summer. Pre-requisite(s): None.

Introduction to computer programming for data science using Python, R, and SAS.

- Introduction to Python. Python syntax to write basic computer programs; Using the interpreter; Built-in and user-defined functions; Introduction to object-oriented programming in Python.
- Introduction to R. Simple graphing; R Basics: variables, strings, vectors; Data Structures: arrays, matrices, lists, dataframes; Programming Fundamentals: conditions and loops, functions, objects and classes, debugging.
- Introduction to SAS Programming. The SAS Operating Environment; SAS Programming Essentials: SAS Program Structure, SAS Program Syntax; Getting Data In and Out of SAS; Printing and Displaying Data; Introduction to SAS Graphics.

There are no pre-requisites for this course. Students are expected to have a working familiarity with the discipline of data science and analytics and general knowledge about the impacts of Big Data in businesses and corporations. All students should have a working knowledge of all aspects of Microsoft Office; and it goes without saying that they should be familiar with Internet access and usage.

3 credit hours, Spring & Summer. Pre-requisite(s): None.

Using Excel, JavaScript, Python, SAS, SQL, and R to develop Data Conscientiousness: ability to immediately recognize the issues involved in data organization that will need to be addressed to tackle a specific problem. Developing skills in all of the preprocessing, scrubbing, cleaning tools (“search and rescue” operations), data imputation and handling of missing values, checking for adherence to data standards, and all of the rest of the time-consuming and dirty work of data projects. Linking structured and unstructured data sources and recognizing how to reshape data to get it into a computer-friendly format (i.e., rows and columns) required by analytical and statistical methods. A gentle introduction to statistics to enable understanding of the statistical difference between observations and variables, along with knowledge of the different scales of measurement so as not to end up with nonsensical analytical results.

There are no pre-requisites for this course. Students are expected to have a working familiarity with the discipline of data science and analytics and general knowledge about the impacts of Big Data in businesses and corporations. All students should have a working knowledge of all aspects of Microsoft Office; and it goes without saying that they should be familiar with Internet access and usage.

3 credit hours, Fall, Spring. Pre-requisite(s): Undergraduate Calculus or Elementary Statistics.

Techniques for building and interpreting mathematical models of real-world phenomena in and across multiple disciplines, including linear algebra, discrete mathematics, probability, and calculus, with an emphasis on applications in data science and data engineering. Introduction to statistical methods that are used to solve data problems. Topics include sampling and experimental design, group comparisons, parametric statistical models, multivariate data visualization, multiple linear regression, and classification. Students will obtain hands on experience in implementing a range of commonly used statistical methods on numerous real-world datasets.

3 credit hours, Spring & Summer. Pre-requisite(s): MSDS 510, 515.

The concepts and structures used to store, analyze, manage, and present (visualize) information and navigation using Python, SQL, SAS, and QGIS. Topics will include information analysis and organizational methods, and metadata concepts and applications. Students will be assisted to identify disparate data sources needed to perform analysis for a given real-world problem. Typically, data from a single source will not be adequate to perform the required analysis. Students will pull data from the disparate data sources and import it into SAS and use several SAS procedures to detect invalid data; format, validate, clean the data; and impute the data if it is missing. This will prepare the data for statistical analysis and decision modeling in SAS.

- Python Lists, Sets, Strings, Tuples, and Dictionaries; Reading and manipulating CSV files, and the Numpy library; Introduction to the abstraction of the Series, Pandas, and DataFrame as the central data structures for data analysis, along with tutorials on how to use functions such as Groupby, merge, and pivot tables effectively.
- Introduction to Databases and basic SQL; Using string patterns and ranges to search data and to sort and group data in result sets; Working with multiple tables in a relational database using join operations; Using Python to connect to databases and then create tables, load data, query data using SQL, and analyze data using Python.
- Introduction to Data Step in SAS; Processing Data in Groups; Manipulating Data with Functions; Data Extraction and Preparation, Concatenating, Merging and Interleaving Tables; Using SQL in SAS to query and join tables.
- Preparing comprehensive plans to manage spatial and non-spatial health-related data; building versioned enterprise databases; and knowing how to implement best practices for managing databases for health projects and organizations.

3 credit hours, Spring & Summer. Pre-requisite(s): MSDS 510, (MSDS 520 or MSBD 520).

Regression Models and Analysis of Variance (SAS, R). Confidence Interval; Parameter Estimation, Fitting Distributions; Testing Hypothesis, Goodness of Fit; Summarizing Data; Comparing Two Samples; ANOVA; Categorical Data; Least Squares Method.

3 credit hours, Fall, Spring. Pre-requisite(s): MSDS 525.

This course covers other useful mainstream programming languages for data science, beyond Python, R, SQL, and SAS. These “other” potential programming languages supplement the ability to crunch numbers and equip the data scientist with good all-round programming skills. Programming languages covered will vary depending on industry popularity. While some of the programming languages may not be covered in detail, examples include Java, Scala, Julia, MATLAB, JavaScript, TensorFlow, Go, Spark.

3 credit hours, Fall, Summer. Pre-requisite(s): (MSDS 530 or MSBD 530), (MSDS 535 or MSBD 540).

Introduction to machine learning with business applications. Survey of machine learning techniques, including traditional statistical methods, resampling techniques, model selection and regularization, tree-based methods, principal components analysis, cluster analysis, artificial neural networks, and deep learning. Students implement machine learning models with open-source software for data science. They explore data and learn from data, finding underlying patterns useful for data reduction, feature analysis, prediction, and classification.

3 credit hours, Fall, Summer. Pre-requisite(s): MSDS 530, 535.

(SAS, Python, SQL, MapReduce/Hadoop). An overview of modern data science: the practice of obtaining, storing, modeling, manipulating, analyzing, and interpreting data. Emerging Big data processing frameworks. NoSQL storage solutions. Memory resident databases and graph databases. Ability to initiate and design highly scalable systems that can accept, store, and analyze large volumes of unstructured data in batch mode and/or real time. Organization, administration, and governance of large volumes of both structured and unstructured data.

3 credit hours, Fall, Spring. Pre-requisite(s): MSDS 550.

(Python, SAS, R). A comprehensive review of text analytics and natural language processing with a focus on recent developments in computational linguistics and machine learning. Students work with unstructured and semi-structured text from online sources, document collections, and databases. Using methods of artificial intelligence and machine learning, students learn how to parse text into numeric vectors and to convert higher dimensional vectors into lower dimensional vectors for subsequent analysis and modeling. Applications include speech recognition, semantic processing, text classification, relevant search, recommendation systems, sentiment analysis, and topic modeling. This is a project-based course with extensive programming assignments.

3 credit hours, Fall, Spring. Pre-requisite(s): MSDS 550 or MSBD 550.

Tools and techniques for building statistical or machine learning models to make predictions based on data. NLP and Text Analytics, Time Series, Experimentation and Optimization.

3 credit hours, Fall, Spring. Pre-requisite(s): MSDS 520 or MSBD 520.

Data visualization tools and technologies (including SAS Visual Analytics, R and ggplot2, Tableau) essential to analyze massive disparate amounts of information and make data-driven decisions.

3 credit hours, Spring & Summer. Pre-requisite(s): MSDS 530 or MSBD 530.

Analysis of ethical issues, algorithmic challenges, and policy decisions (and social implications of these decisions) that arise when addressing real-world problems through the lens of data science, and the choices we make at the different stages of the data analysis pipeline, from data collection and storage to understand feedback loops in analysis.

3 credit hours, Spring, Summer. Pre-requisite(s): MSDS 565, 575.

The research process investigating information needs, creation, organization, flow, retrieval, and use. Stages include: research definition, question, objectives, data collection and management, data analysis and data interpretation. Techniques include: observation, interviews, questionnaires, and transaction-log analysis.

3 credit hours, Fall, Spring. Pre-requisite(s): MSDS 580.

Comprehensive real-life industry-type capstone, oriented toward the student’s domain of interest. Projects will include: formulation of a question to be answered by the data; collection, cleaning and processing of data; choosing and applying a suitable model and/or analytic method to the problem; and communicating the results to a non-technical audience.