SAS-meharry Academic Specialization
in Data Wrangling

Live classes
Engage in group discussions with professors and peers.

100% online
Hands-on learning from anywhere

4 courses
That align with the master’s programs if you decide to continue on that path.

$12,984
Tuition for specialization. For students starting in Fall 2023. Tuition is reviewed each year. Additional fees will apply.

The SAS Academic Specialization in Data Wrangling is a certificate offered through the SACS extension and special program. Students will develop a mastery of the fundamental concepts, problems, principles, techniques, and issues that comprise the field of organization and management of information. They will use Excel, SAS, Python, SQL, QGIS, and R, to prepare data from disparate sources for statistical analysis and decision modeling. After completing the four-module program, students will receive graduate credit for four courses at Meharry Medical College.

Students who successfully complete this specialization will:
  • Be skilled in the use of a range of general techniques and specific software tools for the storage, description, categorization, linking and discovery of resources.
  • Be familiar with the nature and purpose of a range of standard schemata, frameworks, languages, vocabularies, formats, models and rule-sets.
  • Have knowledge of appropriate criteria on which to base choices among available approaches, techniques, tools and standards.
  • Appreciate that effective organization of resources is a necessary condition to properly access, and engagement with, those resources.
  • Have knowledge of appropriate criteria on which to base choices among available approaches, techniques, tools and standards.
  • Understand the statistical difference between observations and variables, along with knowledge of the different scales of measurement.

The SAS Academic Specialization in Data Wrangling is divided into three modules.

Introduction to the basic foundations of computer programming for data science, using Python, R, and SAS as problem solving tools.

  • Introduction to Python.
    Python syntax to write basic computer programs; Using the interpreter; Built-in and user-defined functions; Introduction to object-oriented programming in Python.
  • Introduction to R.
    Simple graphing; R Basics: variables, strings, vectors; Data Structures: arrays, matrices, lists, dataframes; Programming Fundamentals: conditions and loops, functions, objects and classes, debugging.
  • Introduction to SAS Programming.
    The SAS Operating Environment; Understanding Data and the quality characteristics it exhibits; SAS Programming Essentials: SAS Program Structure, SAS Program Syntax; Getting Data In and Out of SAS; Printing and Displaying Data; Introduction to SAS Graphics.

In this module you will use Excel, SAS, and R to develop Data Conscientiousness by:

  • Learning to immediately recognize the issues involved in data organization that will need to be addressed to tackle a specific problem.
  • Developing skills in all of the preprocessing, scrubbing, cleaning tools (“search and rescue” operations), data imputation and handling of missing values, checking for adherence to data standards, and all of the rest of the time-consuming and dirty work of data projects.
  • Linking structured and unstructured data sources, and recognizing how to reshape data to get it into a computer-friendly format (i.e., rows and columns) required by analytical and statistical methods.
  • Gaining a gentle introduction to statistics to enable understanding of the statistical difference between observations and variables, along with knowledge of the different scales of measurement so as not to end up with nonsensical analytical results.

There are two options for Module 3: Mathematical and Statistical Foundations for Data Science or Introduction to Biostatistics.

This module intended to provide students with an overview of fundamental methods and concepts in statistics used in the design and analysis of real-world phenomena across multiple disciplines. The topic will include multivariate data visualization, probability, sampling distribution, group comparisons, parametric statistical models, estimation, significance, confidence level, analysis of variance, and simple linear regression. Students who successfully complete the course will obtain hands on experience in implementing a range of commonly used statistical methods on broad real-world applications. The course provides hands-on training with SAS applications to prepare students for real life data collection and analysis.

There are two options for Module 3: Mathematical and Statistical Foundations for Data Science or Introduction to Biostatistics.

This module will be focused on providing a conceptual foundation for the study of biostatistical methods that will build statistical intuition. It covers principles of inference and the analysis of real-world data. Descriptions and derivations of statistical methods as well as demonstrations of these methods using SAS. Topics include basic concept in analysis, presentation of data, and statistical aspects of design of studies, with emphasis on probability, commonly used sampling distributions, parametric and nonparametric hypothesis tests, point estimator, confidence intervals, analysis of two-way contingency table, analysis of variance, and simple linear regression. Special emphasis will be given to application of statistical methods to public health, medical, and health sciences. The course provides hands-on training with SAS applications to prepare students for real life data collection and analysis.

Module 4 covers the concepts and structures used to store, analyze, manage and present (visualize) information and navigation using Python, SQL, SAS, and QGIS. Topics will include information analysis and organizational methods, and metadata concepts and applications. Students will identify disparate data sources needed to perform analysis for a given real-world problem. Typically, data from a single source will not be adequate to perform the required analysis. Students will pull data from the disparate data sources and import it into SAS by using several SAS procedures to detect invalid data; format, validate, and clean the data; and impute the data if it is missing. This will prepare the data for statistical analysis and decision modeling in SAS.

  • Python Lists, Sets, Strings, Tuples, and Dictionaries; Reading and manipulating CSV files, and the Numpy library; Introduction to the abstraction of the Series, Pandas, and DataFrame as the central data structures for data analysis, along with tutorials on how to use functions such as Groupby, Merge, and Pivot Tables effectively.
  • Introduction to Databases and basic SQL; Using string patterns and ranges to search data and to sort and group data in result sets; Working with multiple tables in a relational database using Join Operations; Using Python to connect to databases and then creating tables, loading data, querying data using SQL, and analyzing data using Python.
  • Introduction to Data Step in SAS; Processing Data in Groups; Manipulating Data with Functions; Data Extraction and Preparation, Concatenating, Merging and Interleaving Tables; Using SQL in SAS to query and join tables.
  • Using QGIS to prepare comprehensive plans to manage spatial and non-spatial data, including health-related data; building versioned enterprise databases; and knowing how to implement best practices for managing databases for various projects, including health projects and organizations.
  • Data Management/Reporting Project, using SAS as the development platform, to reinforce practical understanding of data manipulation, analysis and reporting.

You will be assisted to identify disparate data sources needed to perform analysis for a given real-world problem. Typically, data from a single source will not be adequate to perform the required analysis. You will pull data from the disparate data sources and import it into SAS, and use several

Students who successfully complete Modules 1, 2, 3 and 4 will receive graduate credit at Meharry Medical College for the following four credit hour courses.

  • MSDS 510 Computer Programming Foundations for Data Science
  • MSDS 515 Data Conscientiousness
  • MSDS 520 Mathematical and Statistical Foundations for Data Science or
    MSBD 520 Introduction to Biostatistics
  • MSDS 525 Data Management Foundations for Data Science

All students who complete the SAS-Meharry Academic Specialization in Data Wrangling will receive this badge from Meharry and SAS as their certificate.