SAS-meharry Academic Specialization
in Data Wrangling

The SAS Academic Specialization in Data Wrangling is a certificate offered through the SACS extension and special program. Students will develop a mastery of the fundamental concepts, problems, principles, techniques, and issues that comprise the field of organization and management of information. They will use Excel, SAS, Python, SQL, QGIS, and R, to prepare data from disparate sources for statistical analysis and decision modeling. After completing the three-module program, students will receive graduate credit for three courses at Meharry Medical College.

Students who successfully complete this specialization will:
  • Be skilled in the use of a range of general techniques and specific software tools for the storage, description, categorization, linking and discovery of resources.
  • Be familiar with the nature and purpose of a range of standard schemata, frameworks, languages, vocabularies, formats, models and rule-sets.
  • Have knowledge of appropriate criteria on which to base choices among available approaches, techniques, tools and standards.
  • Appreciate that effective organization of resources is a necessary condition to properly access, and engagement with, those resources.
  • Have knowledge of appropriate criteria on which to base choices among available approaches, techniques, tools and standards.
  • Understand the statistical difference between observations and variables, along with knowledge of the different scales of measurement.
The SAS Academic Specialization in Data Wrangling is divided into three modules.

Introduction to the basic foundations of computer programming for data science, using Python, R, and SAS as problem solving tools.

  • Introduction to Python.
    Python syntax to write basic computer programs; Using the interpreter; Built-in and user-defined functions; Introduction to object-oriented programming in Python.
  • Introduction to R.
    Simple graphing; R Basics: variables, strings, vectors; Data Structures: arrays, matrices, lists, dataframes; Programming Fundamentals: conditions and loops, functions, objects and classes, debugging.
  • Introduction to SAS Programming.
    The SAS Operating Environment; Understanding Data and the quality characteristics it exhibits; SAS Programming Essentials: SAS Program Structure, SAS Program Syntax; Getting Data In and Out of SAS; Printing and Displaying Data; Introduction to SAS Graphics.

In this module you will use Excel, SAS, and R to develop Data Conscientiousness by:

  • Learning to immediately recognize the issues involved in data organization that will need to be addressed to tackle a specific problem.
  • Developing skills in all of the preprocessing, scrubbing, cleaning tools (“search and rescue” operations), data imputation and handling of missing values, checking for adherence to data standards, and all of the rest of the time-consuming and dirty work of data projects.
  • Linking structured and unstructured data sources, and recognizing how to reshape data to get it into a computer-friendly format (i.e., rows and columns) required by analytical and statistical methods.
  • Gaining a gentle introduction to statistics to enable understanding of the statistical difference between observations and variables, along with knowledge of the different scales of measurement so as not to end up with nonsensical analytical results.

Module 3 covers the concepts and structures used to store, analyze, manage and present (visualize) information and navigation using Python, SQL, SAS, and QGIS. Topics will include information analysis and organizational methods, and metadata concepts and applications. Students will identify disparate data sources needed to perform analysis for a given real-world problem. Typically, data from a single source will not be adequate to perform the required analysis. Students will pull data from the disparate data sources and import it into SAS by using several SAS procedures to detect invalid data; format, validate, and clean the data; and impute the data if it is missing. This will prepare the data for statistical analysis and decision modeling in SAS.

  • Python Lists, Sets, Strings, Tuples, and Dictionaries; Reading and manipulating CSV files, and the Numpy library; Introduction to the abstraction of the Series, Pandas, and DataFrame as the central data structures for data analysis, along with tutorials on how to use functions such as Groupby, Merge, and Pivot Tables effectively.
  • Introduction to Databases and basic SQL; Using string patterns and ranges to search data and to sort and group data in result sets; Working with multiple tables in a relational database using Join Operations; Using Python to connect to databases and then creating tables, loading data, querying data using SQL, and analyzing data using Python.
  • Introduction to Data Step in SAS; Processing Data in Groups; Manipulating Data with Functions; Data Extraction and Preparation, Concatenating, Merging and Interleaving Tables; Using SQL in SAS to query and join tables.
  • Using QGIS to prepare comprehensive plans to manage spatial and non-spatial data, including health-related data; building versioned enterprise databases; and knowing how to implement best practices for managing databases for various projects, including health projects and organizations.
  • Data Management/Reporting Project, using SAS as the development platform, to reinforce practical understanding of data manipulation, analysis and reporting.

You will be assisted to identify disparate data sources needed to perform analysis for a given real-world problem. Typically, data from a single source will not be adequate to perform the required analysis. You will pull data from the disparate data sources and import it into SAS, and use several

Students who successfully complete Modules 1, 2 and 3 will receive graduate credit at Meharry Medical College for the following three credit hour courses.

  • DS 510 Computer Programming Foundations for Data Science (for successfully completing Module 1)
  • DS 515 Data Conscientiousness (for successfully completing Module 2)
  • DS 525 Data Management Foundations for Data Science (for successfully completing Module 3)

All students who complete the SAS-Meharry Academic Specialization in Data Wrangling will receive this badge from Meharry and SAS as their certificate.

Recommended reading materials
Program Brochures
Apply Now