Biomedical Data Science Ph.D. Curriculum

The curriculum for the Biomedical Data Science Ph.D. program combines mathematics, computational science, biostatistics, biomedical informatics and computer programming. Your program will begin with foundation and core courses that provide a thorough education in biomedical data science. You will then take biomedical data science electives and research seminar courses before beginning your dissertation.

Highlights include:

  • Mathematical and statistical theory
  • Design and analysis of algorithms
  • Advanced scientific computing
  • Distributed algorithms and optimization
  • Advanced biostatistics
  • Big data management and analytics
  • AI and computational ML
  • Computational software engineering
  • Predictive modeling and analytics
  • Visualization and unstructured data analysis
  • Privacy and Security in health care
  • Ethical, Legal and Societal Issues in health care

Course Schedule Overview

Courses are to be delivered generally on a Fall-Spring Semester basis, and as such, the completion estimate is four years. Where feasible and applicable, special topics, directed research, and dissertation hours may be available during summer semesters. In some cases of summer semesters, the schedule may allow variable hours of each, such as one hour of MSBD 880V.

Foundation Courses (15 hours) **

Leveling Courses

** Note: Students with conditional admission may be subject to additional leveling and preliminary courses as determined by the admissions committee in order to attain regular admission status

For students with conditional admission that do not have the required computing background

3 credit hours. Fall. Pre-requisite(s): Computer programming in an object-oriented programming language, MSDS 510, or equivalent.

Fundamental data structures and algorithms and the tradeoffs between different implementations. Theoretical analysis, implementation, and application. Lists, stacks, queues, heaps, dictionaries, maps, hashing, trees and balanced trees, sets, and graphs. Searching and sorting algorithms.

3 credit hours. Fall, Spring. Pre-requisite(s): CSDS 240 or equivalent.

Basic concepts necessary to design and implement database systems that are free of update anomalies. Extensive use of SQL.

3 credit hours. Fall. Pre-requisite(s): CSDS 240.

Algorithm design analysis, problem solving strategies, proof techniques, complexity analysis, upper and lower bounds, sorting and searching, graph algorithms, geometric algorithms, probabilistic algorithms, intractability and NP-completeness, transformations, and approximation algorithms.

For students with conditional admission that do not have the required math background

3 credit hours, Fall, Spring. Pre-requisite(s): Elementary Statistics.

Principles of biostatistics and the analysis of clinical and epidemiological data. Descriptions and derivations of statistical methods as well as demonstrations of these methods using SAS. Topics include basic analysis methods, elementary concepts, statistical models and applications of probability, commonly used sampling distributions, parametric and nonparametric one and two sample tests, confidence intervals, applications of analysis of two-way contingency table data, simple linear regression, and simple analysis of variance.

3 credit hours. Pre-requisite(s): Instructor approval. 

Principles of biostatistics focusing on statistical modeling approaches to the analysis of continuous, categorical, and survival data. Regression modeling including the links between regression and analysis of variance (parameterization), multiple regression, indicator variables, use of contrasts, multiple comparison procedures and regression diagnostics. The course will generalize these modeling concepts to different types of outcome data including categorical outcomes (i.e., logistic and log-linear modeling) and survival outcomes (i.e., proportional hazards analysis). Students are taught to conduct the relevant analysis using SAS and R.

3 credit hours, Fall, Spring. Pre-requisite(s): Elementary Statistics.

Principles of biostatistics and the analysis of clinical and epidemiological data. Descriptions and derivations of statistical methods as well as demonstrations of these methods using SAS. Topics include basic analysis methods, elementary concepts, statistical models and applications of probability, commonly used sampling distributions, parametric and nonparametric one and two sample tests, confidence intervals, applications of analysis of two-way contingency table data, simple linear regression, and simple analysis of variance.

For students with regular admission or after successful completion of leveling courses listed above if needed

3 credit hours. Fall. Pre-requisite(s): Instructor approval.

Fundamental concepts and methods in bioinformatics; a wide range of topics including sequence homology searching and motif finding, gene finding and genome annotation, protein structure analysis and modeling, genomics and SNP analysis, DNA, RNA, and protein databases, etc. (This is also a leveling course for students without background in biology).

3 credit hours. Fall. Pre-requisite(s): Instructor approval.

This course introduces important topics in computational structural biology: fold recognition, protein structure classification, homology modeling, protein-protein docking, hierarchical docking, assembly modeling using experimental data from multiple sources, prediction of protein-protein networks, genome structures, and others. (This is also a leveling course for students without background in biology).

3 credit hours. Fall. Pre-requisite(s): Instructor approval. 

The basic principles and methods of epidemiology and their applications to public health and medicine; measures of disease frequency and association; epidemiologic study designs; sources of bias and error, screening, and applications to public health (this is also a leveling course for students without background in public health).

3 credit hours. Fall, Spring. Pre-requisite(s): MSBD 710, CSDS 240.

Introduction to systems development for computational science. Design, develop, and deploy a set of software components to produce a scalable, reliable, and reproducible experimental system for scientific investigation. Use a variety of approaches to software development team organization, and select techniques that are appropriate in different circumstances.

3 credit hours. Fall, Spring. Pre-requisite(s): MSDS 515, MMSBD520 or equivalent.

(Basics of R, MATLAB). This course will cover fundamental mathematical background for statistical theories. Probability spaces as models for phenomena with statistical regularity. Discrete spaces (binomial, hypergeometric, Poisson). Continuous spaces (normal, exponential) and densities. Random variables, expectation, independence, conditional probability. The course will cover probabilities, multivariate distribution and special distribution, statistical inference, maximum likelihood methods, sufficiency, test of hypotheses, inference about normal methods, nonparametric statistics, Bayesian statistics.

Core Courses (36 hours)

3 credit hours. Fall, Spring. Pre-requisite(s): CSDS 340.

Deep dive into recent advances in AI in health care, focusing in particular on deep learning approaches for health care problems. Foundations of neural networks. Cutting-edge deep learning models in the context of a variety of health care data including image, text, multimodal and time-series data. Advanced topics on open challenges of integrating AI in a societal application such as health care, including interpretability, robustness, privacy and fairness.

3 credit hours. Spring. Pre-requisite(s): MSBD 710, MSDS 525, 530.

Introduction to machine learning with business applications. Survey of machine learning techniques, including traditional statistical methods, resampling techniques, model selection and regularization, tree-based methods, principal components analysis, cluster analysis, artificial neural networks, and deep learning. Students implement machine learning models with open-source software for data science. They explore data and learn from data, finding underlying patterns useful for data reduction, feature analysis, prediction, and classification.

3 credit hours. Fall, Spring. Pre-requisite(s): MSDS 525, 530.

(SAS, Python, SQL, MapReduce/Hadoop). An overview of modern data science: the practice of obtaining, storing, modeling, manipulating, analyzing, and interpreting data. Emerging big data processing frameworks. NoSQL storage solutions. Memory resident databases and graph databases. Ability to initiate and design highly scalable systems that can accept, store, and analyze large volumes of unstructured data in batch mode and/or real time. Organization, administration and governance of large volumes of both structured and unstructured data.

3 credit hours. Fall, Spring. Pre-requisite(s): MSDS 540, 545.

Tools and techniques for building statistical or machine learning models to make predictions based on data. NLP and Text Analytics, Time Series, Experimentation and Optimization (Python, SAS, R).

3 credit hours. Fall. Pre-requisite(s): MBDS 710.

(SAS Visual Analytics, Tableau). Data visualization tools and technologies essential to analyze massive disparate amounts of information and make data-driven decisions. Information and geographic visualization of health data. Hands-on experience in planning, creating and using compelling multimedia visualizations such as online maps, responsive graphs, interactive animations and GIS dashboards. Use of different visualizations to support various research activities including hypothesis formulation, data synthesis, analysis, and exploration as well as communicate and share health information. Application of usability and user experience (UX) principles to evaluate the extents to which various visualizations meet expectations.

3 credit hours. Fall, Spring. Pre-requisite(s): MSBD 525 or 710, or equivalent.

Utilize current statistical techniques to assess and analyze biomedical and public health related data. Read and critique the use of such techniques in published research. Review of linear models, matrix algebra, and multiple analysis of variance. Introduction to random effects models, understanding and computing power for the GLM, GLM assumption diagnostics, transformations, polynomial regression, coding schemes for regression, multicollinearity. Determine what analytical approaches are appropriate under different research scenarios.

3 credit hours. Fall, Spring. Pre-requisite(s): MSBD 535.

Study of Monte Carlo methods, a diverse class of algorithms that rely on repeated random sampling to compute the solution to problems whose solution space is too large to explore systematically or whose systemic behavior is too complex to model. Introduction to important principles of Monte Carlo techniques and their power. Bayesian analysis and Markov chain Monte Carlo samplers, slice sampling, multi-grid Monte Carlo, Hamiltonian Monte Carlo, parallel tempering and multi-nested methods, and streaming methods such as particle filters/sequential Monte Carlo. Related topics in stochastic optimization and inference such as genetic algorithms, simulated annealing, probabilistic Gaussian models, and Gaussian processes. Applications to Bayesian inference and machine learning. Python or R for all programming assignments and projects.

3 credit hours. Fall, Spring. Pre-requisite(s): MSBD 725

Study of biomedical imaging and diagnostics concepts and methods, including mathematical treatment of tensor data structures, image processing, and methods of analysis. Typical data sets and studies may include radiology and pathology, e.g. CT, PET, SPECT, MRI, microscopy, or ultrasound, and hyperspectral data. Computational studies may be performed in R, Julia, or Python. Upon completion of course, students should be able to apply AI and ML methods (from prior courses) to various biomedical diagnostic imaging.

3 credit hours. Fall, Spring. Pre-requisite(s): Instructor approval.

Epidemiology is a discipline that is essential for understating and solving public health problems. It is a study of advanced analytical methods, tools, and study designs used to investigate disease transmission, chronic illness, and other public health phenomena. It provides a means of assessing the magnitude of public health problems and the success of interventions designed to control them. This course introduces students to the principles of essential issues in epidemiologic methodology. The focus is on how and why a given method, design, or approach might help us explain population health. The emphasis is on the strengths, limitations, and potential alternatives for a given approach. The origins, use, and potential of both classic and cutting-edge methods will be introduced.

3 credit hours. Fall, Spring. Pre-requisite(s): Instructor approval.

Examination of case studies. Introduction to health care law and ethics, making ethical decisions, contracts, medical records and informed consent, privacy law and HIPAA.

3 credit hours. Fall. Pre-requisite(s): None

Security issues related to the safeguarding of sensitive personal and corporate information against inadvertent disclosure. Policy and societal questions concerning the value of security and privacy regulations, the real-world effects of data breaches on individuals and businesses, and the balancing of interests among individuals, government, and enterprises. Current and proposed laws and regulations that govern information security and privacy. Private sector regulatory efforts and self-help measures. Emerging technologies that may affect security and privacy concerns; and issues related to the development of enterprise data security programs, policies, and procedures that take into account the requirements of all relevant constituencies; e.g., technical, business, and legal.

3 credit hours. Fall, Spring. Pre-requisite(s): Instructor approval

Candidacy Exam to demonstrate advanced knowledge of content and materials of the six required classes.

Electives (6 hours)

3 credit hours. Fall, Spring. Pre-requisite(s): Instructor approval

This course will cover the network and graph theory for biomedical data analytics. The representative power of graphs will be used to understand and model networks of biomedical data for various biomedical applications such as protein interaction networks, drug repositioning, genomics, etc. In this course, firstly a brief overview of graph theory will be provided to quantify the structure and interactions of networks, and then various methods and algorithms will be discussed to analyze the biomedical network data. Finally, a range of applications will be studied through real-world biomedical data sets.

3 credit hours. Fall, Spring. Pre-requisite(s): Instructor approval

This course introduces fundamentals of biomedical signal processing along with its applications in wearable sensor devices. The course includes topics on biomedical signal acquisition, techniques on processing the signals captured, including time domain approaches for event detection, time-varying signal processing for understanding the dynamical aspects of complex biomedical systems, and finally the application of machine learning algorithms to build predictive models for early insights on diseases.

3 credit hours. Fall, Spring. Pre-requisite(s): Instructor approval

Advanced deep learning is used on data systems in many ways. The course introduces students to recent developments and advanced state-of-the-art methods in machine learning using deep learning and presents the mathematical, statistical and computational challenges of building stable representations for high-dimensional data, such as images, text, and electronic health records. It aims to help students to become familiar with several deep learning methods, and to code them efficiently in Python using current Pytorch package.

3 credit hours. Fall, Spring. Pre-requisite(s): Instructor approval

Special topics of interest may be offered on demand based upon faculty and student Ph.D. research opportunities or needs.

Research Seminar (6 hours total; at least one hour in each)

Variable hours per semester may be offered (1–3 hours).

The directed reading and research course provides students an opportunity to delve into a special topic of interest related to biomedical data science selected by the student under the guidance of a faculty member. The student and faculty member meet weekly to discuss the readings; the student will be required to write a comprehensive review paper on the semester’s reading.

Variable hours per semester may be offered (1–3 hours).

The course provides doctoral students with advanced research skills and strategies for conducting a literature review leading to a dissertation. Through this course, students will produce an extensive and integrative literature review related to their dissertation topic. Students will search, retrieve, summarize, and synthesize relevant studies to produce a comprehensive literature review.

Variable hours per semester may be offered (1–3 hours).

This course provides the student with the opportunity to concisely describe a biomedical data science research problem and methodology. Preparation and defense of the dissertation proposal which clearly articulates the problem to be investigated in the field of biomedical data science, literature review, and what would need to be done to complete the dissertation. Student must successfully defend the proposal before a Dissertation Committee which will determine whether the student proceeds to complete the dissertation.

Dissertation and Defense (12 hours)

12 credit hours. Fall. Pre-requisite(s): MSBD 880 Proposal Manuscript and Defense.

Variable hours may be offered.

The completion of Ph.D. dissertation is the culmination of the doctoral degree in this graduate program. The research topic of the dissertation must be related to the Ph.D. in Biomedical Data Science Ph.D. program.