Student Capstone Projects

The capstone is the culminating project for each student in the M.S. Data Science and M.S. Biomedical Data Science programs. The comprehensive, real-life industry-type projects are oriented toward the student’s domain of interest.

Each project includes: formulation of a question to be answered by the data; collection, cleaning and processing of data; choosing and applying a suitable model and/or analytic method to the problem; and communicating the results to a non-technical audience.

Fall 2023 Capstone Projects

Andrea Hannah
M.S. Biomedical Data Science

Applying Machine Learning to Ovarian Cancer Predicting Biomarkers

Identifying biomarkers that predict patient’s risk for Ovarian Cancer is a key factor in the fight to improve survival rates. Ovarian Cancer is a group of diseases that originate in the ovaries, fallopian tubes or peritoneum. Ovarian Cancer is best treated at its earliest stages when it is most treatable. Therefore, early screening and diagnosis is key to successfully treating or curing the disease. This study will use heatmap visualization, pearson correlation coefficient method, scatterplot visualizations, logistic regression, and existing literature to determine the best biomarkers of importance in comparison with elevated CA125 levels importance identified include Age, Menopause, Human Epididymis Protein 4 (HE4), Alkaline Phosphatase (ALP), and Calcium. Preliminary analysis shows variables of interest, except HE4, correspond with elevated CA125 levels and would be biomarkers to play closer attention to in predicting ovarian cancer with machine learning models. To optimize performance of the prediction model, removal of non-biomarkers, Age and Menopause, is necessary. Menopause is a nominal category that could still decrease performance even if its cleaned and converted to numeric form.

Tara Linney
M.S. Data Science

Assessing Water Quality in Schools Around the World

Water quality is an important issue to address in schools around the world. Access to clean and safe water for drinking, hygienic, and waste purposes is essential for the health and well-being of students in our schools. Issues with water quality can pose serious risks to the health of students, potentially leading to illnesses which hinder a student’s educational outcomes. There is an adequate amount of water quality in schools research out there, but there has yet to be a study that compares water quality across countries located on different continents. This research proposal seeks to address this global issue by conducting a systematic assessment of water quality in schools located within a variety of specific regions around the world. Data from the WHO/UNICEF Joint Monitoring Programme for Water Supply, Sanitation and Hygiene (JMP) will be used to find and assess water quality over time.

Aleesa Mann
M.S. Data Science

The Past and Future of Global Human Rights Discourse: Analysis and Predictive Modeling Using UN Roll Call Data

The United Nations (UN) is a global intergovernmental organization that convenes member states on issues of international peace and security.  While its declarations and activities are non-binding, one of its important actions is to adopt, by vote or by consensus, resolutions that reflect the opinion of a majority of member states among the UN’s general assembly or subsidiary bodies. In this way, the UN plays an important and highly visible role in setting the tone for global policy discourse. In this paper, we look at the UN’s historical position on human rights issues via roll call votes and archival data from the UN digital library. Through analysis, we will provide an overview of thematic trends and voting patterns regarding human rights resolutions put before the UN. This information will then be used to develop a predictive model for voting outcomes on future resolutions put before the UN. Understanding these patterns and approximating future voting outcomes can provide critical insights to inform diplomatic and international policy strategies by political actors across the world.

Gina Robinson
M.S. Data Science

DEFIN’D: Examining the Efficacy of Data-Driven Digital Recruitment Strategies for Clinical Trials in Attracting Candidates from Diverse Backgrounds

Clinical trials play a vital role in advancing medical research and enhancing patient outcomes. Nevertheless, the recruitment of diverse participants for these trials remains a significant challenge. The objective of this study is to assess the efficacy of digital recruitment strategies in attracting candidates from diverse backgrounds for clinical trials.  To achieve this, the study will conduct a comprehensive review of existing literature on digital recruitment efforts in clinical trials, with a particular emphasis on diversity considerations. Additionally, a pre-collected dataset consisting of diverse digital recruitment campaigns and their outcomes will be utilized, supplemented with data from the US census.  The analysis will primarily focus on key metrics such as the number of recruited participants, demographic information, recruitment channels and outreach, participant engagement, and participant retention rates. By examining these data points, the study aims to identify trends and patterns pertaining to the effectiveness of digital recruitment strategies in enrolling participants from diverse backgrounds.  Preliminary findings indicate that digital recruitment strategies have the potential to reach a broader audience and attract participants from diverse backgrounds compared to traditional recruitment methods. However, several factors were found to influence the effectiveness of these strategies, including the selection of appropriate digital platforms, targeted messaging, and cultural sensitivities.  By identifying the strengths and limitations of digital recruitment strategies, this study aims to provide valuable insights and recommendations for optimizing future clinical trial recruitment efforts. The findings will inform researchers, pharmaceutical companies, and clinical trial coordinators on the best practices for designing inclusive digital recruitment campaigns that effectively engage candidates from diverse backgrounds.

Cyruss Tsurgeon
M.S. Biomedical Data Science

Cell Census: Unlocking the Power of Artificial Intelligence for Accurate Cell Quantification

Cell counting is a fundamental task in various biological and medical research fields, providing crucial information about cellular populations in a variety of contexts.  The addition of fluorescence microscopy has revolutionized cell imaging by enabling visualization of specific cell types and components with high precision and sensitivity.  Thus, providing advanced techniques for distinguishing individual cells or cellular features, segregating clusters of cells by type, and even labeling distinct cells in culture or tissue section.  However, the manual counting of cells in-situ is a time-consuming and subjective process prone to human error and cannot be performed from microscope images themselves.  To overcome these limitations, researchers have turned to deep learning techniques, leveraging their ability to learn intricate patterns and relationships in large datasets.  In this paper, we present a comprehensive approach for automated cell counting using deep learning algorithms applied to fluorescent microscopy images.  We propose a novel framework that combines convolutional neural networks (CNNs) with advanced image processing techniques and statistical methods, enabling accurate and efficient cell quantification.  Our method utilizes annotated training data to train the network, and subsequently employs it for automated cell counting in unseen microscopy images.  We demonstrate the effectiveness and robustness of our approach through extensive experiments on diverse datasets, showcasing improved performance compared to existing methods.  The proposed deep learning-based automated cell counting technique holds immense potential for accelerating research and advancing our understanding of various biological processes, while also serving as a valuable tool for diagnostic and therapeutic applications in clinical settings.  In addition, we demonstrate the application of our model in various contexts including medical diagnosis, drug discovery, biological research, and environmental monitoring.  With this research, we provide a foundation for future investigations in biomedical image analysis, offering new insights into the applications of deep learning in computer vision for medicine and healthcare.

Ebony Weems, Ph.D.
M.S. Biomedical Data Science

Exploring Health Disparities Among Older Americans (65+) Residing in Food Deserts: A Multifaceted Analysis

The Administration on Aging reports that 1 in 6 people living in the United States is 65 years old or older. This represents 55.7 million people with a 38 percent increase in this population since 2010.  The older adult population represents a vulnerable population due to age-related health concerns and potential limitations in mobility, income, and access to resources. Understanding and addressing health disparities among older Americans is crucial to ensuring their well-being and quality of life. This research study will examine the relationship between food insecurity and health outcomes among older adults in food deserts, including the prevalence of chronic conditions such as obesity, diabetes, hypertension, and cardiovascular disease. A comprehensive content analysis and quantitative analysis will be done to examine the impact of food access on health outcomes, explore socioeconomic factors, and propose interventions. Both the National Health and Nutrition Examination Survey (NHANES) 2017-2018 will be used to analyze data on various health indicators, dietary habits, and nutritional status in older adults. The Food Access Research Atlas (FARA) will also be used to map food access and proximity to grocery stores, farmer’s markets, and other food retail outlets. Results from this research will contribute to the existing knowledge, raise awareness, inform policymakers, and provide insights to improve the health outcomes of older Americans residing in food deserts.

Clarence White, Ph.D.
M.S. Data Science

Evaluating Factors That Contribute to Substance Use and Co-occurring Mental Health Disorders

Substance abuse continues to be heavy social and medical burdens. Many misused drugs can alter a person’s thinking and judgment, leading to health risks, including addiction, impaired driving, and infectious diseases. Substance use disorder (SUD) affects more than 8% of people in the United States at some point in their lives. Prescription opioids, marijuana, psychostimulants like cocaine and methamphetamine and alcohol are the most commonly abused substances in the United States. As the active addiction grows more serious, its social impact on the community expands exponentially in a multitude of ways. Abused drugs act to increase the dopamine in reward regions of the brain. A protein called dopamine transporter helps to clear the dopamine released to restore dopamine homeostasis.  Additionally, individuals who experience a substance use disorder during their lives may also experience a co-occurring mental health disorder or vice versa. 

In order to investigate this in a large and diverse cohort, we will analyze the association between SUDs and other mental health disorders by using All of Us data and risk prediction models.  To this end we will conduct a Longitudinal Cohort Study on participants with SUD and Co-occurring mental health disorders.  This study will evaluate the likelihood that an individual with mental health disorder (depression, anxiety or bipolar) will have a co-occurring substance use disorder (alcohol, opioid, cannabis or cocaine).  The data will be evaluated using logistic regression and an adjusted odds ratio with a 95% confidence interval.  Preliminary analysis of the adjusted odds ratio showed that when patient had severe condition of depression, anxiety, and bipolar, they were more likely to use alcohol, opioid, cannabis, and cocaine, with the exception of decreasing use of cocaine associated with increased anxiety.  When compared with Whites, Black and Hispanic are more likely to use cannabis and cocaine, but less likely to use alcohol and opioid.  Additionally, diabetes, heart failure, and HIV were positively associated with opioid and cannabis use, but they were negatively associated with the increased use of alcohol.