Student Capstone Projects

The capstone is the culminating project for each student in the M.S. Data Science and M.S. Biomedical Data Science programs. The comprehensive, real-life industry-type projects are oriented toward the student’s domain of interest.

Each project includes: formulation of a question to be answered by the data; collection, cleaning and processing of data; choosing and applying a suitable model and/or analytic method to the problem; and communicating the results to a non-technical audience.

Spring 2024 Capstone Projects

Degree
Semester
1-5 of 8 results
  • Saul Ashley

    M.S. Biomedical Data Science

    Development of Anxiety in Breast Cancer Patients Undergoing Therapy: A Preliminary Study Using NIH All of Us Data

    Sources report that, 1 in 8 women in the United States will be diagnosed with breast cancer in her lifetime. Furthermore, the long-term mental health outcomes of radiation therapy and chemotherapy, a common practice in the field of oncology, for cancer patients are a significant concern. Research shows that the global prevalence of anxiety among cancer patients is 17%–69%, and the global prevalence of anxiety among the general population is 31.9%, implying that the mental health outcome may be more prevalent among cancer patients. More specifically, Living Beyond Breast Cancer (LBBC) states more than 40% of people diagnosed with breast cancer experience anxiety. Even survivors may face long-term psychological effects such as anxiety, depression, and a fear of cancer recurrence, which can impact their day-to-day life. Finally, the American Cancer Society (ACS) states some cancer treatments can cause cognitive effects in current patients and survivors, such as “chemo brain” or brain fog, which may lead to difficulties with concentration, memory, and multitasking. Using NIH All of Us data, This study seeks to further highlight and examine the associations between cancer therapy and procedures and the mental health outcome of anxiety as it relates to breast cancer patients using various regression techniques.

  • Chris Brown

    M.S. Data Science

    Antiphospholipid Syndrome: Unraveling Adverse Outcomes in Pregnancy

    Antiphospholipid syndrome (APS) refers to the clinical association between antiphospholipid antibodies and a hypercoagulable state, which increases the risk of blood clot formation within blood vessels. APS is more prevalent in women than in men. Research shows that women with APS face an elevated risk of adverse pregnancy outcomes, particularly during the fetal period (ten or more weeks of gestation). These outcomes include preeclampsia, characterized by high blood pressure and proteinuria (excess protein in urine), recurrent early pregnancy loss, fetal demise, and intrauterine growth restriction. APS-related pregnancy losses tend to occur later in pregnancy compared to sporadic or recurrent miscarriages, which typically happen earlier in the pre-embryonic or embryonic period. Factors such as placental insufficiency, hypertensive disorders of pregnancy, thrombophilia, and underlying autoimmune conditions play a role. This research aims to study the complex interplay of these factors to improve outcomes for affected women. Notably, APS is more prevalent among underserved communities.

  • Charleston Lee

    M.S. Data Science

    Investigating Criminal References in Rap Lyrics: A Data Science Approach

    Rap music has long served as a platform for artists to express their realities, often exploring themes of crime and street life. This study employs data science methodologies to identify and analyze potential references to criminal activities within rap lyrics. Leveraging natural language processing techniques, this study analyzes a curated dataset of rap lyrics spanning the years 2000-2023, sourced from a diverse range of music labels, certifications, and artists, encompassing modern slang associated with criminal behavior while ensuring ethical data collection practices. Through sentiment analysis, topic modeling, and named entity recognition, the aim of this study is to quantify and contextualize the prevalence of criminal references in rap songs during this time period. Additionally, this study investigates the relationship between lyrical content and commercial success by examining the impact of identified themes on the performance of songs, measured through RIAA certifications from 2000-2023. Furthermore, based on these linguistic insights, a chatbot is developed that is equipped with a comprehensive understanding of contemporary slang terminologies related to crime. This chatbot enables interactive engagement and discourse on pertinent subjects within the rap genre, facilitating broader discussions about cultural representations in music and industry influences. This interdisciplinary approach not only advances data science methodologies but also provides valuable insights into the portrayal of societal realities within artistic expression and its reception in the music industry across different labels and time periods.

  • Lexius Lynch

    M.S. Data Science

    Exploratory Analysis of Alzheimer’s Disease: Unraveling the complexities of single cell RNA sequence Data

    Single-cell biology is a field that focuses on understanding human health and diseases at the cellular level, with a particular emphasis on precision medicine. Identifying specific cell types in major brain disorders is a critical area of research. However, the complex cellular architecture of the brain, which consists of a diverse set of cell types, makes it challenging to determine the primary pathological cell type for a particular disease. Recent studies have used single-cell RNA and expression-weighted cell type enrichment to identify specific neuronal cell types associated with brain disorders, such as Alzheimer’s disease. Sc-RNA is a powerful technology that allows the analysis of a large number of individual cells. These studies have revealed statistically significant enrichment of certain neuronal cell types in the context of these disorders, providing valuable insights into the differentially expressed genes as well as cell signaling pathways critical to the understanding of variants associated with brain diseases.

  • Kristen Oguno

    M.S. Data Science

    Building a Machine Learning Model to Evaluate Risk Factors Associated with Poly-cystic Ovarian Syndrome

    Poly-cystic Ovarian Syndrome (PCOS) is a common, yet often undiagnosed, health condition affecting 8-13% of women globally. Its effects are primarily centered around hormonal imbalances and metabolism causing problems with the ovaries. The exact cause remains unknown, but PCOS is associated with an increased risk of diabetes, heart disease, and other complications. Early diagnosis is crucial for effective management and prevention of these issues. Leveraging machine learning (ML) and data science, our study focuses on developing a robust diagnostic model for PCOS, excluding the need for ultrasonography. Statistical analysis models such as Recursive Feature Elimination (RFE), Logistic Regression, and Random Forest were used to identify key predictors of PCOS diagnosis. Notably, results revealed women with less than 5 cycle days per month were more likely to develop PCOS, contradicting the assumption that PCOS causes excessive bleeding. Cystic acne, skin discoloration, and excess hair growth were identified as notable precursors to PCOS. Anti-Müllerian hormone was a significant biomarker for PCOS development. To address disparities in access to diagnostic tools, we propose integrating Anti-Müllerian hormone testing into routine blood work for all women to enable earlier PCOS detection. Implementing these recommendations could revolutionize PCOS management by facilitating early intervention and mitigating downstream health complications. Further research is needed to fully understand the mechanisms underlying PCOS development.