Student Capstone Projects

The capstone is the culminating project for each student in the M.S. Data Science and M.S. Biomedical Data Science programs. The comprehensive, real-life industry-type projects are oriented toward the student’s domain of interest.

Each project includes: formulation of a question to be answered by the data; collection, cleaning and processing of data; choosing and applying a suitable model and/or analytic method to the problem; and communicating the results to a non-technical audience.

Spring 2024 Capstone Projects

1-5 of 18 results
  • Saul Ashley

    M.S. Biomedical Data Science

    Development of Anxiety in Breast Cancer Patients Undergoing Therapy: A Preliminary Study Using NIH All of Us Data

    Sources report that, 1 in 8 women in the United States will be diagnosed with breast cancer in her lifetime. Furthermore, the long-term mental health outcomes of radiation therapy and chemotherapy, a common practice in the field of oncology, for cancer patients are a significant concern. Research shows that the global prevalence of anxiety among cancer patients is 17%–69%, and the global prevalence of anxiety among the general population is 31.9%, implying that the mental health outcome may be more prevalent among cancer patients. More specifically, Living Beyond Breast Cancer (LBBC) states more than 40% of people diagnosed with breast cancer experience anxiety. Even survivors may face long-term psychological effects such as anxiety, depression, and a fear of cancer recurrence, which can impact their day-to-day life. Finally, the American Cancer Society (ACS) states some cancer treatments can cause cognitive effects in current patients and survivors, such as “chemo brain” or brain fog, which may lead to difficulties with concentration, memory, and multitasking. Using NIH All of Us data, This study seeks to further highlight and examine the associations between cancer therapy and procedures and the mental health outcome of anxiety as it relates to breast cancer patients using various regression techniques.

  • Chris Brown

    M.S. Data Science

    Antiphospholipid Syndrome: Unraveling Adverse Outcomes in Pregnancy

    Antiphospholipid syndrome (APS) refers to the clinical association between antiphospholipid antibodies and a hypercoagulable state, which increases the risk of blood clot formation within blood vessels. APS is more prevalent in women than in men. Research shows that women with APS face an elevated risk of adverse pregnancy outcomes, particularly during the fetal period (ten or more weeks of gestation). These outcomes include preeclampsia, characterized by high blood pressure and proteinuria (excess protein in urine), recurrent early pregnancy loss, fetal demise, and intrauterine growth restriction. APS-related pregnancy losses tend to occur later in pregnancy compared to sporadic or recurrent miscarriages, which typically happen earlier in the pre-embryonic or embryonic period. Factors such as placental insufficiency, hypertensive disorders of pregnancy, thrombophilia, and underlying autoimmune conditions play a role. This research aims to study the complex interplay of these factors to improve outcomes for affected women. Notably, APS is more prevalent among underserved communities.

  • Brittany City

    M.S. Data Science

    Brittany City
    A Technology Career Recommendation System Based on Personality, Skills, and Interests

    With the rapid expansion of technology, there is an increased demand for individuals with technology skills, leading to an interest in technology careers. Despite the abundance of technology job opportunities, many individuals struggle to identify which technology career field is best suited for their skills, interests, and personality. This lack of clarity can lead to high job turnover rates, low job satisfaction, and lack of productivity in the workplace. The research proposal aims to develop a career recommendation system based on key personalities, skills, and interests using machine learning algorithms to suggest viable technology career decisions. The research will analyze and develop a predictive model and recommendation system based on the personalities, technical skills, and interests collected through a survey.

  • Andrea Hannah

    M.S. Biomedical Data Science

    Applying Machine Learning to Ovarian Cancer Predicting Biomarkers

    Identifying biomarkers that predict patient’s risk for Ovarian Cancer is a key factor in the fight to improve survival rates. Ovarian Cancer is a group of diseases that originate in the ovaries, fallopian tubes or peritoneum. Ovarian Cancer is best treated at its earliest stages when it is most treatable. Therefore, early screening and diagnosis is key to successfully treating or curing the disease. This study will use heatmap visualization, pearson correlation coefficient method, scatterplot visualizations, logistic regression, and existing literature to determine the best biomarkers of importance in comparison with elevated CA125 levels importance identified include Age, Menopause, Human Epididymis Protein 4 (HE4), Alkaline Phosphatase (ALP), and Calcium. Preliminary analysis shows variables of interest, except HE4, correspond with elevated CA125 levels and would be biomarkers to play closer attention to in predicting ovarian cancer with machine learning models. To optimize performance of the prediction model, removal of non-biomarkers, Age and Menopause, is necessary. Menopause is a nominal category that could still decrease performance even if its cleaned and converted to numeric form.

  • Charleston Lee

    M.S. Data Science

    Investigating Criminal References in Rap Lyrics: A Data Science Approach

    Rap music has long served as a platform for artists to express their realities, often exploring themes of crime and street life. This study employs data science methodologies to identify and analyze potential references to criminal activities within rap lyrics. Leveraging natural language processing techniques, this study analyzes a curated dataset of rap lyrics spanning the years 2000-2023, sourced from a diverse range of music labels, certifications, and artists, encompassing modern slang associated with criminal behavior while ensuring ethical data collection practices. Through sentiment analysis, topic modeling, and named entity recognition, the aim of this study is to quantify and contextualize the prevalence of criminal references in rap songs during this time period. Additionally, this study investigates the relationship between lyrical content and commercial success by examining the impact of identified themes on the performance of songs, measured through RIAA certifications from 2000-2023. Furthermore, based on these linguistic insights, a chatbot is developed that is equipped with a comprehensive understanding of contemporary slang terminologies related to crime. This chatbot enables interactive engagement and discourse on pertinent subjects within the rap genre, facilitating broader discussions about cultural representations in music and industry influences. This interdisciplinary approach not only advances data science methodologies but also provides valuable insights into the portrayal of societal realities within artistic expression and its reception in the music industry across different labels and time periods.