Kristen Oguno

M.S. Data Science

Building a Machine Learning Model to Evaluate Risk Factors Associated with Poly-cystic Ovarian Syndrome

Poly-cystic Ovarian Syndrome (PCOS) is a common, yet often undiagnosed, health condition affecting 8-13% of women globally. Its effects are primarily centered around hormonal imbalances and metabolism causing problems with the ovaries. The exact cause remains unknown, but PCOS is associated with an increased risk of diabetes, heart disease, and other complications. Early diagnosis is crucial for effective management and prevention of these issues. Leveraging machine learning (ML) and data science, our study focuses on developing a robust diagnostic model for PCOS, excluding the need for ultrasonography. Statistical analysis models such as Recursive Feature Elimination (RFE), Logistic Regression, and Random Forest were used to identify key predictors of PCOS diagnosis. Notably, results revealed women with less than 5 cycle days per month were more likely to develop PCOS, contradicting the assumption that PCOS causes excessive bleeding. Cystic acne, skin discoloration, and excess hair growth were identified as notable precursors to PCOS. Anti-Müllerian hormone was a significant biomarker for PCOS development. To address disparities in access to diagnostic tools, we propose integrating Anti-Müllerian hormone testing into routine blood work for all women to enable earlier PCOS detection. Implementing these recommendations could revolutionize PCOS management by facilitating early intervention and mitigating downstream health complications. Further research is needed to fully understand the mechanisms underlying PCOS development.