Taylor machine learning project identifies diseases using X-ray images


Data science methods like machine learning have become important to helping clinician’s diagnose disease. Shara Taylor, an M.S. Data Science candidate, recently gained hands-on experience using X-ray images to distinguish COVID-19 from other illnesses affecting the lungs.
The project was part of the MSDS 550 Computational Machine Learning course taught Dr. Bishnu Sarker, assistant professor of computer science and data science.
Using 2,000 images from the COVID-19 Radiography Database, Taylor built a Convolutional Neural Network (CNN) that can distinguish between COVID-19, viral pneumonia, lung infections and normal lungs.
“It was really an interesting project,” says Taylor. “It is amazing that you can build an algorithm that can work at such a granular level and diagnose a disease.”
The project called for a supervised machine learning approach. She coded in Python, using the Matplotlib and NumPy libraries, and used Torchvision to program the machine learning.
“I set up a training and a test folder,” says Taylor. “The X-ray images in the test folder were labeled as either COVID-19, viral pneumonia, lung infections, or normal lungs.”

Taylor then trained the algorithm to identify the COVID-19 images in the training folder.
“I compared those images in the training folder to images in the test folder. I actually used different images in the test folder, to see if it would recognize images it had not seen before and was able to do so with 86 percent accuracy,” says Taylor.
The project was her first experience with machine learning.
“I was surprised to learn that machine learning is just a few lines of code,” says Taylor. “Everything else is preparing the data and setting the parameters.”
Upon starting the M.S. Data Science program, she was interested in both database management and data analysis. Her course projects have not only given Taylor practical experience, but have helped her hone her data science interests.
“I am now much more interested in the actual pre-processing, data wrangling, cleaning and statistical analysis,” says Taylor. “The projects with that work have been the most interesting to me.”