AI for X-Ray
The AI4XRAY project focused on providing second opinions to clinicians studying x-rays, benefiting junior doctors and resource-limited clinics. Key engineering efforts included ingesting and extracting 1.5M DICOM images, creating a custom annotation suite, and managing model training, tracking, and deployment.
AI for X-Ray: My Journey in Engineering Medical Imaging AI
During 2020-2023, I’ve had the opportunity to work on an incredible project, AI for X-Ray at Unumed ApS, where we built an AI pipeline to assist how medical professionals use X-ray images for diagnostic purposes. My role in the project was largely focused on engineering efforts, where I led the data ingestion pipeline, custom annotation tooling, and AI model deployment. In this post, I want to walk you through the challenges and technical solutions we developed, as well as the technology stack we used to power this highly specialized system.
Raw Data Ingestion and Extraction of 1.5 Million DICOM Images
The first major hurdle we faced was handling the sheer scale of medical image data—1.5 million DICOM (.dcm) images, to be precise. These images were stored on a secure remote server, and one of the key concerns was ensuring efficient and secure communication without downloading all of the files at once. For this, we chose Golang, a language known for its high-performance networking capabilities. Golang was crucial in enabling us to implement an SSH-based communication system, which allowed us to maintain a manifest of all DICOM files on the server without transferring them unnecessarily. This gave us the flexibility to adopt a lazy batch-download process, only pulling files when needed for model training or annotation.
For database management, we opted for Postgres, which served as a reliable relational database to store metadata for these DICOM files. Postgres’s ability to handle complex queries and transactions efficiently made it an ideal choice for keeping track of millions of images.
Custom Annotation Suite with Complex Label Hierarchy
Given the nuanced nature of medical imaging, we needed a sophisticated tool to annotate these DICOM images with a complex, hierarchical label structure. This was critical because medical professionals might need to label images across various axes such as anatomical location, type of pathology, and severity. We built a custom annotation suite using React for the frontend and FastAPI for the backend, all tied together with Postgres for data storage.
The React frontend allowed for real-time feedback, enabling medical professionals to quickly label images and navigate the complex label hierarchy without friction. FastAPI was a natural choice for our backend due to its speed and ease of use, especially when handling a large number of asynchronous requests from multiple annotators. The integration with Postgres ensured that we could store and retrieve these hierarchical labels in a performant manner, enabling efficient data curation.
Model Training, Tracking, and Deployment
Once the data was processed and labeled, the next challenge was training machine learning models to perform diagnostic tasks. We chose PyTorch for model development due to its flexibility and strong support for complex neural network architectures, particularly for image-based tasks.
Model tracking and experimentation were handled using MLflow, allowing us to version, compare, and keep track of numerous model iterations and hyperparameter configurations. This was especially important given the experimental nature of AI in medical imaging; each new model iteration had to be rigorously tracked and evaluated for regulatory and accuracy requirements.
For orchestrating our data pipeline and model training processes, we utilized Dagster. It provided a robust framework for scheduling and managing complex workflows, from data ingestion and annotation to model training and validation. Finally, for model deployment, we utilized BentoML to containerize and serve our models in production. BentoML provided an efficient framework for packaging and deploying machine learning models in a scalable way, enabling us to serve predictions to healthcare applications reliably.
The stack
Golang
Used for secure SSH communication and lazy batch download of 1.5M DICOM images. Golang's performance in network I/O and concurrency made it ideal for efficiently maintaining a manifest of DICOM files without overloading the server.
React
A powerful front-end framework used to build the annotation suite, providing a seamless user experience for annotators dealing with complex label hierarchies.
DBT
Reproducible data transformation using DBT integrated with Dagster. Standard tool for SQL as code
Dagster
A modern orchestration framework used to manage the end-to-end data pipeline, ensuring that each step, from data ingestion to model training, was executed in a coordinated and scalable manner.
Pytorch
The deep learning library chosen for model development due to its flexibility and widespread adoption in the AI community, especially for image-based tasks.
MLFlow
Integrated to manage experiment tracking and versioning, allowing us to keep track of various model iterations in a reproducible manner.
Final thoughts
Working on the AI for X-Ray project has been a valuable learning experience, full of both challenges and important milestones. Although the models we worked on have not yet been deployed to clinicians, the foundational engineering work we completed was a crucial step toward that goal. From managing large-scale medical image data to building a custom annotation tool and setting up a reliable pipeline for model development, each piece of the system has brought us closer to creating a solution that could one day assist healthcare professionals. While there's still work to be done, I'm proud of the progress we've made and look forward to seeing how this groundwork can contribute to future advancements in AI for healthcare.