The landscape of biomedical research is experiencing a fundamental shift, transitioning from hypothesis-driven approaches to data-driven discoveries fueled by the large and complex datasets generated through high-throughput technologies. Effectively analyzing and extracting meaningful insights from these datasets requires researchers to be proficient in advanced computational methods such as Artificial Intelligence (AI) and Machine Learning (ML). Furthermore, cloud computing offers flexible, cost-effective, and powerful solutions for data storage, analysis, and collaboration without the infrastructure burden of individual institutions. However, unlocking the full potential of cloud-based AI/ML in biomedical research hinges on equipping researchers with the necessary skills and knowledge. Recognizing this gap, the National Institute of General Medical Sciences (NIGMS) launched the NIGMS Sandbox initiative, aiming to create a repository of cloud-based learning modules for diverse biomedical data science topics. The proposed module, "Practical Data-Centric AI/ML for Biomedical Researchers" aligns perfectly with the NIGMS’s vision to expand the skilled workforce capable of harnessing the power of cloud computing and AI/ML. The module tackles the crucial challenge of upskilling biomedical researchers by equipping researchers with these skills to foster innovation, accelerate scientific discovery. By leveraging the NIGMS Sandbox and cloud platform, the module ensures broad accessibility. This democratizes access to cutting-edge knowledge, empowering researchers regardless of their institutional resources and fostering a more inclusive research landscape. The curriculum prioritizes practical, data-centric techniques, ensuring researchers can immediately apply their acquired knowledge to real-world problems. We pay special attention to critical upstream tasks like data preparation, cleaning, etc. that are the key to successful AI/ML applications. We aim to train the participants with the competencies and skills needed to make biomedical data FAIR (Findability, Accessibility, Interoperability, and Reusability) and AI/ML-ready. Our goal is to bring awareness and practices to our trainees so that their data are collected and prepared to support AI/ML applications, with attention to
- use of data and metadata standards to make data FAIR,
- presentation and labeling of data, including noise, uncertainty, and missing data issues, and
- ethical and social considerations and collaborative team science.
The module also utilizes a blend of engaging instructional videos, interactive demonstrations, hands-on exercises to facilitate self-directed learning and knowledge retention. This innovative approach caters to diverse learning styles and maximizes learning outcomes, ensuring a more engaging and effective learning experience for all participants.
After successfully completing this module, learners will be able to:
- Identify core concepts, ethical aspects and applications of AI/ML in biomedical research
- Adapt existing Python scripts for a machine learning task
- Prepare data that is machine learning ready, build models, and deploy these in real-world setting
The cloud-based sandbox module is supported by the NIH National Institute for General Medical Sciences (3T32GM142603-03S1).
Submodule 1 - Introduction
Learn core concepts, diverse applications, introductory algorithms, ethical considerations, and data challenges.
- Lecture
- Live Demo
- Exercise
Submodule 2 - FAIR Data Principles, Data-Centric AI/ML, and Responsible AI/ML
Learn FAIR principles for responsible data management, evaluate data quality and AI/ML readiness, and understand fairness, transparency, and accountability in AI/ML development and deployment.
- Lecture
- Live Demo
- Exercise
Submodule 3 - Data Preparation
Learn practical data cleaning techniques, as well as feature engineering, feature scaling, and feature selection techniques.
- Lecture
- Live Demo
- Data Cleaning
- Feature Engineering
- Feature Scaling
- Feature Selection
- Exercise
Submodule 4 - Model Building, Evaluation, Interpretation, and Deployment
Explore different AI/ML model types, and model evaluation techniques, delve into interpretability methods, and learn best practices for model deployment.
-
Lecture
-
Live Demo
-
Exercise
Submodule 5 - AI/ML for Biomedical Applications
Show different types of AI/ML algorithms and their suitability for biomedical data. Explore real-world examples of AI/ML in various areas of biomedicine.