Practical Data-Centric AI/ML for Biomedical Researchers

Introduction
Learning Objectives
Sponsors
License
Schedule

Introduction

The landscape of biomedical research is experiencing a fundamental shift, transitioning from hypothesis-driven approaches to data-driven discoveries fueled by the large and complex datasets generated through high-throughput technologies. Effectively analyzing and extracting meaningful insights from these datasets requires researchers to be proficient in advanced computational methods such as Artificial Intelligence (AI) and Machine Learning (ML). Furthermore, cloud computing offers flexible, cost-effective, and powerful solutions for data storage, analysis, and collaboration without the infrastructure burden of individual institutions. However, unlocking the full potential of cloud-based AI/ML in biomedical research hinges on equipping researchers with the necessary skills and knowledge. Recognizing this gap, the National Institute of General Medical Sciences (NIGMS) launched the NIGMS Sandbox initiative, aiming to create a repository of cloud-based learning modules for diverse biomedical data science topics. The proposed module, "Practical Data-Centric AI/ML for Biomedical Researchers" aligns perfectly with the NIGMS’s vision to expand the skilled workforce capable of harnessing the power of cloud computing and AI/ML. The module tackles the crucial challenge of upskilling biomedical researchers by equipping researchers with these skills to foster innovation, accelerate scientific discovery. By leveraging the NIGMS Sandbox and cloud platform, the module ensures broad accessibility. This democratizes access to cutting-edge knowledge, empowering researchers regardless of their institutional resources and fostering a more inclusive research landscape. The curriculum prioritizes practical, data-centric techniques, ensuring researchers can immediately apply their acquired knowledge to real-world problems. We pay special attention to critical upstream tasks like data preparation, cleaning, etc. that are the key to successful AI/ML applications. We aim to train the participants with the competencies and skills needed to make biomedical data FAIR (Findability, Accessibility, Interoperability, and Reusability) and AI/ML-ready. Our goal is to bring awareness and practices to our trainees so that their data are collected and prepared to support AI/ML applications, with attention to

use of data and metadata standards to make data FAIR,
presentation and labeling of data, including noise, uncertainty, and missing data issues, and
ethical and social considerations and collaborative team science.

The module also utilizes a blend of engaging instructional videos, interactive demonstrations, hands-on exercises to facilitate self-directed learning and knowledge retention. This innovative approach caters to diverse learning styles and maximizes learning outcomes, ensuring a more engaging and effective learning experience for all participants.

Learning Objectives

After successfully completing this module, learners will be able to:

Identify core concepts, ethical aspects and applications of AI/ML in biomedical research
Adapt existing Python scripts for a machine learning task
Prepare data that is machine learning ready, build models, and deploy these in real-world setting

Submodules

Submodule 1 - Introduction

Learn core concepts, diverse applications, introductory algorithms, ethical considerations, and data challenges.

Lecture
- Introduction to AI/ML (Quiz)
Live Demo
- Introduction to NumPy
- Introduction to Pandas
Exercise
- NumPy Exercise (Solution)
- Pandas Exercise (Solution)

Submodule 2 - FAIR Data Principles, Data-Centric AI/ML, and Responsible AI/ML

Learn FAIR principles for responsible data management, evaluate data quality and AI/ML readiness, and understand fairness, transparency, and accountability in AI/ML development and deployment.

Submodule 3 - Data Preparation

Learn practical data cleaning techniques, as well as feature engineering, feature scaling, and feature selection techniques.

Lecture
- Data Collection and Data Preparation (Quiz)
- Feature Engineering, Scaling and Selection (Quiz)
Live Demo
- Data Cleaning
- Feature Engineering
- Feature Scaling
  - Numerical Data
  - Data With Outliers
- Feature Selection
Exercise

Submodule 4 - Model Building, Evaluation, Interpretation, and Deployment

Explore different AI/ML model types, and model evaluation techniques, delve into interpretability methods, and learn best practices for model deployment.

Lecture
- AI/ML Models and Model Evaluation (Quiz)
- Model Tuning, Interpretation and Deployment (Quiz)
Live Demo
- Model Building and Evaluation
- Model Tuning, Interpretation, Deployment
Exercise
- Exploratory Analysis and Model Prediction with Biomedical Data (Solution)

Submodule 5 - AI/ML for Biomedical Applications

Show different types of AI/ML algorithms and their suitability for biomedical data. Explore real-world examples of AI/ML in various areas of biomedicine.

Lecture
- AI/ML Applications in Biomedicine (Quiz)
- Introduction to Deep Learning (Quiz)
Live Demo
- Pfam protein sequence classification using Tensorflow and Keras
Exercise
- Protein 3D structure prediction using AlphaFold2
Course Summary
Course Survey

Name		Name	Last commit message	Last commit date
Latest commit History 105 Commits
Submodule_1		Submodule_1
Submodule_2		Submodule_2
Submodule_3		Submodule_3
Submodule_4		Submodule_4
Submodule_5		Submodule_5
data		data
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Practical Data-Centric AI/ML for Biomedical Researchers

Introduction

Learning Objectives

Sponsors

Submodules

About

Releases

Packages

Languages

udel-cbcb/AI_ML_Sandbox_Module

Folders and files

Latest commit

History

Repository files navigation

Practical Data-Centric AI/ML for Biomedical Researchers

Introduction

Learning Objectives

Sponsors

Submodules

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages