- 快召唤伙伴们来围观吧
- 微博 QQ QQ空间 贴吧
- 视频嵌入链接 文档嵌入链接
- 复制
- 微信扫一扫分享
- 已成功复制到剪贴板
如何开发开源研究数据:以医学图像为例
支撑人工智能和机器学习快速发展的一大支柱是数据。在学术界、工业界的努力下,目前业界已经有了各式各样的数据集;但考虑到研究问题的广泛性和演进性,总是需要源源不断的标准数据集来支撑新的研究。
杨健程 上海交通大学博士生
主要研究医学图像分析、3D计算机视觉和可信机器学习,已发表10余篇(共同)一作顶刊顶会论文,包括Cancer Research,EBioMedicine,CVPR,MICCAI,NeurIPS等。担任10余个学术期刊、会议审稿人,多次在国际AI挑战赛中名列前茅,并作为主要组织者举办了MICCAI 2020肋骨骨折挑战赛。
个人主页:https://jiancheng-yang.com
展开查看详情
1 .How to Develop Open Research Dataset: Examples of Medical Images 如何开发开源研究数据:以医学图像为例 杨健程 Jiancheng Yang Shanghai Jiao Tong University Jan 26, 2021
2 . Biography l BEng’11-15, MEng’15-18, PhD’18- @SJTU l Diplôme d'ingénieur (Master)’14-16 @IMT, FR l Visiting research fellow’20-21 @Harvard (remotely) l Incoming visiting researcher’21-22 @EPFL, CH Medical Image Analysis Clinical Science Methodology Data & Benchmark 3D Vision Trustworthy ML Introduction – Reasons – Steps – Examples – Keys
3 . Open Data Makes a Difference Deep learning research is driven by datasets! • Accelerate Research • Benchmarking • Quantitative • Practicality • … Introduction – Reasons – Steps – Examples – Keys
4 . Contents • Introduction: Open Data Makes a Difference • Reasons Why You Should Develop New Datasets • Steps to Develop New Datasets • Examples of Medical Images • RibFrac Dataset • MICCAI 2020 RibFrac Challenge • MedMNIST Dataset • Keys to the Success Introduction – Reasons – Steps – Examples – Keys
5 . Why You Should Develop New Datasets • Asking new research questions • No existing solution, how about developing a new one? • Improving your own applications • Extending existing datasets for your own purpose • Benchmarking existing methods • Which method is best-performing? • Building influence to advance your career • Datasets and benchmarks are generally highly-cited • Understanding the pitfalls of existing materials and methods • Are existing methods good enough for real-world applications? • Are existing datasets enough for different aspects of model performance (e.g., subtle details, domain generalization, model calibration, …) Introduction – Reasons – Steps – Examples – Keys
6 . Steps to Develop New Datasets I. Finding Research Questions V. Benchma- II. Data rking & Collection Evaluation IV. Quality III. Control Annotation Introduction – Reasons – Steps – Examples – Keys
7 . Examples of Medical Images RibFrac Dataset MICCAI 2020 RibFrac Challenge MedMNIST Dataset Introduction – Reasons – Steps – Examples – Keys
8 . Deep-Learning-Assisted Detection and Segmentation of Rib Fractures from CT Scans: Development and Validation of FracNet Liang Jin*, Jiancheng Yang*, Kaiming Kuang, Bingbing Ni, et al. EBioMedicine 2020 https://m3dv.github.io/FracNet/
9 . RibFrac Dataset Introduction – Reasons – Steps – Examples – Keys
10 . Network Architecture of FracNet Introduction – Reasons – Steps – Examples – Keys
11 . Model Performance Introduction – Reasons – Steps – Examples – Keys
12 . Human-computer collaboration Introduction – Reasons – Steps – Examples – Keys
13 .
14 .
15 .
16 .
17 .
18 .
19 .
20 .
21 .
22 .
23 .MedMNIST Classification Decathlon: A Lightweight AutoML Benchmark for Medical Image Analysis Jiancheng Yang, Rui Shi, Bingbing Ni ISBI 2021 https://medmnist.github.io/
24 . Motivation MedMNIST Classification Decathlon Educational. Standardized. Diverse. Lightweight. Massive Data Formats: DICOM, NII, nrrd, … Massive Data Modalities: X-Ray, CT, OCT, DR, … Various Licenses Various Resolution 2D or 3D Non-Standardized Pre-Processing Various Data Sizes Introduction – Reasons – Steps – Examples – Keys
25 . MedMNIST Overview Introduction – Reasons – Steps – Examples – Keys
26 . MedMNIST Overview Tasks (# Name Data Modality # Training # Validation # Test Classes/Labels) PathMNIST Pathology Multi-Class (9) 89,996 10,004 7,180 Multi-Label (14) ChestMNIST Chest X-ray 78,468 11,219 22,433 Binary-Class (2) DermaMNIST Dermatoscope Multi-Class (7) 7,007 1,003 2,005 OCTMNIST OCT Multi-Class (4) 97,477 10,832 1,000 PneumoniaMNIST Chest X-ray Binary-Class (2) 4,708 524 624 Ordinal Regression RetinaMNIST Fundus Camera 1,080 120 400 (5) BreastMNIST Breast Ultrasound Binary-Class (2) 546 78 156 OrganMNIST_Axial Abdominal CT Multi-Class (11) 34,581 6,491 17,778 OragnMNIST_Coro Abdominal CT Multi-Class (11) 13,000 2,392 8,268 nal OrganMNIST_Sagitt Abdominal CT Multi-Class (11) 13,940 2,452 8,829 al Introduction – Reasons – Steps – Examples – Keys
27 . Benchmarking AutoML Algorithms Standard ResNets with Early-Stopping Strategy AutoML Tools Introduction – Reasons – Steps – Examples – Keys
28 . Benchmarking AutoML Algorithms Introduction – Reasons – Steps – Examples – Keys
29 . Keys to the Success I. Finding Research Questions 5 Steps I. Finding Research Questions II. Data Collection V. Benchma- II. Data III. Annotation rking & Collection Evaluation IV. Quality Control V. Benchmarking & Evaluation IV. Quality III. Control Annotation Introduction – Reasons – Steps – Examples – Keys