如何开发开源研究数据:以医学图像为例

播放视频

视频文档

如何开发开源研究数据:以医学图像为例

下载 4

白玉兰开源

发布于

853

人观看

#信息技术

支撑人工智能和机器学习快速发展的一大支柱是数据。在学术界、工业界的努力下，目前业界已经有了各式各样的数据集；但考虑到研究问题的广泛性和演进性，总是需要源源不断的标准数据集来支撑新的研究。

杨健程上海交通大学博士生

主要研究医学图像分析、3D计算机视觉和可信机器学习，已发表10余篇（共同）一作顶刊顶会论文，包括Cancer Research，EBioMedicine，CVPR，MICCAI，NeurIPS等。担任10余个学术期刊、会议审稿人，多次在国际AI挑战赛中名列前茅，并作为主要组织者举办了MICCAI 2020肋骨骨折挑战赛。
个人主页：https://jiancheng-yang.com

展开查看详情

1 .How to Develop Open Research Dataset: Examples of Medical Images 如何开发开源研究数据：以医学图像为例杨健程 Jiancheng Yang Shanghai Jiao Tong University Jan 26, 2021

2 . Biography l BEng’11-15, MEng’15-18, PhD’18- @SJTU l Diplôme d'ingénieur (Master)’14-16 @IMT, FR l Visiting research fellow’20-21 @Harvard (remotely) l Incoming visiting researcher’21-22 @EPFL, CH Medical Image Analysis Clinical Science Methodology Data & Benchmark 3D Vision Trustworthy ML Introduction – Reasons – Steps – Examples – Keys

3 . Open Data Makes a Difference Deep learning research is driven by datasets! • Accelerate Research • Benchmarking • Quantitative • Practicality • … Introduction – Reasons – Steps – Examples – Keys

4 . Contents • Introduction: Open Data Makes a Difference • Reasons Why You Should Develop New Datasets • Steps to Develop New Datasets • Examples of Medical Images • RibFrac Dataset • MICCAI 2020 RibFrac Challenge • MedMNIST Dataset • Keys to the Success Introduction – Reasons – Steps – Examples – Keys

5 . Why You Should Develop New Datasets • Asking new research questions • No existing solution, how about developing a new one? • Improving your own applications • Extending existing datasets for your own purpose • Benchmarking existing methods • Which method is best-performing? • Building influence to advance your career • Datasets and benchmarks are generally highly-cited • Understanding the pitfalls of existing materials and methods • Are existing methods good enough for real-world applications? • Are existing datasets enough for different aspects of model performance (e.g., subtle details, domain generalization, model calibration, …) Introduction – Reasons – Steps – Examples – Keys

6 . Steps to Develop New Datasets I. Finding Research Questions V. Benchma- II. Data rking & Collection Evaluation IV. Quality III. Control Annotation Introduction – Reasons – Steps – Examples – Keys

7 . Examples of Medical Images RibFrac Dataset MICCAI 2020 RibFrac Challenge MedMNIST Dataset Introduction – Reasons – Steps – Examples – Keys

8 . Deep-Learning-Assisted Detection and Segmentation of Rib Fractures from CT Scans: Development and Validation of FracNet Liang Jin*, Jiancheng Yang*, Kaiming Kuang, Bingbing Ni, et al. EBioMedicine 2020 https://m3dv.github.io/FracNet/

9 . RibFrac Dataset Introduction – Reasons – Steps – Examples – Keys

10 . Network Architecture of FracNet Introduction – Reasons – Steps – Examples – Keys

11 . Model Performance Introduction – Reasons – Steps – Examples – Keys

12 . Human-computer collaboration Introduction – Reasons – Steps – Examples – Keys

13 .

14 .

15 .

16 .

17 .

18 .

19 .

20 .

21 .

22 .

23 .MedMNIST Classification Decathlon: A Lightweight AutoML Benchmark for Medical Image Analysis Jiancheng Yang, Rui Shi, Bingbing Ni ISBI 2021 https://medmnist.github.io/

24 . Motivation MedMNIST Classification Decathlon Educational. Standardized. Diverse. Lightweight. Massive Data Formats: DICOM, NII, nrrd, … Massive Data Modalities: X-Ray, CT, OCT, DR, … Various Licenses Various Resolution 2D or 3D Non-Standardized Pre-Processing Various Data Sizes Introduction – Reasons – Steps – Examples – Keys

25 . MedMNIST Overview Introduction – Reasons – Steps – Examples – Keys

26 . MedMNIST Overview Tasks (# Name Data Modality # Training # Validation # Test Classes/Labels) PathMNIST Pathology Multi-Class (9) 89,996 10,004 7,180 Multi-Label (14) ChestMNIST Chest X-ray 78,468 11,219 22,433 Binary-Class (2) DermaMNIST Dermatoscope Multi-Class (7) 7,007 1,003 2,005 OCTMNIST OCT Multi-Class (4) 97,477 10,832 1,000 PneumoniaMNIST Chest X-ray Binary-Class (2) 4,708 524 624 Ordinal Regression RetinaMNIST Fundus Camera 1,080 120 400 (5) BreastMNIST Breast Ultrasound Binary-Class (2) 546 78 156 OrganMNIST_Axial Abdominal CT Multi-Class (11) 34,581 6,491 17,778 OragnMNIST_Coro Abdominal CT Multi-Class (11) 13,000 2,392 8,268 nal OrganMNIST_Sagitt Abdominal CT Multi-Class (11) 13,940 2,452 8,829 al Introduction – Reasons – Steps – Examples – Keys

27 . Benchmarking AutoML Algorithms Standard ResNets with Early-Stopping Strategy AutoML Tools Introduction – Reasons – Steps – Examples – Keys

28 . Benchmarking AutoML Algorithms Introduction – Reasons – Steps – Examples – Keys

29 . Keys to the Success I. Finding Research Questions 5 Steps I. Finding Research Questions II. Data Collection V. Benchma- II. Data III. Annotation rking & Collection Evaluation IV. Quality Control V. Benchmarking & Evaluation IV. Quality III. Control Annotation Introduction – Reasons – Steps – Examples – Keys

4点赞

0收藏

4下载

如何开发开源研究数据:以医学图像为例

杨健程 上海交通大学博士生

杨健程上海交通大学博士生