Using Deep Learning on Apache Spark to Diagnose Thoracic Pathology from Chest

Overview and extended description: AI is expected to be the engine of technological advancements in the healthcare industry, especially in the areas of radiology and image processing. The purpose of this session is to demonstrate how we can build a AI-based Radiologist system using Apache Spark and Analytics Zoo to detect pneumonia and other diseases from chest x-ray images. The dataset, released by the NIH, contains around 110,00 X-ray images of around 30,000 unique patients, annotated with up to 14 different thoracic pathology labels. Stanford University developed a state-of-the-art model using CNN and exceeds average radiologist performance on the F1 metric. This talk focuses on how we can build a multi-label image classification model in a distributed Apache Spark infrastructure, and demonstrate how to build complex image transformations and deep learning pipelines using BigDL and Analytics Zoo with scalability and ease of use. Some practical image pre-processing procedures and evaluation metrics are introduced. We will also discuss runtime configuration, near-linear scalability for training and model serving, and other general performance topics.

1.WIFI SSID:SparkAISummit | Password: UnifiedAnalytics

2.Using Deep Learning on Apache Spark to diagnose thoracic pathology from chest X-rays Yuhao Yang, Intel Bala Chandrasekaran, Dell EMC #UnifiedAnalytics #SparkAISummit

3.Agenda • Problem statement • Analytics Zoo • Chest Xray Model architecture • Results and Observations #UnifiedAnalytics #SparkAISummit 3

4.Predicting diseases in Chest X-rays • Develop a ML pipeline in Apache Spark and train a deep learning model to predict disease in Chest X-rays ▪ An integrated ML pipeline with Analytics Zoo on Apache Spark ▪ Demonstrate feature engineering and transfer learning APIs in Analytics Zoo ▪ Use Spark worker nodes to train at scale • CheXNet – Developed at Stanford University, CheXNet is a model for identifying thoracic pathologies from the NIH ChestXray14 dataset – #UnifiedAnalytics #SparkAISummit 4

5.X-ray dataset from NIH 70000 60000 • 112,120 images from over 30000 patients 50000 Frequency • Multi label (14 diseases) 40000 00000013_005.png,Emphysema | Infiltration | Pleural_Thickening 30000 00000013_006.png,Effusion|Infiltration 00000013_007.png,Infiltration 20000 00000013_008.png,No Finding 10000 • Unbalanced datasets 0 Nodule No Findings Atelectasis Mass Hernia Fibrosis Cardiomegaly Effusion Infiltration Edema Emphysema Pneumonia Pleural Thickening Consolidation Pneumothorax • Close to 50% of the images have ‘No findings’ • Infiltration get the most positive samples (19894) and Hernia get the least positive samples (227) #UnifiedAnalytics #SparkAISummit 5

6.Analytics Zoo

7. Analytics Zoo Distributed TensorFlow, Keras and BigDL on Apache Spark Anomaly detection, sentiment analysis, fraud detection, chatbot, sequence prediction, Reference Use Cases etc. Image classification, object detection, text classification, recommendations, sequence- Built-In Deep Learning Models to-sequence, anomaly detection, etc. Feature transformations for Feature Engineering • Image, text, 3D imaging, time series, speech, etc. • Distributed TensorFlow and Keras on Spark High-Level Pipeline APIs • Native deep learning support in Spark DataFrames and ML Pipelines • Model serving API for model serving/inference pipelines Backends Spark, TensorFlow, Keras, BigDL, OpenVINO, MKL-DNN, etc. #UnifiedAnalytics #SparkAISummit 7

8. Bringing Deep Learning To Big Data Platform • Distributed deep learning framework for Apache Spark* • Make deep learning more accessible to big data users and data scientists – Write deep learning applications as standard Spark programs – Run on existing Spark/Hadoop clusters (no changes needed) • Feature parity with popular deep learning frameworks – E.g., Caffe, Torch, Tensorflow, etc. • High performance (on CPU) – Built-in Intel MKL and multi-threaded programming • Efficient scale-out – Leveraging Spark for distributed training & inference #UnifiedAnalytics #SparkAISummit 8

9.Architecture • Training using synchronous mini-batch SGD • Distributed training iterations run as standard Spark jobs 9

10.Parameter Synchronization local gradient local gradient local gradient 1 2 n 1 2 n 1 2 n ∑ ∑ ∑ gradient 1 1 gradient 2 2 gradient n n update update update weight 1 1 weight 2 2 weight n n Task 1 Task 2 Task n “Parameter Synchronization” Job #UnifiedAnalytics #SparkAISummit 10

11.Integrated Pipeline for Machine Learning and Deep Learning ImageReader Image Pre- Image processing Classification fit Images Images Images as Image Classification in HDFS DataFrame RDD Model

12.Building the X-ray Model

13.Let’s build the model Read the X-ray images Feature Engineering Define the model Define optimizer Train the model #UnifiedAnalytics #SparkAISummit 13

14. Read the X-ray images as Spark DataFrames Read the X-ray • Initialize NNContext and load X-ray images into images DataFrames using NNImageReader Feature Engineering from zoo.pipeline.nnframes import NNImageReader imageDF = NNImageReader.readImages(image_path, sc, resizeH=256, resizeW=256, image_codec=1) Define the model • Process loaded X-ray images and add labels (another DataFrame) using Spark transformations Define optimizer trainingDF = imageDF.join(labelDF, on="Image_Index", how="inner") Training the model #UnifiedAnalytics #SparkAISummit 14

15. Feature Engineering – Image Pre-processing Read the X-ray In-built APIs for feature engineering using ChainedPreprocessing API images ImageCenterCrop() ImageCenterCrop(224, 224) Feature Engineering ImageFlip() Random flip and brightness Define the model ImageChannelNormalize() ImageChannelNormalize() Define optimizer ImageMatToTensor() To Tensor() Training the model #UnifiedAnalytics #SparkAISummit 15

16.Transfer Learning using ResNet-50 trained with ImageNet Patient A Condition A Condition B Condition C Condition D Condition E Add a new Remove the existing input layer classification layer and add new classification layer #UnifiedAnalytics #SparkAISummit 16

17. Defining the model with Transfer Learning APIs Read the X-ray • Load a pre-trained model using Net.load_bigdl. The model is images trained with ImageNet dataset – Inception Feature Engineering – ResNet 50 – DenseNet • Remove the final softmax layer of ResNet-50 Define the model • Add new input (for resized x-ray images) and output layer (to predict the 14 diseases). Activation function is Sigmoid Define optimizer • Avoid overfitting – Regularization Training the model – Dropout #UnifiedAnalytics #SparkAISummit 17

18. Defining the model with Transfer Learning APIs Read the X-ray def get_resnet_model(model_path, label_length): images full_model = Net.load_bigdl(model_path) model = full_model.new_graph(["pool5"]) inputNode = Input(name="input", shape=(3, 224, 224)) Feature Engineering resnet = model.to_keras()(inputNode) flatten = GlobalAveragePooling2D(dim_ordering='th')(resnet) Define the model dropout = Dropout(0.2)(flatten) logits = Dense(label_length, W_regularizer=L2Regularizer Define optimizer (1e-1), b_regularizer=L2Regularizer(1e-1), activation="sigmoid")(dropout) lrModel = Model(inputNode, logits) Training the model return lrModel #UnifiedAnalytics #SparkAISummit 18

19. Define the Optimizer Read the X-ray images • Evaluated two optimizers: SGD and Adam Optimizer • Learning rate scheduler is implemented in two Feature Engineering phases: – Warmup + Plateau schedule Define the model – Warmup: Gradually increase the learning rate for 5 epochs Define optimizer – Plateau: Plateau("Loss", factor=0.1, patience=1, mode="min", epsilon=0.01, cooldown=0, min_lr=1e- Training the model 15) #UnifiedAnalytics #SparkAISummit 19

20. Train the model using ML Pipelines Read the X-ray • Analytics Zoo API NNEstimator to build the model images • .fit() produces a neural network model which is a Transformer • You can now run .predict() on the model for inference Feature Engineering • AUC-RoC is used to measure the accuracy of the model. Spark ML pipeline API BinaryClassificationEvaluator to determine the Define the model AUC-ROC for each disease. estimator = NNEstimator(xray_model, BinaryCrossEntropy(), transformer) .setBatchSize(batch_size).setMaxEpoch(num_epoch) Define optimizer .setFeaturesCol("image").setCachingSample(False). .setValidation(EveryEpoch(), validationDF, [AUC()], batch_size) .setOptimMethod(optim_method) Xray_nnmodel = Training the model #UnifiedAnalytics #SparkAISummit 20

21.Results and observations #UnifiedAnalytics #SparkAISummit 21

22.Spark parameter recommendations • Number of executors Number of compute nodes • Number of executor cores Number of physical cores minus two • Executor memory Number of cores per executor * (Memory required for each core + data partition) #UnifiedAnalytics #SparkAISummit 22

23.Impact on Learning Rate Scheduler 0.00012 0.9 0.8 • Adam (with Learning Average AUC (Higher is better) 0.0001 Rate Scheduler) 0.7 outperforms SGD 0.00008 0.6 Learning Rate – Warmup 0.5 0.00006 – Plateau 0.4 • Learning rate scheduler 0.00004 0.3 helps is covering the 0.2 0.00002 model much faster 0.1 0 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 1E-7 Epochs Learning rate Average AUC w/ Learning Rate Scheduler Average AUC w/o Learning Rate Scheduler #UnifiedAnalytics #SparkAISummit 23

24.Base model comparison 0.95 0.9 AUC (Higher is better) 0.85 0.8 0.75 0.7 0.65 0.6 0.55 0.5 Inception-V3 DenseNet-121 ResNet-50 #UnifiedAnalytics #SparkAISummit 24

25.Batch size and Scalability 0.85 • Two factors: Average AUC (Higher is better) – Batches per thread (mini-batch): Number of 0.8 images in a iteration per worker – Global batch size: Batches per thread * 0.75 Number of workers 0.7 • Batch size is a critical parameter for model accuracy as well as for 0.65 distributed performance 0.6 – Increasing the batch size increases parallelism 0.55 – Increasing the batch size may lead to a loss in 1 2 3 4 5 6 7 8 generalization performance especially Batch per thread #UnifiedAnalytics #SparkAISummit 25

26.Scalability 3x speed up from 4 nodes to 16 nodes. ~ 2.5 hours to train the model 3.5 Speed up (Higher is better) bz=2048 3 bz=1536 2.5 2 bz=1024 1.5 bz=512 1 0.5 0 4 8 12 16 Number of nodes * 15 epochs and 32 cores per executor #UnifiedAnalytics #SparkAISummit 26

27.Future Work • Continue to improve the model performance and scalability – Implement LARS and other advanced learning rate scheduler for large scale training • Increase the feature set – Incorporate patient profile information • Develop performance characteristic and sizing guidelines for image classification problems #UnifiedAnalytics #SparkAISummit 27

28.Links Code and white paper available at ImageProcessing-Examples #UnifiedAnalytics #SparkAISummit 28

29.Acknowledgements (Alphabetical) Mehmood Abd, Michael Bennett, Sajan Govindan, Jenwei Hsieh, Andrew Kipp, Dharmesh Patel, Leela Uppuluri, and Luke Wilson #UnifiedAnalytics #SparkAISummit 29