Building Real-Time Data Pipeline


1.Building Real-Time Data Pipeline For Diabetes Medication Recommender System Using Databricks Arivoli Tirouvingadame Data Platform Engineer, Qventus Jayaradha Natarajan Sr. Data Engineer, Change Healthcare #DevSAIS17

2.$whoami • Jayaradha Natarajan Sr. Data Engineer, Change Healthcare Arivoli Tirouvingadame Data Platform Engineer, Qventus Open Source Committer Organizer, Data Riders meetup group Organizer, Data Riders meetup group

3.AI/ML in Healthcare “AI will be ubiquitous in healthcare by 2025”

4. Patient Visit Prescription Lab Healthcare Data IoT Sensors R&D

5.“We are in the early days of AI assisting Physicians better prescribe medication”

6. - Current: 1 in 11 adults are diabetic - By 2040: Diabetes population is expected to be 2 times population of USA

7.Life in a day … of a Diabetes patient Problem Challenge Symptoms

8.How can we prescribe Diabetes medication better in near real-time?

9.Solution - Use Big Data pipeline to collect patient's Blood glucose level and medication before/after food and predict better medication in near real-time Data Model Predict Collection Predict Collect Medication Model data Sensor Data & alert using ML (Wearable patient’s Algorithms devices) mobile device

10. Non-meter test strips Glucose Monitors Hospital glucose meters Blood testing with meters using test strips Noninvasive meters Continuous glucose monitors

11.Ingestion data o Typically, raw data can be structured/semi- structured/unstructured with/without errors o IoT devices (from Continuous Glucose Monitors) produce structured data with/without errors

12.Data Storage and Cleansing Cleansed Data Storage Blood glucose Calorie intake level Model Storage Sensor Raw Data Data Storage Age Recommendation/ Data Cleansing Score storage

13.Data Cleansing and modeling o Data cleansing uses statistical analysis tools to read and audit data based on a list of pre-defined constraints. Streaming Range Validate check Split Data Data Training Test data data


15.Reference Architecture Train Transformation/ Cleansing EMR Raw Data Clean Data Model

16.Reference Architecture Train Transformation/ Cleansing EMR Raw Data Clean Data Model Prediction

17.Architecture components o Kafka: Get sensor data in real-time from Wearable devices o Apache Spark: Ingest raw data through Kafka. Use Structured Streaming (Data verification, validation, cleansing, enrichment, etc.), and store it in S3 buckets o MLlib: Process data stored in S3 buckets via Machine Learning libraries. Insulin intake can be recommended o AWS: Deploy model and other related services in EC2, EMR, etc.. o Mobile or Web App: Notify patients with medication recommendation o D3/Tableau: Visualize via charts/dashboards

18.Pain points o Maintaining multiple root accounts for Dev, Pre-Prod and Prod environments is expensive o Choosing HIPAA compliant services (most of the server-less technologies are not HIPAA compliant) o We have to build secured network from scratch and maintain them (for example: using terraform, cloud formation, etc.). o End-to-end encryption: Data-in-flight and Data-at-rest encryption

19.HIPAA Challenges o HIPAA requires Healthcare Data to be protected. o Ensure the confidentiality, integrity, and availability of Protected Health Information (PHI) created, received, maintained, or transmitted. o Protect against any reasonably anticipated threats and hazards to the security or integrity of PHI. o Protect against reasonably anticipated uses or disclosures of PHI not permitted by the Privacy Rule.


21.Databricks – Kinesis - Connector Kinesis Structured Streaming Spark ML AWS Lambda API Gateway

22.Databricks – Kafka - Connector Spark to Spark ML clean data data Kafka Train Connector Raw Cleansed Data Data Model Prediction

23.Deployment o Hybrid only or single tenant o Selected AWS BAA HIPAA services o Databricks auxiliary services (Web app and cluster management software) would be in a Databricks-owned AWS account and run on dedicated VPC instance. o Spark clusters would continue to be deployed to customers AWS account and on dedicated instances. o End to End Encryption: Data-in-flight and Data-at-rest encryption o Logging and Monitoring o Audit


25.Mobile App




29.Future directions o Health: Extend it to apply to any medication management based solutions and emergency medication management o Wellness: Predict calorie intake o Fitness: Predict workouts needed to be done