Building Real-Time Data Pipeline

利用数据库建立糖尿病药物推荐系统实时数据管道。本次讲座解释了如何实时实时大数据管道推荐引擎可以用来建议胰岛素摄入量的糖尿病患者在近实时。根据患者的热量摄入和血糖水平以及所产生的数据集,可以推荐胰岛素剂量,这将有助于患者避免过量/不足的剂量。设计药物推荐系统是医疗保健行业的需要。基于病人的历史数据,医生通过推荐药物帮助医生的趋势越来越大。这也有助于促进全世界所有用户的医生友好和无医院的氛围。
展开查看详情

1.Building Real-Time Data Pipeline For Diabetes Medication Recommender System Using Databricks Arivoli Tirouvingadame Data Platform Engineer, Qventus Jayaradha Natarajan Sr. Data Engineer, Change Healthcare #DevSAIS17

2.$whoami • Jayaradha Natarajan Sr. Data Engineer, Change Healthcare Arivoli Tirouvingadame www.github.com/jayaradha Data Platform Engineer, Qventus Open Source Committer http://www.github.com/olisource https://l10n.gnome.org/teams/ta/ Organizer, Data Riders meetup group Organizer, Data Riders meetup group www.meetup.com/datariders www.meetup.com/datariders

3.AI/ML in Healthcare “AI will be ubiquitous in healthcare by 2025” https://www.techemergence.com/machine-learning-in-healthcare-executive-consensus/

4. Patient Visit Prescription Lab Healthcare Data IoT Sensors R&D

5.“We are in the early days of AI assisting Physicians better prescribe medication” https://www.truveris.com/resources/ai-in-healthcare-helping-physicians-better-prescribe-treatments

6. - Current: 1 in 11 adults are diabetic - By 2040: Diabetes population is expected to be 2 times population of USA https://www.alchemyfoodtech.com/copy-of-diabetes-epidemic

7.Life in a day … of a Diabetes patient Problem Challenge Symptoms

8.How can we prescribe Diabetes medication better in near real-time?

9.Solution - Use Big Data pipeline to collect patient's Blood glucose level and medication before/after food and predict better medication in near real-time Data Model Predict Collection Predict Collect Medication Model data Sensor Data & alert using ML (Wearable patient’s Algorithms devices) mobile device

10. Non-meter test strips Glucose Monitors Hospital glucose meters Blood testing with meters using test strips Noninvasive meters Continuous glucose monitors

11.Ingestion data o Typically, raw data can be structured/semi- structured/unstructured with/without errors o IoT devices (from Continuous Glucose Monitors) produce structured data with/without errors

12.Data Storage and Cleansing Cleansed Data Storage Blood glucose Calorie intake level Model Storage Sensor Raw Data Data Storage Age Recommendation/ Data Cleansing Score storage

13.Data Cleansing and modeling o Data cleansing uses statistical analysis tools to read and audit data based on a list of pre-defined constraints. Streaming Range Validate check Split Data Data Training Test data data

14.ARCHITECTURE

15.Reference Architecture Train Transformation/ Cleansing EMR Raw Data Clean Data Model

16.Reference Architecture Train Transformation/ Cleansing EMR Raw Data Clean Data Model Prediction

17.Architecture components o Kafka: Get sensor data in real-time from Wearable devices o Apache Spark: Ingest raw data through Kafka. Use Structured Streaming (Data verification, validation, cleansing, enrichment, etc.), and store it in S3 buckets o MLlib: Process data stored in S3 buckets via Machine Learning libraries. Insulin intake can be recommended o AWS: Deploy model and other related services in EC2, EMR, etc.. o Mobile or Web App: Notify patients with medication recommendation o D3/Tableau: Visualize via charts/dashboards

18.Pain points o Maintaining multiple root accounts for Dev, Pre-Prod and Prod environments is expensive o Choosing HIPAA compliant services (most of the server-less technologies are not HIPAA compliant) o We have to build secured network from scratch and maintain them (for example: using terraform, cloud formation, etc.). o End-to-end encryption: Data-in-flight and Data-at-rest encryption

19.HIPAA Challenges o HIPAA requires Healthcare Data to be protected. o Ensure the confidentiality, integrity, and availability of Protected Health Information (PHI) created, received, maintained, or transmitted. o Protect against any reasonably anticipated threats and hazards to the security or integrity of PHI. o Protect against reasonably anticipated uses or disclosures of PHI not permitted by the Privacy Rule.

20.DATABRICKS PIPELINE

21.Databricks – Kinesis - Connector Kinesis Structured Streaming Spark ML AWS Lambda API Gateway

22.Databricks – Kafka - Connector Spark to Spark ML clean data data Kafka Train Connector Raw Cleansed Data Data Model Prediction

23.Deployment o Hybrid only or single tenant o Selected AWS BAA HIPAA services o Databricks auxiliary services (Web app and cluster management software) would be in a Databricks-owned AWS account and run on dedicated VPC instance. o Spark clusters would continue to be deployed to customers AWS account and on dedicated instances. o End to End Encryption: Data-in-flight and Data-at-rest encryption o Logging and Monitoring o Audit https://docs.databricks.com/user-guide/advanced/hipaa-compliant-deployment.html

24.DEMO

25.Mobile App

26.Visualizations

27.

28.

29.Future directions o Health: Extend it to apply to any medication management based solutions and emergency medication management o Wellness: Predict calorie intake o Fitness: Predict workouts needed to be done