- 快召唤伙伴们来围观吧
- 微博 QQ QQ空间 贴吧
- 文档嵌入链接
- 复制
- 微信扫一扫分享
- 已成功复制到剪贴板
Building Real-Time Data Pipeline
展开查看详情
1 .Building Real-Time Data Pipeline For Diabetes Medication Recommender System Using Databricks Arivoli Tirouvingadame Data Platform Engineer, Qventus Jayaradha Natarajan Sr. Data Engineer, Change Healthcare #DevSAIS17
2 .$whoami • Jayaradha Natarajan Sr. Data Engineer, Change Healthcare Arivoli Tirouvingadame www.github.com/jayaradha Data Platform Engineer, Qventus Open Source Committer http://www.github.com/olisource https://l10n.gnome.org/teams/ta/ Organizer, Data Riders meetup group Organizer, Data Riders meetup group www.meetup.com/datariders www.meetup.com/datariders
3 .AI/ML in Healthcare “AI will be ubiquitous in healthcare by 2025” https://www.techemergence.com/machine-learning-in-healthcare-executive-consensus/
4 . Patient Visit Prescription Lab Healthcare Data IoT Sensors R&D
5 .“We are in the early days of AI assisting Physicians better prescribe medication” https://www.truveris.com/resources/ai-in-healthcare-helping-physicians-better-prescribe-treatments
6 . - Current: 1 in 11 adults are diabetic - By 2040: Diabetes population is expected to be 2 times population of USA https://www.alchemyfoodtech.com/copy-of-diabetes-epidemic
7 .Life in a day … of a Diabetes patient Problem Challenge Symptoms
8 .How can we prescribe Diabetes medication better in near real-time?
9 .Solution - Use Big Data pipeline to collect patient's Blood glucose level and medication before/after food and predict better medication in near real-time Data Model Predict Collection Predict Collect Medication Model data Sensor Data & alert using ML (Wearable patient’s Algorithms devices) mobile device
10 . Non-meter test strips Glucose Monitors Hospital glucose meters Blood testing with meters using test strips Noninvasive meters Continuous glucose monitors
11 .Ingestion data o Typically, raw data can be structured/semi- structured/unstructured with/without errors o IoT devices (from Continuous Glucose Monitors) produce structured data with/without errors
12 .Data Storage and Cleansing Cleansed Data Storage Blood glucose Calorie intake level Model Storage Sensor Raw Data Data Storage Age Recommendation/ Data Cleansing Score storage
13 .Data Cleansing and modeling o Data cleansing uses statistical analysis tools to read and audit data based on a list of pre-defined constraints. Streaming Range Validate check Split Data Data Training Test data data
14 .ARCHITECTURE
15 .Reference Architecture Train Transformation/ Cleansing EMR Raw Data Clean Data Model
16 .Reference Architecture Train Transformation/ Cleansing EMR Raw Data Clean Data Model Prediction
17 .Architecture components o Kafka: Get sensor data in real-time from Wearable devices o Apache Spark: Ingest raw data through Kafka. Use Structured Streaming (Data verification, validation, cleansing, enrichment, etc.), and store it in S3 buckets o MLlib: Process data stored in S3 buckets via Machine Learning libraries. Insulin intake can be recommended o AWS: Deploy model and other related services in EC2, EMR, etc.. o Mobile or Web App: Notify patients with medication recommendation o D3/Tableau: Visualize via charts/dashboards
18 .Pain points o Maintaining multiple root accounts for Dev, Pre-Prod and Prod environments is expensive o Choosing HIPAA compliant services (most of the server-less technologies are not HIPAA compliant) o We have to build secured network from scratch and maintain them (for example: using terraform, cloud formation, etc.). o End-to-end encryption: Data-in-flight and Data-at-rest encryption
19 .HIPAA Challenges o HIPAA requires Healthcare Data to be protected. o Ensure the confidentiality, integrity, and availability of Protected Health Information (PHI) created, received, maintained, or transmitted. o Protect against any reasonably anticipated threats and hazards to the security or integrity of PHI. o Protect against reasonably anticipated uses or disclosures of PHI not permitted by the Privacy Rule.
20 .DATABRICKS PIPELINE
21 .Databricks – Kinesis - Connector Kinesis Structured Streaming Spark ML AWS Lambda API Gateway
22 .Databricks – Kafka - Connector Spark to Spark ML clean data data Kafka Train Connector Raw Cleansed Data Data Model Prediction
23 .Deployment o Hybrid only or single tenant o Selected AWS BAA HIPAA services o Databricks auxiliary services (Web app and cluster management software) would be in a Databricks-owned AWS account and run on dedicated VPC instance. o Spark clusters would continue to be deployed to customers AWS account and on dedicated instances. o End to End Encryption: Data-in-flight and Data-at-rest encryption o Logging and Monitoring o Audit https://docs.databricks.com/user-guide/advanced/hipaa-compliant-deployment.html
24 .DEMO
25 .Mobile App
26 .Visualizations
27 .
28 .
29 .Future directions o Health: Extend it to apply to any medication management based solutions and emergency medication management o Wellness: Predict calorie intake o Fitness: Predict workouts needed to be done