Architecting Production IoT Analytics——Paige Roberts

展开查看详情

1. Architecting Production IoT Analytics Paige Roberts Vertica Open Source Relations Manager

2. Paige Roberts, Vertica Open Source Relations Manager § 23 years in data management § 6 years as a teacher § 1 year at Vertica § Past: Syncsort, Hortonworks, Bloor Group, Actian, Epicor, Pervasive, CSC, Data Junction § Can’t seem to decide what I want to be when I grow up: § Tech Support § Tech Writer § Software Trainer § Software Engineer § Consultant § Industry Analyst § Product and Technical Marketer § Product Manager 2

3.Agenda § Introduction § Some Successful IoT Architectures § Architecture Evolution § Take Aways §Q&A

4.Introduction

5. A Day in the IoT Analytics Life Successful implementations right now across dozens of industries and use cases Smart Health / EMR Ride Customer Network Predictive Buildings Analytics Share Analytics Optimization Maintenance Route Wearable Smart Software Clickstream Security Optimization Analytics Agriculture Optimization Analytics Analysis 5

6.6

7.IoT Architectures

8. Distributed Mass ETL Analytics and Visualization Data Extract, Transform, Machine Learning Application Sources Pub/Sub Storage System Load AI Low Philips Remote Latency Service Distributed Network Analytics Database R&D Access Batch Remote Monitoring Teradata, Salesforce, SAP data Remote Service SQL Server

9. Distributed Mass ETL Analytics and Visualization Data Extract, Transform, Machine Learning Application Sources Pub/Sub Storage System Load AI Low Call Detail Latency Records ML Application HDFS HDFS Anritsu/ Distributed Internal Analytics Database Pre-Packaged E TL Machine Hot Data Dashboards / Data Enrichment Network Auto-Sync Hive Hive Batch Customer Cold Data

10. Distributed Mass ETL Analytics and Visualization Data Extract, Transform, Machine Learning Application Sources Pub/Sub Storage System Load AI Extremely Real-Time Low Bidding Data Latency Distributed Analytics Database Low Third-Party Latency Data HDFS HDFS Distributed Analytics Database Batch Contextual Data (PostgreSQL) Distributed Analytics Database

11. Distributed Mass ETL Analytics and Visualization Data Extract, Transform, Machine Learning Application Sources Pub/Sub Storage System Load AI Low Application data Distributed Latency Clickstreams Ad Hoc Analytics: Analytics Data Manager / Manifest Location Data • CityOps Database • Data Scientists Database Proxy • QueryBuilder – HDFS HDFS Uber created Distributed • DashBuilder – L Analytics n E d) Uber created Cassandra, t i o Database es oa Batch Key/Value Ing ract, L E TL t DB (E x (Flattened, Schema Distributed Enforced Modeled Recent Analytics Tables) Data Database RDBMS Applications: MySQL, • ETL/Modeling PostgreSQL … • CityOps Hive, Spark, Presto, Notebooks • Machine Learning • Experiments

12. Distributed Mass ETL Analytics and Visualization Data Extract, Transform, Machine Learning Application Sources Pub/Sub Storage System Load AI Planting and harvest equipment Low Weather stations, Latency probes, satellite imagery Distributed Analytics Application data Database - clickstreams Batch t/ ges e In er t ra lust pa C Sales Data Se ETL Marketing Campaigns Amazon RDS

13. Distributed Mass ETL Analytics and Visualization Data Extract, Transform, Machine Learning Application Sources Pub/Sub Storage System Load AI Distributed Analytics Database Reporting CRM ERP Business Intelligence Billing Data Ingest Financial Reporting / ELT Contact Center Geo/Mapping Ingestion ELT Customer Batch CX Transformation Operational pushed down Financial Service Quality TV Schedule

14. Distributed Mass ETL Analytics and Visualization Data Extract, Transform, Machine Learning Application Sources Pub/Sub Storage System Load AI EON TV Service Low On/Off, Channel Latency Change Data Distributed Analytics Database Content Analytics Machine Learning for real-time ad targeting Reporting CRM ERP Business Intelligence Billing Data Ingest Financial Reporting / ELT Customer Profile Contact Center Analyses for Geo/Mapping Ingestion ELT Data-Driven Apps Customer Batch CX Transformation Data-Driven Apps Operational pushed down Financial Service Quality TV Schedule

15. Distributed Mass ETL Analytics and Visualization Data Extract, Transform, Machine Learning Application Sources Pub/Sub Storage System Load AI Extremely Real-Time Distributed Analytics Database Low Bidding Data that uses shared storage Latency Low Third-Party BI Latency Data Reporting Data Ingest / ELT/ BI / Machine Learning BI Batch Contextual Reporting Data

16. Key Aspects § All Highly Successful Production Architectures § Simplicity of design § Both low-latency streaming and historical batch data processing § Bring analytics to data, not data to analytics 17

17.Data Architecture Evolution

18.Data Warehouse Visualization Message Queues Files Analytics Database Transactional Business Batch Data ETL Intelligence CRM ERP Billing Application Data Customer Operational Financial

19. Data Lake Visualization Applications Streaming Artificial Intelligence Data Application data Web clicks Distributed Low Logs Pub/Sub Data Lake Latency Sensors Mass Storage Operational metrics User tracking Stream Processing Geo-location Distributed Prepped Contextual On-Premises Data Data Prep / Query Engine / Data Machine Enrichment Files Batch ETL AND / OR Learning Weather Geo or EL with T done on Batch mass storage Object Transactional Storage Data Application Data Cloud OLTP/ODS 20

20. Cooperative Data Architecture Visualization Applications Streaming Artificial Intelligence Data Stream Processing Application data Distributed Low Web clicks Data Lake Logs Pub/Sub Mass Storage Distributed Latency Sensors Operational metrics Analytics User tracking Geo-location Warehouse Contextual On-Premises Data Prep / Import Data Export Enrichment Files Query Batch ETL AND / OR Weather Geo or EL with T done on Batch mass storage Object Transactional Storage Distributed Data Columnar Application Data Cloud Data OLTP/ODS 21

21. Unified Analytics Warehouse Visualization Applications Streaming Stream Processing Data Science Tools Artificial Intelligence Data Application data Low Web clicks Logs Distributed Unified Analytics Pub/Sub Latency Sensors Warehouse Operational metrics Shared User tracking Reporting / Geo-location Storage BI HDFS HDFS Contextual On-Premises Data Science / ML Data AND / OR Ingestion/ ELT/ Files Data Prep / Enrichment Weather Batch ETL Geo or EL with T Object Storage Departmental Batch done in warehouse Cloud Use Transactional Data Managed ML Application Data Models OLTP/ODS 22

22.TakeAways

23. The Only Constant is Change 24

24. The Only Constant is Change – DOFOFU! § Don’t commit to only open source, only proprietary, only one brand (Yes, people HAVE been fired for choosing only IBM) § Don’t lock yourself in to only one deployment option – solution only works on- prem, only works on Cloud, only works on THIS cloud § Don’t tightly couple components – Everything should be interchangeable. Switching out one component shouldn’t break everything. § Plan for the future. Don’t get locked in. - DOFOFU (Acronym creation credit to @_ColinFay) 25

25.Q&A Learn More: academy.vertica.com Try it Free: vertica.com/try Paige Roberts Open Source Relations Manager E: Paige.Roberts@microfocus.com

26.Thank you!

27. Vertica Data Disruptors Webcast Series https://www.vertica.com/data-disruptors-vertica-webcast-series/ http://academy.vertica.com

28. Cooperative Data Architecture Visualization Applications Streaming Artificial Intelligence Data Stream Processing Application data Low Web clicks Logs Data Lake Latency Sensors Mass Storage Operational metrics User tracking Departmental Use Reporting/BI Geo-location Ingestion/ELT Data Science/ML Contextual On-Premises Data Prep / Data Enrichment Import Manage Files Batch ETL AND / OR Weather Query Import Geo or EL with T Export Export done on Batch mass storage Transactional Data Application Data Cloud OLTP/ODS 29

29. Unified Analytics Warehouse Visualization Applications Streaming Stream Processing Artificial Intelligence Data Application data Web clicks Low Logs Latency Sensors Operational metrics Shared User tracking Reporting / Geo-location Storage BI HDFS HDFS Contextual On-Premises Data Science / ML Data AND / OR Ingestion/ ELT/ Files Data Prep / Enrichment Weather Batch ETL Geo or EL with T Batch done in Departmental Cloud warehouse Use Transactional Data Managed ML Application Data Models OLTP/ODS 30

StreamNative 是一家围绕 Apache Pulsar 和 Apache BookKeeper 打造下一代流数据平台的开源基础软件公司。秉承 Event Streaming 是大数据的未来基石、开源是基础软件的未来这两个理念,专注于开源生态和社区的构建,致力于前沿技术。