Cloud Experience: Data-driven Applications Made Simple and Fast

A complex real-time data workflow implementation is very challenging. This session will describe the architecture of a data platform that provides a single, secure, high-performance system that can be deployed in a hybrid cloud architectures. We will present how to support simultaneous, consistent and high-performance access through multiple industry open source and cloud compatible standards of streaming, table, TSDB, object, and file APIs. A new serverless technology is also used in the architecture to support a dynamic and flexible implementations. The presenter will also outline how the platform was integrated with the Spark eco-system, including AI and ML tools, to simplify the development process

1.Delivering Real-Time Data-driven Applications Made Simple and Fast Yaron Ekshtein, Iguazio

2.The Data-Driven Business Challenge From Reactive to Proactive and Intelligent Event-Driven Value of Data Interactive Batch Real-time Minutes Days Time to Action

3. Cloud Must Expand to The Edge ! Automotive Manufacturing Gaming 1 connected car generates 25 GB of data every In 2018, 40% of IoT data will be processed, AR games require 10ms latency and volumetric hour. Cloud assisted cars require 10ms latency. stored, analyzed and acted upon at the edge. virtual reality requires 586mbps. Source: Hitachi and Bell Labs Source: IDC Source: Deutsche Bank and Bell Labs 5G only eliminates the bottleneck at the metro ! forcing cloud expansion to the edge US Internet connectivity Inter-metro bottlenecks 3

4.The Challenge: Combine Real-Time With AI § Large scale, multi-variant data § Predictive, Intelligent How do you operationalize it? § Slow and not actionable § How to form feature vectors from operational & real-time data AI & Machine Learning Today § Exceptions, security, logging, .. § Mass and distributed deployment

5. The Traditional Approach Too slow Batch Layer § Big data but slow § Not up to date View 1 View 2 § Complex Reports ETL Tools Change Log Batch Processing Data Lake OR Data Sources Real-time Layer Limited context In- Memory § Small amounts of data Real-time § Expensive NoSQL Dashboard § Lacks context Stream Stream Processing Serving 5

6. The Real-Time AI Pipeline Additional Context: • Historical ML • Operational Model Micro-services • Environmental & Serverless Decode Enrich Infer Post & Index (contextualize) (predict) Process Real-Time Sources Real-time Real-Time DB and Model/File store Triggers & Actions Lower Time to Action, Higher Throughput 6

7.Build and Operationalize Proactive Systems Faster Traditional Continuous Analytics Real-time Real-time and Batch Data Source Streaming Data Stores Visualization Spectrum Data Source Streaming Data Stores Visualization Streaming Netcool Streaming Spectrum REST Streaming API Batch Netcool Data Source ETL Data Stores Visualization Streaming SMOD SMOD • Complex, skill gaps, slow to productize • Simple, just a few weeks to a working app • No single view of ops, real-time, history • Unified view across ALL data • Reactive (no actions) • AI driven, proactive

8. Example: Event-Driven Analytics for Connected Cars Event processing State and enrichment State Changes Changed? Violations Car State and Events Parallel Stream Identify Enrichment Violation? Drivers Geo Aggregate Import service External Sources Geo Data Stats Enriched Update Map Weather/Road info Events ML Processing Prediction Update Vehicles Data

9.Evolve Into an Agile Cloud-Native Architecture Your Business Logic Innovate Any Containerized Microservice Consume Cloud Storage and Databases

10. Serverless, Eliminating 80% of The Work Traditional Dev and Ops Model “Serverless” Development Model § Write code + local testing § Write code + local testing § Build code and Docker image § Provide spec, push deploy § CI/CD pipeline § Add logging and monitoring § Harden security 1. Automated by the § Provision servers + OS 80% serverless platform § Handle data/event feed 2. Pay for what you use § Handle failures/auto-scaling § Handle rolling upgrades § Configuration management 10

11. The Surprising Truth About What it Takes to Build a Machine Learning Product Source: Josh Cogan, Google 11

12. Nuclio: Taking Serverless to The Next Level Extreme Performance Advanced Data & AI Features Statefulness nuclio processor Shard 1 Workers Shard 2 Event Function Shard 3 Workers Listeners Workers Functions Shard 4 DB, MQ, File Workers § Zero copy, buffer reuse § Auto-rebalance, checkpoints § Data bindings § Up to 400K events/sec/proc § Any trigger source § Shared volumes § GPU Support § Simple integration § Context cache Open-source Serverless for compute & data intensive tasks 12

13. Open-Source Serverless, A Simpler Lock-free Alternative Cloud Laptop On-Prem or Edge Same APIs, Same User Experience, Anywhere With native integration into each cloud platform 13

14.Example: Predictive Maintenance Based on Real-time + Historical Data Real-time Predicted Devices & Machines Alerts Alerts Process Stream Predict Every 15 Upload to Every 6 Web Update ML Sensor Data Trigger minutes Cloud hours Model hook Aggregate using Time Series APIs Query APIs • ML Models NoSQL & Time • Machine Metadata Real-time Series API Time Series Vectors • Environmental data dashboard (Avg, Min/Max, Stdev per sensor) intelligent edge

15. Cyber and Network Ops A leading telco use-case - predict network behavior in real-time: § Processing high message throughput from multiple streams § Cross correlating with historical and external data in real-time § AI predictions conducted on live data § Small footprint to fit network locations 15

16. Demo Video 16

17. Demo: Voice Driven Real-Time Analytics GOOGLE MAP SERVICE Update Voice Locations Query SMART HOME DEVICE SQL Query AI SQL API WEB UI (REACT)

18. Summary Build continuous, data-driven and proactive apps § Deliver real-time analytics on fresh, historical and operational data § Create a unified data layer for stream processing, AI and serving § Adopt cloud-native and serverless approaches to gain agility § Edge is the new Cloud 18

19. Thank You |