The Azure Cognitive Services on Spark: Clusters with Embedded Intelligent Servic

We present the Azure Cognitive Services on Spark, a simple and easy to use extension of the SparkML Library to all Azure Cognitive Services. This integration allows Spark Users to embed cloud intelligence directly into their spark computations, enabling a new generation of intelligent applications on Spark. Furthermore, we show that with our new Containerized Cognitive Services, one can embed cloud intelligence directly into the Spark cluster for ultra-low latency, on-prem, and offline applications. We show how using our Integration, one can compose these cognitive services with other services, SQL computations, and Deep Networks to create sophisticated and intelligent heterogenous applications. Moreover, we show how to redeploy these compositions as Restful Services with Spark Serving. We will also explore the architecture of these contributions which leverage HTTP on Spark, a novel integration between Spark with the widely used Hypertext Transfer Protocol (HTTP). This library can integrate any framework into the Spark ecosystem that is capable of communicating through HTTP. Finally, we demonstrate how to use these services to create a large class of intelligent applications such as custom search engines, realtime facial recognition systems, and unsupervised object detectors.

1.The Azure Cognitive Services on Spark: Clusters with Embedded Intelligent Services Mark Hamilton, Microsoft, Anand Raman, Microsoft, #UnifiedAnalytics #SparkAISummit

2.Overview • The Cognitive Services on Spark – Basic Usage – Fluent Design • HTTP on Spark – Architecture and Principles • Clusters with Embedded Services – Kubernetes, Databricks • Examples – GANs + the Metropolitan Museum of Art #UnifiedAnalytics #SparkAISummit 2

3.Motivation • Azure Cognitive Services provide high quality pre- built intelligent services • No need for time intensive model training or deployment • Can quickly create intelligent applications • Leverage Microsoft • Research and Azure ML #UnifiedAnalytics #SparkAISummit 3

4. Vision Speech Language Knowledge Search Object, scene, and Speech transcription Language detection Q&A extraction from Ad-free web, news, image, activity detection (speech-to-text) unstructured text and video search results Named entity recognition Face recognition Custom speech models for Knowledge base creation Trends for video, news and identification unique vocabularies or Key phrase extraction from collections of Q&As complex environment Image identification, Celebrity and landmark Text sentiment analysis Semantic matching for classification and recognition Text-to-speech knowledge bases knowledge extraction Multilingual and contextual Emotion recognition Custom Voice spell checking Customizable content Identification of similar personalization learning images and products Text and handwriting Real-time speech translation Explicit or offensive text recognition (OCR) content moderation Named entity recognition Customizable speech and classification Customizable image transcription and translation PII detection for text recognition moderation Knowledge acquisition Speaker identification for named entities Video metadata, audio, and verification Text translation and keyframe extraction Search query autosuggest Customizable text translation and analysis Ad-free custom search Contextual language Explicit or offensive engine creation understanding content moderation

5.Azure Cognitive Services on Spark • Easy to use integration between Spark and the Azure Cognitive Services • Composable and pipelinable with all other val df = new TextSentiment() .setTextCol(“text”) SparkML models! .setOutputCol(“sentiment”) • Python, Scala, R (Beta) .transform(inputs) #UnifiedAnalytics #SparkAISummit 5


7.Fluent API for Advanced Orchestration • Any parameter can be set with a dataframe column or with a single value queries Cat Get results for multiple search terms: Dog Antelope new BingImageSearch() .setQueryCol(“queries”) Car Bob Ross #UnifiedAnalytics #SparkAISummit 7

8.Fluent API for Advanced Orchestration • Any parameter can be set with a dataframe column or with a single value offsets Get the first N pages of Bing for a 0 specific term: 100 new BingImageSearch() 200 .setQuery(“cats”) 300 .setOffsetCol(“offsets”) 400 #UnifiedAnalytics #SparkAISummit 8

9.Fluent API for Advanced Orchestration • Any parameter can be set with a dataframe column or with a single value offsets queries keys Get the get fist 200 results for many 0 Cat 17… terms using several different accounts: 100 Cat 17… new BingImageSearch() 0 Tree 3e… .setQueryCol(“queries”) 100 Tree 4q… .setOffsetCol(“offsets”) 0 Car G1… .setKeyCol(“keys”) #UnifiedAnalytics #SparkAISummit 9

10.High Performance Capabilities OOTB • Asynchronous Parallelism (P) Features Time (s) Errors # None 30.8 18993 • Automatic Batching (B) EBO+BP 1163.0 0 • Automatic Retries EBO+BP+B 57.1 0 – Exponential Back-offs EBO+BP+B+P 49.7 0 (EBO) – Backpressure (BP) 10 nodes, 20k Requests, 1k req/min limited service #UnifiedAnalytics #SparkAISummit 10

11. on • Full Integration between HTTP Protocol and df = SimpleHTTPTransformer() Spark SQL .setInputParser(JSONInputParser()) .setOutputParser(JSONOutputParser() • Spark as a Microservice .setDataType(schema)) .setOutputCol("results") Orchestrator .setUrl(…) • Spark + X #UnifiedAnalytics #SparkAISummit 11

12. on Web Service Local Local Local HTTP Service Service Service Requests and Client Client Client Responses Client Client Client Partition Partition Partition Partition Partition Partition Spark Worker Spark Worker #UnifiedAnalytics #SparkAISummit 12

13. Cognitive Service Containers Now In Public Preview • No app changes & Compatible with full Cognitive Services feature-set • Support for 6 key AI capabilities: • Key Phrase Extraction • Language Detection • Sentiment Analysis • Face & Emotion Detection • OCR / Text Recognition • Language Understanding • Run & manage locally, Try for free • Connect to Billing service for report back, unified billing with on-cloud and off-cloud transactions • Additional Capabilities coming soon (e.g. Speech) #UnifiedAnalytics #SparkAISummit 13

14. Clusters with Embedded Services • Deploy cognitive services directly onto Local PySpark Cognitive cluster worker nodes Service • Bring the compute to the Pyspark Protocol HTTP data Spark Scala Process • Use low latency in- machine networking Spark Worker #UnifiedAnalytics #SparkAISummit 14

15. Azure Kubernetes Service + Helm Kubernetes (AKS, ACS, GKE, On-Prem etc) • Works on any k8s cluster K8s worker K8s worker K8s worker • Helm: Package Manager Cloud Cognitive Service Cognitive Spark Service Worker Cognitive Service Container Container Container for Kubernetes Cognitive Services HTTP on Spark HTTP on Spark HTTP on Spark HTTP on Spark Spark Spark Spark Worker Worker Worker helm repo add mmlspark \ Storage or other Databases Spark Serving Hotpath Jupyter, Zepplin Spark Zepplin, Serving helm install mmlspark/spark \ Spark Readers Load LIVY, or Spark Jupyter Balancer --set localTextApi=true Submit LB REST Requests to Submit Jobs, Run Notebooks, Deployed Models Manage Cluster, etc Dalitso Banda, Users / Apps Microsoft AI Development Acceleration Program #UnifiedAnalytics #SparkAISummit 15

16.Creating a Visual Search Engine for the Metropolitan Museum of Art #UnifiedAnalytics #SparkAISummit 16

17. Intelligent Image Annotation • The MET Query Released 400k Image: Images under Open Access Describe A picture A picture A fish • Pipe images Image containing a containing a swimming through Output: person glass, cup underwater Computer Vision API to annotate Deep image for Feature searching Nearest Neighbors: #UnifiedAnalytics #SparkAISummit 17

18. Reverse Image Search Architecture Query ResNet Deep Fast Nearest Closest Image Featurizer Features Neighbor Match Lookup MMLSpark SparkML LSH or Annoy Filters from Zeiler + Fergus 2013 #UnifiedAnalytics #SparkAISummit 18

19. Example Nearest Neighbors Query Images Neighbors Nearest #UnifiedAnalytics #SparkAISummit 19

20.Spark x Azure Search • Azure Search Sink for Spark • Allows for pushing thousands of documents per second into Azure Search instances • Built on HTTP on Spark • Use to create search APIs on top of Spark Dataframe #UnifiedAnalytics #SparkAISummit 20

21. Microsoft Machine Learning for Apache Spark v0.16 Microsoft’s Open Source Contributions to Apache Spark Cognitive Spark Model LightGBM Deep Networks HTTP on Services Serving Interpretability Gradient Boosting with CNTK Spark Azure/mmlspark #UnifiedAnalytics #SparkAISummit 21

22.Conclusions • Can now embed Cognitive Services into Spark Workflows • Can harness Spark Help us advance Spark: Cluster for Azure/mmlspark Microservices • Get started now with Contact: interactive examples! #UnifiedAnalytics #SparkAISummit 22

23.Thanks To • Sudarshan Raghunathan • Ilya Matiach • Microsoft NERD Garage Team + MIT Externship Program • Microsoft Development Acceleration Team: – Dalitso Banda, Casey Hong, Karthik Rajendran, Manon Knoertzer, Tayo Amuneke, Alejandro Buendia • Pablo Castro, Chris Hoder, Ryan Gaspar, Henrik Neilsen, Joseph Sirosh, Andrew Schonhoffer, Daniel Ciborowski, Markus Cosowicz • Azure CAT, AzureML, and Azure Search Teams #UnifiedAnalytics #SparkAISummit 23

24.Backup Slides #UnifiedAnalytics #SparkAISummit 24

25. Training Data Real or Generated ? Noise Real or Vector Generated ? Generated Generator Discriminator Image

26. Target Image 𝐿𝑜𝑠𝑠𝑝𝑖𝑥𝑒𝑙 + 𝐿𝑜𝑠𝑠𝑠𝑒𝑚𝑎𝑛𝑡𝑖𝑐 × 𝜆 Learned Noise Vector Generator Generated Pretrained ResNet 50 Image

27. Code Space Interpolation 𝐺 −1 𝐺 −1 Inverted Inverted Noise Vector Noise Vector 1 2 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺