介绍微软Azure平台的智能客服系统的基本构架,以及利用Analytics Zoo和BigDL如何在NLP模块的端对端完整过程,展现了Analytics Zoo和BigDL的在数据处理,训练,模型部署等全作业流程方面的优势。微软工程师现场演示了智能对话的效果,并介绍Azure平台上Spark产品及服务。

注脚

1.Use Intel Analytics Zoo to build an intelligent QA Bot for Microsoft Azure 3/11/2018

2.About Us Kai Huang Software Engineer from Intel Data Analytics Technology Team Yuqing Wei Software Engineer from Microsoft C+AI Team

3.Why customer support platform needs AI? Traditional vs r ecent intelligent platforms. Chat Bot is often one of the core intelligent components. To enhance user experience and relieve human workload. To provide technical support for Azure users effectively and efficiently.

4.Overall architecture Overview of customer service platform (basic modules in blue, intelligent modules in green)

5.Why neural networks? Neural networks are easier for feature extraction. TextClassifier module can be modified for sentiment analysis. Neural networks generally have better performance, especially on QA tasks and when we lack data. Common parts can share for different AI modules.

6.Why Analytics Zoo & BigDL ? A unified distributed analytics + AI platform on Apache Spark. Provides pipeline APIs, prebuilt models and use cases for NLP tasks. Provide practical experience for Azure big data users to build AI applications. Preinstalled image on Azure Marketplace for easy deployment. https:// github.com/intel-analytics/analytics-zoo https:// analytics-zoo.github.io/

7.General steps for NLP tasks Data Collection Data Cleaning Preprocessing Model Training Evaluation Tuning New data Inference

8.Data Preprocessing Read cleaned text data as RDD where each record contains two columns (text, label). C ommon Steps Tokenization: https:// github.com/fxsjy/jieba Stopwords removal Sequence aligning Word2Vec: https:// github.com/facebookresearch/fastText Conversion to BigDL Sample -> RDD[Sample]

9.Define a TextClassifier model class_num : The number of text categories to be classified. token_length : The size of each word vector. sequence_length : The length of a sequence. encoder: The encoder for input sequences. cnn or lstm or gru . encoder_output_dim : The output dimension for the encoder. from   zoo.models.textclassification   import   TextClassifier      text_classifier  =  TextClassifier ( class_num ,  token_length ,  sequence_length =500,  encoder =" cnn ",  encoder_output_dim =256 ) *Photo from: https://blog.csdn.net/littlely_ll/article/details/79151403

10.Train and evaluate model text_classifier.compile (optimizer= Adagrad ( learning_rate , decay), loss= " sparse_categorical_crossentropy " , metrics=["accuracy"]) text_classifier.set_checkpoint (path) text_classifier.set_tensorboard ( log_dir , app_name ) text_classifier .fit ( train_rdd , batch_size =…, nb_epoch =…, validation_data = val_rdd ) text_classifier.save_model ( model_path ) text_classifier.predict ( test_rdd ) text_classifier.predict_classes ( test_rdd ) A nalytics Zoo provides Keras -Style API for distributed training:

11.Ways for improvement Check your data first (quality, quantity, etc.). Use custom dictionary for tokenization if necessary. Train word2vec for unknown words if necessary. Hyper parameters tuning (learning rate, etc.). Add character embedding, etc.

12.Service Integration Prediction service implemented in Java POJO-like API for low-latency local inference public   class   TextClassificationModel   extends   AbstractInferenceModel  {       public   JTensor   preProcess (String text) {            // Re-implement the preprocessing using Java API     }   }   TextClassificationModel  model =  new   TextClassificationModel ();   model.load (path );   String  sampleText  = "text content "; JTensor  input =  model.preProcess ( sampleText ); List< JTensor >  inputList  =  new   ArrayList <>();   inputList.add (input);   List<List< JTensor >> result =  model.predict ( inputList );  WebService example: https :// github.com/intel-analytics/analytics-zoo/tree/master/apps/web-service-sample

13.A glimpse of QA Ranker module Input: a query and a document pair. Similar preprocessing steps. Output: Relevance score or probability .

14.CONTROL EASE OF USE Azure Data Lake Analytics Azure Data Lake Store Azure Storage Any Hadoop technology, any distribution Workload optimized, managed clusters Data Engineering in a Job-as-a-service model Azure Marketplace HDP | CDH | MapR Azure Data Lake Analytics IaaS Clusters Managed Clusters Big Data as-a-service Azure HDInsight Frictionless & Optimized Spark clusters Azure Databricks BIG DATA STORAGE BIG DATA ANALYTICS Reduced Administration POSITIONING THE DIFFERENT BIG DATA SOLUTIONS KNOWING THE VARIOUS BIG DATA SOLUTIONS

15.Spark Offerings in Azure Spark IaaS ( Azure Marketplace) https://market.azure.cn/zh-cn Spark on Azure Batch using Docker https://azure.microsoft.com/en-us/blog/on-demand-spark-clusters-on-docker/ HDInsight Spark https://docs.azure.cn/zh-cn/hdinsight/spark/apache-spark-overview Azure Databricks https://azure.microsoft.com/zh-cn/services/databricks/

16.

17.Spark on HDInsight Provision cluster with a click of a mouse Fully supported by Microsoft and Hortonworks Supports Batch, ML, Streaming and SQL workloads Read data from  Azure Blob Storage   The Spark connector enables real-time analytics over globally distributed data in Azure Cosmos DB P owerful visualization of data in Spark with Power BI VS Code Integration

18.Bot Demo WeChat: Microsoft 云科技 Web chat: https://support.azure.cn/zh-cn/support/support-azure/

19.Blogs https:// www.azure.cn/zh-cn/blog/2018/09/12/Using-Intel-Analytics-Zoo-to-inject-AI-into-customer-service-platform_PartI https:// software.intel.com/en-us/articles/use-analytics-zoo-to-inject-ai-into-customer-service-platforms-on-microsoft-azure-part-1

相关Slides