- 快召唤伙伴们来围观吧
- 微博 QQ QQ空间 贴吧
- 文档嵌入链接
- 复制
- 微信扫一扫分享
- 已成功复制到剪贴板
Use Intel Analytics Zoo to Build an intelligent QA Bot for Azure
展开查看详情
1 .Use Intel Analytics Zoo to build an intelligent QA Bot for Microsoft Azure 3/11/2018
2 .About Us Kai Huang Software Engineer from Intel Data Analytics Technology Team Yuqing Wei Software Engineer from Microsoft C+AI Team
3 .Why customer support platform needs AI? Traditional vs r ecent intelligent platforms. Chat Bot is often one of the core intelligent components. To enhance user experience and relieve human workload. To provide technical support for Azure users effectively and efficiently.
4 .Overall architecture Overview of customer service platform (basic modules in blue, intelligent modules in green)
5 .Why neural networks? Neural networks are easier for feature extraction. TextClassifier module can be modified for sentiment analysis. Neural networks generally have better performance, especially on QA tasks and when we lack data. Common parts can share for different AI modules.
6 .Why Analytics Zoo & BigDL ? A unified distributed analytics + AI platform on Apache Spark. Provides pipeline APIs, prebuilt models and use cases for NLP tasks. Provide practical experience for Azure big data users to build AI applications. Preinstalled image on Azure Marketplace for easy deployment. https:// github.com/intel-analytics/analytics-zoo https:// analytics-zoo.github.io/
7 .General steps for NLP tasks Data Collection Data Cleaning Preprocessing Model Training Evaluation Tuning New data Inference
8 .Data Preprocessing Read cleaned text data as RDD where each record contains two columns (text, label). C ommon Steps Tokenization: https:// github.com/fxsjy/jieba Stopwords removal Sequence aligning Word2Vec: https:// github.com/facebookresearch/fastText Conversion to BigDL Sample -> RDD[Sample]
9 .Define a TextClassifier model class_num : The number of text categories to be classified. token_length : The size of each word vector. sequence_length : The length of a sequence. encoder: The encoder for input sequences. cnn or lstm or gru . encoder_output_dim : The output dimension for the encoder. from zoo.models.textclassification import TextClassifier text_classifier = TextClassifier ( class_num , token_length , sequence_length =500, encoder =" cnn ", encoder_output_dim =256 ) *Photo from: https://blog.csdn.net/littlely_ll/article/details/79151403
10 .Train and evaluate model text_classifier.compile (optimizer= Adagrad ( learning_rate , decay), loss= " sparse_categorical_crossentropy " , metrics=["accuracy"]) text_classifier.set_checkpoint (path) text_classifier.set_tensorboard ( log_dir , app_name ) text_classifier .fit ( train_rdd , batch_size =…, nb_epoch =…, validation_data = val_rdd ) text_classifier.save_model ( model_path ) text_classifier.predict ( test_rdd ) text_classifier.predict_classes ( test_rdd ) A nalytics Zoo provides Keras -Style API for distributed training:
11 .Ways for improvement Check your data first (quality, quantity, etc.). Use custom dictionary for tokenization if necessary. Train word2vec for unknown words if necessary. Hyper parameters tuning (learning rate, etc.). Add character embedding, etc.
12 .Service Integration Prediction service implemented in Java POJO-like API for low-latency local inference public class TextClassificationModel extends AbstractInferenceModel { public JTensor preProcess (String text) { // Re-implement the preprocessing using Java API } } TextClassificationModel model = new TextClassificationModel (); model.load (path ); String sampleText = "text content "; JTensor input = model.preProcess ( sampleText ); List< JTensor > inputList = new ArrayList <>(); inputList.add (input); List<List< JTensor >> result = model.predict ( inputList ); WebService example: https :// github.com/intel-analytics/analytics-zoo/tree/master/apps/web-service-sample
13 .A glimpse of QA Ranker module Input: a query and a document pair. Similar preprocessing steps. Output: Relevance score or probability .
14 .CONTROL EASE OF USE Azure Data Lake Analytics Azure Data Lake Store Azure Storage Any Hadoop technology, any distribution Workload optimized, managed clusters Data Engineering in a Job-as-a-service model Azure Marketplace HDP | CDH | MapR Azure Data Lake Analytics IaaS Clusters Managed Clusters Big Data as-a-service Azure HDInsight Frictionless & Optimized Spark clusters Azure Databricks BIG DATA STORAGE BIG DATA ANALYTICS Reduced Administration POSITIONING THE DIFFERENT BIG DATA SOLUTIONS KNOWING THE VARIOUS BIG DATA SOLUTIONS
15 .Spark Offerings in Azure Spark IaaS ( Azure Marketplace) https://market.azure.cn/zh-cn Spark on Azure Batch using Docker https://azure.microsoft.com/en-us/blog/on-demand-spark-clusters-on-docker/ HDInsight Spark https://docs.azure.cn/zh-cn/hdinsight/spark/apache-spark-overview Azure Databricks https://azure.microsoft.com/zh-cn/services/databricks/
16 .
17 .Spark on HDInsight Provision cluster with a click of a mouse Fully supported by Microsoft and Hortonworks Supports Batch, ML, Streaming and SQL workloads Read data from Azure Blob Storage The Spark connector enables real-time analytics over globally distributed data in Azure Cosmos DB P owerful visualization of data in Spark with Power BI VS Code Integration
18 .Bot Demo WeChat: Microsoft 云科技 Web chat: https://support.azure.cn/zh-cn/support/support-azure/
19 .Blogs https:// www.azure.cn/zh-cn/blog/2018/09/12/Using-Intel-Analytics-Zoo-to-inject-AI-into-customer-service-platform_PartI https:// software.intel.com/en-us/articles/use-analytics-zoo-to-inject-ai-into-customer-service-platforms-on-microsoft-azure-part-1