轻松搭建基于Milvus的文本检索系统

本次直播我们会示范如何用 BERT 模型将文本处理成向量,然后利用 Milvus 对特征向量做相似度检索,搭建一个文本搜索引擎。

展开查看详情

1. Milvus Building a Text Search Engine With BERT and Milvus jingjing

2.Unlock the treasure of unstructured data AI algorithms transform image, video, voice, natural language into vectors, and enables understanding and utilization of unstructured data at scale. Unstructured data Deep learning models Vectors Knowledge, insight, $ © 2020 Zilliz. All rights reserved.

3.Philosophy of vector search engine Ballast of an unstructured database. Unstructured Data image, video, voice, natural language store input Information Extraction Object output Result AI Models Storage Milvus Search Index query insert Knowledge Base Feature Vectors © 2020 Zilliz. All rights reserved.

4.Milvus: The journey 2018.10 2019.04 2019.06 The most active AI projects 1st The Milvus in Linux foundation seed idea 0.1 user Open Joined Source LF AI 2019.10 2020.03 © 2020 Zilliz. All rights reserved.

5.Milvus community conf 2020 https://www.slidestalk.com/m/286 https://www.slidestalk.com/m/298 © 2020 Zilliz. All rights reserved.

6.Progress Unstoppable momentum since its debut. 5.9K 4.1K 120 Commits GitHub stars Contributors 152K 300+ 19 DockerHub downloads Users Patents filed © 2020 Zilliz. All rights reserved.

7.Users 300+ community users in initial 6 months, and rapid growing. © 2020 Zilliz. All rights reserved.

8.Useful Links Live demo https://milvus.io/scenarios https://milvus.io https://github.com/milvus-io/milvus • Content-based image retrieval system (以图搜图) • Q&A chatbot powered by NLP (智能客服机器人) https://milvusio.slack.com • Molecular analysis (化合物分析) https://twitter.com/milvusio https://www.facebook.com/io.milvus.5 https://zhuanlan.zhihu.com/ai-search https://medium.com/@milvusio Follow our WeChat account © 2020 Zilliz. All rights reserved.

9.We are hiring! Join us: https://zilliz.gllue.com/portal/social positions?page=1&gql= Email to: HR@zilliz.com © 2020 Zilliz. All rights reserved.

10.Speaker Intro Data Engineer at Zilliz - Data preprocessing - AI model application - Docker deployment © 2020 Zilliz. All rights reserved.

11.Logistics - Project introduction and comparison with different NLP models - Live coding: Building a text search engine demo - Q&A session © 2020 Zilliz. All rights reserved.

12.Project introduction © 2020 Zilliz. All rights reserved.

13.Project introduction Model Core method defects CBOW (to predict the central word by the nearby word) Can't solve the problem of polysemy, no word2vec skip-gram (to predict the nearby word by the central word) contextual information Dynamic adjustment of Word Embedding based on the ELMO uses LSTM as feature extractor, and ELMo current context LSTM is much weaker than Transformer in feature extraction GPT's pre-training uses a one-way language model, using Using a one-way language model (no bi- openAI GPT only the words above to predict the words directional [forward, reverse] extraction), can not understand the text well Bert adopts the exact same two-stage model as GPT: Because the network is more complex, the BERT First, language model pre-training; computation is larger, and the training Second, the fine-tuning mode was used to solve the convergence is slower downstream tasks © 2020 Zilliz. All rights reserved.

14.Project introduction The BERT model uses Transformer as the main framework for the algorithm, which captures the bi- directional relationships in statements more thoroughly. Google provides a number of pre-trained models, the two most basic of which are the Bert-base model and the Bert-Large model. © 2020 Zilliz. All rights reserved.

15.Demo Build Preparation: 1. Milvus 2. BERT 3. Postgresql https://github.com/milvus-io/bootcamp/blob/0.10.0/solutions/Textsys/README.md © 2020 Zilliz. All rights reserved.

16.Other Applications Milvus intelligent Question answering system Graph-based Recommendation System with Milvus Building a video search system based on Milvus © 2020 Zilliz. All rights reserved.

17.欢迎加入 Milvus 技术交流群 © 2020 Zilliz. All rights reserved.