申请试用
HOT
登录
注册
 
AI on Spark for Malware Analysis and Anomalous Threat Detection
Spark开源社区
/
发布于
/
3291
人观看

Demonstrate how Avast leverages AI and big data to burn malware.

  1. Identify - threat researcher
  2. Block - operator
  3. Analyze and automate - data / AI researcher + engineers
展开查看详情

1 .WIFI SSID:Spark+AISummit | Password: UnifiedDataAnalytics

2 .Jakub Sanojca & Joāo Da Silva, Avast Researcher Data Engineer

3 .AI on Spark for Malware Analysis and Anomalous Threat Detection Jakub Sanojca & Joāo Da Silva, Avast Researcher Data Engineer

4 .Goal Demonstrate how Avast leverages AI and big data to burn malware.

5 .Goal Demonstrate how Avast leverages AI and big data to burn malware.

6 .Agenda • What Avast does • Malware research • Structured Streaming • AI anomaly detection • Demo

7 .Thank you

8 .Thank you • Big Data Systems • AI team - especially Yura, Olga and Dmitry • Threat researchers and analysts

9 .Avast is dedicated to creating a world that provides safety and privacy for all, no matter who you are, where you are, or how you connect.

10 . Global reach Portfolio of security, privacy and utility applications #UnifiedDataAnalytics #SparkAISummit 10

11 .World’s Largest Detection Network 200B+ URLs 300 M+ new files monthly 10,000 + globally distributed servers

12 .Training the Avast Machine Learning Engine Purpose-built approach that takes < 12 hours to add new features, train, and deploy into production #UnifiedDataAnalytics #SparkAISummit 12

13 .Malware classification Data ● >500 handcrafted features from binary files from our experts Task ● Classification to clean/malware/pup files Two step ML Pipeline: ● Cluster data with custom k-means ● Classification inside the cluster is done by Random Forest #UnifiedDataAnalytics #SparkAISummit 13

14 .Infrastructure: Underlying data lake - Burger #UnifiedDataAnalytics #SparkAISummit 14

15 . Architecture: Malware classification Features Clustering Training Validation Production Data 3h 4.5h 24 h Clustering Training Validation 24 h 6h 24 h ● ~700TB of binary files ● patented tailor-made solution 15 #UnifiedDataAnalytics #SparkAISummit 15

16 .Custom application Spark • optimised & performant • slower • takes months to develop • easy to experiment with • not that easy to change • very fast development

17 .Threat Detections Streaming #UnifiedDataAnalytics #SparkAISummit

18 .3 step threat approach 1. Identify - threat researcher 2. Block - operator 3. Analyze and automate - data / AI researcher + engineers

19 .3 step threat approach 1. Identify - threat researcher 2. Block - operator 3. Analyze and automate - data / AI researcher + engineers

20 .3 step threat approach 1. Identify - threat researcher 2. Block - operator 3. Analyze and automate - data / AI researcher + engineers

21 .Time series of detections • Thousands of detection time series • Where should operator focus?

22 .Time series of detections • Thousands of detection time series • Where should operator focus?

23 .Short response time is necessary

24 .Short response time is necessary

25 .First idea - custom streaming app • Python because of ML models

26 .First idea - custom streaming app • Python because of ML models • Big part of code about already solved problems

27 .First idea - custom streaming app • Python because of ML models • Big part of code about already solved problems • POC written by researchers

28 .First idea - custom streaming app • Python because of ML models • Big part of code about already solved problems • POC written by researchers • Gets job done, but not easy to maintain or experiment

29 . Adopted solution: Spark Structured Streaming #UnifiedDataAnalytics #SparkAISummit 29

6 点赞
2 收藏
0下载
确认
3秒后跳转登录页面
去登陆