使用Analytics Zoo和Intel SGX实现可扩展的隐私保护机器学习系统 Part I

本次讲座主要介绍如何通过Analytics Zoo和Graphene-SGX,在Intel SGX上实现可扩展的隐私保护机器学习系统。其中包括安全可信的Cluster Serving、隐私保护的共享学习以及联邦学习。

Key points:

  1. Analytics-Zoo
  2. Intel SGX & Graphene-SGX
  3. Secured Cluster Serving
  4. Shared Machine Learning & Federated Learning

Part I PPML & Federated Learning

背景 & 现状
• Data Privacy & GDPR
• PPML (Privacy Preserving Machine Learning)

技术干货
• Intel SGX & Graphene-SGX
• Analytics-Zoo
• PPML with Analytics-Zoo & Intel SGX

添加微信:slidestalk_bot,留言:az。即可入技术交流群。

展开查看详情

1.August 2020 Analytics-Zoo Team Shi Dongjie & Gong Qiyuan

2.Outlines Part 1 (背景 & 现状) • Data Privacy & GDPR • PPML (Privacy Preserving Machine Learning) Part 2 (技术干货) • Intel SGX & Graphene-SGX • Analytics-Zoo • PPML with Analytics-Zoo & Intel SGX 2

3. What is Data Privacy? • Data Privacy Information privacy is the relationship between the collection and dissemination of data, technology, the public expectation of privacy, legal and political issues surrounding them. • Privacy & Security • No Security, then there is no privacy Privacy • Secured doesn’t always means private Security • Win-Win: Secured & Privacy https://en.wikipedia.org/wiki/Information_privacy 3

4. Data Privacy & Data Utility • It’s a tradeoff Max Data Utiity (no privacy) • All data is accessable • Better analytics & machine learning with these data Max Privacy (no utility) • Not data sharing any more Privacy Data Utility • Analytics & machine learning maybe in trouble A balanced status (Win-Win) • Share some insenstive data • Analytics & machine learning is good enough Data privacy is challenging since it attempts to use data while protecting an individual's privacy preferences and personally identifiable information.[3] The fields of computer security, data security, and information security all design and use software, hardware, and human resources to address this issue. https://www.clipartmax.com/middle/m2H7G6Z5N4G6H7H7_plourde-jean-b-silhouette-libra-drawing-measuring-scales-balance-clip-art/ https://en.wikipedia.org/wiki/Information_privacy 4

5. Data Privacy in Big Data era • Lots of personal data are collected • Personal data: id, phone number etc • Photo & Video • Health data: movement, heart rate • Indirectly personal information is everywhere • Input pattern, search log, click streaming etc • Music/Movie your liked/rated https://en.wikipedia.org/wiki/Information_privacy https://myphonefactor.in/2012/04/sensors-used-in-a-smartphone/ https://getsafeandsound.com/2018/09/cctv/ 5

6. What is GDPR? Increase penalty • Up to €20 million or 2~4% turnover Extend coverage • Directly personal inform, e.g., location • Indirectly personal inform, e.g., IP Give users more control/rights • Be informed • Erasure • Access • Rectification • Automated decision making & profiling • … https://www.sentinelone.com/blog/gdpr-coming-sentinelone-can-help/ https://blogs.gartner.com/richard-watson/stop-agonising-gdrp-opt-emails-start-thinking-cloud-providers/ https://www.wired.co.uk/article/what-is-gdpr-uk-eu-legislation-compliance-summary-fines-2018 6

7. Why GDPR matters? • Privacy laws & Regulations Trends GDPR, CCPA Other Privacy laws Privacy Data Utility Data Utility Privacy https://www.clipartmax.com/middle/m2H7G6Z5N4G6H7H7_plourde-jean-b-silhouette-libra-drawing-measuring-scales-balance-clip-art/ https://en.wikipedia.org/wiki/Information_privacy 7

8. What is happening after GDPR took effect? Cost a lot of memory for adoption Huge Penalty: GDPR Top 3/381 cases https://www.forbes.com/sites/oliversmith/2018/05/02/the-gdpr-racket-whos-making-money- https://www.enforcementtracker.com/ from-this-9bn-business-shakedown/#696bf80a34a2 8

9.PPML (Privacy Preserving Machine Learning) Using data to XXX without compromising privacy! A brief hisory (from 1998 to ~): PPDS (Privacy Preserving Data Sharing) PPDP (Privacy Preserving Data Publish) PPDM (Privacy Preserving Data Mining) Data Privacy PPML Machine Learning … PPML (Privacy Preserving Machine Learning) PPDL (Privacy Preserving Deep Learning) Privacy AI Privacy Machine Learning 9

10.Machine Learning Machine Learning Yearning, Andrew Ng, 2016 https://intellipaat.com/blog/tutorial/data-science-tutorial/modeling-the-data/ 10

11. PPML Attack Surface Training Data & input data Photos & Face https://pythonawesome.com/vggface2-dataset-for-face-recognition/ https://www.ahrq.gov/ncepcr/tools/pf-handbook/mod8-app-b-monica-latte.html 12

12. PPML Attack Surface Attack on models Model Inversion Attacks that Exploit Confidence Information and Basic Countermeasures, CCS 2015 13

13.PPML Attack Surface Attack on Gradient Deep Leakage from Gradients, 2019, NIPS 14

14.PPML related Techniques & Hot topic • PPML Related Techniques • TEE (Trusted Execution Environment) • HE (Homomorphic Encryption) PPML • DP (Differential Privacy) Secured Model Inference • SMPC/MPC (Secure Multi-Party Computation) Deep Learning with DP • PPML Hot Topic Federated Learning • Secured Model Inference TEE & HE for ML • FL (Federated Learning) 15

15. TEE (Trusted Execution Environment) Hardware Security Implementation Main solutions • ARM TrustZone • Intel SGX Using secured API, need to redesign your app Now, we have a solution in Part 2 https://source.android.com/security/authentication/fingerprint-hal https://en.wikipedia.org/wiki/Touch_ID 16

16. HE (homomorphic encryption) 同态加密 Compute with encrypted Data Cloud compute Enc(a), Enc(b) Enc(a*b) encryption decryption compute a, b a*b First proposed in 1978, first FHE 2009 17

17.HE (homomorphic encryption) 同态加密 • Full Homomorphic Encryption (任意计算) Performance is not good enough • Partial Homomorphic Encryption (限定计算) Some operations are not supported https://www.leiphone.com/news/202006/SbATMUxnVFkGtcSj.html 18

18. HE (homomorphic encryption) 同态加密 Bright future for HE SOTA Kristin Lauter's TED Talk on Private AI at Congreso Futuro during Panel 11 / SOLVE https://www.microsoft.com/en-us/research/project/microsoft-seal/ 19

19.Differential Privacy (DP) 差分隐私 Noise based Privacy/Noise budge is hard to define Impact Accuracy Data + Noise = Secured Data Already used in https://www.apple.com/privacy/docs/Differential_Privacy_Overview.pdf https://github.com/google/differential-privacy Proposed in 2006-2008 by Dwork from MSR 20

20. Differential Privacy (DP) 差分隐私 With DP you can make your model learning common patterns in a dataset without memorizing individual examples • Add noise in train data • Add Nosie in SGD http://www.cleverhans.io/privacy/2018/04/29/privacy-and-machine-learning.html Learning Differentially Private Recurrent Language Models. ICLR 2018 21

21.Comparison of PPML Technologies Security HE MPC TEE DP Clear Text Performance Clear Text DP TEE MPC HE Data Utility Clear Text TEE HE MPC DP Note that DP is a little special because of budget 22

22.A Simple example with TEE, HE & DP • 部门同事一起点外卖,但是要投票决定,让其中一个人取外卖 • 被投最高票的人去取外卖 • 大家都不想被他知道我投了他/她(因为他可能是你老板) • 怎么做呢? 23

23.A Simple example with TEE, HE & DP • TEE (加密数据送进TEE,解开后计算) • 大家把投票折叠(加密)起来,放到小黑屋里面,让一个可信的同事计票, 例如HR或者产品经理 • HE (加密数据直接计算) • 大家把投票结果用HE加密,让任意同事(甚至老板)去计票,然后把计票结 果解密 • DP (计算加了噪声的数据) • 给每一票增加噪声扰动,每一票都无法解读,但统计结果是基本正确 无论哪种方法,大家都无法获取投票的真实内容,投票人的隐私被保护 24

24.Federated (Machine) Learning 联邦(机器)学习 • Address Information silo https://www.enterpriseirregulars.com/10802/information-silos-and-it-governance-failure/ 25

25. Distributed Training in Deep Learning Parameter Server BigDL Allreduce local gradient local gradient local gradient 1 2 n 1 2 n 1 2 n ∑ ∑ ∑ gradient 1 1 gradient 2 2 gradient n n update update update weight 1 1 weight 2 2 weight n n Task 1 Task 2 Task n “Parameter Synchronization” Job Accelerating Training with more resource/nodes https://static.googleusercontent.com/media/research.google.com/en//people/jeff/BayLearn2015.pdf 26

26.Federated (Machine) Learning 联邦(机器)学习 Google 2016-2018 on mobile device (in production) Motivation • More/better data in user device • Better model based on these data TensorFlow Federated TensorFlow Privacy 27

27.Federated (Machine) Learning 联邦(机器)学习 Webank (Yang Qiang etc) Extend scope of Google’s Federated Learning (https://www.fedai.org/ & FATE) Federated Learning White Paper and RFC Motivation • More/better data across different Crops • Better model based on these data • Federated Data Union (long term) 28

28.Federated (Machine) Learning 联邦(机器)学习 https://www.infoq.cn/article/gtvvYvcWecNKURxeYapD 29

29. Federated Learning with TEE In SGX enclave Cloud Parameter Server TLS gradient averaged gradient Local Analytics-Zoo Analytics-Zoo Analytics-Zoo Local dataset Local dataset Cloud dataset https://github.com/intel-analytics/analytics-zoo All Analytics-Zoo examples & models are supported 30