Pronto - Elasticsearch as Service in eBay
1.Pronto Elasticsearch as Service in eBay Ruan Yiming 2019 / 11 / 23
2.Agenda Overview of Elasticsearch Elasticsearch in eBay • Use Cases & Challenges • Our Solution and Tools & How to extend Elasticsearch 2 © 2018 eBay. All rights reserved.
3.Overview of Elasticsearch History / Elastic Stack / Business Model
4.History of Elasticsearch Elasticsearch • An open source, distributed, near real time, RESTful search and analytics engine • Top 1 Search Engine by DB Engine History • 2004, Shay Banon developed Compass based on Lucene • 2010, Shay Banon release Elasticsearch 1.0 • Half years later : Round A VC investment (1000w USD) • 2012, Elastic Inc founded • Elasticsearch versions ⁃ 2015 - 2.0 / 2016 - 5.0 / 2016 - 6.0 / 2019 - 7.0 • More than 4 billions downloads 4 © 2018 eBay. All rights reserved.
5.Elastic Stack ELKB • Elasticsearch - Search & Aggregation • Logstash - ETL • Kibana - Visualization • Beats - Data Shipper Use Cases & OOTB Solutions • Logs / Metrics • APM / Uptime • SIEM / Endpoint Security • Site Search / App Search / Enterprise • Maps 5 © 2018 eBay. All rights reserved.
6.Business Model Public Company (2018 ESTC LISTED NYSE) • More than 4 billions downloads • 7200 + subscriptions over 100 countries Business Model • Elastic v.s Splunk • Elastic open source all codes include commercial parts • Open Source / Basic / Gold / Platinum / Enterprise • SaaS v.s ON-PREM ⁃ Elastic Cloud ⁃ Amazon Analytics Service ⁃ Ali & Tencent 6 © 2018 eBay. All rights reserved.
7.Elasticsearch in eBay Use Cases / Challenges
8.Use Cases in eBay All the clusters are managed by Pronto • 100 + clusters (different SLA) • 6000 + nodes (different size of cluster) • From VM(openstack) to Container(k8s) Use Cases: • Near real time search / aggregation ⁃ OMS / SEO / Terapeak • Metrics & Logs ⁃ UFES / Ceilometer / SRE / UMP ⁃ More than 20T/day for a single cluster 8 © 2018 eBay. All rights reserved.
9.Vertical Shop & Tire Installation 9 © 2018 eBay. All rights reserved.
10.Terapeak - eCommerce Data Insights Terapeak • SAAS based tool for providing ecommerce data insights to online sellers • Acquired by ebay Tech Stack • From RMDB + SOLR to ELK • S3 and Hadoop for data staging • Spark for data ETL • Kafka for data queue • Postgres for Data Warehouse • ElasticSearch for indexing and search • ReactJS for front-end application 10 © 2018 eBay. All rights reserved.
11.UFES - Anomaly Detection for SLB Goal • Unified Front End Services - Move eBay Closer to Users so that the world shops first on eBay. UFES team built out 8 new Internet Points of Presence(POP) across the globe • Need to route traffic via UFES PoPs by replacing the Netscaler Hardware SEO Load Balancers with Envoy Proxy based Software Load Balancers. Elastic Stack • Filebeats + Kafka + Elasticsearch Clusters • Dashboard for monitoring and comparison • Anomaly Detection for SLB 11 © 2018 eBay. All rights reserved.
12.Ceilometer - IT Operation Analytics 12 © 2018 eBay. All rights reserved.
13.Challenges of Managing Clusters Fleets at Scale Integrated with eBay’s Platform & follow the standards • Configuration management & Change management • Full lifecycle management Performance & High Availability • Search: Site facing application response time should less than 100 ms • Ingesting: 20T per day for a single cluster • Different deployments, like cross region deployment Cost Control • Hardware cost • License fee (support some features like security, alert and ML) • Human resource - Support (7*24 oncall support & Rolling upgrade, etc) 13 © 2018 eBay. All rights reserved.
14.Cluster Provision & Management From VM to Container • VM(Openstack) - Puppet Foreman infrastructure - Puppet module for Elasticsearch • Container(K8s) - Operator Pattern ⁃ Deployment + Statefulset + Service Best practices & Different deployments • Important System Configuration & Best practices • Anti-Affinity (High availability) • Cross region deployment (High availability) • Hot-warm architecture (Cost saving) • LB for write / read 14 © 2018 eBay. All rights reserved.
15.Tools for Clusters Management Sizing Tool / Diagnostic Tool / Performance Testing Tool / Index Lifecycle Management Tool
16.Use Case Onboarding Capacity planning • What’s the use case and use scenarios - Data retention / active period • Performance - Index rate / search rate - Document & bulk size • Deployment & Cost Node Storage Memory CPU Network - How many nodes? Master Low Low Low Low - What’s the hardware configuration? - What kind of deployment should be used? Data Extreme High High Medium • Best practices Ingest Low Medium High Medium - Software configuration Coordinator Low Medium Medium Medium - Deployment in different Region Machine Low Extreme Extreme Medium - Keep the margin to ensure that traffic becomes large without Learning 16 © 2018 eBay. All rights reserved. performance issues
17.Sizing Tool 17 © 2018 eBay. All rights reserved.
18.Customer Support Support model • Different SLA for different use cases - Search response time should less than 100ms - Cluster should NOT be in RED • 7*24 support for tier 1 and tier 2 - SEC call / Pagerduty Support case • Cluster in RED - Node missing and replica is 0 - Dangling index • Response time - Full GC because of Machine check error (MCE) - Too many shards and fields 18 © 2018 eBay. All rights reserved.
19.Diagnostic Tool Features • Find Improper settings or usage • Job scheduler & Diagnostic report for potential issues Rules • Too many indices / Too many shards / Index have too many fields • Shard size check (20GB to 40GB) • Imbalance shards • Replica number should bigger than 0 • Node missing / Rack Id attribute missed / Minimum master • Machine check error / Server disk full • Alias & index template checking 19 © 2018 eBay. All rights reserved.
20.Performance & User Scenarios Behavior User Cases Index heavy Logging / Metrics / Security / APM Search heavy App Search / Site Search / Analytics Update heavy Caching / Systems of Record Hybrid Transaction Search Many Factors: ● Index / Shard ● Query / Scripting ● Mapping / Setting 20 © 2018 eBay. All rights reserved.
21.Performance Issues & Optimization Wildcard search Performance Optimization • Customer use beginning patterns with * and ?. • Disable swapping & give memory to the file system cache • Avoid to use * or ?. • Unset or increase the refresh interval • Disable refresh and replicas for initial loads Stopwords & Shard Size • Use auto generated Ids • Reindex with the stop words • Disable the features you do not need • Use more shards to improve the throughput • Don’t use default dynamic string mapping • Watch your shard size / shrink index • Force Merge Too many indices / shards / fields • Pre-Index data • Close or delete the unused indices • Avoid scripts • Improve the document modeling • Force-merge read-only indices • Disable the dynamic mapping • Warm up global ordinals • Replicas might help with through, but not always 21 © 2018 eBay. All rights reserved.
22.Performance Testing Tool Performance testing • Testing data • Testing scripts • Test report for analysis Web based tool • Developed based on the Gatling • Web UI to select the testing scripts and testing data • Test report for analysis 22 © 2018 eBay. All rights reserved.
23.Data Management & Optimization Backup & Restore • Snapshot lifecycle management (SWIFT as the repository ) Time series data • Benefits of using time based indices - Delete index is faster than delete by query - Use hot-warm architecture - Close indices or force-merge read-only indices • Time series data - Treapeak v.s UFES(different needs) LifeCycle management • Central policy management / Web UI / OOTB Policies 23 © 2018 eBay. All rights reserved.
24.Index Management Tool v.s Curator Function Curator Index Management Tool Elastic ILM High Availability NA YES YES Web UI NA YES YES Version Compatibility NA 2.x / 5.x / 6.x / 7.x 6.8 + Multi-Cluster N/A YES NA 24 © 2018 eBay. All rights reserved.
25.Extend Elasticsearch Capability Security / Alerting / Machine Learning for Anomaly Detection
26.Data Security Security is free starting in versions 6.8 and 7.1 • TLS for encrypted communications • File and native realm for creating and managing users • Role-based access control for controlling user access to cluster APIs and indexes 26 © 2018 eBay. All rights reserved.
27.Solution and security plugin for Elasticsearch Pronto Security Plugin • TLS for encrypted communications • Cluster / Index level RBAC control • Follow ebay’s standardard ⁃ API Key for Application ⁃ 2FA for user login ⁃ Audit logs Security Consideration • Authentication / RBAC • Certification retention • Firewall / White IP list • Vulnerability management 27 ⁃ © 2018 eBay. All rights reserved.
28.X-Pack Subscription License cost • License fee is based on the node count How to Extend • Develop the Kibana Application • Integrate with the alerting and anomaly detection service 28 © 2018 eBay. All rights reserved.
29.Alerting Service Schedule • A schedule for running a query and checking the condition. Query • The query to run as input to the condition. Watches support the full Elasticsearch query and aggregation Condition • A condition that determines whether or not to execute the actions. You can use simple conditions (always true), or use scripting for more sophisticated scenarios Action • One or more actions, such as sending email, pushing data to 3rd party systems through a webhook 29 © 2018 eBay. All rights reserved.