- 快召唤伙伴们来围观吧
- 微博 QQ QQ空间 贴吧
- 文档嵌入链接
- 复制
- 微信扫一扫分享
- 已成功复制到剪贴板
Scala集合API性能
展开查看详情
1 .Scala Collections Performance GS.com/Engineering March, 2015 Craig Motlin
2 . Who am I? • Engineer at Goldman Sachs • Technical lead for GS Collec9ons – A replacement for the Java Collec9ons Framework
3 . Goals • Scala programs ought to perform as well as Java programs, but Scala is a liDle slower
4 . Goals From RedBlackTree.scala /* * Forcing direct fields access using the @inline annotation * helps speed up various operations (especially * smallest/greatest and update/delete). * * Unfortunately the direct field access is not guaranteed to * work (but works on the current implementation of the Scala * compiler). * * An alternative is to implement the these classes using plain * old Java code... */
5 . Goals • Scala programs ought to perform as well as Java programs, but Scala is a liDle slower • Highlight a few performance problems that maDer to me • Suggest fixes, borrowing ideas and technology from GS Collec9ons
6 . Goals GS Collec9ons and Scala’s Collec9ons are similar • Mutable and immutable interfaces with common parent • Similar itera9on paDerns at hierarchy root Traversable and RichIterable • Lazy evalua9on (view, asLazy) • Parallel-‐lazy evalua9on (par, asParallel)
7 . Agenda GS Collec9ons and Scala’s Collec9ons are different • Persistent data structures • Hash tables • Primi9ve specializa9on • Fork-‐join
8 .Persistent Data Structures
9 . Persistent Data Structures • Mutable collec9ons are similar in GS Collec9ons and Scala – though we’ll look at some differences • Immutable collec9ons are quite different • Scala’s immutable collec9ons are persistent • GS Collec9ons’ immutable collec9ons are not
10 . Persistent Data Structures Wikipedia: In compu9ng, a persistent data structure is a data structure that always preserves the previous version of itself when it is modified.
11 . Persistent Data Structures Opera9ons yield new Wikipedia: updated structures when they are “modified” In compu9ng, a persistent data structure is a data structure that always preserves the previous version of itself when it is modified.
12 . Persistent Data Structures • Typical examples: List and Sorted Set
13 . Persistent Data Structures • Typical examples: List and Sorted Set • “Add” by construc9ng O(log n) new nodes from the leaf to a new root. • Scala sorted set à binary tree • GSC immutable sorted set à covered later
14 . Persistent Data Structures Wikipedia:
15 . Persistent Data Structures • Typical examples: List and Sorted Set • “Prepend” by construc9ng a new head in O(1) 9me. LIFO structure. • Scala immutable list à linked list or stack • GSC immutable list à array backed
16 . Persistent Data Structures • Extremely important in purely func9onal languages • All collec9ons are immutable • Must have good run9me complexity • No one seems to miss them in GS Collec9ons
17 . Persistent Data Structures • Proposal: Mutable, Persistent Immutable, and Plain Immutable • Mutable: Same as always • Persistent: Use when you want structural sharing • (Plain) Immutable: Use when you’re done muta9ng or when the data never mutates
18 . Persistent Data Structures Plain old immutable data structures • Not persistent (no structural sharing) • “Adding” is much slower: O(n) • Speed benefits for everything else • Huge memory benefits
19 . Persistent Data Structures • Immutable array-‐backed list • Immutable à trimmed array • No nodes à 1/6 memory • Array backed à cache locality, should parallelize well
20 . Persistent Data Structures Lists: Size in MB by number of elements 30 25 20 Size (MB) 15 10 5 0 Size (num elements) com.gs.collec9ons.api.list.ImmutableList scala.collec9on.immutable.List scala.collec9on.mutable.ArrayBuffer Measured with Java 8 64-‐bit and compressed oops
21 . Persistent Data Structures Performance assump9ons: • Itera9ng through an array should be faster than itera9ng through a linked list • Linked lists won’t parallelize well with .par • No surprises in the results – so we’ll skip github.com/goldmansachs/gs-‐collec9ons/tree/master/jmh-‐tests github.com/goldmansachs/gs-‐collec9ons/tree/master/memory-‐tests
22 . Persistent Data Structures • Immutable array-‐backed sorted set • Immutable à trimmed, sorted array • No nodes à ~1/8 memory • Array backed à cache locality
23 . Persistent Data Structures Sorted Sets: Size in MB by number of elements 45 40 35 30 Size (MB) 25 20 15 10 5 0 Size (num elements) com.gs.collec9ons.api.set.sorted.ImmutableSortedSet com.gs.collec9ons.impl.set.sorted.mutable.TreeSortedSet scala.collec9on.immutable.TreeSet scala.collec9on.mutable.TreeSet
24 . Persistent Data Structures • Assump9ons about contains: • May be faster when it’s a binary search in an array (good cache locality) • Will be about the same in mutable / immutable tree (all nodes have same structure)
25 . Persistent Data Structures Sorted Set contains() throughput (higher is beAer) 6 5 4 Million ops/s 3 2 1 0 Immutable Mutable GSC Scala
26 .Itera9on
27 . Persistent Data Structures val scalaImmutable: immutable.TreeSet[Int] = immutable.TreeSet.empty[Int] ++ (0 to SIZE) def serial_immutable_scala(): Unit = { val count: Int = this.scalaImmutable .view .filter(each => each % 10000 != 0) .map(String.valueOf) .map(Integer.valueOf) .count(each => (each + 1) % 10000 != 0) if (count != 999800) { throw new AssertionError } }
28 . Persistent Data Structures val scalaImmutable: immutable.TreeSet[Int] = immutable.TreeSet.empty[Int] ++ (0 to SIZE) Lazy evalua9on so we don’t create def serial_immutable_scala(): Unit = intermediate { collec9ons val count: Int = this.scalaImmutable .view .filter(each => each % 10000 != 0) .map(String.valueOf) .map(Integer.valueOf) .count(each => (each + 1) % 10000 != 0) if (count != 999800) { throw new AssertionError } }
29 . Persistent Data Structures val scalaImmutable: immutable.TreeSet[Int] = immutable.TreeSet.empty[Int] ++ (0 to SIZE) Lazy evalua9on so we don’t create def serial_immutable_scala(): Unit = intermediate { collec9ons val count: Int = this.scalaImmutable .view .filter(each => each % 10000 != 0) Termina9ng in .toList .map(String.valueOf) .map(Integer.valueOf) or .toBuffer would be .count(each => (each + 1) % 10000 != 0) dominated by if (count != 999800) collec9on crea9on { throw new AssertionError } }