Scala集合API性能
展开查看详情
1.Scala Collections Performance GS.com/Engineering March, 2015 Craig Motlin
2. Who am I? • Engineer at Goldman Sachs • Technical lead for GS Collec9ons – A replacement for the Java Collec9ons Framework
3. Goals • Scala programs ought to perform as well as Java programs, but Scala is a liDle slower
4. Goals From RedBlackTree.scala /* * Forcing direct fields access using the @inline annotation * helps speed up various operations (especially * smallest/greatest and update/delete). * * Unfortunately the direct field access is not guaranteed to * work (but works on the current implementation of the Scala * compiler). * * An alternative is to implement the these classes using plain * old Java code... */
5. Goals • Scala programs ought to perform as well as Java programs, but Scala is a liDle slower • Highlight a few performance problems that maDer to me • Suggest fixes, borrowing ideas and technology from GS Collec9ons
6. Goals GS Collec9ons and Scala’s Collec9ons are similar • Mutable and immutable interfaces with common parent • Similar itera9on paDerns at hierarchy root Traversable and RichIterable • Lazy evalua9on (view, asLazy) • Parallel-‐lazy evalua9on (par, asParallel)
7. Agenda GS Collec9ons and Scala’s Collec9ons are different • Persistent data structures • Hash tables • Primi9ve specializa9on • Fork-‐join
8.Persistent Data Structures
9. Persistent Data Structures • Mutable collec9ons are similar in GS Collec9ons and Scala – though we’ll look at some differences • Immutable collec9ons are quite different • Scala’s immutable collec9ons are persistent • GS Collec9ons’ immutable collec9ons are not
10. Persistent Data Structures Wikipedia: In compu9ng, a persistent data structure is a data structure that always preserves the previous version of itself when it is modified.
11. Persistent Data Structures Opera9ons yield new Wikipedia: updated structures when they are “modified” In compu9ng, a persistent data structure is a data structure that always preserves the previous version of itself when it is modified.
12. Persistent Data Structures • Typical examples: List and Sorted Set
13. Persistent Data Structures • Typical examples: List and Sorted Set • “Add” by construc9ng O(log n) new nodes from the leaf to a new root. • Scala sorted set à binary tree • GSC immutable sorted set à covered later
14. Persistent Data Structures Wikipedia:
15. Persistent Data Structures • Typical examples: List and Sorted Set • “Prepend” by construc9ng a new head in O(1) 9me. LIFO structure. • Scala immutable list à linked list or stack • GSC immutable list à array backed
16. Persistent Data Structures • Extremely important in purely func9onal languages • All collec9ons are immutable • Must have good run9me complexity • No one seems to miss them in GS Collec9ons
17. Persistent Data Structures • Proposal: Mutable, Persistent Immutable, and Plain Immutable • Mutable: Same as always • Persistent: Use when you want structural sharing • (Plain) Immutable: Use when you’re done muta9ng or when the data never mutates
18. Persistent Data Structures Plain old immutable data structures • Not persistent (no structural sharing) • “Adding” is much slower: O(n) • Speed benefits for everything else • Huge memory benefits
19. Persistent Data Structures • Immutable array-‐backed list • Immutable à trimmed array • No nodes à 1/6 memory • Array backed à cache locality, should parallelize well
20. Persistent Data Structures Lists: Size in MB by number of elements 30 25 20 Size (MB) 15 10 5 0 Size (num elements) com.gs.collec9ons.api.list.ImmutableList scala.collec9on.immutable.List scala.collec9on.mutable.ArrayBuffer Measured with Java 8 64-‐bit and compressed oops
21. Persistent Data Structures Performance assump9ons: • Itera9ng through an array should be faster than itera9ng through a linked list • Linked lists won’t parallelize well with .par • No surprises in the results – so we’ll skip github.com/goldmansachs/gs-‐collec9ons/tree/master/jmh-‐tests github.com/goldmansachs/gs-‐collec9ons/tree/master/memory-‐tests
22. Persistent Data Structures • Immutable array-‐backed sorted set • Immutable à trimmed, sorted array • No nodes à ~1/8 memory • Array backed à cache locality
23. Persistent Data Structures Sorted Sets: Size in MB by number of elements 45 40 35 30 Size (MB) 25 20 15 10 5 0 Size (num elements) com.gs.collec9ons.api.set.sorted.ImmutableSortedSet com.gs.collec9ons.impl.set.sorted.mutable.TreeSortedSet scala.collec9on.immutable.TreeSet scala.collec9on.mutable.TreeSet
24. Persistent Data Structures • Assump9ons about contains: • May be faster when it’s a binary search in an array (good cache locality) • Will be about the same in mutable / immutable tree (all nodes have same structure)
25. Persistent Data Structures Sorted Set contains() throughput (higher is beAer) 6 5 4 Million ops/s 3 2 1 0 Immutable Mutable GSC Scala
26.Itera9on
27. Persistent Data Structures val scalaImmutable: immutable.TreeSet[Int] = immutable.TreeSet.empty[Int] ++ (0 to SIZE) def serial_immutable_scala(): Unit = { val count: Int = this.scalaImmutable .view .filter(each => each % 10000 != 0) .map(String.valueOf) .map(Integer.valueOf) .count(each => (each + 1) % 10000 != 0) if (count != 999800) { throw new AssertionError } }
28. Persistent Data Structures val scalaImmutable: immutable.TreeSet[Int] = immutable.TreeSet.empty[Int] ++ (0 to SIZE) Lazy evalua9on so we don’t create def serial_immutable_scala(): Unit = intermediate { collec9ons val count: Int = this.scalaImmutable .view .filter(each => each % 10000 != 0) .map(String.valueOf) .map(Integer.valueOf) .count(each => (each + 1) % 10000 != 0) if (count != 999800) { throw new AssertionError } }
29. Persistent Data Structures val scalaImmutable: immutable.TreeSet[Int] = immutable.TreeSet.empty[Int] ++ (0 to SIZE) Lazy evalua9on so we don’t create def serial_immutable_scala(): Unit = intermediate { collec9ons val count: Int = this.scalaImmutable .view .filter(each => each % 10000 != 0) Termina9ng in .toList .map(String.valueOf) .map(Integer.valueOf) or .toBuffer would be .count(each => (each + 1) % 10000 != 0) dominated by if (count != 999800) collec9on crea9on { throw new AssertionError } }