This talk outlines the work that we did at Twitter to add semantic functionality to code browsing, code review and code evolution tools. We use SemanticDB - an opensource data model for semantic information developed in Scalameta. We have implemented experimental improvements to the Twitter development workflow, integrating opensource and closed-source solutions.

1.How We Built Tools That Scale to Millions of Lines of Code Eugene Burmako Twitter, Inc. 6/20/2018

2.About me ● Founder of Scala macros, Scalameta and Rsc ● Member of the Scala Improvement Process committee ● PhD from Martin Odersky’s lab at EPFL (2011-2016) ● Tech lead of the Advanced Scala Tools team at Twitter (2017-present) 2

3.Credits 3

4.Core contributors Advanced Scala Tools team at Twitter: ● Eugene Burmako ● Shane Delmore ● Uma Srinivasan 4

5.Early adopters ● Build team ● Continuous Integration team ● Code Review team ● Core Data Libraries team ● Core Systems Libraries team ● Other folks at Twitter 5


7.Problem statement 7

8.Huge codebase (ca. 2017) ● ~2^25 lines of human-written code ● ~2^16 targets 8

9.Need for semantic tooling (ca. 2017) ● Not enough to treat programs like text ● Need to understand semantics: ○ What does this identifier resolve to? ○ What are all the usages of this definition? ○ What is the type of this expression? ○ Etc etc. 9

10.Prioritized user asks (ca. 2017) ● Code browsing ● Code review ● Code evolution 10

11.State of semantic tooling (ca. 2017) ● Code browsing = IDEs, but IDEs couldn't load entire Twitter source ● Code review = Phabricator, which didn’t have Scala integration ● Code evolution = scala-refactoring, which didn’t have a maintainer ● Also, several proprietary solutions with varied Scala support 11

12.Advanced Scala Tools team ● Founded in June 2017 ● Mission: “Raise the bar on what is possible for an effective Scala development environment both at Twitter and in the Scala community” ● Roadmap: improve code browsing, code review and code evolution in the Twitter development workflow 12

13.Existing semantic APIs 13

14.Existing semantic APIs (ca. 2017) ● Scala compiler internals ● Scala.reflect (thin wrapper over compiler internals) ● ScalaSignatures (serialization format for compiler internals) 14

15.Blocker #1: Learning curve ● Compiler internals span dozens of modules and thousands of methods ● Complicated data model and arcane preconditions for the APIs ● I did a PhD in Scalac internals, but still can’t make sense of all that 15

16.Blocker #2: Scarce documentation ● Scala requires an extensive semantic API ● This requires lots and lots of documentation ● Even for scala.reflect, the documentation is significantly lagging behind 16

17.Blocker #3: Compiler instance ● Compiler internals require a compiler instance ● This means poor performance even for simple operations like “Go to definition” or “Find all usages” ● Tools that use Scala compiler internals either roll their own indexer or accept the limitations 17

18.Future semantic APIs 18

19.Future semantic APIs (ca. 2020) ● Scala.reflect is based on Scala compiler internals, so it was discarded ● Meet Tasty - serialization format for Dotty compiler internals ● Used in Dotty IDE and the upcoming Dotty macro system 19

20.library/src/scala/tasty/Tasty.scala abstract class Tasty { ... // DefDef type DefDef <: Definition implicit def defDefClassTag: ClassTag[DefDef] val DefDef: DefDefExtractor ... } 20

21.library/src/scala/tasty/Universe.scala trait Universe { val tasty: Tasty implicit val context: tasty.Context } object Universe { implicit def compilationUniverse: Universe = throw new Exception("Not in inline macro.") } 21

22.compiler/.../CompilationUniverse.scala import class CompilationUniverse(val context: Context) extends scala.tasty.Universe { val tasty: TastyImpl.type = TastyImpl } 22

23.Summary ● In its current form, Tasty looks very similar to scala.reflect, but reimplemented for Dotty ● Still based on compiler internals ● Still underdocumented ● Still requires a compiler instance 23

24.Rolling our own semantic APIs 24

25.Scalameta (ca. 2013) ● Open-source metaprogramming library ● Created almost 5 years ago during my time at EPFL ● Focused on tool writers 25

26.Scalameta (ca. 2018) ● More than 10 projects ● More than 10000 commits ● More than 200 contributors ● Funded by Twitter and Scala Center 26

27.SemanticDB ● Data model for semantic information about programs ● Focused on what tool writers need from the compiler... ● ...not on what is convenient to expose in the compiler ● Collaboration between Eugene Burmako (a compiler writer) and Ólafur Páll Geirsson (a tool writer) 27

28.Interchange format message TextDocument { Schema schema = 1; string uri = 2; string text = 3; Language language = 10; repeated SymbolInformation symbols = 5; repeated SymbolOccurrence occurrences = 6; repeated Diagnostic diagnostics = 7; repeated Synthetic synthetics = 8; } 28

29.Example object Test { def main(args: Array[String]): Unit = { println("hello world") } } 29