RapidMiner Named a Leader in the Gartner’s 2019 Magic Quadrant for Data Science and Machine Learning Platforms for Sixth Consecutive Year.
According to Gartner, Leaders should drive market transformation. They have the highest combined scores for Ability to Execute and Completeness of Vision. They are doing well and are prepared for the future with a clear vision and a thorough appreciation of the broader context of digital business. They have strong channel partners, a presence in multiple regions, consistent financial performance, broad platform support and good customer support.
Here is the complimentary Gartner report.
1. Licensed for Distribution Magic Quadrant for Data Science and Machine Learning Platforms Published 28 January 2019 - ID G00354456 - 79 min read By Analysts Carlie Idoine, Peter Krensky, Erick Brethenoux, Alexander Linden Expert data scientists, citizen data scientists and application developers require professional capabilities for building, deploying and managing analytical models. New vendors to this Magic Quadrant, along with changes in the positions of others, reﬂect a dynamic market that is evolving rapidly. Market Deﬁnition/Description This Magic Quadrant evaluates vendors of data science and machine learning (ML) platforms. These are software products that enable expert data scientists, citizen data scientists and application developers to create, deploy and manage their own advanced analytic models (see “Maximize the Value of Your Data Science Efforts by Empowering Citizen Data Scientists”). We deﬁne a data science platform as: A cohesive software application that offers a mixture of basic building blocks essential for creating all kinds of data science solution, and for incorporating those solutions into business processes, surrounding infrastructure and products. “Cohesive” means that the application’s basic building blocks are well-integrated into a single platform, that they provide a consistent “look and feel,” and that the modules are reasonably interoperable in support of an analytics pipeline. An application that is not cohesive — that mostly uses or bundles various packages and libraries — is not considered a data science and ML platform, according to our deﬁnition.
2.A data science and ML platform supports various skilled data scientists in multiple tasks across the data and analytics pipeline. These range from data ingestion, data preparation, interactive exploration and visualization and feature engineering to advanced modeling, testing and deployment. Within the ﬁeld of data science, ML is the most popular area. It is one that warrants speciﬁc attention by those evaluating these platforms. Not all organizations build all of their data science and ML models from scratch. Some may need assistance with getting started with or extending their data science and ML initiatives. Although this Magic Quadrant does assess the availability of prepackaged content, such as templates and samples, it does not assess service providers who can help jump-start or extend data science and ML application throughout an organization (as outlined in “Market Guide for Data Science and Machine Learning Service Providers”). Nor does this Magic Quadrant assess specialized vendors of industry-, domain- or function-speciﬁc solutions. Readers of this Magic Quadrant should understand that: ■ This market features a diverse range of vendors: Gartner invited a wide range of data science and ML platform vendors to participate in the evaluation process for potential inclusion in this Magic Quadrant. Users of these platforms, who include data scientists, citizen data scientists and application developers, have different requirements and preferences for user interfaces (UIs) and tools. Expert data scientists prefer to code data science models in Python or R, or to build and run data models in notebooks. Other users are most comfortable building models by using a point-and-click UI to create visual pipelines. Many members of emerging citizen data science communities favor a much more augmented approach that uses ML techniques “behind the scenes” to guide these less expert data scientists through the model building and operationalization process (see “Build a Comprehensive Ecosystem for Citizen Data Science to Drive Impactful Analytics”). Over time, expert data scientists may also come to prefer an augmented approach, which would enable them to navigate the model-building and operationalization process more efﬁciently. Tool and use case diversity is more important than ever. ■ A Leader may not be the best choice: The wide range of products available offers a breadth and depth of capability, and varied approaches to developing, operationalizing and managing models. It is therefore important to evaluate your speciﬁc needs when assessing vendors. A vendor in the Leaders quadrant, for example, might not be the best choice for you. Equally, a Niche Player might be the perfect choice. For an extensive review of the functional capabilities of each platform, see “Critical Capabilities for Data Science and Machine Learning Platforms.” Bear in mind that this Magic Quadrant includes only a small selection of the hundreds of vendors in this market.
3.■ Only vendors with commercially licensable products are included: Pure open-source platforms are excluded from this Magic Quadrant. Only commercially licensed open-source platforms are included. We do, however, recognize the growing trend for commercial platforms to use open-source libraries and content. Vendors take different approaches to including and supporting open source. Open-source solutions represent an opportunity for both users and vendors to get started with data science and ML with little upfront investment (see Note 1). In addition, many users of data science and ML platforms are either already proﬁcient in or can easily learn and apply open-source technologies. Leveraging open source through collaborative or orchestrated integration with commercial offerings also eliminates the need for a vendor to re-create speciﬁc capabilities within its own platform, as innovation is fast-paced within the open-source community. This approach enables a vendor to keep up with fast-changing algorithms and approaches, while focusing on capabilities that differentiate it from their competitors. However, a platform’s ease of use may suffer if its vendor does not account for the needs of all types of user. ■ Platforms must support not only model building but also model operationalization: The full beneﬁt — including business value — of data science and ML will not be achieved unless models are both: 1. ■ Embedded in business processes 2. ■ Maintained, monitored and managed over time The Gartner Data Science Team Survey of January 2018 found that over 60% of models developed with the intention of operationalizing them were never actually operationalized (see “How to Operationalize Machine Learning and Data Science Projects”). There are many reasons for this, but a crucial one is a lack of tools to enable and facilitate operationalization, which is not just about deployment. Operationalization extends to ongoing review and adjustment of models to ensure their relevancy over time as the business and its objectives change. It also requires ongoing management of models across the organization. ■ Artiﬁcial intelligence (AI) is hyped: Hype about AI is at its peak, but AI must be distinguished from data science and ML. Of course, data science is a core discipline for the development of AI, and ML is a core enabler of AI, but this is not the whole story. ML is about creating and training models; AI is about using those models to infer conclusions under certain conditions. AI is on a different level of aggregation to data science and ML. AI is at the application level. Data science and ML models must be combined to work together with other capabilities, such as a UI and workﬂow management, to constitute an AI application. A self-driving car, for example, has ML capability, but its AI requires much more than that.
4.The diversity of data science and ML platforms largely reﬂects the wide range of people that use them. This Magic Quadrant is therefore aimed at a variety of audiences: ■ Citizen data scientists: Increasingly, these are accessing data and building data science and ML models. They are people who need access to data science and ML capabilities, but who do not have the advanced skills of traditional expert data scientists. Citizen data scientists can come from roles such as business analyst, line of business (LOB) analyst, data engineer and application developer. They need to understand the nature of the data science and ML market, and how it differs from, but complements, the analytics and business intelligence (BI) market (see “Magic Quadrant for Analytics and Business Intelligence Platforms”). Citizen data scientists do not replace expert data scientists but, instead, work in collaboration with them. ■ Line of business (LOB) data science teams: Typically, these are sponsored by their LOB’s executive and charged with addressing LOB-led initiatives in areas such as marketing, risk management and CRM. They focus on their own and their department’s priorities. Levels of collaboration with other LOB data science teams vary. LOB data science teams can include both expert and citizen data scientists. ■ Corporate data science teams: These have strong and broad executive sponsorship, and can take a cross-functional perspective from a position of enterprisewide visibility. In addition to supporting model building, they are often charged with deﬁning and supporting an end-to- end process for building and deploying data science and ML models. They often work in partnership with LOB data science teams in multitier organizations. In addition, they might provide LOB assistance for LOB teams that do not have their own data scientists. Corporate data science teams typically include expert data scientists. ■ “Maverick” data scientists: These are typically one-off scientists in various LOBs. They tend to work independently on “point” solutions and usually strongly favor open-source tools, such as Python, R and Apache Spark. They rarely collaborate much with other data scientists or departments within their organization. Magic Quadrant
5. Figure 1. Magic Quadrant for Data Science and Machine Learning Platforms Source: Gartner (January 2019) Vendor Strengths and Cautions Alteryx Alteryx (https://www.alteryx.com/) is based in Irvine, California, U.S. It provides four software products, which comprise its data science platform. The Alteryx Analytics platform includes Alteryx Connect, Alteryx Designer, Alteryx Server and Alteryx Promote. Alteryx has turned from a Leader into a Challenger by maintaining its position for Ability to
6.Execute but demonstrating less vision relative to many other vendors in this Magic Quadrant. Nevertheless, Alteryx’s emphasis on making data science accessible to citizen data scientists and others across the end-to-end analytic pipeline is resonating in the market. Its approach provides a natural extension for a client base focused on data preparation but ready to take the next step into data science. A lack of innovation, relative to others, also contributes to Alteryx’s new position as a Challenger. Strengths ■ Collaborative enablement of broad user base: Alteryx’s no-code approach is attractive to a broad spectrum of users, from business and data analysts to citizen data scientists. A focus on the ease of use and cohesiveness of its platform enables collaboration between users. ■ End-to-end pipeline: Alteryx has focused on offering a complete, end-to-end data science platform. It has added two new products to its platform. Alteryx Connect focuses on data connections, data discovery and social connections. Alteryx Promote incorporates Alteryx’s Yhat acquisition and focuses on operationalizing analytic content. ■ Marketing execution: Alteryx’s focus on addressing the end-to-end analytic process easily and clearly positions it as a vendor of a comprehensive platform. Alteryx’s value proposition is clear and resonating. ■ Strong customer experience: Alteryx scored in the top quartile for customer experience in our survey of reference customers. Scores were consistently high for overall customer experience, plans to make additional investments, inclusion of product enhancements and requested features into subsequent releases, and overall product capabilities. Cautions ■ Data preparation legacy and market perception: Although Alteryx has built a strong brand by providing self-service data preparation, many potential customers are unfamiliar with the model-building and operationalization capabilities provided. Despite progress, the company’s “legacy” reputation continues to obscure its full value proposition. ■ Innovation: Alteryx’s innovation scores were low, relative to other vendors in this Magic Quadrant. Alteryx is not a standout vendor in terms of automation and augmentation, deep learning or the Internet of Things (IoT). ■ Market understanding: Alteryx scored below the average for market understanding. It appeals primarily to citizen data scientists and entry-level data scientists, and does not cater signiﬁcantly for cutting-edge or code-focused expert data scientists. Anaconda
7.Anaconda (https://www.anaconda.com/) is based in Austin, Texas, U.S. It offers Anaconda Enterprise 5.2, a data science development environment based on the interactive notebook concept (this analysis excludes the Conda Distribution Packages) that sees users exploiting open-source Python and R-based packages. Anaconda continues to provide a loosely coupled distribution environment, which offers access to a wide range of open-source development environments and open-source libraries, primarily Python-based. Anaconda beneﬁts from the growing popularity of Python, the newly preeminent language for data scientists. Anaconda remains a Niche Player. It still suffers from a disparity between its power to federate a very large number of Python developers, who are continuously building additional capabilities, and its lack of control over these developers’ efforts in terms of quality, dependability and predictability. Anaconda is well-suited to seasoned data scientists who are ﬂuent in Python or R and eager to explore a continuous stream of capabilities in Anaconda Cloud, while still beneﬁting from an environment more structured than a pure notebook environment. Strengths ■ Python and open-source support: The dominance of Python among data scientists gives Anaconda great visibility to developers. Anaconda is the only data science vendor not just supporting but also indemnifying and securing the Python open-source community. In the past year, the company has revamped its user interface by providing enhanced collaboration and model reproducibility features, giving data scientists better productivity and model management capabilities. ■ Active ecosystem: Reference customers praised Anaconda’s extensive and active community engagement. The community fosters cutting-edge Python code libraries and integration with other open-source data science projects. Anaconda Cloud also provides wide means of collaboration and code library exchanges, for data scientists and developers to explore and accelerate model development production, whether in the cloud or on- premises. ■ Scalable development for open-source libraries: Anaconda’s scalability takes two main forms: capabilities relating to automatic GPU code production and the ability to embed its platform seamlessly within any of the large cloud providers. Cautions ■ Designed for experts: Anaconda targets experienced data scientists familiar with Python and notebooks. Many data scientists’ favorites, including the widely popular Jupyter notebooks, are readily able for use through Anaconda’s environment. But, however ﬂexible they are, those environments are not conducive to fruitful discussions with business users — a capability to support such exchanges is increasingly valued by large organizations lacking
8. data science talent. ■ Open-source shortcomings: Like many open-source promoters, Anaconda suffers from the usual drawbacks associated with large and ﬂexible developer communities: backward compatibility issues between versions; lack of visibility into important upcoming capabilities (model operationalization, for example); lack of code optimization for models’ integration with existing applications; and, despite marked progress in terms of workbench homogeneity, a lack of overall coherence. ■ Automation and augmentation: Novice Anaconda users will have difﬁculty ﬁnding their way through the Python “jungle.” Citizen data scientists will ﬁnd themselves in uncharted territory within Anaconda’s environment. Also, the do-it-yourself skills and attitude exhibited by typical Anaconda users are not suited to ML automation practices (such as AutoML’s automation of only part of the model development process), which are increasingly popular with data scientists. Databricks Databricks (https://databricks.com/) is based in San Francisco, U.S. Its Apache Spark-based Uniﬁed Analytics Platform combines data engineering and data science capabilities that use a variety of open-source languages. In addition to Spark, the platform provides proprietary features for security, reliability, operationalization, performance and real-time enablement on Amazon Web Services (AWS). Azure Databricks, which became generally available in March 2018, is an integrated service within Microsoft Azure that provides a high-performance Apache Spark-based platform optimized for Azure. Databricks remains a Visionary by providing support for the end-to-end analytic life cycle, hybrid cloud environments and accessibility for a wide variety of users. A focus on innovation and a consistently strong and comprehensive product offering have enabled Databricks to improve its position for both Ability to Execute and Completeness of Vision. Strengths ■ Innovation: Breadth and ease of open-source integration, streaming IoT capabilities and operationalization capabilities are key differentiators for Databricks. Its platform extends on open-source capabilities by providing the framework needed for end-to-end enterprise scalability, performance and operationalization. Databricks Delta, launched in October 2017, provides a managed cloud service for uniﬁed data management with support for streaming analytics and ML. MLﬂow, launched in June 2018, includes support for experimentation, reproducibility and deployment. The Databricks Runtime for Machine Learning provides preconﬁgured clusters for deep learning.
9.■ Partnership with Microsoft: Azure Databricks has quickly gained traction within the Azure community. Azure Databricks adds global scale to Databricks’ effective marketing and sales strategy. It includes an interactive, collaborative workspace for collaboration between data scientists, data engineers and business analysts, single click-to-launch Spark environment capability, and integration with Azure components. ■ Customer appreciation: Surveyed reference customers scored Databricks in the top quartile for both customer experience and operations. Databricks received the highest overall scores for both overall vendor experience and quality of documentation. Cautions ■ Pricing and contract negotiation: Although Databricks reduces the Spark total cost of ownership (TCO) for comparable loads, Databricks’ reference customers scored it in the bottom quartile for evaluation and contract negotiation experience and for predictable, controllable pricing. Customers raised concerns about pricing and value received, relative to investment. ■ Troubleshooting and debugging: Databricks’ reference customers indicated difﬁculty with troubleshooting errors and debugging applications. They would like improved debugging aids, as well as assistance with navigating Spark errors. ■ Cost monitoring and management: Reference customers indicated a need for improved capabilities for monitoring and managing costs between different user groups and managing accounts. Databricks received the second-lowest overall score for predictable and controllable ongoing software cost. Dataiku Dataiku (https://www.dataiku.com/) is headquartered in New York City, U.S., and has a main ofﬁce in Paris, France. It offers Data Science Studio (DSS) with a focus on cross-discipline collaboration and ease of use. Dataiku’s appearance in the Challengers quadrant is principally due to its strong execution and strengthening capabilities in relation to scalability. A focus on real-time analytics capabilities and expansion of the breadth of its use cases could move Dataiku into the Leaders quadrant. Ease of use and collaboration across data science roles and between data science teams remain two of its platform’s major assets. In 2018, Dataiku has executed very well against its vision and delivered capabilities that make DSS a more mature platform. Strengths ■ Collaboration across data science roles: From its inception, teamwork has been at the core
10. of Dataiku’s DSS platform. From data engineers to data scientists, subject matter experts and citizen data scientists, all of the principal roles involved in developing ML models have their place on the platform. It offers various user interface endpoints, all pointing to the same execution core, which makes it one of the most coherent offerings in this Magic Quadrant. ■ Ease of use: The qualities of Dataiku most often highlighted by clients are that its platform is relatively easy to learn and that it provides a rapid path to productivity. Automated ML capabilities and the explicit possibility of quickly delivering interpretable models also contribute to the platform’s intuitive feel and short learning curve. This approachable quality does not prevent experts and experienced data scientists from using notebook-style development capabilities that can complement overall team productivity. ■ Operationalization capabilities: Operationalization is the area in which DSS has made the most progress in the past year. Dataiku now includes model management and monitoring as ﬂexible deployment options. Its logical data science process pipeline view, and coherence around the various roles throughout that process, form a strong foundation for more comprehensive operationalization capabilities. Cautions ■ Scalability of models in production: Despite Dataiku’s newly improved model operationalization capabilities, some clients are still looking for a more seamless, less manual experience with better visualization functions for getting models into production. Reference customers reported that even the new model deployment capabilities do not always deliver as advertised. ■ Price and licensing: Dataiku’s customers often report a suboptimum sales experience, with some considering the company pricey, some regretting a lack of pricing visibility, and some frustrated by cumbersome pricing structures. Confusion about Dataiku’s pricing and licensing has been known to delay the sales cycle. ■ Streaming and the IoT: Being focused on ease of use and collaboration across roles, Dataiku’s DSS seems favored by service-centric organizations. This could also be explained by the fact that Dataiku does not offer strong IoT capabilities and functions, even if it has shown that its platform can operate in demanding and computationally intensive environments. DataRobot DataRobot (https://www.datarobot.com/) is based in Boston, Massachusetts, U.S. It provides an augmented data science and ML platform. The platform automates key tasks, enabling data scientists to work efﬁciently and citizen data scientists to build models easily.
11.DataRobot is a new entrant to this Magic Quadrant. It debuts as a Visionary. Its Completeness of Vision is supported by strong marketing and sales strategies and innovation. Its Ability to Execute is limited by being one of the newer entrants to this market (it launched in 2013), but supported by positive customer feedback. Strengths ■ Thought leader in augmented data science: DataRobot sets the standard for augmented data science and ML. Signiﬁcant funding has enabled expansion via acquisitions to address time series modeling (Nutonian in May 2017) and an augmented approach for developers to incorporate models into applications (Nexosis in July 2018). These acquisitions give DataRobot the opportunity to extend its capabilities to new types of user, while focusing on its core competency of augmentation. ■ Strong customer experience: Reference customers scored DataRobot in the top quartile for overall experience with a vendor and in the top half for both overall product capabilities and inclusion of product enhancements and requests into subsequent releases. DataRobot’s customer-facing data scientists, assigned to each client to jump-start initiatives, provide a unique approach to supporting and onboarding clients. ■ Market responsiveness: DataRobot’s market responsiveness is strong. Despite being a relative newcomer to the data science and ML market, DataRobot has a solid installed base. In addition, the company is quickly gaining market traction. Cautions ■ Sales execution and pricing: Feedback from reference customers about DataRobot’s pricing indicates that it is high, which makes it difﬁcult to scaling the use of its software. They scored DataRobot in the lowest quartile for pricing and sales execution. Only 35% of surveyed respondents — the lowest percentage among participating vendors — strongly believe they will make additional investments in DataRobot’s software. ■ Commoditization of augmented analytics: Augmented analytics is becoming increasingly available in both data science and ML platforms and analytics and BI platforms. As these capabilities become commoditized, DataRobot will need to continue to differentiate itself. Open-source automated ML, though nascent, is also a long-term threat. ■ Growth and scalability challenges: DataRobot’s augmented capabilities are strong with regard to model creation, but do not extend across the analytic process to data preparation and operationalization. (Additional capabilities for model management were added in August 2018, after the cutoff date for consideration in this Magic Quadrant and, as such, were not evaluated.) Although customers like the promise of an augmented approach, the need for additional tools and skills for complete analyses is a concern. The market will soon
12. demand end-to-end augmented capabilities. DataRobot’s concierge service featuring customer-facing data scientists is also difﬁcult to scale. Datawatch (Angoss) Datawatch (https://www.datawatch.com/) is based in Bedford, Massachusetts, U.S. In January 2018, it acquired Angoss and its main data science product components. These include KnowledgeSEEKER, the most basic offering, aimed at citizen data scientists in a desktop context; KnowledgeSTUDIO, which includes many more models and capabilities than KnowledgeSEEKER; and KnowledgeENTERPRISE, a ﬂagship product that includes the full range of capabilities. Angoss has over two decades’ experience, and has loyal customers in services-centric and ﬁnancial services organizations in particular. Often praised for its ease of use and intuitive interface, Angoss should beneﬁt from Datawatch’s extensive experience in data management and preparation. However, the inherent risk and integration uncertainties associated with every acquisition have impacted the company’s scores for Completeness of Vision and Ability to Execute. In theory, Angoss and Datawatch could emerge as a strong combination in this market, but the integration seems to still be a work in progress, which contributes to Datawatch’s position as a Niche Player. Strengths ■ Ease of use and product coherence: Customers continue to commend Angoss’ intuitive interface and well-rounded product functionality. The platform is well-suited to citizen data scientists looking for technological depth, reliability and a quick path to productivity. But the corollary of that strength is an above-average perceived TCO. ■ Customer support: Faithful to its long history, Angoss still strives to build strong customer relationships, as does its new parent. The company’s responsiveness and genuine interest in its clients’ business outcomes is the basis for its customers’ loyalty — something that “acquisition jitters” could compromise, if not handled properly. ■ Additional analytical functionality: Angoss has one of the few platforms that offer strong additional and well-integrated capabilities, such as an optimization engine (with KnowledgeOPTIMIZER) and text analytics capabilities (through KnowledgeREADER). Cautions ■ Acquisition uncertainties: Every acquisition — even those that are thoughtful and logical (from a technology perspective) — entails signiﬁcant risks. One of the most important, in this case, could be the risk of losing critical data science talent that is difﬁcult to replace.
13. Another is that the acquisition might delay important roadmap elements that are anxiously expected by clients, such as model operationalization capabilities. Potential customers should seek reassurance about Angoss’ roadmap execution before investing in its platform. ■ Vision and innovation: Despite progress from a development perspective, Angoss is losing ground in relation to advances such as deep learning and augmented ML functionality. The company also needs to improve its support for, and integration of, open-source capabilities. ■ Data preparation: Another area where clients have been asking for better support and functionality is data preparation and management. Given Datawatch’s strong expertise, the Angoss data science platform should see signiﬁcant improvement in 2019. Domino Domino (Domino Data Lab) (https://www.dominodatalab.com/) is headquartered in San Francisco, California, U.S. The Domino Data Science Platform represents a comprehensive end- to-end solution designed for expert data scientists. The platform incorporates both open- source and proprietary tool ecosystems, while providing capabilities for collaboration, reproducibility, and centralization of model development and deployment. Domino, which was founded in 2013, has moved from the Visionaries quadrant to the Niche Players quadrant. Its market presence continues to grow, but with a focus on the expert data science community. Low customer feedback scores and reliance on many components for comprehensive capabilities contribute to its new position in the Magic Quadrant. Strengths ■ Good product strategy: Reference customers scored Domino in the top quartile for product strategy. Product bundling and conﬁguration is straightforward. The roadmap focuses on collaboration and accessibility, building a tool- and platform-agnostic ecosystem, and driving the end-to-end analytic process through to operationalization, with the goal of making data science a scalable enterprise capability. ■ Open-source and proprietary tool integration: Domino emphasizes the ability to use open- source and proprietary software within a consistent container. Breadth and ease of open- source integration are strengths. Strong open-source support enables data scientists to use their tools of choice within the platform. Again this year, Domino received the highest overall score for ﬂexibility, extensibility and openness. ■ Collaboration and scalability: Domino’s reference customers especially praised the ease of collaboration in a shared environment for project teams using various tools, in conjunction with the ability to scale up computing resources as needed.
14.Cautions ■ Narrow focus on expert data scientists: In a market that is changing quickly and increasingly focused on multiple types of user with varying skills levels, Domino’s approach of targeting expert data scientists by bringing multiple tools together in one environment reduces its platform’s reach. Additionally, its platform is less differentiated than in previous years. It caters to sophisticated audiences, but fails to connect with growing citizen user groups. ■ Poor operations support: Domino has faced challenges scaling its operation support. Relative to other vendors in this Magic Quadrant, reference customers put Domino in the lowest quartile for analytic support, including training and guidance with technique selection. They also placed Domino in the lower half for service and support and overall integration and deployment. ■ Lack of some capabilities: Domino lacks several key capabilities in terms of data access and data preparation, automation and augmentation, user interface for nonexperts and “precanned” solutions. These shortcomings make Domino’s tool hard to use for nonexpert data scientists who need a user-friendly, guided approach to data preparation and exploration, and to creating and operationalizing models. Google Google (https://www.google.com/) , a subsidiary of Alphabet, is based in Mountain View, California, U.S. Its core ML platform offerings include Cloud ML Engine, Cloud AutoML, the open-source TensorFlow, and the recently announced BigQuery ML. Its ML components require other Google components for end-to-end capabilities, such as Google Cloud Dataprep, Google Datalab, Google BigQuery, Google Cloud Dataﬂow, Google Cloud Dataproc, Google Data Studio, Kubeﬂow and Google Kubernetes Engine. Most of these components require the presence of the Google Cloud Platform (GCP). The following ML tools were not in general availability by the cutoff point for full inclusion in this evaluation: Cloud AutoML, Google Data Studio, Kubeﬂow Pipelines, AI Hub and BigQuery ML. As such, these tools could not be included in the Ability to Execute scoring, but did contribute to Google’s position for Completeness of Vision. Strengths ■ Breadth of offerings: Google offers a rich ecosystem of AI products and solutions, ranging from hardware (Tensor Processing Unit [TPU]) and crowdsourcing (Kaggle) to world-class ML components for processing unstructured data like images, video and text. Google is also one of the pioneers of automated ML (with Cloud AutoML). It excels even more with its industry-leading open-source TensorFlow offering for deep neural nets. ■ Scalability and speed: Most of Google’s ML components are meant to run at scale on its high-performing public could environment, GCP. It offers a fully managed environment in
15. which ML can be implemented at scale, which is rivaled by few other companies at this point. Google’s TensorFlow and Kubernetes are the most popular choices for many AI startups and for cutting-edge university AI and data science teams. ■ Geared toward developers: With Cloud AutoML and most of its other tools, Google offers a high-quality software solution for non-data scientists, especially software developers. GCP users familiar with GCP tools will have increasingly less reason to look elsewhere for ML capabilities. Matters will improve even further with the addition of upcoming AutoML capabilities and on-premises functionality (such as those in the Kubernetes-based Kubeﬂow). Cautions ■ Lack of an end-to-end, coherent and easy-to-use ML offering: The third strength noted above also represents the biggest drawback for two large groups of data science constituents: core data scientists who lack familiarity with GCP and (even more so) business-oriented citizen data scientists. For those users, the learning curve for some Google tools can be steep. It is very fragmented and easily becomes overwhelming, especially for the many casual data scientists who spend only 15% to 20% of their time on core ML projects. ■ Currently limited on-premises capabilities: Users who require full on-premises capabilities can only use the open-sourced TensorFlow and Google’s Kubernetes-based Kubeﬂow. Yet most other development tools and prebuilt ML components reside fully in the cloud and are therefore less suitable for the many organizations that prefer to conduct ML capabilities on- premises. ■ Low-level instrumentalization, reuse and project transparency: Even GCP developers will ﬁnd little support for the creation and reuse of long ML pipelines. The tool chain and concepts simply do not yet allow for the notion of end-to-end ML projects. Google also makes heavy use of the open-source arsenal of ML components, yet offers little or no coherent project management support. Google’s new Kubeﬂow Pipelines may address some of these deﬁciencies, but it could not be evaluated as its announcement came after the cutoff date for this Magic Quadrant. H2O.ai H2O.ai (http://h2o.ai/) is based in Mountain View, California, U.S. and offers the free open- source H2O Open-Source Machine Learning (H2O, Sparkling Water and H2O4GPU) and a commercial product called H2O Driverless AI. H2O’s core strength is its high-performing ML components, which are tightly integrated within several competing platforms evaluated in this Magic Quadrant.
16.H2O.ai has lost some ground in terms of Ability to Execute relative to other vendors in this Magic Quadrant, largely due to comparatively low scores from reference customers for several critical capabilities. Although H2O.ai’s Completeness of Vision still is strong, competitors are catching up in several key innovation areas. This has resulted in its new status as a Visionary. Strengths ■ High-performance ML components: H2O.ai’s open-source ML components are effectively an industry standard, with many other platforms integrating them (for example, those of Alteryx, Dataiku, Domino, IBM, KNIME, RapidMiner and TIBCO Software). H2O.ai’s components are highly optimized and parallelized for CPU multicore and multinode conﬁgurations. H2O4GPU offers a software layer for signiﬁcant GPU acceleration. ■ Innovation: With its deep learning layer (Deep Water), its GPU layer and automation capabilities (H2O Driverless AI), H2O.ai outpaces most of its competitors in the employment of cutting-edge technology capabilities. No other vendor in this Magic Quadrant got higher marks from reference customers in the area of product roadmap and future vision. ■ Automation: With its commercial product, H2O Driverless AI, H2O.ai established a premier position in the automated ML domain, being rivaled by only a select few. The product provides automated feature engineering, model selection and hyperparameter tuning. Driverless AI is highly scalable, but also very compute-intensive, and exports the complete pipeline as either MOJO/POJO objects or a Python-scoring pipeline. H2O.ai also offers an open-source version of augmented data science and ML, called AutoML, which seems far less powerful, but can be used from within its open-source software platform. Cautions ■ Steep learning curve for nondevelopers to use open-source offering: The open-source version of H2O is still highly notebook-centric and therefore caters more to data-science- savvy developers and coding-heavy data scientists. Usability is improved when there is a point-and-click interface, as with offerings from KNIME and RapidMiner. H2O.ai’s H2O Driverless AI product is much simpler to use, but is a distinct product from its open-source offering. ■ Little native interoperability between H2O Driverless AI and open-source platform: H2O.ai’s open-source product line is not fully interoperable with H2O Driverless AI, which is an impediment to potential collaboration by differently skilled enterprise users. ■ Lack of some product capabilities: H2O.ai’s surveyed reference customers identiﬁed deﬁciencies in critical capabilities like platform, project and model management, and a signiﬁcant lack of features for data access and preparation, compared with other platforms in this Magic Quadrant. H2O.ai retained excellent marks only in the critical capability
17. categories of ML, performance and delivery. IBM IBM (https://www.ibm.com/us-en/) is based in Armonk, New York, U.S. For this Magic Quadrant we evaluated two platforms: SPSS (including SPSS Modeler and SPSS Statistics) and Watson Studio, an offering that incorporates and builds on IBM’s previous Data Science Experience (DSX) product. IBM remains a Visionary, but has lost ground in terms of both Completeness of Vision and Ability to Execute, relative to other vendors. IBM has deﬁned a clear product strategy and roadmap for the two platforms evaluated in this Magic Quadrant, but needs to prove that its new approach can deliver consistent customer success over time. Strengths ■ Strong visibility and mind share: IBM remains a frontrunner in terms of market share, with 9.5% of the data science platform software market. It is a very visible vendor in the data science and ML market. Its strategy, focused on the complete analytic pipeline, enables both expert and citizen data scientists to be productive. ■ Comprehensive roadmap and product integration: Watson Studio and its roadmap promise to deliver extensive openness, hybrid cloud support and strong analytic capabilities for both expert and citizen data scientists across the full analytic pipeline. Watson Studio provides a new, more modern approach, while continuing not only to support, but also to extend, the capabilities of SPSS. IBM has also delivered a new interface for its SPSS products that is cleaner, more appealing and, most importantly, integrates SPSS Modeler into Watson Studio. ■ Watson Studio customer experience and operations: Reference customers for Watson Studio gave excellent scores for their overall experience. Scores were also strong for IBM’s plans to make further investments and its inclusion of requested product enhancements. Scores for Watson Studio’s service and support and integration and deployment were excellent as well, but those from SPSS customers were in the bottom quartile. Cautions ■ Further revamp of multipronged approach and evolving strategy: Watson Studio shows promise and is indicative of the direction of modern data science platforms. But in light of recent strategy shifts and rebranding, IBM needs to show consistency and long-term commitment to its strategy. ■ Multiple products required for complete capabilities: Multiple IBM products are required to obtain complete end-to-end capabilities. Multiple components potentially increase
18. complexity and cause confusion. They could also increase licensing costs. ■ Capability shortcomings across both platforms: SPSS received low scores for ﬂexibility, extensibility and openness, automation and augmentation, and collaboration. Although, overall, Watson Studio provides stronger capabilities than SPSS, capabilities for data preparation, data exploration and visualization, delivery and precanned solutions are lacking. KNIME KNIME (https://www.knime.com/) is based in Zurich, Switzerland. It provides the KNIME Analytics Platform on a fully open-source basis for free, while a commercial extension, KNIME Server, offers more advanced functions, such as team, automation and deployment capabilities. KNIME remains a Leader in this Magic Quadrant. This is largely due to strong assessments by its customers, its competitive product offerings and its vision, which is one of the most balanced in this market. Strengths ■ Well-balanced execution and vision: With a wealth of well-rounded functionality, KNIME maintains its reputation for being the market’s “Swiss Army knife.” Its for-free and open- source KNIME Analytics Platform covers 85% of critical capabilities, and KNIME’s vision and roadmap are as good as, or better than, those of most of its competitors. ■ Sophistication with clear product bundling and low TCO: Even with advanced features like ML automation, hybrid cloud, new model management (KNIME Model Process Factory) and deployment, KNIME’s product segmentation has been clear and simple for many years. This simplicity does not prevent data scientists from exploiting advanced analytics features (such as deep learning) — it simply makes that power accessible to citizen data scientists, too. ■ Ease of use by those with intermediate skills: KNIME’s platform addresses the intermediate user skills spectrum very well, and KNIME recently began to place increased emphasis on two of the other skills segments. Less skilled users will appreciate the automated ML offerings included in KNIME Server. Developers will appreciate KNIME’s forthcoming Python integration, which will offer them two-way integration via the ability to call the KNIME API from within Python/Jupyter in order to use many of the thousands of KNIME modules and solutions. ■ Low barrier to entry and TCO: As in prior years, reference customers identiﬁed their main reasons for selecting the KNIME platform as its low TCO, predictable costs and value for money.
19.Cautions ■ Performance and scalability: Although KNIME has taken signiﬁcant steps to improve its performance and scalability, these capabilities remain by far the chief customer concern in terms of overall product capabilities. KNIME received one of the lowest overall scores from customer references in this category. ■ Limited visibility as modern data science platform: KNIME does not have a reputation for offering a cutting-edge data science platform. This is due much more to its conservative marketing strategy and passive go-to-market approach than its real capabilities. Performance constraints and a reputation as a desktop tool also contribute to the platform sometimes being overlooked for large-scale data science deployments. ■ Limited traction in IoT domain: Real-time analytics and capabilities linked to IoT data often require a signiﬁcant amount of development investment. Despite having a decent breadth of use cases, KNIME’s focus appears not to have been on asset-centric industries (which account for most IoT work). As the market evolves to embrace multiple data sources, this lack of focus could impair the company’s vision. ■ Relatively small team: Gartner has reservations about KNIME’s strategy of having a relatively small team of fewer than 60 full-time equivalents. Given the growing complexity of the ML stack, a small team is unlikely to be able to cope with the massive innovation that is transforming this market. MathWorks MathWorks (http://www.mathworks.com/) is headquartered in Natick, Massachusetts, U.S. Its two major products are MATLAB and Simulink, but only MATLAB met the inclusion criteria for this Magic Quadrant. MathWorks’ move from the Challengers quadrant to the Visionaries quadrant is essentially due to the company’s remarkable strength in relation to the increasingly demanding needs of asset- centric industries. To serve this growing market, MathWorks has strengthened the coherence of the MATLAB platform for its engineering-focused audience by seamlessly integrating advanced functionality for the treatment of unconventional data sources (images, video and IoT data). Although MathWorks focuses on asset-centric industries, it also has customers in the ﬁnancial services sector. Strengths ■ Platform coherence and ease of use: Built from an engineering perspective, MATLAB offers a seamless experience, with operationalization as a fully integrated step. Mainly focused on industrial applications, MathWorks takes account of ﬁeld personnel and subject matter
20. experts’ experiences through a “citizen engineer” approach aimed at democratizing deployment of its platform. Support for this community is also secured through a rich community ecosystem. ■ Data preparation: In its analytics workﬂow, MATLAB does not explicitly separate ML and deep learning techniques, as algorithms are considered ﬁt for solving speciﬁc problems. MathWorks’ platform therefore offers sophisticated data preparation and labeling capabilities for data that will be used to create deep learning models. ■ Advanced functionality: MathWorks has integrated into MATLAB a range of advanced techniques that can be used in complex use cases. Examples are interpreted notebook experiences (through MATLAB Live Editor), pretrained deep-learning models, embedded real- time analytics, streaming capabilities and simulation techniques (through digital-twin models). Cautions ■ Lack of focus on service-centric organizations: Although nothing prevents nonengineers from using MATLAB, MathWorks’ vision for data science does not focus on marketing, sales or customer service. Data science teams whose primary focus is marketing, sales or customer service should seek alternative platforms. ■ Full cloud platform support: MATLAB offers strong support for AWS and Microsoft Azure, but still lacks comprehensive support for GCP. Although environments such as the pervasive TensorFlow are supported within MATLAB, data science teams that rely on the tight integration between TensorFlow and GCP might not beneﬁt from the performance usually attested in Google-only frameworks. ■ AutoML capabilities: MathWorks has considerably improved its access to open-source libraries and environments in the latest version of MATLAB. However, compared with some of its more visionary competitors, MathWorks still lacks a clear integration vision in relation to AutoML capabilities. Microsoft Microsoft (http://www.microsoft.com/) is based in Redmond, Washington, U.S. It provides a number of software products for data science and ML. In the cloud, it offers Azure Machine Learning (including Azure Machine Learning Studio), Azure Data Factory, Azure HDInsight, Azure Databricks and Power BI. For on-premises workloads, Microsoft offers Machine Learning Server. Only Azure Machine Learning met the inclusion criteria for this Magic Quadrant, although Microsoft’s broader offerings did inﬂuence our assessments of Azure Machine Learning’s extended capabilities and Microsoft’s Completeness of Vision.
21.Microsoft remains a Visionary, having maintained a strong commitment to breadth and ease of open-source technology integration and excellence in relation to deep learning. Azure Machine Learning is not an option for the many data science teams and use cases that require a strictly on-premises product. Strengths ■ Cloud infrastructure approach: Although a signiﬁcant number of on-premises devotees remain, more organizations are migrating to cloud and hybrid approaches to data science. Microsoft’s ﬁrst-class cloud approach with Azure Machine Learning provides a fully managed, high-performing environment. The cloud platform also offers advantages in terms of performance tuning, scalability and agile support for open-source technology. ■ Extensive component and partner offerings: The Azure ecosystem offers a wide range of components for data science use cases such as streaming analytics (Azure Stream Analytics), the IoT (Azure IoT Hub) and deep learning (Microsoft Cognitive Toolkit and various open-source frameworks). Azure Databricks offers Microsoft customers ﬁrst-class support for Apache Spark and integration with Azure tools, and several reference customers praised the platform’s automatic scaling and performance optimization. ■ Support for diverse data science personas: The Azure Machine Learning service offers a code-focused approach for expert data scientists, and Azure Machine Learning Studio offers a highly rated GUI for citizen data scientists. Azure Cognitive Services offers strong functionality and pretrained models for developers. Expert data scientists and data engineers will be drawn to the Azure Databricks offering. Further integration with Power BI brings entry-level ML to masses of business analysts. Cautions ■ Cloud-only applicability: Microsoft’s cloud-only approach with Azure Machine Learning continues to limit certain capabilities and reduces the product’s appeal to some data science teams. The Azure Machine Learning service, a hybrid product for creating and deploying models on-premises, in the cloud and on the edge, was released in December 2018 — after the cutoff date for inclusion in this evaluation. ■ Automation and augmentation: Microsoft needs to keep pace with innovators in automating and assisting with data science tasks. New and improved features will be needed to maintain a competitive position in the fast-moving citizen data science market and to appeal to expert data scientists who use automated ML to accelerate their work. New automated ML capabilities in the Azure Machine Learning service were released in December 2018 — after the evaluation period for this Magic Quadrant. ■ Coherence: Although the Azure ecosystem offers diverse tools and approaches for data
22. science, many users ﬁnd the number of components overwhelming and are frustrated by the overall user experience. RapidMiner RapidMiner (https://rapidminer.com/) is based in Boston, Massachusetts, U.S. Its platform includes RapidMiner Studio, RapidMiner Server, RapidMiner Cloud, RapidMiner Real-Time Scoring and RapidMiner Radoop. RapidMiner remains a Leader by striking a good balance between ease of use and data science sophistication. Its platform’s approachability is praised by citizen data scientists, while the richness of its core data science functionality, including its openness to open-source code and functionality, make it appealing to experienced data scientists, too. Strengths ■ Sophisticated simplicity: Features such as Auto Model, augmented analytics capabilities such as Turbo Prep, and an above-average UI make RapidMiner Studio a favorite of citizen data scientists. More advanced users appreciate the richness of RapidMiner’s functionality, including the ability to access and reuse open-source capabilities, which increases their productivity and enables them to build and manage large numbers of models. ■ Advanced features: Ease of use does not preclude the presence of power. Beyond deep learning and GPU support, RapidMiner’s platform now includes data augmentation functionality and enhanced time series features. The company has also been focusing on explainability, from both a model and an analytics process perspective. In addition to helping explain models’ behaviors, providing more transparency at the process level from development to deployment (by clearly setting out the steps of the analytical pipeline and providing the analytical logic linking those steps), enables greater cross-role collaboration. ■ Coherent end-to-end platform: Reference customers made many complimentary comments about the coherence of RapidMiner’s user experience — from its scalable repository management to its real-time scoring. Elements contributing to the continuum include RapidMiner Studio (for model development); RapidMiner Server (for sharing, collaborating on, deploying and maintaining models); RapidMiner Cloud (including repository and execution services destined to host automodeling capabilities); and RapidMiner Real-Time Scoring (introduced in 2018 to provide a low-latency model execution engine). Cautions ■ Data preparation and visualization: Turbo Prep’s introduction mostly facilitates the data preparation process for citizen data scientists, so advanced users still ﬁnd that RapidMiner’s
23. data preparation capabilities do not match the sophistication of other analytics components of the platform. Also, despite decent progress by RapidMiner with regard to data visualization offerings, users still ﬁnd they need to rely on complementary capabilities for the more powerful visualization options. ■ License and pricing models: The simpliﬁcation of RapidMiner’s pricing process over the past year has still not answered some concerns of RapidMiner’s customers who have been facing complicated pricing schemes and difﬁcult-to-navigate pricing conditions. This has slowed the rapid growth in some organizations’ adoption of the platform. ■ Model operationalization: RapidMiner’s enhanced model management and repository features have made its lack of full model operationalization capabilities even more prominent. Given the ﬂuidity of its environment, we expect RapidMiner to devote more resources to the operationalization cycle of the data science process by including functionalities such as production ensemble model monitoring and business key performance indicator (KPI) validation and monitoring. SAP SAP (http://www.sap.com/) is based in Walldorf, Germany. It offers SAP Predictive Analytics (PA). This platform has a number of components, including Data Manager for dataset preparation and feature engineering, Automated Modeler for citizen data scientists, Expert Analytics for more advanced ML, and Predictive Factory for operationalization. SAP PA is tightly integrated with SAP HANA. SAP’s data science offering is closely tied to the company’s expanding Intelligent Enterprise vision and SAP Leonardo. We considered these when assessing SAP’s Completeness of Vision, but they did not contribute to SAP’s Ability to Execute position in this Magic Quadrant. SAP remains a Niche Player due to low customer satisfaction scores, a lack of thought leadership in key innovation areas, and declining mind share in a highly competitive market. Strengths ■ Suitability for SAP-centric organizations and data science operations: Many SAP customers identify alignment with existing data and analytics investments as a key reason for choosing its platform. SAP PA is well-suited to handling very large datasets via SAP HANA and deploying models to SAP’s wide range of applications. SAP PA received excellent scores from reference customers for delivery and platform/project management and a strong score for model management. The SAP client base and ecosystem form SAP’s sizable niche. ■ Intelligent Enterprise vision: SAP’s vision of a uniﬁed ML fabric across all its applications
24. accords with the impending reality that the vast majority of business users will consume ML via applications with embedded intelligence. The SAP PA roadmap is closely tied to the company’s overall strategy for SAP Leonardo Machine Learning Foundation and establishing a new end-to-end life cycle for data science. This strategy will resonate with customers already heavily invested in SAP systems and applications, and with prospective customers seeking a way to bring AI to the masses. ■ Support for diverse data science professionals: SAP PA offers environments tailored for expert data scientists (Expert Analytics) and citizen data scientists (Automated Modeler). Business analysts among SAP’s reference customers identiﬁed automation for quick and easy prototyping as a major strength of the platform. SAP PA’s interfaces for C++ and Java will complement developers who might leverage pretrained models and AI services in the Leonardo ecosystem. Cautions ■ Product suite transition: In recent years, SAP has received consistently low scores from reference customers for most critical capabilities, and its data science offering will undergo signiﬁcant changes in the near future as SAP executes its broader AI strategy. SAP needs to catch up to its competitors in key innovation areas, such as agile support for increasingly demanded open-source technologies, deep learning, streaming and the IoT. ■ Customer experience and mind share: SAP still needs to improve aspects of its customer experience. Reference customers gave low scores for SAP’s inclusion of requested product enhancements, documentation, contract negotiation and account management. SAP continues to struggle to gain mind share for PA as dedicated data science vendors continue to disrupt this market and other large vendors develop new products. SAP appears on a low percentage of the shortlists, seen by Gartner, of those choosing a data science and ML platform, relative to other vendors in this Magic Quadrant. ■ Coherence: Although SAP PA beneﬁts from HANA’s functionality and other components, its reference customers gave it a low overall score for coherence. A fragmented toolchain is frustrating for users and results in convoluted workﬂows. Improving the coherence of the data science platform and creating a uniﬁed user experience will be crucial as SAP PA evolves within the larger SAP AI ecosystem. SAS SAS (http://www.sas.com/) is based in Cary, North Carolina, U.S. It provides many software products for analytics and data science. For this Magic Quadrant, we evaluated SAS Enterprise Miner (EM) and SAS Visual Data Mining and Machine Learning (VDMML).
25.SAS retains its long-held status as a Leader. Although the company faces threats on multiple fronts from other large vendors, maturing disruptors and open-source solutions, it retains a strong presence in the market. SAS’s Completeness of Vision is in the same class as many highly innovative competitors, but the company is falling behind in key areas such as deep learning and contributions to the open-source community. Its Ability to Execute is hampered by high and sometimes unpredictable costs, which cause existing and prospective customers to explore other options. Like other veterans of the data science market, in addition to focusing on new clients, SAS is embracing the challenge of supporting legacy customers and users while adapting to a rapidly changing landscape. Strengths ■ Incumbent market presence and trusted brand: SAS’s long market presence and considerable staying power have earned it much respect from customers. Many reference customers praised its products’ quality, stability and reliability. That solidity might have come at the expense of a few advances (such as quick adoption of open-source capabilities), but it has not prevented SAS from innovating and staying on a par with many of its newer competitors. ■ Robustness of SAS EM: SAS EM’s reliability throughout the analytics and data science life cycle is recognized throughout the market. From data ingestion and preparation to model production and deployment, the platform continues to deliver dependable results. SAS is well-placed to replicate that remarkable on-premises strength in a multicloud environment. ■ Interface and data engagement capabilities of SAS VDMML: SAS VDMML received excellent scores for user interface and data exploration and visualization. It also received strong scores for data preparation and automation and augmentation. SAS VDMML appeals to citizen data scientists as well as code-focused data scientists and developers. ■ Operational excellence: SAS’s comprehensive worldwide support infrastructure is unmatched. Customers choose SAS for its robust, enterprise-grade platform capabilities, which range from exploration to modeling and deployment. SAS also offers signiﬁcant analytic and industry expertise, which customers rely on. Reference customers gave high scores to SAS’s documentation, customer and analytic support, and overall service and support. Cautions ■ Pricing and sales execution: SAS’s reference customers gave scores for product evaluation and contract negotiation experience that were in the bottom quartile. In addition, SAS’s pricing remains a concern for existing and prospective customers — Gartner clients frequently investigate less costly alternatives. Free open-source data science platforms are
26. increasingly used, along with SAS products, as a way of controlling costs, especially for new projects. ■ Coherence: SAS’s full complement of products is complex and often confusing. Offering two platforms that are not fully interoperable and that have numerous additional components available increases confusion and complexity in terms of managing, deploying and using SAS’s products. The coexistence of SAS Viya and SAS 9 perpetuates the perception of a lack of cohesion. Although SAS has made progress in this regard, migration is still perceived as an issue for those that want to exploit Viya’s capabilities but are not currently on that architecture. ■ Unfashionable interface of SAS EM: SAS EM has not kept up with the times and, though still effective, has a dated UI. Its UI contrasts with those of modern competitors, which are more intuitive and cleaner. SAS VDMML offers a much more modern UI, representing SAS’s future direction. ■ Flexibility and additional support for open source: In its latest release, SAS has added more tools and support for open source, but customers would like to see both SAS EM and VDMML continue to extend ﬁrst-class support for open-source tools, libraries and frameworks. SAS also needs to continue to improve support for Docker and containerization. TIBCO Software TIBCO Software (https://www.tibco.com/) is based in Palo Alto, California, U.S. Through the acquisition of enterprise reporting and modern BI platform vendors (Jaspersoft and Spotﬁre), descriptive and predictive analytics platform vendors (Statistica and Alpine Data), and a streaming analytics vendor (StreamBase Systems), TIBCO has built a well-rounded and powerful analytics platform. TIBCO has moved from the Challengers quadrant to the Leaders quadrant, thanks to a well- orchestrated integration strategy that contributes to its Ability to Execute, and its efforts to keep pace with the rate of innovation in this rapidly changing market. TIBCO has distinctive skill at serving asset-centric industries. In addition to having end-to-end development and deployment capabilities, TIBCO successfully addresses the underserved data science IoT analytics domain, partly as a result of its process-centric roots. Strengths ■ Successful consolidation: On a single platform, TIBCO brings together powerful visualization capabilities, strong descriptive analytics and visionary predictive analytics features (from Statistica and Alpine Data, now rebranded as Spotﬁre Data Science). At the same time, TIBCO has maintained its platform’s necessary extensibility to open-source environments.
27. Open-source code, for example, can be developed within the platform or in an outside environment and then seamlessly integrated into the data science pipeline’s workﬂow. ■ “Connected Intelligence” and IoT: In addition to a strong set of connectors and APIs for machine data, real-time data capture and model scoring, TIBCO has invested in IoT edge analytics to give developers tools for distributing and monitoring models on edge devices and gateways. In addition, the combination of TIBCO Streaming and Statistica is a robust differentiator for TIBCO’s Connected Intelligence strategy. ■ End-to-end data science process: The overall ease of use of TIBCO’s platform (often praised by reference customers) should not overshadow the sophistication and completeness of its functionality. Visual workﬂows spanning the full data science process, from data ingestion to model management, provide a strong basis for effective collaboration by all roles (data scientists, business analysts, citizen data scientists, process engineers and so on). Cautions ■ Performance and stability: Some reference customers identiﬁed cases where the performance of TIBCO’s platform was suboptimal, and remarked that this temporarily slowed their development process. Upcoming improved integration with external cloud services, along with development of hybrid analytical workﬂows, could alleviate this problem. ■ Data management: TIBCO offers strong data access and visualization capabilities, but automated and more integrated data preparation and management capabilities should be an integral part of the platform, given its wide reach. We expect TIBCO to invest in this important functionality for its upcoming release. ■ Incomplete operationalization focus: TIBCO’s model management and deployment capabilities have improved greatly over the past year, but many gaps remain. Given TIBCO’s strength across a wide range of industries, model operationalization capabilities beyond deployment — that is, for the full governance and data science process for models in production — will be crucial. Vendors Added and Dropped We review and adjust our inclusion and exclusion criteria for Magic Quadrants as markets change. As a result of these adjustments, the mix of vendors in any Magic Quadrant may change over time. A vendor’s appearance in a Magic Quadrant one year and not the next does not necessarily indicate that we have changed our opinion of that vendor. It may be a reﬂection of a change in the market and, therefore, changed evaluation criteria, or of a change of focus by that vendor.
28.Added ■ Google ■ DataRobot ■ Datawatch (Angoss) Dropped ■ Teradata, which is revamping its data science and ML offering ■ Angoss, which was acquired by Datawatch in January 2018 Inclusion and Exclusion Criteria We made some changes to the inclusion criteria for this edition of the Magic Quadrant. The inclusion process included requirements for vendors to meet a revenue threshold and identify reference customers. A stack ranking process assessed how well products support the most typical use case scenarios for data science and ML, namely: ■ Business exploration: This is the classic scenario of “exploring the unknown” that requires extensive data preparation, exploration and visualization capabilities combining new and existing data sources and types. This scenario could also include the incorporation of “smart” capabilities to guide the data preparation, use of visualization and analysis that incorporate ML techniques “under the covers.” ■ Advanced prototyping: This scenario describes projects where data science and, especially, novel ML solutions are used to signiﬁcantly improve traditional analytic approaches. Traditional approaches can be the use of human judgment, exact solutions, decade-old heuristic approaches or traditional data mining approaches. All scenarios are considered that utilize some or all of the following: ■ Many more data sources ■ Novel analytic approaches (such as deep neural nets, ensembles and natural language processing) ■ Large-scale computing infrastructure ■ Specialized computer science and ML skills ■ Production reﬁnement: This is the scenario on which many data science teams spend the
29. majority of their time. In this scenario, the organization, having implemented several data science solutions and delivered them to the business, has shifted its focus to improving and updating the existing models. ■ Nontraditional data science: This new use case represents a movement toward incorporating capabilities into the data science and ML platform that speciﬁcally support citizen data scientists and/or developers. Gartner deﬁnes a citizen data scientist as a person who creates or generates models that use advanced diagnostic analytics or predictive and prescriptive capabilities, but whose primary job function is outside the ﬁeld of statistics and analytics. We used the following 15 critical capabilities to score the vendors’ capabilities across the four use-case scenarios: ■ Data access: How well does the product support data access across many types of data (such as tables, images, graphs, logs, time series, audio and texts)? ■ Data preparation: Does the product have a signiﬁcant array of noncoding or coding data preparation features? ■ Data exploration and visualization: Does the product allow for a range of exploratory steps, including interactive visualization? ■ Automation and augmentation: Does the product facilitate the automation of feature generation and hyperparameter tuning? ■ User interface (UI): Does the product have a coherent “look and feel” and have an intuitive interface, ideally one supporting a visual pipelining component or visual composition framework (VCF)? ■ Machine learning (ML): How broad are the ML approaches that are easily accessible and shipped (prepackaged) with the product, along with support for modern ML approaches like ensemble techniques (boosting, bagging and random forests) and modern dimension reduction schemes? ■ Other advanced analytics: How are other methods from the ﬁelds of statistics, optimization, simulation, and text and image analytics, integrated into the development environment? ■ Flexibility, extensibility and openness: How can various open-source libraries be integrated into the platform? How can users create their own functions? How does the platform work with notebooks? ■ Performance and scalability: How can desktop, server and cloud deployments be controlled?
30. How are multicore and multinode conﬁgurations utilized? ■ Delivery: How well does the platform support the ability to create APIs or containers (code, Predictive Model Markup Language [PMML], packaged apps and so on) that can be utilized for faster deployment in various business scenarios? ■ Platform/project management: What management capabilities does the platform provide (such as for security, compute resource management, governance, project or experiment organization, auditing lineage and reproducibility)? ■ Model management: What capabilities does the platform provide to monitor and recalibrate hundreds or thousands of models? This includes model-testing capabilities, such as K-fold cross-validation, train, validation and test splits, area under the curve (AUC), receiver operating characteristic (ROC), loss matrices, and testing models side-by-side (for example, champion/challenger [A/B] testing). ■ Precanned solutions: Does the platform offer “precanned” solutions (for example, cross- selling, social network analysis, fraud detection, recommender systems, propensity to buy, failure prediction and anomaly detection) that can be integrated and imported via libraries, marketplaces or galleries? ■ Collaboration: How can users with different skills work together on the same workﬂows and projects? How can projects be archived, commented on and reused? ■ Coherence: How intuitive, consistent and integrated is the platform to support an entire data analytics pipeline? The platform itself must provide metadata and integration capabilities to take the preceding 14 capabilities and provide a seamless end-to-end experience, in order to make data scientists more productive across the whole data analytics pipeline. This metacapability includes ensuring data input/output formats are standardized, wherever possible, so that components have a consistent “look and feel,” and ensuring uniﬁed terminology across the platform. The subcriteria aligned with each critical capability have been carefully reviewed and modiﬁed to realign subcapabilities with the appropriate overall capabilities and reﬂect new developments and key subcapabilities that differentiate solutions. In addition, we have adjusted weighting criteria within the deﬁned bands. We continue to place strong emphasis on platform performance, and less on market presence, responsiveness and viability. The weighting for marketing execution has been lowered to give it less emphasis, relative to market responsiveness and track record. This is important in this market as innovation is paramount, and newer, nimbler players are making their mark by transforming how people engage with data science and ML tools. The reduced weighting also reﬂects the emphasis of
31.buyers focused on securing strong product capabilities, as opposed to a vendor that performs strongly in the market. We have added a new use case for nontraditional data science to recognize the trend for offering data science and ML capabilities to nonexpert data scientists, who have traditionally not had easy access to, or interactions with, these tools. Finally, we have adjusted the weightings for each use case to reﬂect more accurately the amount of time and effort the survey respondents devote to each case. To qualify for inclusion in the Magic Quadrant, each vendor had to pass the following assessment “gates.” Gate 1: Revenue and Number of Paying Customers Three common license models were assessed, and revenues (and/or customer adoption) from each were combined (if applicable) and evaluated against the criteria below for each platform under consideration: ■ Perpetual license model: Software license, maintenance and upgrade revenue (excluding revenue from hardware and professional services) for the calendar year 2017. ■ SaaS subscription model: Annual contract value (ACV) at year-end 2017, excluding any professional services included in annual contracts. For multiyear contracts, only the contract value for the ﬁrst 12 months was used for this calculation. ■ Customer adoption: The number of active paying client organizations using the vendor’s data science and ML platform (excluding trials). To progress to the next assessment gate, vendors had to have generated revenue from data science and ML platform software licenses and technical support, for each platform under consideration, of: ■ At least $5 million in 2017 (or the closest reporting year) in combined-revenue ACV, or ■ At least $1 million in 2017 (or the closest reporting year) in combined-revenue ACV, and either ■ At least 150% year-over-year (YoY) revenue growth for 2016 to 2017, or ■ A minimum of 200 paying end-user organizations
32.Only vendors with individual platforms that passed this initial revenue requirement were considered for the second inclusion gate. Gate 2: Reference Customer Counts Vendors that satisﬁed the requirements of Gate 1 were next evaluated on the basis of the reference customers they identiﬁed. Vendors had to show signiﬁcant cross-industry and cross- geographic traction for each platform under consideration. In addition, the reference customers had to be using the latest versions of the software packages being evaluated. Cross-Industry Reference Customers Each vendor had to identify reference customers that use each platform in production environments. For a platform to be considered, 24 unique organizations were required. They had to have predictive analytics solutions in production environments and come from at least four of the following major industry segments: ■ Banking, insurance and other ﬁnancial services ■ Education and government ■ Healthcare ■ Logistics and transportation ■ Manufacturing and life sciences ■ Mining, oil and gas, and agriculture ■ Retail ■ Telecommunications, communications and media ■ Utilities ■ Other industries Cross-Region Reference Customers Among the reference customers for each vendor, there had to be at least two active customer organizations in each of the following major areas: ■ North America
33.■ European Union, including the U.K. (we also included Switzerland) ■ Rest of the world Only vendors that passed Gate 2 progressed to Gate 3. Gate 3: Product Capability Scoring Vendors were next assessed by Gartner analysts using a scoring system that measured how well their products meet the 15 critical capabilities. Product capabilities were scored as follows: 1 = rudimentary capability or capability not supported by the data science and ML platform 2 = capability partially support by the data science and ML platform 3 = capability fully supported by the data science and ML platform A product could achieve a maximum score of 45 points, given there are 15 critical capabilities. Only products that achieved at least 30 points were considered for inclusion in this Magic Quadrant. Also, because the number of vendors that can be included in a Magic Quadrant is limited, only the top 16 products continued to the detailed evaluation phase. If two or three vendors’ products tied, we included each of them, bringing the maximum number of vendors to 18. If more than three had platforms that tied, we would have used a metric incorporating internet search, Gartner search and Gartner client inquiry data to determine which vendors’ products have greater market traction and break the tie on that basis. In no case would more than 18 vendors appear in the Magic Quadrant. Approximately 70 vendors were considered for inclusion. Seventeen vendors (collectively offering 19 platforms) were selected for inclusion. Both IBM and SAS had more than one qualifying platform. Honorable Mentions The following list includes notable vendors that either did not meet the inclusion criteria or whose eligibility for inclusion we were unable to verify due to a lack of information: ■ Amazon Web Services (AWS), which offers the Amazon SageMaker platform, aimed
34. primarily at developers and data scientists. ■ Big Squid, which uses an augmented analytics approach to offering data science and ML capabilities. ■ FICO, which is a strong choice for organizations in the ﬁnancial services and other sectors, including those who rely on scorecard modeling for decision management. ■ MicroStrategy, which provides a robust analytics and BI offering that extends to data science and ML capabilities. ■ Teradata, which is revamping its data science and ML offering. ■ World Programming, which provides a software platform for data science and data engineering with drag-and-drop workﬂows and coding support for Python, R, SAS and SQL. Evaluation Criteria Ability to Execute Product/service: Core goods and services that compete in and/or serve the deﬁned market. This criterion assesses current product and service capabilities, quality, feature sets, skills and so on. These may be offered natively or through OEM agreements and partnerships, as deﬁned in the market deﬁnition and detailed in the subcriteria. Overall viability (business unit, ﬁnancial, strategy, and organization): This criterion includes an assessment of the organization’s overall ﬁnancial health, as well as the ﬁnancial and practical success of the business unit. The criterion also assesses the likelihood of the organization continuing to offer and invest in the product, as well as the product’s position in the current portfolio. Sales execution/pricing: This criterion assesses the organization’s capabilities in all presales activities and the structure that supports them. Included are deal management, pricing and negotiation, presale support and the overall effectiveness of the sales channel. Market responsiveness and track record: This criterion assesses a vendor’s ability to respond, change direction, be ﬂexible and achieve competitive success as opportunities develop, competitors act, customers’ needs evolve and market dynamics change. It also considers a vendor’s history of responsiveness to changing market demands. Marketing execution: This criterion assesses the clarity, quality, creativity and efﬁcacy of programs designed to deliver the organization’s message in order to inﬂuence the market,
35.promote a brand, increase awareness of products and establish a positive identiﬁcation in the minds of customers. This “mind share” can be driven by a combination of publicity, promotional, thought leadership, social media, referrals and sales activities. Customer experience: This criterion assesses products, services and/or programs that enable customers to achieve anticipated results with the products evaluated. Speciﬁcally, it considers the quality of supplier-buyer interactions, technical support and account support. Ancillary tools, customer support programs, availability of user groups and SLAs, among other things, may also be evaluated. Operations: This criterion assesses the organization’s ability to achieve its goals and fulﬁll its commitments. Factors considered include the quality of the organizational structure, skills, experiences, programs, systems and other vehicles that enable the organization to operate effectively and efﬁciently. Table 1: Ability to Execute Evaluation Criteria Evaluation Criteria Weighting Product or Service High Overall Viability Medium Sales Execution/Pricing Low Market Responsiveness/Record Medium Marketing Execution Low Customer Experience High Operations Medium Source: Gartner (January 2019) Completeness of Vision Market understanding: This criterion assesses a vendor’s ability to understand customers’ needs and to use that understanding to create products and services. Vendors that have a clear vision of their market, and that listen to and understand customers’ demands, can shape or enhance market changes.
36.Marketing strategy: This criterion looks for clear, differentiated messaging that is consistently communicated internally, and externalized through social media, advertising, customer programs and positioning statements. Sales strategy: This criterion looks for a sound strategy for selling that uses appropriate networks, including direct and indirect sales, marketing, service and communication networks. It also considers partners that extend the scope and depth of a vendor’s market reach, expertise, technologies, services and customer base. Offering (product) strategy: This criterion looks for an approach to product development and delivery that emphasizes market differentiation, functionality, methodology, and features as they map to current and future requirements. Innovation: This criterion looks for direct, related, complementary and synergistic layouts of resources, expertise or capital for investment, consolidation, defensive or pre-emptive purposes. Table 2: Completeness of Vision Evaluation Criteria Evaluation Criteria Weighting Market Understanding Medium Marketing Strategy Low Sales Strategy Low Offering (Product) Strategy High Business Model Not Rated Vertical/Industry Strategy Not Rated Innovation High Geographic Strategy Not Rated Source: Gartner (January 2019) Quadrant Descriptions
37.Leaders Leaders have a strong presence and signiﬁcant mind share in the data science and ML market. They demonstrate strength in depth and breadth across the full data exploration, model development and operationalization process. While providing outstanding service and support, Leaders are also nimble in responding to rapidly changing market conditions. The number of expert and citizen data scientists using Leaders’ platforms is signiﬁcant and growing. Leaders are in the strongest position to inﬂuence the market’s growth and direction. They address the majority of industries, geographies, data domains and use cases, and therefore have a solid understanding of, and strategy for, this market. Not only can they focus on executing effectively, based on current market conditions, but they also have solid roadmaps to take advantage of new developments and advancing technologies in this rapidly transforming sector. They provide thought leadership and innovative differentiation, often disrupting the market in the process. Leaders are suitable vendors for most organizations to evaluate. They should not be the only vendors evaluated, however, as other vendors might address an organization’s unique needs more precisely. Leaders provide a benchmark of high standards to which others should be compared. Challengers Challengers have an established presence, credibility, viability and robust product capabilities. They may not, however, demonstrate thought leadership and innovation to the same degree as Leaders. There are two main types of Challenger: ■ Long-established data science and ML vendors that succeed because of their stability, predictability and long-term customer relationships. They need to revitalize their vision to stay abreast of market developments and become more broadly inﬂuential and innovative. If they simply continue doing what they have been doing, their growth and market presence may be impaired. ■ Vendors established in adjacent markets, such as the analytics and BI, data and analytics service provider, and developer tool markets, which are entering the data science and ML market with solutions that extend their current platforms. These vendors provide a reasonable option not only for existing customers but also for new customers. As these vendors prove they can inﬂuence this market and provide clear direction and vision, they may develop into Leaders. But they must avoid the temptation to introduce new capabilities quickly but superﬁcially.
38.Challengers are well-placed to succeed in this market as it is currently deﬁned and are operating effectively within current market conditions. Their vision and roadmap, however, may be impaired by a lack of market understanding, excessive focus on short-term gains, strategy- and product-related inertia, and a lack of innovation. Equally, their marketing efforts, geographic presence and visibility may not be on a par with the Leaders’. Visionaries Visionaries are typically relatively small vendors or newer entrants representative of trends that are shaping, or have the potential to shape, the market. There may, however, be concerns about these vendors’ ability to keep executing effectively and to scale as they grow. They are typically not well-known in this market, and therefore often have lower momentum, relative to Challengers and Leaders. Visionaries not only have a strong vision, but also a solid supporting roadmap. They are innovative in their approach to addressing the market’s needs. Although their offerings are typically innovative and solid in terms of the capabilities they do provide, there are often gaps in these offerings’ completeness and breadth. Visionaries are worth considering because they may: ■ Represent an opportunity to jump-start an innovative initiative ■ Provide some compelling, differentiating capability that offers a competitive advantage as either a complement to, or a substitute for, existing solutions ■ Be more easily inﬂuenced with regard to their product roadmap and approach Visionaries, however, also pose a potentially riskier choice for buyers. In today’s highly competitive data science and ML market, they may also struggle to gain momentum, develop a presence, increase their market share, fulﬁll their vision and execute on their roadmap. They may also be targets for acquisition. As Visionaries mature and prove their Ability to Execute, they may eventually become Leaders. Niche Players Niche Players demonstrate strength in a particular industry or approach, or pair well with a speciﬁc technology stack. They should be considered by buyers in their particular niche. Some Niche Players demonstrate a degree of vision, which suggests they could become Visionaries. Often, however, they are struggling to make their vision compelling, relative to
39.others in the market. They are considered more followers than leaders in terms of driving and deﬁning the market. They may also be struggling to develop a track record of innovation and thought leadership that could give them the momentum to become Visionaries. Other Niche Players could become Challengers if they continue to execute in a way that increases their momentum and traction in the market. Context The data science and ML market continues to change and evolve. Market transformation is driven by three key factors, namely new commercial, cloud and open-source market entrants, evolving offerings from established players, and the expansion of enterprise use cases and use communities. Movement in the market is happening on many fronts. Once-sleeping giants are waking up. Each of the major cloud infrastructure providers — AWS, Google and Microsoft — now offer products for the data science and ML market. At the same time, established vendors continue to evolve. Teradata, for example, is revamping its data science and ML platform, and IBM has evolved its DSX platform into Watson Studio and incorporated access to SPSS. Few vendors are standing still. Instead there is an overall shift from balanced progression against both Ability to Execute and Completeness of Vision axes to a focus on innovation. This is because innovation — in products and services — is now key to survival. New, nimble competitors are also staking their claims. Vendors such as DataRobot and Big Squid, for example, have platforms geared toward citizen data scientists. Vendors are introducing augmented data science and ML platforms and capabilities to help make expert data scientists more efﬁcient and enable both groups to work more efﬁciently together across the analytic pipeline. Acquisitions also contribute to this market’s development. Oracle has acquired Datascience.com. DataRobot has acquired Nexosis, enabling application developers to take an automated approach to incorporating data science and ML into their applications. End-user organizations need to increase their engagement with this fast-paced market. They should focus on developing new use cases and applications for data science and ML applications — ones that deliver real business value and ﬁll gaps in their analytic portfolios. In addition, they should look to extend access to the market’s technologies to nontraditional roles. They must also proactively address the fresh challenges posed by the need to incorporate new technologies into business processes and to maintain and manage them over time.
40.Whether end-user organizations are just getting started with predictive and prescriptive analytics or have mature capabilities in advanced analytics, they must monitor changes in the market. This includes how vendors are developing their offerings and providing new capabilities to different kinds of user and extended use cases. Organizations should start by identifying gaps in their own portfolios and monitor vendors’ offerings, in light of their business needs. They should ﬁrst assess whether their existing analytics vendors are stepping up to the new challenges. They should consider not only data science and ML vendors, but also analytics and BI vendors, which are increasingly extending their capabilities to perform more advanced analytics. As data science and ML capabilities are increasingly adopted across enterprises, cross-departmental work is important to avoid excessive fragmentation and a lack of common standards. Otherwise, individual departments may adopt different platforms and processes — a situation that leads to operational and maintenance-related problems. To achieve fully mature advanced analytic capabilities, organizations should provide capabilities across the end-to-end analytic pipeline. The pipeline includes processes for accessing and transforming data, conducting analysis and building analytic models, operationalizing and embedding models, managing and monitoring models over time to reassess their relevancy, and adjusting models to reﬂect changes in the business environment. Organizations must assess developments in technology, processes and approaches as technologies such as deep learning, augmented analytics, continuous intelligence (for real-time analytics) and decision intelligence (for modeling and understanding all kinds of decision) become more mature, pervasive and accessible. Innovativeness and the ability to demonstrate and measure business value are key in today’s environment. Whether beginning or extending their journey in the ﬁeld of data science and ML, organizations need not travel alone. Data and analytics service providers offer guidance and a structured approach. And increasingly they provide not only services, but also software. Gartner refers to the associated convergence of service and software as “servware” (see “Take Advantage of the Disruptive Convergence of Analytic Services and Software”). Market Overview The data science and ML market is healthy and vibrant, with a broad mix of vendors offering a range of capabilities. The market is experiencing a “big bang” that is redeﬁning not only who does data science and ML, but how it is done. The market is changing in a number of ways. Several giants are waking up to this market’s potential. Other vendors are stirring things up through acquisitions and product consolidations. Still others are incorporating open-source software, partnering with other vendors in the
41.analytics pipeline, and providing capabilities that are accessible to citizen data scientists, not just experts. Much of the functionality currently delivered is not differentiated between vendors, but there is considerable scope for differentiated execution in terms of marketing, sales and operations. This situation results in many vendors being positioned around the midrange of the Magic Quadrant’s Ability to Execute axis. Vendors of freemium platforms are often failing to match the revenue growth of some aggressive startups. There is strong emphasis on innovation and visionary roadmaps, as indicated by the positioning of many vendors to the right of the Completeness of Vision axis. Though many vendors are focusing on innovation, none is clearly innovating in a way extremely different from the others. In such a dynamic market, there is still room for vendors to differentiate themselves in ways that surpass their competitors in terms of both execution and vision. Many organizations are starting data science and ML initiatives using free or low-cost open- source and public cloud service provider offerings to build up their knowledge and explore possibilities. They are then likely to adopt commercial software to tackle broader use cases and requirements for team collaboration, and to operationalize their deployment and management of models using enterprise-grade capabilities. Overall revenue from data science and ML platform software grew by 12.2% in 2017 (up from 8.3% in 2016), to represent the second-fastest-growing segment of the analytics and BI software market. The segment’s revenue for 2017 was $2.6 billion (up from $2.3 billion in 2016). Its share of the overall analytics and BI market remained the same as in 2016, at 14%. Those interested in this market should monitor and regularly assess the following developments: ■ Increasingly, different types of user have a stake in the use of data science and ML platforms. Expert data scientists remain the primary users, but the citizen data science community increasingly wants to use these tools. The variety of citizen data scientists continues to increase. They now include not only business and BI analysts, but also people from the traditional data space, such as data analysts and data engineers, as well as application developers and application engineers. The ability to collaborate and share is becoming crucial as more users — in different roles — adopt data science and ML platforms. ■ The traditional distinction between analytics and BI platforms on the one hand and data science and ML platforms on the other is blurring. More vendors in the analytics and BI sector, such as MicroStrategy, Qlik and Tableau, are offering predictive and prescriptive analytical capabilities. For their part, data science vendors are adding more robust data
42. transformation and data visualization capabilities to their platforms. ■ Although new vendors are entering the market, “legacy” vendors should not be dismissed as irrelevant — many traditional vendors in the data science and ML space are revamping and modernizing their approach. They often offer new capabilities and approaches, while enabling existing customers to continue beneﬁting from investments they have already made in a tool and the work they have done using it. ■ The open-source ecosystem and community is vibrant and growing. Open-source software enables organizations to jump-start or extend data science and ML initiatives with little upfront or additional investment. This ecosystem is accessible not only to end-user organizations. It is also open to — and supported by — vendors that additionally provide commercial platforms in the data science and ML market. ■ Algorithm building blocks are increasingly used to create models. This trend will continue as models continue to be abstracted and packaged for speciﬁc domain and industry problems. ■ Packaged models are increasingly available through APIs that can easily be integrated with, and consumed in, applications. Some cloud service APIs are being narrowly focused on speciﬁc domain and industry problems. This approach eliminates the need for organizations to build models themselves. ■ Whereas many models are developed, few are operationalized in a way that leads not only to deployment but to ongoing management and maintenance. As a result, business value is often not measured or realized. In addition, models that are not properly managed and monitored are at risk of becoming irrelevant or inaccurate as business conditions change. ■ Advanced technologies such as deep learning and decision management are increasingly available and accessible via cloud services, APIs, new tools, and integrations with existing platforms and services. The data science and ML market continues to change in unprecedented ways at unprecedented speed. Its transformation is far from over — we expect it to be a long-term process. We are witnessing many changes in data science and ML platform offerings. Modern platforms incorporate or accommodate: ■ Componentization: Many components go into creating one platform. Increasingly, componentized platforms are the norm as vendors develop their own components, use open-source software or partner with other vendors to expand their offerings. Vendors increasingly provide a heterogeneous collection of tools, as opposed to native integrations
43. within a single product. ■ Open-source acceptance: All data science and ML platforms use and incorporate open- source software, although to varying degrees. Some provide APIs to access common open- source libraries. Some build open-source capability into capabilities accessible within their own platforms. Others include the ability to use analytic artifacts created within the platform within the open-source ecosystem. Still others provide more of a wrapper for working natively with open-source tools in a consistent environment that also enables operationalization. Vendors are increasingly supporting open-source platforms and frameworks through various collaborative and orchestrated approaches. These adaptive platform approaches increase support for new capabilities and increased workloads, and reduce the need for users to switch platforms for different contexts. Using open-source software enables vendors to keep pace with new developments and tap the expertise of contributors to the open-source community. In addition, it enables more extensibility across the data science and ML ecosystem, while providing key capabilities that ensure enterprise- grade usability and management. ■ Multiple user types: Increasingly, tools address the needs of users with different skills and different levels of data science and ML knowledge. Components or capabilities to enable a broad range of users — from citizen data scientists to expert data scientists to application developers — are increasingly the norm. ■ Cohesiveness: Increased componentization and open-source incorporation creates more potential for fragmented, awkward solutions. The need to access multiple components and platforms for full, robust capabilities must be balanced against the desirability of accessing all functionality in a seamless and cohesive manner. As offerings embrace a heterogeneous environment, cohesion becomes increasingly important. The ability not only to manage multiple components but also to access them easily and seamlessly from within the platform is crucial as offerings expand to provide more — and more complex — capabilities. ■ Operationalization: Operationalization capabilities not only deploy, but also manage and maintain models over time. Operationalization is key to measuring and understanding business impact and value. It is also crucial for encouraging ongoing re-evaluation of the relevance and validity of analysis over time as business needs, priorities and conditions change. As data science and ML moves out of the lab and into the mainstream, it must be operationalized with seamless integration. Operationalization capabilities must also include explainability and versioning of models. ■ Both model and data repositories: There is a trend for providing a means of tracking and sharing both the data and the analytic artifacts generated as part of the model development and deployment process.
44.■ Collaboration: As access to data science and ML platforms becomes democratized and more types of user work together across the analytic pipeline, the need to be able to collaborate easily and seamlessly increases signiﬁcantly. As platforms become more accessible to new types of user, these products must enable people to work together and share in real time throughout the analytic development life cycle. ■ Extension into decision management: Increasingly, data science and ML platforms are extending beyond operationalization to support collaboration, which, in turn, fuels interest in decision management capabilities as analytics tools move beyond prediction to explicitly drive business decisions. ■ AI: AI frameworks enable further extension of analytic capabilities beyond traditional data science and ML capabilities. These frameworks include capabilities such as deep learning. Evidence Gartner’s assessments and commentary in this Magic Quadrant draw on the following sources: ■ Instruction manuals and documentation of selected vendors. We used these to verify platform functionality. ■ An online survey of vendors’ reference customers, conducted from July through August 2018. This survey elicited 545 responses about the reference customers’ experience with vendors’ platforms. The list of survey participants derived from information supplied by the vendors. ■ A questionnaire completed by the vendors. ■ Vendor brieﬁngs, including product demonstrations, about individual vendors’ strategy and operations. ■ An extensive RFP inquiring how each vendor delivers speciﬁc features that correspond to our 15 critical capabilities (see “Toolkit: RFP for Data Science and Machine Learning Platforms”). ■ A prepared video demonstration of how well vendors’ data science and ML platforms address speciﬁc functionality requirements across the 15 critical capabilities. ■ Interactions between Gartner analysts and Gartner clients deciding their evaluation criteria, and Gartner clients’ opinions about how successfully vendors meet these criteria. Note 1
45.Deﬁnition of an Open-Source Platform The open-source approach is becoming more common throughout the data science and ML platform market. It enables people to innovate collaboratively, each contributing their own perspective in a way that shortens time to market. The open-source approach is quickly becoming a mainstream way to introduce new capabilities. Many such capabilities are evaluated in this Magic Quadrant. The most common examples of open source in the data science and ML platform market are components. Open-source components include: ■ Open-source data, introduced by vendors such as Databricks and Microsoft ■ Open-source programming languages, such as Python and R ■ Open-source algorithm libraries, such as those found in DL4J and H2O ■ Open-source visualizations, such as D3 and Plotly ■ Open-source notebooks, such as Jupyter and Zeppelin ■ Open-source data management platforms, such as Apache Spark and Hadoop ■ Open-source frameworks, such as SparkML and TensorFlow. A platform is considered open — but not open-source — if it offers ﬂexibility and extensibility for accessing open-source components. In addition, a platform can itself be open-source, which means that its source code is made available for use or modiﬁcation. Open-source software is usually developed as a public collaboration and made freely available. However, only open-source platforms that also have commercially licensable products were eligible for inclusion in this Magic Quadrant. Evaluation Criteria Deﬁnitions Ability to Execute Product/Service: Core goods and services offered by the vendor for the deﬁned market. This includes current product/service capabilities, quality, feature sets, skills and so on, whether
46.offered natively or through OEM agreements/partnerships as deﬁned in the market deﬁnition and detailed in the subcriteria. Overall Viability: Viability includes an assessment of the overall organization's ﬁnancial health, the ﬁnancial and practical success of the business unit, and the likelihood that the individual business unit will continue investing in the product, will continue offering the product and will advance the state of the art within the organization's portfolio of products. Sales Execution/Pricing: The vendor's capabilities in all presales activities and the structure that supports them. This includes deal management, pricing and negotiation, presales support, and the overall effectiveness of the sales channel. Market Responsiveness/Record: Ability to respond, change direction, be ﬂexible and achieve competitive success as opportunities develop, competitors act, customer needs evolve and market dynamics change. This criterion also considers the vendor's history of responsiveness. Marketing Execution: The clarity, quality, creativity and efﬁcacy of programs designed to deliver the organization's message to inﬂuence the market, promote the brand and business, increase awareness of the products, and establish a positive identiﬁcation with the product/brand and organization in the minds of buyers. This "mind share" can be driven by a combination of publicity, promotional initiatives, thought leadership, word of mouth and sales activities. Customer Experience: Relationships, products and services/programs that enable clients to be successful with the products evaluated. Speciﬁcally, this includes the ways customers receive technical support or account support. This can also include ancillary tools, customer support programs (and the quality thereof), availability of user groups, service-level agreements and so on. Operations: The ability of the organization to meet its goals and commitments. Factors include the quality of the organizational structure, including skills, experiences, programs, systems and other vehicles that enable the organization to operate effectively and efﬁciently on an ongoing basis. Completeness of Vision Market Understanding: Ability of the vendor to understand buyers' wants and needs and to translate those into products and services. Vendors that show the highest degree of vision listen to and understand buyers' wants and needs, and can shape or enhance those with their added vision.
47.Marketing Strategy: A clear, differentiated set of messages consistently communicated throughout the organization and externalized through the website, advertising, customer programs and positioning statements. Sales Strategy: The strategy for selling products that uses the appropriate network of direct and indirect sales, marketing, service, and communication afﬁliates that extend the scope and depth of market reach, skills, expertise, technologies, services and the customer base. Offering (Product) Strategy: The vendor's approach to product development and delivery that emphasizes differentiation, functionality, methodology and feature sets as they map to current and future requirements. Business Model: The soundness and logic of the vendor's underlying business proposition. Vertical/Industry Strategy: The vendor's strategy to direct resources, skills and offerings to meet the speciﬁc needs of individual market segments, including vertical markets. Innovation: Direct, related, complementary and synergistic layouts of resources, expertise or capital for investment, consolidation, defensive or pre-emptive purposes. Geographic Strategy: The vendor's strategy to direct resources, skills and offerings to meet the speciﬁc needs of geographies outside the "home" or native geography, either directly or through partners, channels and subsidiaries as appropriate for that geography and market.
48.© 2019 Gartner, Inc. and/or its afﬁliates. All rights reserved. Gartner is a registered trademark of Gartner, Inc. and its afﬁliates. This publication may not be reproduced or distributed in any form without Gartner's prior written permission. It consists of the opinions of Gartner's research organization, which should not be construed as statements of fact. While the information contained in this publication has been obtained from sources believed to be reliable, Gartner disclaims all warranties as to the accuracy, completeness or adequacy of such information. Although Gartner research may address legal and ﬁnancial issues, Gartner does not provide legal or investment advice and its research should not be construed or used as such. Your access and use of this publication are governed by Gartner’s Usage Policy. Gartner prides itself on its reputation for independence and objectivity. Its research is produced independently by its research organization without input or inﬂuence from any third party. For further information, see "Guiding Principles on Independence and Objectivity." About Careers Newsroom Policies Site Index IT Glossary Gartner Blog Network Contact Send Feedback © 2018 Gartner, Inc. and/or its Affiliates. All Rights Reserved.