SPEED. SCALE. COLLABORATION

COLLABORATIVE WORKBENCH.
SERVERLESS EXECUTION.
ENTIRE DATA LIFETIME.

Ingest · Prepare · ETL · Feature Generation · Model Training · Model Deployment

REDEFINING ANALYTICS IN THE CLOUD

On Premise

Data Lakes

Cloud Serverless Lake

UNIFIED ANALYTICS DESIGN LAYER

ETL · BI · ML · Notebook

Cloud Ephemeral Computer

Cloud Data Lake Storage

Complete Analytics Lifecycle

DATA INGESTION

From File or Databases

Data Ingestion allows data to be ingested from a file in cloud storage such as S3 – various file formats are supported.

You can also get the data from a database using the JDBC connector.

EXPLORE & PREPARE

Use spreadsheet view to understand your data and statistics about its quality and distribution Use prepare functions to modify data, with over 200 built-in functions and any SQL expression can be written

ETL & TRANSFORM

Use SQL Queries and SQL operators to combine and transform data into the shape you want.

We support complete SparkSQL and SQL 2003.

TRAIN ML MODELS

ML Pipeline builder allows you to build simple pipelines or pipelines with hyper-parameter tuning.

See the features you are generating immediately including statistics on the created columns

CUSTOM SCRIPTS &
NOTEBOOK PROGRAMMING

Script Node comes with an inbuilt Jupyter notebook connected to the same Spark as the workflow.

Use the notebook to experiment and generate a function.

Use the final function as a script in your workflow. Scripts can be saved and reused

DEPLOY ML MODELS

Powerful and flexible deployment as model, web service or pod. Versioning and monitoring makes it reliable. High performance with low latency and high concurrency.

DATA INGESTION

From File or Databases

Data Ingestion allows data to be ingested from a file in cloud storage such as S3 – various file formats are supported.

You can also get the data from a database using the JDBC connector.

UNDERSTAND AND PREPARE DATA

Use spreadsheet view to understand your data and statistics about its quality and distribution Use prepare functions to modify data, with over 200 built-in functions and any SQL expression can be written

ETL AND TRANSFORM DATA WITH SQL

Use SQL Queries and SQL operators to combine and transform data into the shape you want.

We support complete SparkSQL and SQL 2003.

BUILD ML PIPELINES

ML Pipeline builder allows you to build simple pipelines or pipelines with hyper-parameter tuning.

See the features you are generating immediately including statistics on the created columns

NOTEBOOK PROGRAMMING AND CUSTOM SCRIPTS

Script Node comes with an inbuilt Jupyter notebook connected to the same Spark as the workflow.

Use the notebook to experiment and generate a function.

Use the final function as a script in your workflow. Scripts can be saved and reused

DEPLOY ML MODELS

Powerful and flexible deployment as model, web service or pod. Versioning and monitoring makes it reliable. High performance with low latency and high concurrency.

CATALOG - COLLABORATE WITH VERSIONING

The Catalog allows teams to collaborate on shared projects where you can share datasets, workflows and even scripts.

Versioning ensures that no information is lost on multiple updates when collaborating

Complete Development & Deployment Lifecycle

Build Workflows

Interactive Development with incremental execution to build workflows on Spark

Execute and Schedule

Execute and Schedule workflows to runon Spark with requested resources - your workflow will run at 100GB or 10TB

Get Production Support

With us, you get a good night’s sleep. We’ll take care of any execution issues and help support SLAs that you want to deliver.

DATA PREPARATION, SQL & MACHINE LEARNING

Group 25 Created with Sketch. Window Function Cube Aggregate Select Filter Join Union All Intersect Except Distinct Limit Order by SQL ALS Logistic Regression Decision Trees Gradient Boost Trees Naive Bayes Multilayer Perceptron Probabilistic Random Forest RECOMMENDATION CLASSIFICATION LDA K Means AFT Survival Decision Tree GBT Isotonic Regression Linear Regression Random Forest CLUSTERING REGRESSION Binarizer Bucketizer ChiSquareSelector CountVectorizer DCT ElementWiseProduct HashingTF IDF IndexToString Interaction MinMaxScaler NGram Normalizer OneHotEncoder PCA PolynomialExpansion QuantileDiscretizer RegexTokenizer Rformula SQLTransformer StandardScaler StopWordsRemover StringIndexer Tokenizer VectorAssembler VectorIndexer VectorSlicer Word2Vec FEATURE GENERATION

SQL

  • Window
  • Function
  • Cube
  • Aggregate
  • Select
  • Filter
  • Join
  • Union All
  • Intersect
  • Except
  • Distinct
  • Limit
  • Order by

Recommendation

  • ALS

Classification

  • Logistic Regression
  • Decision Trees
  • Gradient Boost Trees
  • Naive Bayes
  • Multilayer Perceptron
  • Probabilistic
  • Random Forest

Clustering

  • LDA
  • K Means

Regression

  • AFT Survival
  • Decision Tree
  • GBT
  • Isotonic Regression
  • Linear Regression
  • Random Forest

Feature Generation

  • Binarizer
  • Bucketizer
  • ChiSquareSelector
  • CountVectorizer
  • DCT
  • ElementWiseProduct
  • HashingTF
  • IDF
  • IndexToString
  • Interaction
  • MinMaxScaler
  • NGram
  • Normalizer
  • OneHotEncoder
  • PCA
  • PolynomialExpansion
  • QuantileDiscretizer
  • RegexTokenizer
  • Rformula
  • SQLTransformer
  • StandardScaler
  • StopWordsRemover
  • StringIndexer
  • Tokenizer
  • VectorAssembler
  • VectorIndexer
  • VectorSlicer
  • Word2Vec

+ Your Custom Operators

Write your own custom operators in 100% Apache Spark that are unique to your business. Avoid lock-in into a custom framework.