Product

INGEST DATA
EXPLORE & PREPARE
ETL & TRANSFORM
TRAIN ML MODELS
CUSTOM SCRIPTS &
NOTEBOOK PROGRAMMING
DEPLOY ML MODELS

Complete Analytics Lifecycle

DATA INGESTION

From File or Databases

Data Ingestion allows data to be ingested from a file in cloud storage such as S3 – various file formats are supported.

You can also get the data from a database using the JDBC connector.

EXPLORE & PREPARE

Use spreadsheet view to understand your data and statistics about its quality and distribution Use prepare functions to modify data, with over 200 built-in functions and any SQL expression can be written

ETL & TRANSFORM

Use SQL Queries and SQL operators to combine and transform data into the shape you want.

We support complete SparkSQL and SQL 2003.

TRAIN ML MODELS

ML Pipeline builder allows you to build simple pipelines or pipelines with hyper-parameter tuning.

See the features you are generating immediately including statistics on the created columns

CUSTOM SCRIPTS &
NOTEBOOK PROGRAMMING

Script Node comes with an inbuilt Jupyter notebook connected to the same Spark as the workflow.

Use the notebook to experiment and generate a function.

Use the final function as a script in your workflow. Scripts can be saved and reused

DEPLOY ML MODELS

Powerful and flexible deployment as model, web service or pod. Versioning and monitoring makes it reliable. High performance with low latency and high concurrency.

CATALOG - COLLABORATE WITH VERSIONING

The Catalog allows teams to collaborate on shared projects where you can share datasets, workflows and even scripts.

Versioning ensures that no information is lost on multiple updates when collaborating

Complete Development & Deployment Lifecycle

Build Workflows

Interactive Development with incremental execution to build workflows on Spark

Execute and Schedule

Execute and Schedule workflows to runon Spark with requested resources - your workflow will run at 100GB or 10TB

Get Production Support

With us, you get a good night’s sleep. We’ll take care of any execution issues and help support SLAs that you want to deliver.

DATA PREPARATION, SQL & MACHINE LEARNING

SQL

Window
Function
Cube
Aggregate
Select
Filter
Join
Union All
Intersect
Except
Distinct
Limit
Order by

Recommendation

Classification

Logistic Regression
Decision Trees
Gradient Boost Trees
Naive Bayes
Multilayer Perceptron
Probabilistic
Random Forest

Clustering

LDA
K Means

Regression

AFT Survival
Decision Tree
GBT
Isotonic Regression
Linear Regression
Random Forest

Feature Generation

Binarizer
Bucketizer
ChiSquareSelector
CountVectorizer
DCT
ElementWiseProduct
HashingTF
IDF
IndexToString
Interaction
MinMaxScaler
NGram
Normalizer
OneHotEncoder
PCA
PolynomialExpansion
QuantileDiscretizer
RegexTokenizer
Rformula
SQLTransformer
StandardScaler
StopWordsRemover
StringIndexer
Tokenizer
VectorAssembler
VectorIndexer
VectorSlicer
Word2Vec

+ Your Custom Operators

Write your own custom operators in 100% Apache Spark that are unique to your business. Avoid lock-in into a custom framework.

SPEED. SCALE. COLLABORATION

COLLABORATIVE WORKBENCH.
SERVERLESS EXECUTION.
ENTIRE DATA LIFETIME.

REDEFINING ANALYTICS IN THE CLOUD

On Premise

Data Lakes

Cloud Serverless Lake

ETL · BI · ML · Notebook

Cloud Ephemeral Computer

Cloud Data Lake Storage

Complete Analytics Lifecycle

DATA INGESTION

EXPLORE & PREPARE

ETL & TRANSFORM

TRAIN ML MODELS

CUSTOM SCRIPTS &
NOTEBOOK PROGRAMMING

DEPLOY ML MODELS

DATA INGESTION

UNDERSTAND AND PREPARE DATA

ETL AND TRANSFORM DATA WITH SQL

BUILD ML PIPELINES

NOTEBOOK PROGRAMMING AND CUSTOM SCRIPTS

DEPLOY ML MODELS

CATALOG - COLLABORATE WITH VERSIONING

Complete Development & Deployment Lifecycle

DATA PREPARATION, SQL & MACHINE LEARNING

+ Your Custom Operators

PUBLIC BETA COMING SOON…

REDEFINING ANALYTICS IN THE CLOUD

On Premise

Data Lakes

Cloud Serverless Lake

ETL · BI · ML · Notebook

Cloud Ephemeral Computer

Cloud Data Lake Storage

Complete Analytics Lifecycle

DATA INGESTION

EXPLORE & PREPARE

ETL & TRANSFORM

TRAIN ML MODELS

CUSTOM SCRIPTS &NOTEBOOK PROGRAMMING

DEPLOY ML MODELS

DATA INGESTION

UNDERSTAND AND PREPARE DATA

ETL AND TRANSFORM DATA WITH SQL

BUILD ML PIPELINES

NOTEBOOK PROGRAMMING AND CUSTOM SCRIPTS

DEPLOY ML MODELS

CATALOG - COLLABORATE WITH VERSIONING

Complete Development & Deployment Lifecycle

DATA PREPARATION, SQL & MACHINE LEARNING

+ Your Custom Operators

CUSTOM SCRIPTS &
NOTEBOOK PROGRAMMING