0% found this document useful (0 votes)

89 views16 pages

Dataiku Datsheet

Dataiku-datsheet

Uploaded by

Milind Dhobe

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

89 views16 pages

Dataiku Datsheet

Dataiku-datsheet

Uploaded by

Milind Dhobe

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 16

Data Sheet

2021
Everyday AI,
Extraordinary People
Everyday AI, Extraordinary People

Dataiku is the platform for Everyday AI, systemizing the use of data for
exceptional business results. Organizations that use Dataiku elevate their
people (whether technical and working in code or on the business side
and low- or no-code) to extraordinary, arming them with the ability to
make better day-to-day decisions with data.

More than 450 companies worldwide use Dataiku to systemize their

use of data and AI, driving diverse use cases from fraud detection to
customer churn prevention, predictive maintenance to supply chain
optimization, and everything in between.
Connectivity
Dataiku allows you to seamlessly connect to your data no matter
where it’s stored or in what format. That means easy access for
everyone — whether technical or not — to the data they need.

SQL Databases
☑ MySQL Streaming Data Sources
☑ PostgreSQL ☑ Kafka
☑ Vertica ☑ AWS SQS
☑ Amazon Redshift ☑ Spark
☑ Pivotal Greenplum
☑ Teradata
☑ IBM Netezza Remote Data Sources
☑ SAP HANA
☑ FTP
☑ Oracle
☑ SCP
☑ Azure Synapse
☑ SFTP
☑ Google BigQuery
☑ HTTP
☑ Google Cloud SQL
☑ IBM DB2
☑ Exasol Cloud Object Storage
☑ MemSQL
☑ Amazon S3
☑ Snowflake
☑ Google Cloud Storage
☑ Custom connectivity through JDBC
☑ Azure Blob Storage
☑ Azure Data Lake Store Gen1 & Gen2
NoSQL Databases
☑ MongoDB Custom Data Sources - Extended Connectivity
☑ Cassandra Through Dataiku Plugins
☑ ElasticSearch
☑ Connect to REST APIs
☑ Create custom file formats
Hadoop & Spark Supported Distributions ☑ Connect to databases
☑ Cloudera
☑ Hortonworks Optimized Sync Between:
☑ Google DataProc
☑ MapR ☑ Snowflake and WASB
☑ Amazon EMR ☑ S3 and Amazon Redshift
☑ DataBricks ☑ Snowflake and S3

Hadoop File Formats Native Support for Snowflake

in Spark Driver
☑ CSV
☑ Parquet
☑ ORC
☑ SequenceFile
☑ RCFile

3
Data Sheet - Dataiku
Exploratory Analytics
Sometimes you need to do a deep dive on your data, but other times,
it’s important to understand it at a glance. From exploring available
datasets to dashboarding, Dataiku makes this type of analysis easy.

Data Analysis Data Cataloging

• Automatically detect dataset schema • Search for data, comments, features,

and data types or models in a centralized catalog

• Assign semantic meanings to your datasets’ • Explore data from all your existing
columns connections

• Build univariate statistics automatically

& derive data quality checks Data Visualization

• Dataset audit • Create standard charts (histogram,

☑ Automatically produce data quality and statistical bar charts, etc) and scale charts’ computation
analysis of entire Dataiku datasets’ by leveraging underlying systems (in-database
☑ Support of several backends for audit (in-memory, aggregations)
Spark, SQL)
• Create custom charts using
Advanced Analysis ☑ Custom Python-based or R-based Charts
☑ Custom Web Applications (HTML/JS/CSS/Flask)
☑ Shiny Web Application (R)
• Interactive visual statistics ☑ Bokeh and Dash Web Applications (Python)
☑ Univariate analysis and statistical tests on single or
multiple populations.
☑ Statistics and tests on multiple populations Dashboarding
☑ Correlations analysis
☑ Principal Components Analysis
• User-managed reports and dashboards
• Leverage predefined Python-based Jupyter
☑ RMarkdown reports
Notebooks
☑ Jupyter Notebooks reports
☑ All analysis supported in Visual Statistics ☑ Custom Insights (GGplot, Plotly, Matplotlib)
☑ High dimensional data visualization (t-SNE) ☑ Custom interactive, web-based visualizations
☑ Topic modeling

• Time series
☑ Time series data prep with visual recipes for
resampling, windowing, extrema extraction, interval
extraction
☑ Time series visualization
☑ Time series forecasting

4
Data Sheet - Dataiku
Data Preparation
Traditionally, data preparation takes up to 80% of the time of a data
project. But Dataiku’s data prep features make that process 10x faster
and easier, which means more time for more impactful (and creative)
work.

Visual Data Transformation Dataset Sampling

• Design your data transformation jobs using • First

records, random selection, stratified
a point-and-click interface sampling, etc.
☑ Group
☑ Filter
☑ Sort Interactive Data Preparation
☑ Stack
☑ Join • Processors(90 built-in from simple text
☑ Fuzzy Join
☑ Window
processing to custom Python- or formula-based
☑ Sync transformations)
☑ Distinct
☑ Top-N • Scaledata preparation scripts using
☑ Pivot in-database (SQL) or in-cluster (Spark) processing
☑ Split

• Scale your transformations by running them

directly in distributed computations systems
(SQL, Hive, Spark, Impala)

• See and tune the underlying code generated for

the task

5
Data Sheet - Dataiku
Machine Learning
Dataiku offers the latest machine learning technologies all in one place
so that data scientists can focus on what they do best: building and
optimizing the right model for the use case at hand.

Automated Machine Learning (AutoML)

• Automated ML strategies • Algorithms

☑ Quick prototypes ☑ Python-based
☑ Interpretable models + Ordinary Least Squares
☑ High performance + Ridge Regression
+ Lasso Regression
• Features handling for machine learning + Logistic regression
+ Random Forests
☑ Support for numerical, categorical, text and vector + Gradient Boosted Trees
features + XGBoost
☑ Automatic preprocessing of categorical features (Dummy + Decision Tree
encoding, impact coding, hashing, custom preprocessing, + Support Vector Machine
etc.) + Stochastic Gradient Descent
☑ Automatic preprocessing of numerical features (Standard + K Nearest Neighbors
scaling, quantile-based binning, custom preprocessing, + Extra Random Trees
etc.) + Artificial Neural Network
☑ Automatic preprocessing of text features (TF/IDF, Hashing + Lasso Path
trick, Truncated SVD, Custom preprocessing) + Custom Models offering scikit-learn compatible API’s
☑ Various missing values imputation strategies (ex: LightGBM)
+ Features generation ☑ Spark MLLib-based
◊ Feature-per-feature derived variables (square, square root…) + Logistic Regression
◊ Linear and polynomial combinations + Linear Regression
+ Features selection + Decision Trees
◊ Filter and embedded methods + Random Forest
+ Gradient Boosted Trees
• Choose between several ML backends + Naive Bayes
to train your models + Custom models

☑ TensorFlow ☑ H20-based
☑ Keras + Deep Learning
☑ Scikit-learn + GBM
☑ XGBoost + GLM
☑ MLLib + Random Forest
☑ H2O + Naive Bayes

6
Data Sheet - Dataiku
Machine Learning
Dataiku offers the latest machine learning technologies all in one place
so that data scientists can focus on what they do best: building and
optimizing the right model for the use case at hand.

Automated Machine Learning (AutoML)

• Hyperparameters optimization ☑ Audit model performances

+ Confusion matrix
☑ Freely set and search hyperparameters + Decision chart
☑ Support for grid, random, and Bayesian + Lift chart
hyperparameter optimization and search + ROC curve
☑ Cross validation strategies + Probabilities distribution chart
+ Support for several train/test splitting policies (incl. + Detailed Metrics (Accuracy, F1 Score, ROC-AUC Score, MAE,
custom) RMSE, etc.)
+ K-Fold cross testing
+ Optimize model tuning on several metrics (Explained • Automatically create ensemble from several
Variance Score, MAPE, MAE, MSE, Accuracy, F1 Score, Cost
matrix, AUC, etc.) models
☑ Interrupt and resume grid search ☑ Linear stacking (for regression models) or logistic
☑ Visualize grid search results stacking (for classification problems)
☑ Auto-recalibration on the predicted probabilities ☑ Prediction averaging or median (for regression
☑ Distributed hyperparameter search on Kubernetes problems)
☑ Majority voting (for classification problems)
• Analyzing model training results
☑ Get insights from your model • Scoring capabilities
+ Scored data
☑ Real-time serverless scoring API
+ Features importance
☑ Distributed batch with Spark
+ Model parameters
☑ SQL (in-database scoring)
+ Partial dependence plots
☑ Dataiku built-in engine
+ Regression coefficients
+ Bias and performance analysis on subpopulations
• Model export
+ Individual prediction explanations
+ Model fairness report ☑ Export trained models as a set of Java classes for
+ Interactive scoring (what-if analysis) extremely efficient scoring in any JVM application.
+ ML diagnostics ☑ Export a trained model as a PMML file for scoring with
+ Model assertions any PMML-compatible scorer
☑ Publish training results to Dataiku Dashboards
• Automated model documentation
☑ Leverage pre-built templates or create your own for
standardized model documentation without the
manual work

7
Data Sheet - Dataiku
Machine Learning
Dataiku offers the latest machine learning technologies all in one place
so that data scientists can focus on what they do best: building and
optimizing the right model for the use case at hand.

Model Deployment Deep Learning

• Model versioning • Support for Keras with Tensorflow backend

• Batch scoring • User-defined model architecture
• Real-time scoring • Personalize training settings
☑ Expose your models through REST API’s for real-time • Support for multiple inputs for your models
scoring by other applications
• Support for CPU and GPU
• Expose arbitrary functions and models through • Support pre-trained models
REST API’s
• Extract features from images
☑ Write custom R, Python or SQL based functions or
models
• Tensorboard integration
☑ Automatically turn them into API endpoints for
operationalization
Unsupervised Learning
• Easily manage all your model deployments
☑ One-click deployment of models
• Automated features engineering
(similar to supervised learning)
• Docker & Kubernetes • Optional dimensionality reduction
☑ Deploy models into Docker containers for • Outliers detection
operationalization
☑ Automatically push images to Kubernetes clusters for
• Algorithms
high scalability ☑ K-means
☑ Works “out of the box” with Spark on Kubernetes ☑ Gaussian Mixture
☑ Agglomerative clustering
• Model monitoring mechanism ☑ Spectral clustering
☑ DBSCAN
☑ Control model performances over time ☑ Interactive clustering (two-step clustering)
☑ Data drift detection ☑ Isolation forest (anomaly detection)
☑ Automatically retrain models in case of performance ☑ Custom Models
drift
☑ Customize your retraining strategies
Model Training
• Logging
• Train models over Kubernetes
☑ Log and audit all queries sent to your models

8
Data Sheet - Dataiku
Output Features
After all the work of finding insights, it’s important to effectively
communicate them with stakeholders around the organization to inspire
action. Dataiku puts the power of AI in the hands of everyone to make
intelligence-driven decisions.

Charts Dashboards
☑ Bar, line, curve, pie, donut, scatter, boxplot,
☑ Create interactive insights with charts, tables, notebook
2D distribution, lift
exports, webapps and more
☑ Maps: Scatter, binned, administrative
☑ Tables
Dataiku WebApps
Dataiku Applications ☑ Use code to build highly customized applications that can be
leveraged as an API for users
☑ Create user-friendly interfaces on top of projects for users to ☑ Supports R-Shiny, Dash Plotly, Bokeh, HTML, CSS, JS, and
customize and parametrize in a few clicks and without code Flash
☑ Share applications on Dataiku as a recipe or as an API

9
Data Sheet - Dataiku
Automation Features
When it comes to streamlining and automating workflows, Dataiku
allows data teams to put the right processes in place to ensure models
are properly monitored and easily managed in production.

Data Flow Scenarios

☑ Keep track of the dependencies between your datasets
☑ Manage the complete data lineage ☑ Trigger the execution of your data flows and applications on
☑ Check consistency of data, schemas or data types a scheduled or event-driven basis
☑ Organize flows into zones ☑ Create complete custom execution scenarios by assembling
a set of actions to do (steps)
☑ Leverage built-in steps or define your own steps through
Partitioning a Python API
☑ Publish the results of the scenarios to various channels
through Reporters (Send emails with custom templates;
☑ Leverage HDFS or SQL partitioning mechanisms to optimize
attach datasets, logs, files, or reports to your Reporters; send
computation time
notifications to Slack or Hipchat)

Metrics & Checks Automation Environments

☑ Create Metrics assessing data consistency and quality ☑ Use dedicated Dataiku Automation nodes for production
☑ Adapt the behavior of your data pipelines and jobs based on pipelines
Checks against these Metrics ☑ Connect and deploy on production systems (data lakes,
☑ Leverage Metrics and Checks to measure potential ML databases)
models drift over time ☑ Activate, use or revert multiple Dataiku project bundles

Monitoring

☑ Track the status of your production scenarios

☑ Visualize the success and errors of your Dataiku jobs

10
Data Sheet - Dataiku
Code
Work in the tools and with the languages you already know (even in
Dataiku) — everything can be done with code and fully customized.
And for tasks where it’s easier to use a visual interface, Dataiku
provides the freedom to switch seamlessly between the two.

Support of Multiple Languages

for Coding “Recipes” • Create reusable custom components
☑ Dataiku Plugins to package and ship complex code-
☑ Python ☑ Hive ☑ PySpark based functions in a visual interface to less-technical
☑ R ☑ Impala ☑ SparkR users
☑ SQL ☑ Spark Scala ☑ Sparklyr ☑ Extend native Dataiku capabilities through code-based
☑ Shell ☑ Spark SQL Plugins (Custom connectors, custom data preparation
processor, custom web applications for interactive
analysis and visualization, etc.)
☑ Create Python-based custom steps for your Dataiku
Create and Use Custom Code Environments recipes and scenarios

☑ Support for multiple versions of Python (2.7, 3.4, 3.5, • APIs

3.6)
☑ Support for Conda ☑ Manage the Dataiku platform through CLI or Python
☑ Install R and Python libraries directly from Dataiku’s SDK
interface ☑ Train and deploy ML models programmatically
☑ Open environment to install any R or Python libraries ☑ Expose custom Python & R functions through REST
API’s
☑ Manage packages dependencies and create
reproducible environments • Leverage your favorite IDE to develop and test
code
Scale Code Execution
☑ RStudio for R code
☑ Scale your code by submitting Python or R jobs to ☑ Sublime Text
Kubernetes cluster, either on-premises or through ☑ VS Code
cloud services (EKS, AKS, GKE) ☑ PyCharm

• Interactive Notebooks for data scientists

☑ Full integration of Jupyter notebooks with Python, R or
PySpark kernels
☑ Use pre-templated Notebooks to speed up your work
☑ Interactively query databases or data lakes through
SQL Notebooks (support for Hive)
☑ Run Jupyter Notebooks over Kubernetes

• Python & R Libraries

☑ Create your own R or Python libraries or helpers
☑ Share them within all the Dataiku instance
☑ Easily use your pre-existing code assets
☑ Benefit from Git integration to streamline development
workflow

11
Data Sheet - Dataiku
Collaboration
Dataiku was designed from the ground up with collaboration in mind.
From knowledge sharing to change management to monitoring, data
teams — including scientists, engineers, analysts, and more — can
work faster and smarter together.

Shared Platform (for Data Scientists,

Data Engineers, Analysts, etc.)

Version Control

☑ Git-based version control recording all changes made

in Dataiku

Knowledge Management and Sharing

☑ Create and export Wikis to document projects

☑ Engage with other users of the platform through
Discussions
☑ Tag, comment and favorite any Dataiku objects

Team Activity Monitoring

• Global search to quickly find all project assets,

plugins, wiki, reference docs, etc.

• Share custom, code-based capabilities with

less-technical users in a visual interface
• Shared code-based components
☑ Distribute reusable code snippets for all users
☑ Package arbitrary complex function, operation or
business logic to be used by less-technical users
☑ Integrate with remote Git repositories such as Github

12
Data Sheet - Dataiku
Governance & Security
Dataiku makes data governance easy, bringing enterprise-level security
with fine-grained access rights and advanced monitoring for admins or
project managers.

User Profiles
• Resources management
• Role-based access
(fine-grained or custom) ☑ Dynamically start and stop Hadoop clusters from
Dataiku
☑ Control server resources allocation directly from the
• Authentication management
user interface
☑ Use SSO systems
☑ Connect to your corporate database (LDAP, Active • Platform management
Directory…) to manage users and groups
☑ Integrate with your corporate workload management
• Enterprise-grade security tools using Dataiku CLI and APIs

☑ Track and monitor all actions in Dataiku using an audit Custom policy framework for data protection
trail
☑ Authenticate against Hadoop clusters and databases
and external regulations compliance
through Kerberos
☑ Implement GDPR rules and processes directly
☑ Supports users impersonation for full traceability and
☑ Framework capabilities
compliance
◊ Document data sources with sensitive information, and
enforce good practices
◊ Restrict access to projects and data sources with sensitive
information
◊ Audit the sensitive information in a Dataiku instance

13
Data Sheet - Dataiku
Architecture
Dataiku was built for the modern enterprise, and its architecture
ensures that businesses can stay open (i.e., not tied down to a certain
technology) and that they can scale their data efforts.

• No client installation for Dataiku users • Traceability and debugging through full system
logs
• Dataiku nodes (use dedicated Dataiku
environments or nodes to design, run, and • Open platform
deploy your ML applications)
☑ Native support of Jupyter notebooks
• Integrations ☑ Install and manage any of your favorite Python or R
packages and libraries, or integrate with external Git
☑ Leverage distributed systems to scale computations repositories
through Dataiku ☑ Freely reuse your existing corporate codebase
☑ Automatically turn Dataiku jobs into SQL, Spark, ☑ Extend the Dataiku platform with custom components
MapReduce, Hive, or Impala jobs for in-cluster or
in-database processing to avoid unnecessary data
movements or copies

• Modern architecture (Docker, Kubernetes, GPU

for deep learning)

Kailash BusinessReport
No ratings yet
Kailash BusinessReport
31 pages
Dokumen - Pub - Understanding Etl Data Pipelines For Modern Data Architectures Early Release 9781098159252
No ratings yet
Dokumen - Pub - Understanding Etl Data Pipelines For Modern Data Architectures Early Release 9781098159252
39 pages
CDH To CDP Migration-July29v3
0% (1)
CDH To CDP Migration-July29v3
22 pages
Google People and Ai Guidebook-Workshop-Slides
No ratings yet
Google People and Ai Guidebook-Workshop-Slides
126 pages
Informatica Big Data Management Course Agenda
100% (2)
Informatica Big Data Management Course Agenda
4 pages
Data Quality Administration Guide
No ratings yet
Data Quality Administration Guide
210 pages
Using Pony in Flask
No ratings yet
Using Pony in Flask
90 pages
Software Asset Management: What Is It and Why Do I Need It?: A Textbook on the Fundamentals in Software License Compliance, Audit Risks, Optimizing Software License ROI, Business Practices and Life Cycle Management
From Everand
Software Asset Management: What Is It and Why Do I Need It?: A Textbook on the Fundamentals in Software License Compliance, Audit Risks, Optimizing Software License ROI, Business Practices and Life Cycle Management
Carl A. Bolton
No ratings yet
Chapter 6 - Advanced Machine Learning PDF
No ratings yet
Chapter 6 - Advanced Machine Learning PDF
37 pages
Data Engineering with Scala and Spark: Build streaming and batch pipelines that process massive amounts of data using Scala
From Everand
Data Engineering with Scala and Spark: Build streaming and batch pipelines that process massive amounts of data using Scala
Eric Tome
No ratings yet
Geospatial Data Abstraction Library (GDAL) - Utilities
No ratings yet
Geospatial Data Abstraction Library (GDAL) - Utilities
31 pages
Opportunities For Artificial Intelligence
No ratings yet
Opportunities For Artificial Intelligence
10 pages
Architecture Basics Guide Dataiku
No ratings yet
Architecture Basics Guide Dataiku
31 pages
AI in Education-V.4 23-08-17 (Final) - 0
No ratings yet
AI in Education-V.4 23-08-17 (Final) - 0
25 pages
MLOps Buyers Guide by Seldon
No ratings yet
MLOps Buyers Guide by Seldon
11 pages
Ai Chatbot Kodeminds
No ratings yet
Ai Chatbot Kodeminds
19 pages
SQL Python Connect
No ratings yet
SQL Python Connect
2 pages
Video-LLaMA: A Novel and Advanced Audio-Visual Language Model For Video Content
No ratings yet
Video-LLaMA: A Novel and Advanced Audio-Visual Language Model For Video Content
7 pages
AI and Data Science
No ratings yet
AI and Data Science
12 pages
CMMI Whitepaper
No ratings yet
CMMI Whitepaper
8 pages
Geospatial Software 2023
No ratings yet
Geospatial Software 2023
4 pages
DevOps - Fresher Training
No ratings yet
DevOps - Fresher Training
15 pages
DataOps AWS Architecture Blueprint
100% (1)
DataOps AWS Architecture Blueprint
11 pages
Applied Coding Track
No ratings yet
Applied Coding Track
10 pages
APN Partner Project Plan Template
No ratings yet
APN Partner Project Plan Template
8 pages
Final - Data and Ai Governance.6sept2023
No ratings yet
Final - Data and Ai Governance.6sept2023
42 pages
Automation and DevOps Best Practices Presentation
No ratings yet
Automation and DevOps Best Practices Presentation
33 pages
Data Smart For Product Managers
100% (1)
Data Smart For Product Managers
13 pages
Big Data Maturity Model
100% (1)
Big Data Maturity Model
6 pages
Spark Use Cases
No ratings yet
Spark Use Cases
2 pages
Cloudera Introduction
No ratings yet
Cloudera Introduction
93 pages
Business Requirements Document /: Project Name Module Name
No ratings yet
Business Requirements Document /: Project Name Module Name
11 pages
Data Warehouse Design For E-Commerce Environment
No ratings yet
Data Warehouse Design For E-Commerce Environment
26 pages
Troubleshooting Spark Challenges
No ratings yet
Troubleshooting Spark Challenges
7 pages
Trove Project Onboarding Final2
No ratings yet
Trove Project Onboarding Final2
27 pages
Hemanshu Kumar Saraf - Resume New
No ratings yet
Hemanshu Kumar Saraf - Resume New
1 page
ETL vs. ELT: Frictionless Data Integration - Diyotta
100% (1)
ETL vs. ELT: Frictionless Data Integration - Diyotta
3 pages
Set Your Data in Motion
No ratings yet
Set Your Data in Motion
8 pages
Ibilolia New CV PDF
No ratings yet
Ibilolia New CV PDF
3 pages
Data Science Interview Preparation
100% (1)
Data Science Interview Preparation
113 pages
Resume - Tanmoy Munshi PDF
No ratings yet
Resume - Tanmoy Munshi PDF
2 pages
Chat-Bots Project Presentation
No ratings yet
Chat-Bots Project Presentation
33 pages
1 Introduction To Databricks Machine Learning
No ratings yet
1 Introduction To Databricks Machine Learning
9 pages
What Is DataOps - The Ultimate DataOps Guide by Rivery
No ratings yet
What Is DataOps - The Ultimate DataOps Guide by Rivery
11 pages
Data Cleaning Guide
No ratings yet
Data Cleaning Guide
66 pages
Informatica
No ratings yet
Informatica
7 pages
Implementing Artificial Intelligence (AI) in Higher Education: A Narrative Literature Review
No ratings yet
Implementing Artificial Intelligence (AI) in Higher Education: A Narrative Literature Review
32 pages
How To Build A Self-Service Data Analytics Stack Final - Google Docs Pdxule
No ratings yet
How To Build A Self-Service Data Analytics Stack Final - Google Docs Pdxule
12 pages
Accident Detection Using Deep Learning
No ratings yet
Accident Detection Using Deep Learning
4 pages
Big Data: by It Faculty Alttc Ghaziabad
No ratings yet
Big Data: by It Faculty Alttc Ghaziabad
26 pages
Structured Approach To Bi Testing
No ratings yet
Structured Approach To Bi Testing
13 pages
Cloud Data Warehouse
No ratings yet
Cloud Data Warehouse
7 pages
Google Certified Professional Data Engineer
No ratings yet
Google Certified Professional Data Engineer
4 pages
Tungban Machine Learning Math Course
No ratings yet
Tungban Machine Learning Math Course
124 pages
Example Star Schema For Banking
No ratings yet
Example Star Schema For Banking
16 pages
Data Quality and Cleaning
No ratings yet
Data Quality and Cleaning
9 pages
Data-Science MUMBAI
100% (1)
Data-Science MUMBAI
149 pages
ITIL v3 Sample Exams & Answers
No ratings yet
ITIL v3 Sample Exams & Answers
30 pages
SETLabs Briefings Software Validation
No ratings yet
SETLabs Briefings Software Validation
75 pages
Piyush Data Science 3
No ratings yet
Piyush Data Science 3
26 pages
Google Cloud Dataproc The Ultimate Step-By-Step Guide
From Everand
Google Cloud Dataproc The Ultimate Step-By-Step Guide
Gerardus Blokdyk
No ratings yet
HDInsight Essentials - Second Edition
From Everand
HDInsight Essentials - Second Edition
Rajesh Nadipalli
No ratings yet
Brain, Bytes & Bias: ML Interview Questions You Can't Miss!
No ratings yet
Brain, Bytes & Bias: ML Interview Questions You Can't Miss!
21 pages
Predicting Clinically Promising Therapeutic Hypotheses Using Tensor Factorization
No ratings yet
Predicting Clinically Promising Therapeutic Hypotheses Using Tensor Factorization
12 pages
Data Mining: Practical Machine Learning Tools and Techniques
No ratings yet
Data Mining: Practical Machine Learning Tools and Techniques
73 pages
Sensitivity Analysis
No ratings yet
Sensitivity Analysis
64 pages
Comprehensive Roadmap For AI, ML, DS, DA & DSA
No ratings yet
Comprehensive Roadmap For AI, ML, DS, DA & DSA
26 pages
NLP Tutorial For Text Classification in Python - by Vijaya Rani - Analytics Vidhya - Medium
No ratings yet
NLP Tutorial For Text Classification in Python - by Vijaya Rani - Analytics Vidhya - Medium
20 pages
ICAART 2024 - Paper
No ratings yet
ICAART 2024 - Paper
8 pages
A Novel Approach To Analyzing The Impact of AI Cha
No ratings yet
A Novel Approach To Analyzing The Impact of AI Cha
8 pages
Parvin 2023
No ratings yet
Parvin 2023
11 pages
Walter R. Palmas - Pocket Evidence Based Medicine - A Survival Guide For Clinicians and Students-Springer (2023)
No ratings yet
Walter R. Palmas - Pocket Evidence Based Medicine - A Survival Guide For Clinicians and Students-Springer (2023)
221 pages
Luận văn tốt nghiệp
No ratings yet
Luận văn tốt nghiệp
45 pages
Machine Learning Identifies Ecological Selectivity Patterns Across The End Permian Mass Extinction PDF
No ratings yet
Machine Learning Identifies Ecological Selectivity Patterns Across The End Permian Mass Extinction PDF
15 pages
Performance Analysis of Naive Bayes and J48 Classification Algorithm For Data Classification
No ratings yet
Performance Analysis of Naive Bayes and J48 Classification Algorithm For Data Classification
6 pages
BenchmarkingDefaultPredictionModels TR030124
No ratings yet
BenchmarkingDefaultPredictionModels TR030124
37 pages
An Interpretable Deep Learning Optimized Wearable Daily Detection System For Parkinsons Disease
No ratings yet
An Interpretable Deep Learning Optimized Wearable Daily Detection System For Parkinsons Disease
10 pages
Development and Validation of College Students ' Tuberculosis Knowledge, Attitudes and Practices Questionnaire (CS-TBKAPQ)
No ratings yet
Development and Validation of College Students ' Tuberculosis Knowledge, Attitudes and Practices Questionnaire (CS-TBKAPQ)
11 pages
CE11. CLIA-Efficacy of CLIA On VCA-IgA and EBNA1-IgA Antibodies of EB Virus in Diagnosing Nasopharyngeal Carcinoma
No ratings yet
CE11. CLIA-Efficacy of CLIA On VCA-IgA and EBNA1-IgA Antibodies of EB Virus in Diagnosing Nasopharyngeal Carcinoma
8 pages
Coca Cola - Killian Farrell & Luis Honsel
No ratings yet
Coca Cola - Killian Farrell & Luis Honsel
1 page
Probabilistic Models For Classification
No ratings yet
Probabilistic Models For Classification
32 pages
Intership Report
No ratings yet
Intership Report
41 pages
Kancherla 2013
No ratings yet
Kancherla 2013
5 pages
Decision Tree With Cross Validation
No ratings yet
Decision Tree With Cross Validation
19 pages
Sensors 23 03333 v2
No ratings yet
Sensors 23 03333 v2
22 pages
Evaluation: From Precision, Recall and F-Measure To Roc, Informedness, Markedness & Correlation
No ratings yet
Evaluation: From Precision, Recall and F-Measure To Roc, Informedness, Markedness & Correlation
27 pages
Accepted Manuscript: Cutibacterium
No ratings yet
Accepted Manuscript: Cutibacterium
20 pages
Vectorcardiographic QRS Area As A Novel Predictor of Response To Cardiac Resynchronization Therapy
No ratings yet
Vectorcardiographic QRS Area As A Novel Predictor of Response To Cardiac Resynchronization Therapy
8 pages
Learning Best Practices For Model Evaluation and Hyper-Parameter Tuning
No ratings yet
Learning Best Practices For Model Evaluation and Hyper-Parameter Tuning
20 pages
Balneo 768
No ratings yet
Balneo 768
18 pages

Dataiku Datsheet

Uploaded by

Dataiku Datsheet

Uploaded by

Data Sheet

More than 450 companies worldwide use Dataiku to systemize their

Hadoop File Formats Native Support for Snowflake

Data Analysis Data Cataloging

• Automatically detect dataset schema • Search for data, comments, features,

• Build univariate statistics automatically

• Dataset audit • Create standard charts (histogram,

Visual Data Transformation Dataset Sampling

• Design your data transformation jobs using • First

• Scale your transformations by running them

• See and tune the underlying code generated for

Automated Machine Learning (AutoML)

• Automated ML strategies • Algorithms

Automated Machine Learning (AutoML)

• Hyperparameters optimization ☑ Audit model performances

Model Deployment Deep Learning

• Model versioning • Support for Keras with Tensorflow backend

Data Flow Scenarios

Metrics & Checks Automation Environments

☑ Track the status of your production scenarios

Support of Multiple Languages

☑ Support for multiple versions of Python (2.7, 3.4, 3.5, • APIs

• Interactive Notebooks for data scientists

• Python & R Libraries

Shared Platform (for Data Scientists,

☑ Git-based version control recording all changes made

Knowledge Management and Sharing

☑ Create and export Wikis to document projects

Team Activity Monitoring

• Global search to quickly find all project assets,

• Share custom, code-based capabilities with

• Modern architecture (Docker, Kubernetes, GPU

You might also like