Dataiku Datsheet
Dataiku Datsheet
2021
Everyday AI,
Extraordinary People
Everyday AI, Extraordinary People
Dataiku is the platform for Everyday AI, systemizing the use of data for
exceptional business results. Organizations that use Dataiku elevate their
people (whether technical and working in code or on the business side
and low- or no-code) to extraordinary, arming them with the ability to
make better day-to-day decisions with data.
SQL Databases
☑ MySQL Streaming Data Sources
☑ PostgreSQL ☑ Kafka
☑ Vertica ☑ AWS SQS
☑ Amazon Redshift ☑ Spark
☑ Pivotal Greenplum
☑ Teradata
☑ IBM Netezza Remote Data Sources
☑ SAP HANA
☑ FTP
☑ Oracle
☑ SCP
☑ Azure Synapse
☑ SFTP
☑ Google BigQuery
☑ HTTP
☑ Google Cloud SQL
☑ IBM DB2
☑ Exasol Cloud Object Storage
☑ MemSQL
☑ Amazon S3
☑ Snowflake
☑ Google Cloud Storage
☑ Custom connectivity through JDBC
☑ Azure Blob Storage
☑ Azure Data Lake Store Gen1 & Gen2
NoSQL Databases
☑ MongoDB Custom Data Sources - Extended Connectivity
☑ Cassandra Through Dataiku Plugins
☑ ElasticSearch
☑ Connect to REST APIs
☑ Create custom file formats
Hadoop & Spark Supported Distributions ☑ Connect to databases
☑ Cloudera
☑ Hortonworks Optimized Sync Between:
☑ Google DataProc
☑ MapR ☑ Snowflake and WASB
☑ Amazon EMR ☑ S3 and Amazon Redshift
☑ DataBricks ☑ Snowflake and S3
3
Data Sheet - Dataiku
Exploratory Analytics
Sometimes you need to do a deep dive on your data, but other times,
it’s important to understand it at a glance. From exploring available
datasets to dashboarding, Dataiku makes this type of analysis easy.
• Assign semantic meanings to your datasets’ • Explore data from all your existing
columns connections
• Time series
☑ Time series data prep with visual recipes for
resampling, windowing, extrema extraction, interval
extraction
☑ Time series visualization
☑ Time series forecasting
4
Data Sheet - Dataiku
Data Preparation
Traditionally, data preparation takes up to 80% of the time of a data
project. But Dataiku’s data prep features make that process 10x faster
and easier, which means more time for more impactful (and creative)
work.
5
Data Sheet - Dataiku
Machine Learning
Dataiku offers the latest machine learning technologies all in one place
so that data scientists can focus on what they do best: building and
optimizing the right model for the use case at hand.
☑ TensorFlow ☑ H20-based
☑ Keras + Deep Learning
☑ Scikit-learn + GBM
☑ XGBoost + GLM
☑ MLLib + Random Forest
☑ H2O + Naive Bayes
6
Data Sheet - Dataiku
Machine Learning
Dataiku offers the latest machine learning technologies all in one place
so that data scientists can focus on what they do best: building and
optimizing the right model for the use case at hand.
7
Data Sheet - Dataiku
Machine Learning
Dataiku offers the latest machine learning technologies all in one place
so that data scientists can focus on what they do best: building and
optimizing the right model for the use case at hand.
8
Data Sheet - Dataiku
Output Features
After all the work of finding insights, it’s important to effectively
communicate them with stakeholders around the organization to inspire
action. Dataiku puts the power of AI in the hands of everyone to make
intelligence-driven decisions.
Charts Dashboards
☑ Bar, line, curve, pie, donut, scatter, boxplot,
☑ Create interactive insights with charts, tables, notebook
2D distribution, lift
exports, webapps and more
☑ Maps: Scatter, binned, administrative
☑ Tables
Dataiku WebApps
Dataiku Applications ☑ Use code to build highly customized applications that can be
leveraged as an API for users
☑ Create user-friendly interfaces on top of projects for users to ☑ Supports R-Shiny, Dash Plotly, Bokeh, HTML, CSS, JS, and
customize and parametrize in a few clicks and without code Flash
☑ Share applications on Dataiku as a recipe or as an API
9
Data Sheet - Dataiku
Automation Features
When it comes to streamlining and automating workflows, Dataiku
allows data teams to put the right processes in place to ensure models
are properly monitored and easily managed in production.
Monitoring
10
Data Sheet - Dataiku
Code
Work in the tools and with the languages you already know (even in
Dataiku) — everything can be done with code and fully customized.
And for tasks where it’s easier to use a visual interface, Dataiku
provides the freedom to switch seamlessly between the two.
11
Data Sheet - Dataiku
Collaboration
Dataiku was designed from the ground up with collaboration in mind.
From knowledge sharing to change management to monitoring, data
teams — including scientists, engineers, analysts, and more — can
work faster and smarter together.
Version Control
12
Data Sheet - Dataiku
Governance & Security
Dataiku makes data governance easy, bringing enterprise-level security
with fine-grained access rights and advanced monitoring for admins or
project managers.
User Profiles
• Resources management
• Role-based access
(fine-grained or custom) ☑ Dynamically start and stop Hadoop clusters from
Dataiku
☑ Control server resources allocation directly from the
• Authentication management
user interface
☑ Use SSO systems
☑ Connect to your corporate database (LDAP, Active • Platform management
Directory…) to manage users and groups
☑ Integrate with your corporate workload management
• Enterprise-grade security tools using Dataiku CLI and APIs
☑ Track and monitor all actions in Dataiku using an audit Custom policy framework for data protection
trail
☑ Authenticate against Hadoop clusters and databases
and external regulations compliance
through Kerberos
☑ Implement GDPR rules and processes directly
☑ Supports users impersonation for full traceability and
☑ Framework capabilities
compliance
◊ Document data sources with sensitive information, and
enforce good practices
◊ Restrict access to projects and data sources with sensitive
information
◊ Audit the sensitive information in a Dataiku instance
13
Data Sheet - Dataiku
Architecture
Dataiku was built for the modern enterprise, and its architecture
ensures that businesses can stay open (i.e., not tied down to a certain
technology) and that they can scale their data efforts.
• No client installation for Dataiku users • Traceability and debugging through full system
logs
• Dataiku nodes (use dedicated Dataiku
environments or nodes to design, run, and • Open platform
deploy your ML applications)
☑ Native support of Jupyter notebooks
• Integrations ☑ Install and manage any of your favorite Python or R
packages and libraries, or integrate with external Git
☑ Leverage distributed systems to scale computations repositories
through Dataiku ☑ Freely reuse your existing corporate codebase
☑ Automatically turn Dataiku jobs into SQL, Spark, ☑ Extend the Dataiku platform with custom components
MapReduce, Hive, or Impala jobs for in-cluster or
in-database processing to avoid unnecessary data
movements or copies
14
Data Sheet - Dataiku
©2021 DATAIKU | DATAIKU.COM