0% found this document useful (0 votes)
67 views

Sas Data Preparation

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
67 views

Sas Data Preparation

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

FACT SHEET

SAS® Data Preparation


Empower analysts to quickly prepare data for analytics in a self-service,
point-and-click environment

What does SAS® Data Preparation do?


SAS Data Preparation provides an interactive, self-service environment for users who
need to access, blend, shape and cleanse data to prepare it for reporting or analytics.

Why is SAS® Data Preparation important?


SAS Data Preparation saves time on the preliminary tasks done to prepare data for
reporting and analytics. Its intuitive interface provides point-and-click actions for critical
functions – no coding or SQL skills required. With simplified data preparation tasks seam-
lessly defined as part of the activities involved in analytics processing, users can spend
more time analyzing data and less time preparing it.

For whom is SAS® Data Preparation designed?


It’s designed for business analysts, citizen data scientists and other nontechnical users.
Data scientists and IT can use the same interface to prepare reusable plans for business
analysts.

To answer time-sensitive business questions, improve everyone’s productivity. Through


Key Benefits
organizations need fast access to consistent, its self-service tools, SAS Data Preparation
Boost productivity through self-service data
trusted data that they can use for analytics. empowers business users to take vetted
preparation. No specialized skills or coding
Without it, they may not be able to respond data from IT and customize it for any report
are required to access, merge and shape
quickly enough to market and customer or analysis they need.
data, and data preparation tasks are defined
requirements. But most organizations have
within the same visual experience – auto-
massive volumes of data spread across silos. Built on SAS® Viya®, the intuitive, visual
matically integrated with downstream
This raw data often contains errors or is interface of SAS Data Preparation1 makes
analytics and reporting tasks.
duplicated, outdated or lacks identifiers it easy for business users to quickly prepare
needed to merge sources. Preparing it for data without coding or help from IT. The
Gain efficiency through reusability, collabo-
analytics can consume up to 80 percent of software runs in a fast, in-memory distrib-
ration. Automatically generated code and
an analyst’s time. uted environment. This frees IT from the
defined transformations can be shared with
mundane task of provisioning data, and
IT and scheduled to run with each source code
It’s a frustrating issue for business and IT. business analysts and data scientists get
update. Data preparation tasks can be saved
Nontechnical users lack the skills to move relevant results that drive faster business
in projects, then shared and reused by others.
and transform data to make it ready for insights. The interface automatically gener-
analytics. Alternatives require extensive ates code that can be scheduled to ensure
Empower analytics users with fast results.
coding, SQL or scripting knowledge, and currency with source system refreshes.
Prebuilt transformations and data cleansing
training in data engineering for extract, trans- Templates can be defined and reused,
functions assist users as they explore data,
form, load (ETL) tools. In most cases, IT has to promoting sharing and collaboration.
refine it and explore some more. And with
provision data for business users when they
in-memory distributed processing and
could have focused on more strategic activi-
parallel I/O, responses can be delivered
ties. And business users have to wait in line
in near-real time.
for IT to create their data sources before they
can get data in the right form for analytics.
Reduce total cost of ownership. Make the
most of your existing resources by giving
Many organizations want to give business
them a visual, interactive interface that
users direct access to data to free IT from
SAS Visual Analytics (sold separately) is a required
1 guides them through routine reporting
never-ending custom data requests and
product for SAS Data Preparation. and analytics data preparation tasks, with
software that requires very little training.
Product Overview SAS Data Preparation provides the type of with downstream reporting and analytics
ad hoc environment today’s analytics processing – all from the same intuitive
With the volume and variety of data avail-
professionals crave. With its simple, interac- interface. Market-validated data integration
able today, business analysts need to
tive user interface designed for self-service and data quality capabilities are prebuilt for
curate data to answer specific questions.
data preparation, nontechnical users have quick data vetting and correction. And the
This requires different views of the data,
flexibility to integrate data from virtually any seamless, consistent user experience
which often needs to be examined in
source they need, cleanse it and prepare it extends across the entire analytics life cycle.
different ways, multiple times a day. Even
for analysis quickly and easily. Data can be
when IT has prepared and cleansed the
data for them, analysts still need to itera-
loaded in memory so multiple users will Easy-to-use capabilities
share the same view simultaneously. Users’ With SAS Data Preparation, it’s easy to
tively examine and prepare it further for
data preparation tasks are fully integrated access, integrate, browse and cleanse data.
their particular needs.
Visually explore external data sources, and
big data stores like Hadoop and data in
SAS Viya. Create connections to external
data sources on the fly – curate what you
need, when you need it. And get fast insight
into the data by profiling physical metadata
information – column names, data types,
encoding, column and row counts.

You can access data from flat files, relational


data sources, social media sources, SAS
data sets, Apache Hadoop, Teradata, CSV
files, text files and other sources. Technical
users who prefer to code can access the
SAS Data Quality routines from SAS code
or from third-party coding languages, like
Python.
Figure 1. Explore data accessed from multiple sources.
Speed and scalability
High-performance, high-quality data fuels
high-performing results. With SAS Data
Preparation, users can interactively blend
and shape data in near-real time, without
having to wait on batch processes. Data
preparation functions can be loaded in
parallel and processed in memory. For
some sources, processing can be pushed
to run where the data resides – speeding
execution of SAS code, minimizing data
movement and delivering rapid responses.

Visual interface for self-service


data preparation
Business analysts and data scientists can
Figure 2. Object lineage shows the relationships between different objects.
use the wizard-based interface to access,
integrate, view, filter, join, transform, cleanse
and query data. Each transformation is
designed to guide users through the data
orchestration process so they can easily
understand the impact of how any single
data preparation task affected results.
Key Features
Variety of prebuilt transformations
Data and metadata access • Change case, convert column,
Several types of prebuilt transformations • Use any authorized internal source, rename, remove, split, trim
are included in SAS Data Preparation – accessible external data sources whitespace, custom calculations.
column-based, row-based, code-based and data held in memory in SAS Viya.
and multiple-input-based transformations. • View a sample of a table or file
These prebuilt transformations assist with loaded in the in-memory engine Row-based transformations
filtering, blending, shaping, remediating of SAS Viya, or from data sources • Use row-based transformations
and standardizing data. registered with SAS/ACCESS®, to to filter and shape data.
see data you want to work with.  • Create analytical-based tables using
the transpose transformation to
Built-in data quality • Quickly create connections to and
between external data sources. prepare the data for analytics and
Out of the box, SAS Data Preparation reporting tasks. 
• Access physical metadata infor-
includes SAS Data Quality functions to • Create simple or complex filters to
mation like column names, data
help create analytics-ready data. Functions remove unnecessary data. 
types, encoding, column count
include profiling, casing, standardizing,
and row count to gain further
parsing, identification analysis and more.
insight into the data.
Users can generate column-based and Code-based transformations
• Data sources and types include:
table-based basic and advanced profile • Write custom code to transform,
• Access to more than 20 data
metrics to uncover data quality issues and shape, blend, remediate and stan-
sources and types, including rela-
get insights into the data itself. Data quality dardize data. 
tional databases, social sources,
• Write simple expressions to create
and other data preparation tasks are acces- etc.
calculated columns, write advanced
sible from coding interfaces other than
code or reuse code snippets for
SAS, including Python.2
Data provisioning greater transformational flexibility. 
• Parallel load data from supported • Import custom code defined by
Data governance and lineage others, sharing best practices and
data sources into memory simply by
SAS Data Preparation lets users explore the collaborative productivity.
selecting them – no need to write code
relationships between data sources, data
or have experience with an ETL tool.*3
objects and actions taken on the data –
• Reduce the amount of data being Multiple-input-based
so it’s easy to trace pipeline activity.
copied by performing row filtering transformations
or column filtering before the data is • Use multiple-input-based transfor-
Collaboration, reuse and provisioned. mations to blend and shape data. 
automation • Blend or shape one or more sets of
With SAS Data Preparation, users can data together using the guided inter-
Guided, interactive data preparation
prepare data for their specific analysis, then face – there’s no requirement to know
• Transform, blend, shape, cleanse and
save and share transformations so they can SQL or SAS.
standardize data in an interactive,
be reused later. Templates can be defined
visual environment.
from a point-and-click interface – or from a • Easily understand how a transfor-
coding environment – defining best prac- Data profiling
mation affected results, getting • Profile data to generate column-
tices for others to use. Template code can visual feedback in near-real time based and table-based basic and
also be scheduled as part of IT processing through the distributed, in-memory advanced profile metrics.
to keep prepared data current with processing of SAS Viya. • Use the table-level profile metrics to
refreshes. • Quickly extract document content uncover data quality issues and get
and perform text identification and further insight into the data itself. 
extraction using batch text analysis. • Drill into column-level profile
metrics and see visual graphs of
pattern distribution and frequency
Column-based transformations
distribution results. 
• Save data plans for quick data
• Use a variety of data types/sources
preparation jobs (through support
(listed previously) to profile data from
for wide tables).
Twitter, Facebook, Google Analytics
• Use column-based transformations
or YouTube.
to standardize, remediate and shape
Such third-party interfaces to SAS are available for
2 data without configuring:
download from GitHub.
3 Data cannot be sent back to the following data sources: Twitter, YouTube, Facebook, Google Analytics, Esri; it can only
be sourced from these sites.
Key Features (continued)
Data quality processing4 System and job monitoring Plan templates and project
Data cleansing • Use integrated monitoring capa- collaboration
bilities for system- and job-level • Use data preparation plans
• Find like records and group
processes. (templates), with one or more
together logically.
• Understand how many processes sources of data, to improve
• Use locale- and context-specific
are running, how long they’re taking productivity.
parsing and field extraction defini-
and who is running them.  • Reuse the templates by applying
tions to reshape data and uncover
• Easily filter through all system jobs them to different sets of data to
additional insights.
based on job status (running, ensure that data is transformed
• Use the extraction transformation
successful, failed, pending and consistently to adhere to enterprise
to identify and extract informa-
cancelled).  data standards and policies. 
tion (e.g., name, gender, field,
• Access job error logs to help • Rely on team-based collaboration
pattern, identify, email and phone
with root-cause analysis and through a project hub used with
number) in a specified column. 
troubleshooting.  SAS Viya projects. 
• Generate match codes on data
that can be used to perform fuzzy
matching. Data import and data preparation
Cloud data exchange
• Standardize data with locale- and job scheduling
• Move data from on-premises loca-
context-specific definitions to • Create a data import job from auto-
tions to SAS Viya running in a private
transform data into a common matically generated code to perform
or public cloud.
format, like casing. a data refresh using the integrated
• Preprocess data locally to reduce
scheduler. 
Identity definition the amount of data that needs to be
• Schedule data explorer imports as
• Create a unique identity for each moved to remote locations.
jobs so they will become an auto-
row with the unique ID generator. • Use a command line input (CLI)
matic, repeatable process.
• Analyze column data using interface for administration and
• Specify a time, date, frequency and/or
locale-specific rules to determine control.
interval for the jobs.
gender or context. • Use cloud data exchange to securely
• Identify, find and sort data by and responsibly negotiate your
tagging columns and tables. Data lineage on-site firewall.
• Use identification analysis to analyze • Create multiple views with different
the data and determine its context, tabs, and save the organization of
and to identify the subject data in those views.   TO LEARN MORE  » 
each column.  • Explore relationships between
• Use gender analysis to determine accessible data sources, data objects To learn more about SAS Data Preparation
the gender of a name using locale- and jobs. system requirements, download white
specific rules.  • Use the relationship graph to visu- papers, view screenshots and see other
ally show the relationships that exist related material, please visit: sas.com/
between objects.  data-preparation.

4
Supported data quality transformations rely on SAS
Quality Knowledge Base Locales, a locale-specific
library of data quality functions available in over 30
locales, included with SAS Data Preparation.

To contact your local SAS office, please visit: sas.com/offices

SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc.
in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their
respective companies. Copyright © 2018, SAS Institute Inc. All rights reserved. 109216_G91145.1018

You might also like