0% found this document useful (0 votes)

128 views

Sybase Bigdata

Analytics is about businesses making optimal decisions. With the advent of big data, analytics has become "big analytics" SAP Sybase has a history of database innovation and application. Courtney Claussen is a product manager at Sybase, focusing on its Data Management and Analytics products.

Uploaded by

yogi9009

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

128 views

Sybase Bigdata

Uploaded by

yogi9009

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 36

Tuesday, May 22, 12

[email protected]

Twitter Tag: #briefr

Tuesday, May 22, 12

Reveal the essential characteristics of enterprise software, good and bad Provide a forum for detailed analysis of todays innovative technologies Give vendors a chance to explain their product to savvy analysts Allow audience members to pose serious questions... and get answers!

Twitter Tag: #briefr

Tuesday, May 22, 12

May: Analytics June: Intelligence July: Governance August: Analytics September: Integration October: Database

Twitter Tag: #briefr

Tuesday, May 22, 12

Ultimately analytics is about businesses making optimal decisions, although the range of technologies that inhabit this area is wide: statistical analysis, data mining, process mining, predictive analytics, predictive modeling, business process modeling and complex event processing. With the advent of big data, analytics has become big analytics with organizations diving into large heaps of data that previously was not available or usable. A major challenge with this market trend is to be able to provide adequate performance for all BI and analytics workloads on the volumes of data that are now being assembled and which are continuously growing.

Twitter Tag: #briefr

Tuesday, May 22, 12

Robin Bloor is Chief Analyst at The Bloor Group.

[email protected]

Twitter Tag: #briefr

Tuesday, May 22, 12

SAP Sybase has a history of database innovation and application from the corporate RDBMS through to the mobile and embedded market. Sybase IQ has been deployed in many areas of application and is used in many complex predictive analytics deployments, where speed data capacity and versatility are critical. Recently it has been upgraded to be used in a symbiotic manner with Hadoop in order to provide a comprehensive capability as a BI and analytics engine for Big Data applications

Twitter Tag: #briefr

Tuesday, May 22, 12

David Jonker works in the area of Data Management & Analytics for SAP and is Product Marketing Director for Sybase IQ. In the last 5 years David has led product marketing teams for Sybases Data Management & Analytics product lines, including Sybase IQ, Sybase ASE, SQL Anywhere, and Advantage Database Server. His career includes over 10 years in software engineering and product management. Before joining Sybase, David had consulting, product management and software development roles. Courtney Claussen is a product manager at Sybase, Inc., focusing on Sybase's data warehousing and analytics products. She has enjoyed a 30 year career in software development, technical support and product marketing in the areas of computer aided design, computer aided software engineering, database management systems, middleware, and analytics.

Twitter Tag: #briefr

Tuesday, May 22, 12

Sybase IQ 15.4 Overview Big data analytics & Hadoop

Tuesday, May 22, 12

Sybase IQ
Widespread success

Manage and analyze statistical measures for the entire nation of Canada

Analyze ALL Federal tax returns in the US

Analyze complex models in more than 200 financial institutions worldwide

Stands out as the leading enterprise data warehouse among the largest banks, insurance agencies, and telecom operators worldwide

Store and analyze massive amounts of industry segment data in 30 of the largest information providers in the world, including Transunion, Nielsen and Axiom

2012 SAP AG. All rights reserved.

Tuesday, May 22, 12

BIG DATA ANALYTICS ISSUES

Dealing with volume, variety, velocity, costs, skills

Volume
Managing and harnessing terabytes of data

Skills
Lack of adequate skills for nonstandard platforms and APIs

Variety
Harmonizing silos of structured and unstructured data

BIG DATA ANALYTICS

Costs
Too expensive to acquire, operate, and expand

Velocity
Keeping up with unpredictable data and query flows

2012 SAP AG. All rights reserved.

Tuesday, May 22, 12

Sybase IQ 15
A powerful big data analytics platform in the making
2009
v15.0

2009
v15.1

2010
v15.2

2011
v15.3

2011
v15.4

Big data analytics

Skills Costs Variety Velocity Volume

MapReduce API

PlexQ MPP Foundation

Text Search, Web 2.0 API

In-Database Analytics API

VLDB Platform Foundation

Tuesday, May 22, 12

Sybase IQ 15.4
A comprehensive platform for big data analytics

Eco-System

CONTROL CENTER

Sybase

POWERDESIGNER

Sybase

CERTIFITED ISV TOOLS Ingest + Persist Federation

App Services

Web 2.0

Java

C/C++

SQL

Unstructured Data (Hadoop, Content Mgmt)

Structured Data (DBMS)

DMBS

2012 SAP AG. All rights reserved.

Tuesday, May 22, 12

Details: In-Database Analytics & Hadoop

Tuesday, May 22, 12

In-database analytics in Sybase IQ

No compromise for complex analytics

Basic to advanced analytical functions available to SQL directly from Sybase IQ engine Data never leaves the database until results are materialized Analytics code / models must be shareable yet must allow AD-HOC analysis Analytics code / models must be applicable to the latest data set Standards based access, concept extensibility is compulsory Performance and scalability is a given Average developer must be able to build In-database analytical models

Sybase IQ Process Built-In func6ons External DLL A

Database = Logic/Filtering Applied in database

External DLL A

Analy7cs simplied: Logic To Data = Fast + Ecient

2012 SAP AG. All rights reserved.

Tuesday, May 22, 12

In-database analytics in Sybase IQ

Custom functions APIs

Several different forms of C++ and JAVA UDF APIs for building custom In-database analytics, each valid at different locations within queries 1.{Scalar} to {Scalar functions} e.g. sin, cosine, 2.{Scalar set} to {Scalar functions} e.g. max, min, 3.{Scalar set} to {Scalar set} e.g. OLAP windows, 4.{Scalar set} to {Tables} e.g. join result sets, 5.{Scalar set, Tables} to {Tables} e.g. MapReduce, All variants are parallelizable, but (5) is also distributable across the PlexQ grid

2012 SAP AG. All rights reserved.

Tuesday, May 22, 12

In-database analytics in Sybase IQ

Java custom functions
3

Feature
JAVA User Defined Function offers a new indatabase analytics API

Characteristics
External algorithms written as JAVA fns, plugged into Sybase IQ JAVA fns via SQL: runs InDatabase, much faster than client side JAVA fns run protected/fault tolerant (in separate process) Supports scalar and table outputs Supports all data types

Big Data Use Cases

Ideal for ISV or custom Data Mining libraries for Healthcare, eCommerce, Public Sector Apps include: ISV partner Zementis built a plug-in for PMML (Predictive Modeling Markup Language) models Validates PMML from SAS, R,.. Translates PMML to JAVA UDFs JAVA UDFs called from SQL

Plug-In

PMML

Zementis
JAVA UDF

Sybase IQ

2012 SAP AG. All rights reserved.

Tuesday, May 22, 12

SYBASE IQ 15.4 DECONSTRUCTED

App services integrating Sybase IQ + Hadoop: at client side 6a Feature
Client side federation: Join data from Sybase IQ AND Hadoop at a client application level

Characteristics
Client tool capable of querying Sybase IQ and Hadoop Currently certified client tool is Quest Toad for Cloud Better performance when results from sources are pre-computed/ pre-aggregated

Big Data Use Cases

Ideal for bringing together Big Data Analytics pre-computations from different domains
Example In Telecommunication: Sybase IQ with aggregated customer loyalty data & Hadoop with aggregated network utilization data; Quest Toad for Cloud can bring data from both sources, linking customer loyalty to network utilization or network faults (e.g. dropped calls)

Toad for Cloud Databases

$
Sybase IQ

Hadoop Hive

2012 SAP AG. All rights reserved.

Tuesday, May 22, 12

SYBASE IQ 15.4 DECONSTRUCTED

App services integrating Sybase IQ + Hadoop: using ETL 6b Feature
Load Hadoop data into Sybase IQ column store: Extract, transform, load data from HDFS (Hadoop Distributed File System) into Sybase IQ schemas

Characteristics
Extract & load subsets of HDFS data into Sybase IQ column store Raw data from HDFS Results of Hadoop MR jobs HDFS data stored in Sybase IQ is treated like other Sybase IQ data Gets ACID properties of a DBMS Can be indexed, joined, parallelized Can be queried in an ad-hoc way Visible to BI and other client tools via Sybase IQ ANSI SQL API only Currently, the Apache bulk data transfer utility SQOOP (built by Cloudera) is certified to provide this ETL capability

Big Data Use Cases

Ideal for combining subsets of HDFS unstructured data or summary of HDFS data into Sybase IQ for mid to long term usage in business reports
Example In eCommerce: clickstream data from weblogs stored in HDFS and outputs of MR jobs on that data (to study browsing behavior) ETLd into Sybase IQ. The transactional sales data in Sybase IQ joined with clickstream data to understand and predict customer browsing to buying behavior

ETL

Clickstream Data HDFS

Sales Data Sybase IQ

SQOOP

Tuesday, May 22, 12

SYBASE IQ 15.4 DECONSTRUCTED

App services integrating Sybase IQ + Hadoop: using Data Federation 6c Feature
Join HDFS data with Sybase IQ data on the fly: Fetch and join subsets of HDFS data on-demand using SQL queries from Sybase IQ (Data Federation technique)

Characteristics
Scan and fetch specified data subsets from HDFS via table UDF Can read and fetch HDFS data subsets Called as part of Sybase IQ SQL query Output joinable with Sybase IQ data HDFS data not stored in Sybase IQ Fetched into Sybase IQ In-memory tables ACID properties not applicable

Big Data Use Cases

Ideal for combining subsets of HDFS data with Sybase IQ data for operational (transient) business reports Example In Retail: Point Of Sale (POS) detailed data stored in HDFS. Sybase IQ EDW fetches POS data at fixed intervals from HDFS of specific hot selling SKUs, combines with inventory data in Sybase IQ to predict and prevent inventory stockouts

Visible to BI/other client tools via Sybase IQ ANSI SQL API

POS Data HDFS UDF Bridge

Inventory Data Sybase IQ

Tuesday, May 22, 12

SYBASE IQ 15.4 DECONSTRUCTED

App services integrating Sybase IQ + Hadoop: using Query Federation 6d Feature

Characteristics Characteristics
Trigger and fetch Hadoop MR job results via table UDF Can trigger Hadoop MR jobs Called as part of Sybase IQ SQL query Output joinable with Sybase IQ data HDFS data not stored in Sybase IQ Fetched into Sybase IQ In-memory tables ACID properties not applicable Repeated use: put fetched data in tables Visible to BI and other client tools via Sybase IQ ANSI SQL API

Big Data Use Cases

Ideal for combining results of Hadoop MR job results with Sybase IQ data for operational (transient) business reports Example In Utilities: Smart meter and smart grid data can be combined for load monitoring and demand forecast. Smart grid transmission quality data (multi-attribute time series data) stored in HDFS can be computed via Hadoop MR jobs triggered from Sybase IQ and combined with Smart meter data stored in Sybase IQ to analyze demand and workload.

Combine results of Hadoop MR jobs with Sybase IQ data on the fly: Initiate and Join results of Hadoop MR jobs on-demand using SQL queries from Sybase IQ data (Query Federation technique)

Smart Grid Transmission Data HDFS

Smart Meter Consumption Data Sybase IQ

UDF Bridge

Tuesday, May 22, 12

SYBASE IQ 15.4
Unique, user community focused platform for big data analytics

Data Discovery (Data Scien7sts)

Applica6on Modeling (Business Analysts)

Reports/Dashboards (BI Programmers)

Business Decisions (Business End Users)

Full Mesh High Speed Interconnect

Infrastructure Management (DBAs)

SAN Fabric

Dynamic, elastic PlexQ MPP grid Grow, shrink, provision on-demand Heavy parallelization Load, prepare, mine, report in a workflow Privacy through isolation of resources Collaboration through sharing of results/data via sharing of resources
2012 SAP AG. All rights reserved. 23

Tuesday, May 22, 12

Thank you
Courtney Claussen Product Manager, Sybase IQ [email protected] David Jonker Product Marketing Director, Sybase IQ [email protected]

Tuesday, May 22, 12

Twitter Tag: #briefr

Tuesday, May 22, 12

Most of the Big Data opportunity is, in the end, a Big Analytics opportunity. There are two challenges in this: Managing the data and the data flow Providing acceptable performance for analytics applications Hadoop and its associated technologies can be both a blessing and a curse.

Twitter Tag: #briefr

Tuesday, May 22, 12

Hadoop = Key-value store & Parallel processing framework Some NoSQL databases are DHT-based, some are specialized DBMS Column-store DBMS vary, but in general they are MPP RDBMS and NewSQL DBMS

Twitter Tag: #briefr

Tuesday, May 22, 12

Data volumes (includes complexity of data structure) Concurrency (includes also workload variability) Computation (is application dependent) Data flow architecture is a factor
Twitter Tag: #briefr
Tuesday, May 22, 12

In many ways this is similar to the Data Warehouse data flow challenge; writ larger Latency is about application service levels This is probably still a three stage process This is, by the way, a simplification

Twitter Tag: #briefr

Tuesday, May 22, 12

Big Analytics is here to stay In some analytical application areas speed is desirable, in others speed is critical. Warning: Workloads can be mixed Analytic speed depends upon the database engine, but also data flow architecture Business effectiveness depends upon integration with the business process
Twitter Tag: #briefr
Tuesday, May 22, 12

The prebuilt functions clearly make sense (for speed of processing). Are they intended to make some analytic tools unnecessary or simply to be called directly by such tools? What does SAP see as the appropriate role(s) for Hadoop in most businesses? As I understand it, Sybase IQ can fully replace Hadoop in some contexts. What are the situations where you think Hadoop AND Sybase IQ is appropriate? Im intrigued by the idea of JOINing data between Hadoop results and Sybase IQ, but Im not sure of the role of such a capability. How is this different from using MR for data ingest? As you can link up to Hadoop/Sybase IQ at the front or at the back-end, which would you tend to use when?

Twitter Tag: #briefr

Tuesday, May 22, 12

You speak of broad and comprehensive capability, in combination with Hadoop. So which areas do you think are sweet spots? And which kinds of application and/or data collections do you think require different approaches? Who have been the early adopters of this Hadoop/Sybase IQ capability and what kind of business problems are they trying to solve? What do you see as SAP HANAs role in this? Are the same analytical capabilities being added to SAP HANA?

Twitter Tag: #briefr

Tuesday, May 22, 12

May: Analytics June: Intelligence July: Governance August: Analytics September: Integration October: Database
Twitter Tag: #briefr
Tuesday, May 22, 12