Sybase Bigdata
Sybase Bigdata
Reveal the essential characteristics of enterprise software, good and bad Provide a forum for detailed analysis of todays innovative technologies Give vendors a chance to explain their product to savvy analysts Allow audience members to pose serious questions... and get answers!
May: Analytics June: Intelligence July: Governance August: Analytics September: Integration October: Database
Ultimately analytics is about businesses making optimal decisions, although the range of technologies that inhabit this area is wide: statistical analysis, data mining, process mining, predictive analytics, predictive modeling, business process modeling and complex event processing. With the advent of big data, analytics has become big analytics with organizations diving into large heaps of data that previously was not available or usable. A major challenge with this market trend is to be able to provide adequate performance for all BI and analytics workloads on the volumes of data that are now being assembled and which are continuously growing.
SAP Sybase has a history of database innovation and application from the corporate RDBMS through to the mobile and embedded market. Sybase IQ has been deployed in many areas of application and is used in many complex predictive analytics deployments, where speed data capacity and versatility are critical. Recently it has been upgraded to be used in a symbiotic manner with Hadoop in order to provide a comprehensive capability as a BI and analytics engine for Big Data applications
David Jonker works in the area of Data Management & Analytics for SAP and is Product Marketing Director for Sybase IQ. In the last 5 years David has led product marketing teams for Sybases Data Management & Analytics product lines, including Sybase IQ, Sybase ASE, SQL Anywhere, and Advantage Database Server. His career includes over 10 years in software engineering and product management. Before joining Sybase, David had consulting, product management and software development roles. Courtney Claussen is a product manager at Sybase, Inc., focusing on Sybase's data warehousing and analytics products. She has enjoyed a 30 year career in software development, technical support and product marketing in the areas of computer aided design, computer aided software engineering, database management systems, middleware, and analytics.
Sybase IQ
Widespread success
Manage and analyze statistical measures for the entire nation of Canada
Stands out as the leading enterprise data warehouse among the largest banks, insurance agencies, and telecom operators worldwide
Store and analyze massive amounts of industry segment data in 30 of the largest information providers in the world, including Transunion, Nielsen and Axiom
10
Volume
Managing and harnessing terabytes of data
Skills
Lack of adequate skills for nonstandard platforms and APIs
Variety
Harmonizing silos of structured and unstructured data
Costs
Too expensive to acquire, operate, and expand
Velocity
Keeping up with unpredictable data and query flows
11
Sybase IQ 15
A powerful big data analytics platform in the making
2009
v15.0
2009
v15.1
2010
v15.2
2011
v15.3
2011
v15.4
MapReduce API
Sybase IQ 15.4
A comprehensive platform for big data analytics
Eco-System
CONTROL CENTER
Sybase
POWERDESIGNER
Sybase
App Services
Web 2.0
Java
C/C++
SQL
DMBS
13
External DLL A
15
Several different forms of C++ and JAVA UDF APIs for building custom In-database analytics, each valid at different locations within queries 1.{Scalar} to {Scalar functions} e.g. sin, cosine, 2.{Scalar set} to {Scalar functions} e.g. max, min, 3.{Scalar set} to {Scalar set} e.g. OLAP windows, 4.{Scalar set} to {Tables} e.g. join result sets, 5.{Scalar set, Tables} to {Tables} e.g. MapReduce, All variants are parallelizable, but (5) is also distributable across the PlexQ grid
17
Feature
JAVA User Defined Function offers a new indatabase analytics API
Characteristics
External algorithms written as JAVA fns, plugged into Sybase IQ JAVA fns via SQL: runs InDatabase, much faster than client side JAVA fns run protected/fault tolerant (in separate process) Supports scalar and table outputs Supports all data types
Plug-In
PMML
Zementis
JAVA UDF
Sybase IQ
18
Characteristics
Client tool capable of querying Sybase IQ and Hadoop Currently certified client tool is Quest Toad for Cloud Better performance when results from sources are pre-computed/ pre-aggregated
$
Sybase IQ
Hadoop Hive
19
Characteristics
Extract & load subsets of HDFS data into Sybase IQ column store Raw data from HDFS Results of Hadoop MR jobs HDFS data stored in Sybase IQ is treated like other Sybase IQ data Gets ACID properties of a DBMS Can be indexed, joined, parallelized Can be queried in an ad-hoc way Visible to BI and other client tools via Sybase IQ ANSI SQL API only Currently, the Apache bulk data transfer utility SQOOP (built by Cloudera) is certified to provide this ETL capability
ETL
SQOOP
20
Characteristics
Scan and fetch specified data subsets from HDFS via table UDF Can read and fetch HDFS data subsets Called as part of Sybase IQ SQL query Output joinable with Sybase IQ data HDFS data not stored in Sybase IQ Fetched into Sybase IQ In-memory tables ACID properties not applicable
21
Characteristics Characteristics
Trigger and fetch Hadoop MR job results via table UDF Can trigger Hadoop MR jobs Called as part of Sybase IQ SQL query Output joinable with Sybase IQ data HDFS data not stored in Sybase IQ Fetched into Sybase IQ In-memory tables ACID properties not applicable Repeated use: put fetched data in tables Visible to BI and other client tools via Sybase IQ ANSI SQL API
Combine results of Hadoop MR jobs with Sybase IQ data on the fly: Initiate and Join results of Hadoop MR jobs on-demand using SQL queries from Sybase IQ data (Query Federation technique)
UDF Bridge
22
SYBASE IQ 15.4
Unique, user community focused platform for big data analytics
SAN Fabric
Dynamic, elastic PlexQ MPP grid Grow, shrink, provision on-demand Heavy parallelization Load, prepare, mine, report in a workflow Privacy through isolation of resources Collaboration through sharing of results/data via sharing of resources
2012 SAP AG. All rights reserved. 23
Thank you
Courtney Claussen Product Manager, Sybase IQ [email protected] David Jonker Product Marketing Director, Sybase IQ [email protected]
Most of the Big Data opportunity is, in the end, a Big Analytics opportunity. There are two challenges in this: Managing the data and the data flow Providing acceptable performance for analytics applications Hadoop and its associated technologies can be both a blessing and a curse.
Hadoop = Key-value store & Parallel processing framework Some NoSQL databases are DHT-based, some are specialized DBMS Column-store DBMS vary, but in general they are MPP RDBMS and NewSQL DBMS
Data volumes (includes complexity of data structure) Concurrency (includes also workload variability) Computation (is application dependent) Data flow architecture is a factor
Twitter Tag: #briefr
Tuesday, May 22, 12
In many ways this is similar to the Data Warehouse data flow challenge; writ larger Latency is about application service levels This is probably still a three stage process This is, by the way, a simplification
Big Analytics is here to stay In some analytical application areas speed is desirable, in others speed is critical. Warning: Workloads can be mixed Analytic speed depends upon the database engine, but also data flow architecture Business effectiveness depends upon integration with the business process
Twitter Tag: #briefr
Tuesday, May 22, 12
The prebuilt functions clearly make sense (for speed of processing). Are they intended to make some analytic tools unnecessary or simply to be called directly by such tools? What does SAP see as the appropriate role(s) for Hadoop in most businesses? As I understand it, Sybase IQ can fully replace Hadoop in some contexts. What are the situations where you think Hadoop AND Sybase IQ is appropriate? Im intrigued by the idea of JOINing data between Hadoop results and Sybase IQ, but Im not sure of the role of such a capability. How is this different from using MR for data ingest? As you can link up to Hadoop/Sybase IQ at the front or at the back-end, which would you tend to use when?
You speak of broad and comprehensive capability, in combination with Hadoop. So which areas do you think are sweet spots? And which kinds of application and/or data collections do you think require different approaches? Who have been the early adopters of this Hadoop/Sybase IQ capability and what kind of business problems are they trying to solve? What do you see as SAP HANAs role in this? Are the same analytical capabilities being added to SAP HANA?
May: Analytics June: Intelligence July: Governance August: Analytics September: Integration October: Database
Twitter Tag: #briefr
Tuesday, May 22, 12