0% found this document useful (0 votes)
4 views33 pages

DataScience Slides

The document outlines a comprehensive course on Data Science and Big Data Analytics led by instructor Mr. Sai Kumar, who has extensive experience in IT and big data. The course aims to provide practical knowledge on data science applications, analytical models, and technologies to derive insights from big data, covering various topics over five days. Key objectives include understanding the role of big data, selecting appropriate models and tools, and applying best practices in data analytics through case studies and real-world scenarios.

Uploaded by

suresh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views33 pages

DataScience Slides

The document outlines a comprehensive course on Data Science and Big Data Analytics led by instructor Mr. Sai Kumar, who has extensive experience in IT and big data. The course aims to provide practical knowledge on data science applications, analytical models, and technologies to derive insights from big data, covering various topics over five days. Key objectives include understanding the role of big data, selecting appropriate models and tools, and applying best practices in data analytics through case studies and real-world scenarios.

Uploaded by

suresh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 33

Data Science and Big

data Analytics
Hello! Instructor introduction
Instructor: Mr. SAI KUMAR

Big data consultant, Data Scientist

• 25 years of IT experience in sphere of Trainings, Consultancy, and Team lead.


Having 13+ years of experience on Databases, programming languages. Having
10+ years of experience on big data, data science, functional programming
(Python, Scala, R). Cloudera certified.
• Big data consultant with UAE company, project manager for Srilanka Government
project. Vast experience in delivering trainings for several corporate clients.
• Recent training deliveries: Oman Petroleum, Oman Hamas water services
company, Walmart, Amdocs, L&T, Aramco, ITC infotech.
About the course
The Complete Course on Data Science & Big Data Analytics provides a practical approach to applying
data science techniques. It covers how to determine the requirements for data science applications,
the technologies available, and the analytical models suited to derive valuable insights from Big Data.
This training will guide you on making sense of complex data and utilizing analytics effectively.

This training course will feature:


• An Overview of Big Data Analytics
• Adaptation and Approach of Lifecycles and Models
• Methods and Models for Statistical Evaluation
• Advance Methods and Models for Big Data Analytics
• How To Select Appropriate Tools to Achieve the Best from Data Analytics
Course Objective
The objective of this course is to:

• Understand the role of Big Data for their organization


• Appreciate when to apply Data Analytics and the Best Methods of Approach
• Consider How to Choose Appropriate Models and Technology for Big Data
• Learn from case study examples and use case scenarios
• Successfully achieve results from applying best practice in Data Analytics
Course Schedule
Topic Name Day
• Current Practices and trends in Big Data Analytics 1
• Business Intelligence v Data Science
• Analytical Architecture for Big Data
• Roles for Big Data within the Technology and Commercial Enterprise
• Key Drivers for Big Data Analytics
• Case Study and Summary
• Data Analytics Lifecycle 2
• Stage 1 - Discovery
• Stage 2 - Preparation of Data
• Stage 3 - Model Planning and Review
• Stage 4 - Model Creation
• Stage 5 - Communication Plan
• Stage 6 - From Planning to Operation
• Case Study and Summary
Course Schedule
Topic Name Day

• Overview of R Framework 3
• Overview of Big Data Analytics
• Exploratory Data Analysis
• Statistical methods of Evaluation
• Advanced Methods of Clustering
• Advanced Theory and Methods of Association Rules
• Advanced Theory and Methods of Regression
• Case Study and Summary
4
• Advanced Analytical Theory of Classification
• Advanced Analytical Theory of Time Series Analysis
• Advanced Analytical Theory of Textual Analysis
• Technology and Tools for Advanced Data Analytics
• Use Case and Assessment
• Case Study and Summary
Course Schedule
Topic Name Day
5
• Unstructured Data Analytics
• Advanced Analytical Tools in Database Analytics
• How integrate Data Analytics
• Current Best Practice Management and Approach for Project Delivery
• Data Visualization Overview
• Summary ad Case Study
Day 1 – Agenda
• Current Practices and trends in Big Data Analytics
• Business Intelligence v Data Science
• Analytical Architecture for Big Data
• Roles for Big Data within the Technology and Commercial Enterprise
• Key Drivers for Big Data Analytics
• Case Study and Summary
Day 2 – Agenda
• Data Analytics Lifecycle
• Stage 1 - Discovery
• Stage 2 - Preparation of Data
• Stage 3 - Model Planning and Review
• Stage 4 - Model Creation
• Stage 5 - Communication Plan
• Stage 6 - From Planning to Operation
• Case Study and Summary
Day 3 – Agenda
• Overview of R Framework
• Overview of Big Data Analytics
• Exploratory Data Analysis
• Statistical methods of Evaluation
• Advanced Methods of Clustering
• Advanced Theory and Methods of Association Rules
• Advanced Theory and Methods of Regression
• Case Study and Summary
Day 4 – Agenda
• Advanced Analytical Theory of Classification
• Advanced Analytical Theory of Time Series Analysis
• Advanced Analytical Theory of Textual Analysis
• Technology and Tools for Advanced Data Analytics
• Use Case and Assessment
• Case Study and Summary
Day 5 – Agenda
• Unstructured Data Analytics
• Advanced Analytical Tools in Database Analytics
• How integrate Data Analytics
• Current Best Practice Management and Approach for Project Delivery
• Data Visualization Overview
• Summary ad Case Study
Current practices and trends in Big
data Analytics
“Data is everywhere and every form”

Data journey is long from simple data formats


to complex data formats a continued journey.

Current data practices are more on complex, real time


data and all sorts of organizations depends on data
Analytics for getting better insights and
recommendations
Evolution of Big data and
Technologies Ecosystem
AQUIRE : NiFi, FLUME, KAFKA, SPARK STREAM

ARRANGE: HDFS / NoSQL (Hbase, MongoDB,


Casandra)

PROCESS: SPARK, SOLR, FLINK

PIPELINES: DATABRICKS, AIRFLOW

ANALYZE: HIVE, IMPALA, SPARK SQL

VISUALIZE: TABLEAU, QLIK VIEW, EXCEL, HUE

DECIDE: R, PYTHON, ML, DATA SCIENCE


DATA TYPES
• STRUCTURED
• SEMI STRUCTURED
• UNSTRUCTURED
ROLE OF CLOUD COMPUTING IN BIG
DATA
• Cloud computing adoption by companies is due to high rise in
data volumes, maintaining data and infrastructure.
BIG DATA IMPLEMENTATIONS
• Telecom
• E-Commerce
• Social media
• Super markets / retail stores like Walmart
• Stock markets
• Financial organizations
• Government organizations
• Airlines
Gen AI
ANALYTICAL ARCHITECTURE FOR
BIG DATA
BUSINESS INTELLIGENCE VS. DATA
SCIENCE
KEY DRIVERS FOR BIG DATA
ANALYTICS
DATA ANALYTICS MODELS AND
LIFECYCLE
The data analytics models and lifecycle is to address big data
and data science projects. It goes through several life
stages, including creation, testing, processing,
consumption, and reuse.
STAGE 1: DISCOVERY
Data discovery helps analysts to gather the data, understand the
data, get better insights of the of the data, which can help in
better decisions, improve the process. It is a continue activity in
Data science.
STAGE 2: DATA PREPARATION
WHY DATA PREPARATION
•.

STAGE 3: MODEL PLANNING AND


REVIEW
Model planning is about understanding the problem and defining
the approach for building data analytics based on the
requirements.
It is crucial step in data analytics which can help in getting right
results,.

Predicting sales, customer churn, risk assessment).

Determine if it's a classification, regression, or clustering task,


among others
MODEL PLANNING AND REVIEW
Key components of model planning
• Defining the problem
• Data understanding
• Feature engineering
• Choosing the right model
• Model training and evaluation
• Deployment and Monitoring
STAGE 4: MODEL CREATION
STAGE 5: COMMUNICATION PLAN
How the data insights, models and analysis is conveyed to stake
holders, decision makers.
At this stage the work is converted into understandable and
meaningful presentations.
STAGE 6: PLANNING TO OPERATION
MACHINE LEARNING
Machine Learning is a subfield of Data Science that enables
machines to improve at a given task with experience.

It is important to note that all machine learning techniques are


classified as Artificial Intelligence ones. However, not all Artificial
Intelligence could count as Machine Learning, since some basic
Rule-based engines could be classified as AI, but they do not
learn from experience therefore they do not belong to the
machine learning category.
MACHINE LEARNING - AI
MACHINE LEARNING: BIG PICTURE
SUPERVISED LEARNING CLASSIFICATION
Training algorithms using
labeled input/output data.
REGRESSION

ARTIFICIAL MACHINE
INTELLIGENCE LEARNING
Science that enables Subset of AI UNSUPERVISED LEARNING
computers to mimic that enable Training algorithms with no
human intelligence. machines to labeled data. It attempts at CLUSTERING
Subfields: Machine improve at discovering hidden patterns on
Learning, robotics, and tasks with its own.
computer vision experience

REINFORCEMENT LEARNING
Algorithm take actions to
maximize cumulative reward.

You might also like