0% found this document useful (0 votes)
110 views18 pages

Data Strategy Guide

1) The document introduces a course on data analytics and discusses its high-level goals, motivation, and format. 2) It aims to teach foundational concepts, statistical methods, machine learning algorithms, and tools for analyzing structured and unstructured data to solve business problems. 3) The course will use demonstrations and hands-on exercises to illustrate applying analytics to real data and visualizing results.

Uploaded by

Amit Jain
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
110 views18 pages

Data Strategy Guide

1) The document introduces a course on data analytics and discusses its high-level goals, motivation, and format. 2) It aims to teach foundational concepts, statistical methods, machine learning algorithms, and tools for analyzing structured and unstructured data to solve business problems. 3) The course will use demonstrations and hands-on exercises to illustrate applying analytics to real data and visualizing results.

Uploaded by

Amit Jain
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 18

Introduction to Data Analytics

B.RAMAMURTHY

Rich's Data Analytics Training 10/29/2022


High Level Goals for the course
2

Understand foundations of data analytics so that you can interpret


and communicate results and make informed decisions
Study and learn to apply common statistical methods and
machine learning algorithms to solve business problems
Learn to work with popular tools to analyze and visualize data; more
importantly encourage consistency across departments on
analytics/tools used
Working with cloud for data storage and for deployment of
applications
Learn methods for mastering and applying emerging concepts and
technologies for continuous data-driven improvements to
your business processes
Transform complex analytics into routine processes

Rich's Data Analytics Training 10/29/2022


Motivation
3

Tremendous advances have taken place in statistical


methods and tools, machine learning and data mining
approaches, and internet based dissemination tools for
analysis and visualization.
Many tools are open source and freely available for
anybody to use.
Is there an easy entry-point into learning these
technologies?
Can we make these tools easily accessible to the decision
makers similar to how “office” productivity software is
used?

Rich's Data Analytics Training 10/29/2022


Newer kinds of Data
4

New kinds of data from different sources (see p.23 of Data Science
book) : tweets, geo location, emails, blogs
Two major types: structured and unstructured data
Structured data: data collected and stored according to well
defined schema; Realtime stock quotes
Unstructured data: messages from social media, news, talks,
books, letters, manuscripts, court documents..
“Regardless of their differences, they work in tandem in any
effective big data operation. Companies wishing to make the most
of their data should use tools that utilize the benefits of both.” 5
We will discuss methods for analyzing both structured and
unstructured data

Rich's Data Analytics Training 10/29/2022


Top Ten Largest Databases

7000

6000

5000
Terabytes

4000
Top ten largest databases (2007)
3000

2000

1000

0
LOC CIA Amazon YOUTube ChoicePt Sprint Google AT&T NERSC Climate

Ref: http://www.comparebusinessproducts.com/fyi/10-largest-databases-in-the-world/

Rich's Data Analytics Training 5 10/29/2022


Top Ten Largest Databases in 2007 vs
Facebook ‘s cluster in 2010
21 PetaByte
In 2010
7000

6000

5000

4000
Terabytes

3000
Top ten largest databases (2007)
2000

1000

0
LOC CIA Amazon YOUTube ChoicePt Sprint Google AT&T NERSC Climate Facebook

Ref: http://www.comparebusinessproducts.com/fyi/10-largest-databases-in-the-world

Rich's Data Analytics Training 6 10/29/2022


Data Strategy
7

 In this era of big data, what is your data strategy?


 Strategy as in simple “Planning for the data challenge”
 It is not only about big data: all sizes and forms of data
 Data collections from customers used to be an elaborate
task: surveys, and other such instruments
 Nowadays data is available in abundance: thanks to the
technological advances as well as the social networks
 Data is also generated by many of your own business
processes and applications
 Data strategy means many different things: we will discuss
this next

Rich's Data Analytics Training 10/29/2022


Components of a data Strategy1
8

Data integration
Meta data
Data modeling
Organizational roles and responsibilities
Performance and metrics
Security and privacy
Structured data management
Unstructured data management
Business intelligence
Data analysis and visualization
Tapping into social data
This course will provide training in emerging technologies, tools, environments
and APIs available for developing and implementing one or more of these
components.

Rich's Data Analytics Training 10/29/2022


Data Strategy for newer kinds of data
9

How will you collect data? Aggregate data? What are


your sources? (Eg. Social media)
How will you store them? And Where?
How will you use the data? Analyze them? Analytics?
Data mining? Pattern recognition?
How will you present or report the data to the
stakeholders and decision makers? visualization?
Archive the data for provenance and accountability.

Rich's Data Analytics Training 10/29/2022


Tools for Analytics
10

Elaborate tools with nifty visualizations; expensive


licensing fees: Ex: Tableau, Tom Sawyer
Software that you can buy for data analytics: Brilig,
small, affordable but short-lived
Open sources tools: Gephi, sporadic support
Open source, freeware with excellent community
involvement: R system
Some desirable characteristics of the tools: simple,
quick to apply, intuitive, useful, flat learning curve
A demo to prove this point: data  actions /decisions

Rich's Data Analytics Training 10/29/2022


Demo: Exam1 Grade: Traditional reporting 1

Q1 Q2 Q3 Q4 Q5 Total
16.7 13.9 9.6 18.5 13.7 72.4
20.0 16.0 9.0 19.0 17.0 76.0
20.0 20.0 15.0 25.0 20.0 90.0

Q1 Q2 Q3 Q4 Q5 Total
16.0 14.2 9.6 19.4 14.0 73.2
80.1% 71.1% 64.0% 77.4% 70.2% 73.2%

Q1 Q2 Q3 Q4 Q5 Total
17.3 13.6 9.7 17.6 13.3 71.5
86.7% 67.8% 64.6% 70.3% 66.7% 71.5%

Question 1..5, total, mean, median, mode; mean ver1, mean ver2
Rich's Data Analytics Training 11 10/29/2022
Traditional approach 2: points vs #students
12

Distribution of exam1 points


Rich's Data Analytics Training 10/29/2022
Individual questions analyzed..
13

Rich's Data Analytics Training 10/29/2022


Interpretation and action/decisions
14

Rich's Data Analytics Training 10/29/2022


R-code
15

data2<-read.csv(file.choose())

exam1<-data2$midterm
hist(exam1, col=rainbow(8))
boxplot(data2, col=rainbow(6))

boxplot(data2,col=c("orange","green","blue","grey","yellow", "sienna"))
fn<-boxplot(data2,col=c("orange","green","blue","grey","yellow", "pink"))$stats

text(5.55, fn[1,6], paste("Minimum =", fn[1,6]), adj=0, cex=.7)


text(5.55, fn[2,6], paste("LQuartile =", fn[2,6]), adj=0, cex=.7)
text(5.0, fn[3,6], paste("Median =", fn[3,6]), adj=0, cex=.7)
text(5.55, fn[4,6], paste("UQuartile =", fn[4,6]), adj=0, cex=.7)
text(5.55, fn[5,6], paste("Maximum =", fn[5,6]), adj=0, cex=.7)

grid(nx=NA, ny=NULL)

Rich's Data Analytics Training 10/29/2022


Demo Details
16

Grade data stored in excel file and common input format


Converted this file to csv
Start a R Studio project
Read in the csv data (using a file chooser option) into
data2
boxplot(data2)
That is it.
You can now add legends, colors, and labels to make it
presentable.
Export the plot as a image or pdf to report the results

Rich's Data Analytics Training 10/29/2022


Format of the course
17

Focus on a single topic per session


Begin with general introduction to the topic
Related concepts explained
Sample problems and solutions, algorithms, methods
and hands on exercises
Implement solutions using tools
Don’t hesitate to provide feedback, ask questions
What this course is NOT: We will NOT teach
Statistics or Machine Learning insides, but we will
learn how to apply and use them for data analytics
Rich's Data Analytics Training 10/29/2022
Session Format

Slide Presentation Visualization


Portfolio
Session: lecture,
demos, hands-on
Lab Handout
exercises

Projects:
R-Project

Code/Program Data

Rich's Data Analytics Training 18 10/29/2022

You might also like