0% found this document useful (0 votes)
17 views46 pages

Lecture

Data analytics is the process of analyzing raw data to extract meaningful insights that inform business decisions. It differs from data science, which focuses on building systems for automation and optimization. The document also discusses various types of data, the importance of metadata, and the characteristics of big data.

Uploaded by

hamzasaif4791
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views46 pages

Lecture

Data analytics is the process of analyzing raw data to extract meaningful insights that inform business decisions. It differs from data science, which focuses on building systems for automation and optimization. The document also discusses various types of data, the importance of metadata, and the characteristics of big data.

Uploaded by

hamzasaif4791
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 46

DATA

ANALYTICS
Prepared By:
Lecturer: Muhammad
Bilal
DATA ANALYTICS
1. What is data analytics?
Most companies are collecting loads of data all the time—but, in
its raw form, this data doesn’t really mean anything. This is where
data analytics comes in. Data analytics is the process of analyzing
raw data in order to draw out meaningful, actionable insights,
which are then used to inform and drive smart business decisions.
HOW DATA ANALYTICS HELP US

YOU CAN THINK OF DATA ANALYTICS AS A FORM OF BUSINESS INTELLIGENCE, USED TO


SOLVE SPECIFIC PROBLEMS AND CHALLENGES WITHIN AN ORGANIZATION.

IT’S ALL ABOUT FINDING PATTERNS IN A DATASET WHICH CAN TELL YOU SOMETHING
USEFUL AND RELEVANT ABOUT A PARTICULAR AREA OF THE BUSINESS

HOW CERTAIN CUSTOMER GROUPS BEHAVE, FOR EXAMPLE, OR HOW EMPLOYEES


ENGAGE WITH A PARTICULAR TOOL.
HOW DATA ANALYTICS HELP US

DATA ANALYTICS HELPS YOU TO MAKE SENSE OF THE PAST AND TO PREDICT FUTURE
TRENDS AND BEHAVIORS

RATHER THAN BASING YOUR DECISIONS AND STRATEGIES ON GUESSWORK, YOU’RE


MAKING INFORMED CHOICES BASED ON WHAT THE DATA IS TELLING YOU.

ARMED WITH THE INSIGHTS DRAWN FROM THE DATA, BUSINESSES AND
ORGANIZATIONS ARE ABLE TO DEVELOP A MUCH DEEPER UNDERSTANDING OF THEIR
AUDIENCE, THEIR INDUSTRY, AND THEIR COMPANY AS A WHOLE—AND, AS A RESULT,
ARE MUCH BETTER EQUIPPED TO MAKE DECISIONS AND PLAN AHEAD.
HISTORY OF DATA ANALYTICS
The use of Data Analytics by business can be found as far back as the 19th century,
when Frederick Winslow Taylor initiated time management exercises. Another example is
when Henry Ford measured the speed of assembly lines. In the late 1960s, analytics
began receiving more attention as computers became decision-making support systems.

•Predictive Analytics
•Big Data Analytics
•Cognitive Analytics
•Prescriptive Analytics
•Descriptive Analytics
•Enterprise Decision Management
•Retail Analytics
•Augmented Analytics
•Web Analytics
•Call Analytics
2. What’s the difference between data
analytics and data science?
You’ll find that the terms “data science” and “data analytics” tend
to be used interchangeably. However, they are two different fields
and denote two distinct career paths. What’s more, they each have
a very different impact on the business or organization.
data analysts tackle and solve discrete questions about data,
often on request, revealing insights that can be acted upon by
other stakeholders,

while data scientists build systems to automate and optimize


the overall functioning of the business.
DATA ANALYTICS VS. DATA ANALYSIS
The terms Data Analysis and Data Analytics are often used
interchangeably, including in this course.
However it is important to note that there is a subtle difference
between the terms and meaning of the words Analysis and
Analytics. In fact some people go far as saying that these terms
mean different things and should not be used interchangeably.
Yes, there is a technical difference...
The dictionary meanings are:
Analysis - detailed examination of the elements or structure of
something
Analytics - the systematic computational analysis of data or
statistics
DIFFERENCE BETWEEN DATA ANALYTICS AND DATA
ANALYSIS
1. Data Analytics :
• Analytics is a technique of converting raw facts and figures into some particular
actions by analyzing those raw data evaluations and perceptions in the context of
organizational problem-solving and also with the decision making.
• Analytics is the discovery and conversation of significant patterns in data.
Especially, precious in areas prosperous with recorded information, analytics
depends on the simultaneous utility of statistics, computer programming, and
operation lookup to qualify performance. Analytics frequently favors data
visualization to talk insight.
• The aim of Data Analytics is to get actionable insights ensuing in smarter
selections and higher commercial enterprise outcomes.
DIFFERENCE BETWEEN DATA ANALYTICS AND DATA
ANALYSIS
2. Data Analysis :

• It is the technique of observing, transforming, cleaning, and modeling


raw facts and figures with the purpose of developing beneficial
information and acquiring profitable conclusions.
WHAT ARE PROGRAMMING LANGUAGES?
Programming languages give instructions to computers. A high-level
programming language is typically more user-friendly and easier to read
and write than a low-level programming language. The source code of
high-level languages uses a syntax that is easy to read.
This is then converted into a low-level language that the central
processing unit can recognize. Popular high-level languages include C,
C++, Java, and JavaScript. Processors run low-level languages without
the need for an interpreter. These are machine languages that computers
understand directly.
Best Data Analysis Languages
•Python
•SQL
•R
•Java
•Scala
What is Data?
• The term data is derived from Latin word ‘Datum’ which refers to
‘something given’.
• Data is raw and unorganized facts that are useless without proper
processing and organizing them to retrieve some information for
future use .
• Data is a set of facts and statistics that can be operated, referred or
analyzed.
• It can simply be a piece of information, a list of grocery items, or
observations, a story or a description of a certain scenario
What is Data?
In general, data is any set of characters that is gathered and
translated for some purpose, usually analysis. If data is not put
into context, it doesn't do anything to a human or computer.
There are multiple types of data. Some more common types of
data include the following:

•Single character
•Boolean (true or false)
•Text (string)
•Number (integer or floating-point)
•Picture
•Sound
•Video
WHAT IS META DATA?
• Data that provide information about other data.

• Metadata summarizes basic information about data,


making finding & working with particular instances of
data easier.

• Metadata can be created manually to be more


accurate, or automatically and contain more basic
information.
META DATA?
•Metadata is a data about data. Metadata shows basic
information about data, which can make finding and
working with specific instances of data easier.

• Metadata increases the accuracy of searching and


operating of data from large amount of data.

•It helps in fetching piece of some data that is required


from the bundle of data vastly, Metadata provides the
information regarding organization of raw data.
META DATA?
•It may be created manually or by automatic information
processing.

•Manual processed metadata is more accurate than


automatic information processed one because automatic
information processed metadata only contains file name,
size, extension, time of creation and information about
who created the file.
META DATA EXAMPLE
META DATA EXAMPLE
REAL WORLD DATA

Real-world data (RWD) is defined as "the data relating to


patient health status and/or the delivery of health care
routinely collected from a variety of sources."
Sources of RWD include, but are not limited to:

•Electronic health records (EHRs);


•Claims and billing activity;
•Product and disease registries;
•Data gathered from other sources such as mobile devices,
wearables such a pedometers and smart watches, etc.
DATA VS INFORMATION
WHAT ARE THE 5 V'S OF BIG DATA?
•Volume: the size and amounts of big data that companies manage and
analyze
•Variety: the diversity and range of different data types, including
unstructured data, semi-structured data and raw data
•Velocity: the speed at which companies receive, store and manage data –
e.g., the specific number of social media posts or search queries received
within a day, hour or other unit of time
•Value: the most important “V” from the perspective of the business, the
value of big data usually comes from insight discovery and pattern
recognition that lead to more effective operations, stronger customer
relationships and other clear and quantifiable business benefits
•Veracity: the “truth” or accuracy of data and information assets, which often
determines executive-level confidence
The additional characteristic of variability can also be considered:
•Variability: the changing nature of the data companies seek to capture,
manage and analyze – e.g., in sentiment or text analytics, changes in the
WHAT ARE THE 5 V'S OF BIG DATA?
Volume:
The name ‘Big Data’ itself is related to a size which is enormous.
Volume is a huge amount of data.
To determine the value of data, size of data plays a very crucial role. If
the volume of data is very large, then it is actually considered as a ‘Big
Data’. This means whether a particular data can actually be considered
as a Big Data or not, is dependent upon the volume of data.
Hence while dealing with Big Data it is necessary to consider a
characteristic ‘Volume’.
Example: In the year 2016, the estimated global mobile traffic was 6.2
Exabytes (6.2 billion GB) per month. Also, by the year 2020 we will have
almost 40000 Exabytes of data.
VOLUME
WHAT ARE THE 5 V'S OF BIG DATA?
Variety:
• It refers to nature of data that is structured, semi-structured and unstructured data.
• It also refers to heterogeneous sources.
• Variety is basically the arrival of data from new sources that are both inside and outside of an
enterprise. It can be structured, semi-structured and unstructured.

• Structured data: This data is basically an organized data. It generally refers to data that has
defined the length and format of data.
• Semi- Structured data: This data is basically a semi-organised data. It is generally a form of
data that do not conform to the formal structure of data. Log files are the examples of this
type of data.
• Unstructured data: This data basically refers to unorganized data. It generally refers to data
that doesn’t fit neatly into the traditional row and column structure of the relational
database. Texts, pictures, videos etc. are the examples of unstructured data which can’t be
stored in the form of rows and columns.
VARIETY:
WHAT ARE THE 5 V'S OF BIG DATA?
Velocity:

• Velocity refers to the high speed of accumulation of data.


• In Big Data velocity data flows in from sources like machines, networks, social media, mobile
phones etc.
• There is a massive and continuous flow of data. This determines the potential of data that how
fast the data is generated and processed to meet the demands.
• Sampling data can help in dealing with the issue like ‘velocity’.
• Example: There are more than 3.5 billion searches per day are made on Google. Also, Facebook
users are increasing by 22%(Approx.) year by
VELOCITY:
WHAT ARE THE 5 V'S OF BIG DATA?

Value:
•After having the 4 V’s into account there comes one more V which stands
for Value! The bulk of Data having no Value is of no good to the company,
unless you turn it into something useful.

•Data in itself is of no use or importance but it needs to be converted into


something valuable to extract Information. Hence, you can state that Value!
is the most important V of all the 5V’s.
WHAT ARE THE 5 V'S OF BIG DATA?

Variability:

•How fast or available data that extent is the structure of your data is
changing?
•How often does the meaning or shape of your data change?
•Example: if you are eating same ice-cream daily and the taste just keep
changing.
STRUCTURED AND UNSTRUCTURED
Facebook Friends Undergraduate Degree

Home Town

Housemates

Archery Clubs

Work Placement
Graduate Mixer
Facebook Friends

Technical Communication
Lu Xin Damien Lafferty Niall Turbitt

Sean Cawley Georg Christian


Structured Data Unstructured Data
• 80% of all data is unstructured
data

• Unstructured data estimated at


3,000,000 petabytes

• Relative distance from the Earth to


Jupiter

Dublin Cork
TEXT
•Forms the majority of unstructured data
•Nearly one million bits of content shared on
Facebook every minute
•Over 100,000 tweets per minute
TEXT MINING EXAMPLE
• People’s mood on coffee, wine, beer and soda from Twitter
• Compare tweets to database of positive and negative words
• Calculate a sentiment score:
Score = # of Positive Words - # of Negative Words

• If Score > 0 - 'positive opinion'


• If Score < 0 - 'negative opinion'
TYPES OF DATA
TYPES OF DATA
QUALITATIVE VS QUANTITATIVE DATA
1. Quantitative data
Quantitative data seems to be the easiest to explain. It answers key
questions such as “how many, “how much” and “how often”.
Quantitative data can be expressed as a number or can be quantified.
Simply put, it can be measured by numerical variables.
Quantitative data are easily amenable to statistical manipulation and
can be represented by a wide variety of statistical types of graphs and
charts such as line, bar graph, scatter plot, and etc.
Examples of quantitative data:
•Scores on tests and exams e.g. 85, 67, 90 and etc.
•The weight of a person or a subject.
•Your shoe size.
•The temperature in a room.
2. Qualitative data

Qualitative data can’t be expressed as a number and can’t be measured.


Qualitative data consist of words, pictures, and symbols, not numbers.
Qualitative data is also called categorical data because the information can be
sorted by category, not by number.

Qualitative data can answer questions such as “how this has happened” or and
“why this has happened”.

Examples of qualitative data:


•Colors e.g. the color of the sea
•Your favorite holiday destination such as Hawaii, New Zealand and etc.
•Names as John, Patricia,…..
•Ethnicity such as American Pakistan, Asian, etc.
NOMINAL VS ORDINAL DATA

3. Nominal data

Nominal data is used just for labeling variables, without any type of quantitative
value. The name ‘nominal’ comes from the Latin word “nomen” which means
‘name’.
The nominal data just name a thing without applying it to order. Actually, the
nominal data could just be called “labels.”

Examples of Nominal Data:

•Gender (Women, Men)


•Hair color (Blonde, Brown, Brunette, Red, etc.)
•Marital status (Married, Single, Widowed)
•Ethnicity (Hispanic, Asian)
4. Ordinal data

Ordinal data shows where a number is in order. This is the crucial difference from nominal types of data.
Ordinal data is data which is placed into some kind of order by their position on a scale. Ordinal data may
indicate superiority.
However, you cannot do arithmetic with ordinal numbers because they only show sequence.
Ordinal variables are considered as “in between” qualitative and quantitative variables.
In other words, the ordinal data is qualitative data for which the values are ordered.
In comparison with nominal data, the second one is qualitative data for which the values cannot be placed
in an ordered.
We can also assign numbers to ordinal data to show their relative position. But we cannot do math with
those numbers. For example: “first, second, third…etc.”

Examples of Ordinal Data:

•The first, second and third person in a competition.


•Letter grades: A, B, C, and etc.
•When a company asks a customer to rate the sales experience on a scale of 1-10.
•Economic status: low, medium and high.
DISCRETE VS CONTINUOUS DATA

As we mentioned above discrete and continuous data are the two key types of
quantitative data.
In statistics, marketing research, and data science, many decisions depend on
whether the basic data is discrete or continuous.
5. Discrete data
Discrete data is a count that involves only integers. The discrete values cannot be
subdivided into parts.
For example, the number of children in a class is discrete data. You can count whole
individuals. You can’t count 1.5 kids.
To put in other words, discrete data can take only certain values. The data variables
cannot be divided into smaller parts.
It has a limited number of possible values e.g. days of the month.

Examples of discrete data:


•The number of students in a class.
•The number of workers in a company.
•The number of home runs in a baseball game.
•The number of test questions you answered correctly
6. Continuous data
Continuous data is information that could be meaningfully divided into finer levels. It
can be measured on a scale or continuum and can have almost any numeric value.
For example, you can measure your height at very precise scales — meters,
centimeters, millimeters and etc.
You can record continuous data at so many different measurements – width,
temperature, time, and etc. This is where the key difference from discrete types of
data lies.
The continuous variables can take any value between two numbers. For example,
between 50 and 72 inches, there are literally millions of possible heights: 52.04762
inches, 69.948376 inches and etc.
A good great rule for defining if a data is continuous or discrete is that if the point of
measurement can be reduced in half and still make sense, the data is continuous.
Examples of continuous data:
•The amount of time required to complete a project.
•The height of children.
•The square footage of a two-bedroom house.
•The speed of cars.

You might also like