What Is Data Analytics?
What Is Data Analytics?
For businesses, the data they use may include historical data or new
information they collect for a particular initiative. They may also
collect it first-hand from their customers and site visitors or purchase it
from other organizations. Data a company collects about its own
customers is called first-party data, data a company obtains from a
known organization that collected it is called second-party data, and
aggregated data a company buys from a marketplace is called third-
party data. The data a company uses may include information about
an audience’s demographics, their interests, behaviors and more.
This means less money wasted as well as improved results from your
campaigns and content strategies. In addition to reducing your costs,
analytics can also boost your revenue through increased conversions,
ad revenue or subscriptions.
What Insights Can You Gain
From Data Analytics?
This and other types of data can reveal information about customer
affinities — expressed or suggested interest in activities, products,
brands and topics. A customer may express interest in your brand by
signing up for your email list. They may also indirectly express interest
in a topic by reading about it on your website. They may express
interest in a product by clicking on one of your ads for it. Some other
potential sources of customer affinity data include survey responses,
social media likes and video views.
You can then use this information to predict the behaviors of various
types of users and target your ads and content more effectively.
Data Analytics Technology
Data analytics is nothing new. Today, though, the growing volume of
data and the advanced analytics technologies available mean you can
get much deeper data insights more quickly. The insights that big data
and modern technologies make possible are more accurate and more
detailed. In addition to using data to inform future decisions, you can
also use current data to make immediate decisions.
As another example, let’s say you publish a site that features videos
about sports. As people visit your site, you could collect data about
which videos different visitors watch as well as how highly they rate
the videos, which ones they comment on and more. You could also
gather information about the demographics of each user. You can use
data analytics tools to determine which audience segments are most
likely to watch certain videos. You can then suggest videos to people
based on the segments they fit into best. For example, you might find
that older men are most likely to be interested in golf, while younger
men are most likely to be interested in basketball.
For some real-life examples of how Lotame’s data analytics tools have
helped business drive improved results, check out our case studies.
You also need to ensure data quality so your results are accurate. In
addition, your data needs to be accessible and not siloed so everyone
throughout your organization has the same repository.
4 Types of Data in Statistics
Introduction
Data types are important concepts in statistics, they enable us to apply
statistical measurements correctly on data and assist in correctly concluding
certain assumptions about it.
Having an adequate comprehension of the various data types is significantly
essential for doing Exploratory Data Analysis or EDA since you can use certain
factual measurements just for particular data types.
SImilarly, you need to know which data analysis and its type you are working
to select the correct perception technique. You can consider data types as an
approach to arrange various types of variables.
If you go into detail then there are only two classes of data in statistics, that
is Qualitative and Quantitative data. But, after that, there is a subdivision and
it breaks into 4 types of data. Data types are like a guide for doing the whole
study of statistics correctly!
This blog gives you a glance over different types of data need to know for
performing proper exploratory data analysis.
Qualitative and Quantitative Data
Qualitative data is a bunch of information that cannot be measured in the
form of numbers. It is also known as categorical data. It normally comprises
words, narratives, and we labelled them with names.
It delivers information about the qualities of things in data. The outcome of
qualitative data analysis can come in the type of featuring key words,
extracting data, and ideas elaboration.
For examples:
Hair colour- black, brown, red
Opinion- agree, disagree, neutral
On the other side, Quantitative data is a bunch of information gathered from a
group of individuals and includes statistical data analysis. Numerical data is
another name for quantitative data. Simply, it gives information about
quantities of items in the data and the items that can be estimated. And, we
can formulate them in terms of numbers.
For examples:
We can measure the height (1.70 meters), distance (1.35 miles) with
the help of a ruler or tape.
We can measure water (1.5 litres) with a jug.
Under a subdivision, nominal data and ordinal data come under qualitative
data. Interval data and ratio data come under quantitative data. Here we will
read in detail about all these data types.
Different Types of Data
1. Nominal Data
Nominal data are used to label variables where there is no quantitative
value and has no order. So, if you change the order of the value then the
meaning will remain the same.
Thus, nominal data are observed but not measured, are unordered but
non-equidistant, and have no meaningful zero.
The only numerical activities you can perform on nominal data is to state
that perception is (or isn't) equivalent to another (equity or inequity), and
you can use this data to amass them.
You can't organize nominal data, so you can't sort them.
Neither would you be able to do any numerical tasks as they are saved
for numerical data. With nominal data, you can calculate frequencies,
proportions, percentages, and central points.
Examples of Nominal data:
What languages do you speak?
English
German
French
Punjabi
What’s your nationality?
American
Indian
Japanese
German
You can clearly see that in these examples of nominal data the
categories have no order.
2. Ordinal Data
Ordinal data is almost the same as nominal data but not in the case of
order as their categories can be ordered like 1st, 2nd, etc. However, there
is no continuity in the relative distances between adjacent categories.
Ordinal Data is observed but not measured, is ordered but non-
equidistant, and has no meaningful zero. Ordinal scales are always used
for measuring happiness, satisfaction, etc.
With ordinal data, likewise, with nominal data, you can amass the
information by evaluating whether they are equivalent or extraordinary.
As ordinal data are ordered, they can be arranged by making basic
comparisons between the categories, for example, greater or less than,
higher or lower, and so on.
You can't do any numerical activities with ordinal data, however, as they
are numerical data.
With ordinal data, you can calculate the same things as nominal data like
frequencies, proportions, percentage, central point but there is one more
point added in ordinal data that is summary statistics and
similarly bayesian statistics.
Examples of Ordinal data:
Opinion
o Agree
o Disagree
o Mostly agree
o Neutral
o Mostly disagree
Time of day
o Morning
o Noon
o Night
In these examples, there is an obvious order to the categories.
3. Interval Data
Interval Data are measured and ordered with the nearest items but have
no meaningful zero.
The central point of an Interval scale is that the word 'Interval' signifies
'space in between', which is the significant thing to recall, interval scales
not only educate us about the order but additionally about the value
between every item.
Interval data can be negative, though ratio data can't.
Even though interval data can show up fundamentally the same as ratio
data, the thing that matters is in their characterized zero-points. If the
zero-point of the scale has been picked subjectively, at that point the
data can't be ratio data and should be interval data.
Hence, with interval data you can easily correlate the degrees of the data
and also you can add or subtract the values.
There are some descriptive statistics that you can calculate for interval
data are central point (mean, median, mode), range (minimum,
maximum), and spread (percentiles, interquartile range, and standard
deviation).
In addition to that, similar other statistical data analysis techniques can
be used for more analysis.
Examples of Interval data:
Temperature (°C or F, but not Kelvin)
Dates (1066, 1492, 1776, etc.)
Time interval on a 12-hour clock (6 am, 6 pm)
4. Ratio Data
Ratio Data are measured and ordered with equidistant items and a
meaningful zero and never be negative like interval data.
An outstanding example of ratio data is the measurement of heights. It
could be measured in centimetres, inches, meters, or feet and it is not
practicable to have a negative height.
Ratio data enlightens us regarding the order for variables, the contrasts
among them, and they have absolutely zero. It permits a wide range of
estimations and surmisings to be performed and drawn.
Ratio data is fundamentally the same as interval data, aside from zero
means none.
The descriptive statistics which you can calculate for ratio data are the
same as interval data which are central point (mean, median, mode),
range (minimum, maximum), and spread (percentiles, interquartile range,
and standard deviation).
Example of Ratio data:
Age (from 0 years to 100+)
Temperature (in Kelvin, but not °C or F)
Distance (measured with a ruler or any other assessing device)
Time interval (measured with a stop-watch or similar)
Therefore, for these examples of ratio data, there is an actual, meaningful
zero-point like the age of a person, absolute zero, distance calculated
from a specified point or time all have real zeros.
Key Takeaways
We hope you understood about 4 types of data in statistics and their
importance, now you can learn how to handle data correctly, which statistical
hypothesis tests you can use, and what you could calculate with them.
Moreover,
Nominal data and ordinal data are the types of qualitative data or
categorical data.
Interval data and ratio data are the types of quantitative data which are
also known as numerical data.
Nominal Data are not measured but observed and they are unordered,
non-equidistant, and also have no meaningful zero.
(Also check: Types of Statistical Analysis)
Ordinal Data is also not measured but observed and they are ordered
however non-equidistant and have no meaningful zero.
Interval Data are measured and ordered with equidistant items yet have
no meaningful zero.
Ratio Data are also measured and ordered with equidistant items and a
meaningful zero.
98%
What is data visualization? Presenting data for
decision-making
Data visualization definition
Data visualization is the presentation of data in a graphical format. It reduces the “noise”
of data by presenting it visually, making it easier for decision makers to see and
understand trends, outliers, and patterns in data.
Maps and charts were among the earliest forms of data visualization. One of the most
well-known early examples of data visualization was a flow map created by French civil
engineer Charles Joseph Minard in 1869 to help understand what Napoleon’s troops
suffered in the disastrous Russian campaign of 1812. The map used two dimensions to
depict the number of troops, distance, temperature, latitude and longitude, direction of
travel, and location relative to specific dates.
[ Learn the essential skills and traits of elite data scientists and the secrets of highly
successful data analytics teams. | Prove your data science chops by earning one of
these data science certifications. | Get the insights by signing up for our newsletters. ]
The business value of data visualization
Data visualization helps people analyze data quickly and efficiently. By providing easy-
to-understand visual representations of data, it helps employees make more informed
decisions based on that data. Presenting data in visual form can make it easier to
comprehend, enable people to obtain insights more quickly. Visualizations can also
make it easier to communicate those insights. Visual representations of data can also
make it easier to see how independent variables relate to one another. This can help
you see trends, understand the frequency of events, and track connections between
operations and performance, for example.
Exploration: Exploration visualizations help you understand what the data is telling you.
Explanation: Explanation visualizations tell a story to an audience using data.
2D area
These are typically geospatial visualizations. For example, cartograms use distortions of
maps to convey information such as population or travel time. Choropleths use shades
or patterns on a map to represent a statistical variable, such as population density by
state.
Temporal
These are one-dimensional linear visualizations that have a start and finish time.
Examples include a time series, which presents data like website visits by day or month,
and Gantt charts, which illustrate project schedules.
Multidimensional
These common visualizations present data with two or more dimensions. Examples
include pie charts, histograms, and scatter plots.
Hierarchical
These visualizations show how groups relate to each other. Tree diagrams are an
example of a hierarchical visualization that shows how larger groups encompass sets of
smaller groups.
Network
Network visualizations show how data sets are related to each other in a network. An
example is a node-link diagram, also known as a network graph, which uses nodes and
link lines to show how things are interconnected.
A dot map created by English physician John Snow in 1854 to understand the cholera
outbreak in London that year. The map used bar graphs on city blocks to indicate cholera
deaths at each household in a London neighborhood. The map showed that the worst-affected
households were all drawing water from the same well, which eventually led to the insight that
wells contaminated by sewage had caused the outbreak.
An animated age and gender demographic breakdown pyramid created by Pew
Research Center as part of its The Next America project, published in 2014. The project is filled
with innovative data visualizations. This one shows how population demographics have shifted
since the 1950s, with a pyramid of many young people at the bottom and very few older people
at the top in the 1950s to a rectangular shape in 2060.
A collection of four visualizations by Hanah Anderson and Matt Daniels of The Pudding
that illustrate gender disparity in pop culture by breaking down the scripts of 2,000 movies and
tallying spoken lines of dialogue for male and female characters. The visualizations include a
breakdown of Disney movies, the overview of 2,000 scripts, a gradient bar with which users can
search for specific movies, and a representation of age biases shown toward male and female
roles.
Domo
Domo is a cloud software company that specializes in business intelligence tools and
data visualization. It focuses on business-user deployed dashboards and ease of use.
Dundas BI
Dundas BI is a BI platform for visualizing data, building and sharing dashboards and
reports, and embedding analytics.
Infogram
Infogram is a drag-and-drop visualization tool for creating visualizations for marketing
reports, infographics, social media posts, dashboards, and more.
Microsoft Power BI
Microsoft Power BI is a business intelligence platform integrated with Microsoft Office. It
has an easy-to-use interface for making dashboards and reports.
Qlik
Qlik’s Qlik Sense features an “associative” data engine for investigating data and AI-
powered recommendations for visualizations. It is continuing to build out its open
architecture and multicloud capabilities.
Sisense
Sisense is an end-to-end analytics platform best known for embedded analytics. Many
customers use it in an OEM form.
Tableau
One of the most popular data visualization platforms on the market, Tableau is a
platform that supports accessing, preparing, analyzing, and presenting data.
Data visualization can help you draw actionable insights from massive
amounts of data in a short amount of time. Even a simple visualization, like
a bar graph, can present valuable insights in seconds. Take a look at the
example below:
Data collected from a corporate technology assessment is organized into this
colorful bar graph. By glancing at the chart, an IT manager could
immediately recognize which skills need improvement. From there, the
manager could decide to allocate resources towards training and recruiting in
these skill areas. All within a few minutes.
2 | Improve Accuracy
While big data provides decision-makers with all the information, it's not
always presented in a consumable form. Imagine for every high-impact
decision you must scroll through rows of data compiled in a spreadsheet, just
to digest all the facts. It's unreasonable, time-consuming, and confusing.
As a manager, you need to spend your time driving action, not analyzing
numbers. When it's hard to consume data, it's easy to ignore the facts
and lean on our biases. Instead of wasting valuable time analyzing rows of
data or falling back on your assumptions, use visualizations to identify
relevant information quickly.
Data visualization simplifies the information, reducing the need to fill the
gaps with your personal biases. In the bar chart above, you can easily see
a comparison of all the skills across the workforce. When you need to decide
where to allocate resources, your decision is based on factual data, not
assumptions.
3 | Simplify Communication
A decision is just words until it is carried out through people's actions. After
you make a decision, you must effectively communicate your thoughts
with the people who will carry out the subsequent steps. In the same way
that data visualization simplifies data analysis, it can also streamline and
objectify communication.
Alternatively, the manager could use the graph below to clearly communicate
why he is making this decision to the developer:
The chart clearly shows which skills do not meet the ideal proficiency levels
and by how much those skills need to improve. By presenting his message in
visual form, the manager can ensure the engineer understands why he needs
training and how he can gauge his progress. The visualization shifted the
manager's message from unclear and subjective to concise and
objective.
4 | Empower Collaboration
Great decision-making has always been a crucial skill for business leaders.
Big data can put you ahead of the competition if you can use it to produce
timely, informed decisions that deliver successful outcomes for your company.
Incorporating data visualization in the decision-making process can improve
speed, reduce inaccuracies, and enhance communication and collaboration.
How will you leverage data visualization to start making better data-driven
decisions today?
What is Data Science?
Data science defined
Data science combines multiple fields, including statistics, scientific methods, artificial
intelligence (AI), and data analysis, to extract value from data. Those who practice data science
are called data scientists, and they combine a range of skills to analyze data collected from the
web, smartphones, customers, sensors, and other sources to derive actionable insights.
Data science encompasses preparing data for analysis, including cleansing, aggregating, and
manipulating the data to perform advanced data analysis. Analytic applications and data
scientists can then review the results to uncover patterns and enable business leaders to draw
informed insights.
Data Science
Data science combines the scientific method, math and statistics, specialized programming,
advanced analytics, AI, and even storytelling to uncover and explain the business insights
buried in data.
Data preparation can involve cleansing, aggregating, and manipulating it to be ready for
specific types of processing. Analysis requires the development and use of algorithms,
analytics and AI models. It’s driven by software that combs through data to find patterns
within to transform these patterns into predictions that support business decision-
making. The accuracy of these predictions must be validated through scientifically
designed tests and experiments. And the results should be shared through the skillful
use of data visualization tools that make it possible for anyone to see the patterns and
understand trends.
As a result, data scientists (as data science practitioners are called) require computer
science and pure science skills beyond those of a typical data analyst. A data scientist
must be able to do the following:
This combination of skills is rare, and it’s no surprise that data scientists are currently in
high demand. According to an IBM survey (PDF, 3.9 MB), the number of job openings in
the field continues to grow at over 5% per year, with over 60,000 forecast for 2020.
Because companies are sitting on a treasure trove of data. As modern technology has enabled the
creation and storage of increasing amounts of information, data volumes have exploded. It’s
estimated that 90 percent of the data in the world was created in the last two years. For example,
Facebook users upload 10 million photos every hour.
But this data is often just sitting in databases and data lakes, mostly untouched.
The wealth of data being collected and stored by these technologies can bring transformative
benefits to organizations and societies around the world—but only if we can interpret it. That’s
where data science comes in.
Data science reveals trends and produces insights that businesses can use to make better
decisions and create more innovative products and services. Perhaps most importantly, it enables
machine learning (ML) models to learn from the vast amounts of data being fed to them, rather
than mainly relying upon business analysts to see what they can discover from the data.
Data is the bedrock of innovation, but its value comes from the information data scientists can
glean from it, and then act upon.
What’s the difference between data science, artificial intelligence, and machine learning?
To better understand data science—and how you can harness it—it’s equally important to know
other terms related to the field, such as artificial intelligence (AI) and machine learning. Often,
you’ll find that these terms are used interchangeably, but there are nuances.
Determine customer churn by analyzing data collected from call centers, so marketing can
take action to retain them
Improve efficiency by analyzing traffic patterns, weather conditions, and other factors so
logistics companies can improve delivery speeds and reduce costs
Improve patient diagnoses by analyzing medical test data and reported symptoms so doctors
can diagnose diseases earlier and treat them more effectively
Optimize the supply chain by predicting when equipment will break down
Detect fraud in financial services by recognizing suspicious behaviors and anomalous actions
Improve sales by creating recommendations for customers based upon previous purchases
Many companies have made data science a priority and are investing in it heavily. In Gartner’s
recent survey of more than 3,000 CIOs, respondents ranked analytics and business intelligence as
the top differentiating technology for their organizations. The CIOs surveyed see these
technologies as the most strategic for their companies, and are investing accordingly.
How data science is conducted
The process of analyzing and acting upon data is iterative rather than linear, but this is how the
data science lifecycle typically flows for a data modeling project:
Some of the most popular notebooks are Jupyter, RStudio, and Zeppelin. Notebooks are very
useful for conducting analysis, but have their limitations when data scientists need to work as a
team. Data science platforms were built to solve this problem.
To determine which data science tool is right for you, it’s important to ask the following
questions: What kind of languages do your data scientists use? What kind of working methods
do they prefer? What kind of data sources are they using?
For example, some users prefer to have a datasource-agnostic service that uses open source
libraries. Others prefer the speed of in-database, machine learning algorithms.
The data science lifecycle—also called the data science pipeline—includes anywhere
from five to sixteen (depending on whom you ask) overlapping, continuing processes.
The processes common to just about everyone’s definition of the lifecycle include the
following:
Capture: This is the gathering of raw structured and unstructured data from all
relevant sources via just about any method—from manual entry and web scraping to
capturing data from systems and devices in real time.
Prepare and maintain: This involves putting the raw data into a consistent format for
analytics or machine learning or deep learning models. This can include everything
from cleansing, deduplicating, and reformatting the data, to using ETL (extract,
transform, load) or other data integration technologies to combine the data into a data
warehouse, data lake, or other unified store for analysis.
Preprocess or process: Here, data scientists examine biases, patterns, ranges, and
distributions of values within the data to determine the data’s suitability for use with
predictive analytics, machine learning, and/or deep learning algorithms (or other
analytical methods).
Analyze: This is where the discovery happens—where data scientists perform
statistical analysis, predictive analytics, regression, machine learning and deep
learning algorithms, and more to extract insights from the prepared data.
Communicate: Finally, the insights are presented as reports, charts, and other data
visualizations that make the insights—and their impact on the business—easier for
decision-makers to understand. A data science programming language such as R or
Python (see below) includes components for generating visualizations; alternatively,
data scientists can use dedicated visualization tools.
Data science tools
Data scientists must be able to build and run code in order to create models. The most
popular programming languages among data scientists are open source tools that
include or support pre-built statistical, machine learning and graphics capabilities. These
languages include:
R: An open source programming language and environment for developing statistical
computing and graphics, R is the most popular programming language among data
scientists. R provides a broad variety of libraries and tools for cleansing and prepping
data, creating visualizations, and training and evaluating machine learning and deep
learning algorithms. It’s also widely used among data science scholars and
researchers.
Python: Python is a general-purpose, object-oriented, high-level programming
language that emphasizes code readability through its distinctive generous use of
white space. Several Python libraries support data science tasks, including Numpy for
handling large dimensional arrays, Pandas for data manipulation and analysis, and
Matplotlib for building data visualizations.
Machine Learning
https://www.youtube.com/watch?v=ukzFI9rgwfU
Machine Learning
https://www.youtube.com/watch?v=6M5VXKLf4D4
Deep learning
https://www.youtube.com/watch?v=bfmFfD2RIcg
Artificial Neural Network
This introduction to machine learning provides an overview of its history, important definitions,
applications and concerns within businesses today.
Since deep learning and machine learning tend to be used interchangeably, it’s worth
noting the nuances between the two. Machine learning, deep learning, and neural
networks are all sub-fields of artificial intelligence. However, deep learning is actually a
sub-field of machine learning, and neural networks is a sub-field of deep learning.
The way in which deep learning and machine learning differ is in how each algorithm
learns. Deep learning automates much of the feature extraction piece of the process,
eliminating some of the manual human intervention required and enabling the use of
larger data sets. You can think of deep learning as "scalable machine learning" as Lex
Fridman notes in this MIT lecture (00:30) (link resides outside IBM). Classical, or "non-
deep", machine learning is more dependent on human intervention to learn. Human
experts determine the set of features to understand the differences between data
inputs, usually requiring more structured data to learn.
"Deep" machine learning can leverage labeled datasets, also known as supervised
learning, to inform its algorithm, but it doesn’t necessarily require a labeled dataset. It
can ingest unstructured data in its raw form (e.g. text, images), and it can automatically
determine the set of features which distinguish different categories of data from one
another. Unlike machine learning, it doesn't require human intervention to process data,
allowing us to scale machine learning in more interesting ways. Deep learning and
neural networks are primarily credited with accelerating progress in areas, such as
computer vision, natural language processing, and speech recognition.
Neural networks, or artificial neural networks (ANNs), are comprised of a node layers,
containing an input layer, one or more hidden layers, and an output layer. Each node, or
artificial neuron, connects to another and has an associated weight and threshold. If the
output of any individual node is above the specified threshold value, that node is
activated, sending data to the next layer of the network. Otherwise, no data is passed
along to the next layer of the network. The “deep” in deep learning is just referring to the
depth of layers in a neural network. A neural network that consists of more than three
layers—which would be inclusive of the inputs and the output—can be considered a
deep learning algorithm or a deep neural network. A neural network that only has two or
three layers is just a basic neural network.
10 INDUSTRIES REDEFINED BY BIG DATA
ANALYTICS
It has been a widely acknowledged fact that big data has become a big
game changer in most of the modern industries over the last few years. As
big data continues to permeate our day-to-day lives the number of different
industries that are adopting big data continues to increase. It is well said
that when new technologies become cheaper and easier to use, they have
the potential to transform industries. That is exactly what is happening
with big data right now. Here are 10 Industries redefined the most by big
data analytics-
Sports
Most elite sports have now embraced data analytics. In Premier League
football games, cameras installed around the stadiums track the movement
of every player with the help of pattern recognition software generating over
25 data points per player every second. What more? NFL players have
installed sensors on their shoulder pads to gather intelligent insights on
their performance using data mining. It was analytics which helped British
rowers row their way to the Olympic gold.
Hospitality
Hotel and the luxury industry have turned to advanced analytics solutions
to understand the secret behind customer satisfaction initiatives. Yield
management in the hotel industry is one common use of analytics which is
an important means to tackle the recurring peaks in demand throughout the
year in consideration with other factors which include weather and local
events, that can influence the number and nationalities of guests checking
in.
Government and Public Sector Services
Analytics, data science, and big data have helped a number of cities to pilot
the smart cities initiative where data collection, analytics and the IoT
combine to create joined-up public services and utilities spanning the entire
city. For example, a sensor network has been rolled out across all 80 of the
council’s neighborhood recycling centres to help streamline collection
services, so wagons can prioritize the fullest recycling centres and skip
those with almost nothing in them.
Energy
The costs of extracting oil and gas are rising, and the turbulent state of
international politics adds to the difficulties of exploration and drilling for
new reserves. The energy industry Royal Dutch Shell, for example, has
been developing the “data-driven oilfield” in an attempt to bring down the
cost of drilling for oil.
And on a smaller but no less important scale, data and the Internet of
Things (IoT) is disrupting the way we use energy in our homes. The rise of
“smart homes” includes technology like Google’s Nest thermostat, which
helps make homes more comfortable and cut down on energy wastage.
Education
Education sector generates massive data through courseware and learning
methodologies. Important insights can identify better teaching strategies,
highlight areas where students may not be learning efficiently, and
transform how the education is delivered. Increasingly educational
establishments have been putting data into use for everything from
planning school bus routes to improving classroom cleanliness.
This industry also heavily relies on big data for risk analytics including; anti-
money laundering, demand enterprise risk management, “Know Your
Customer”, and fraud mitigation.
Transportation
Big data analytics finds huge application to the transportation industry.
Governments of different countries use big data to control the traffic,
optimize route planning and intelligent transport systems and congestion
management.
Big data is improving user experiences, and the massive adoption change
has just begun.
https://www.youtube.com/watch?v=_XfWkCsvbEU
Data Analysis is a process of inspecting, cleaning, transforming and modeling data with
the goal of discovering useful information, suggesting conclusions and supporting
decision-making
.
Data Mining
Business Intelligence
Statistical Analysis
Predictive Analytics
Text Analytics
Data Mining
Data Mining is the analysis of large quantities of data to extract previously unknown,
interesting patterns of data, unusual data and the dependencies. Note that the goal is
the extraction of patterns and knowledge from large amounts of data and not the
extraction of data itself.
Data mining analysis involves computer science methods at the intersection of the
artificial intelligence, machine learning, statistics, and database systems.
The patterns obtained from data mining can be considered as a summary of the input
data that can be used in further analysis or to obtain more accurate prediction results
by a decision support system.
Business Intelligence
Business Intelligence techniques and tools are for acquisition and transformation of
large amounts of unstructured business data to help identify, develop and create new
strategic business opportunities.
The goal of business intelligence is to allow easy interpretation of large volumes of
data to identify new opportunities. It helps in implementing an effective strategy based
on insights that can provide businesses with a competitive market-advantage and long-
term stability.
Statistical Analysis
Statistics is the study of collection, analysis, interpretation, presentation, and
organization of data.
In data analysis, two main statistical methodologies are used −
Descriptive statistics − In descriptive statistics, data from the entire population
or a sample is summarized with numerical descriptors such as −
o Mean, Standard Deviation for Continuous Data
o Frequency, Percentage for Categorical Data
Inferential statistics − It uses patterns in the sample data to draw inferences
about the represented population or accounting for randomness. These
inferences can be −
o answering yes/no questions about the data (hypothesis testing)
o estimating numerical characteristics of the data (estimation)
o describing associations within the data (correlation)
o modeling relationships within the data (E.g. regression analysis)
Predictive Analytics
Predictive Analytics use statistical models to analyze current and historical data for
forecasting (predictions) about future or otherwise unknown events. In business,
predictive analytics is used to identify risks and opportunities that aid in decision-
making.
Text Analytics
Text Analytics, also referred to as Text Mining or as Text Data Mining is the process of
deriving high-quality information from text. Text mining usually involves the process of
structuring the input text, deriving patterns within the structured data using means such
as statistical pattern learning, and finally evaluation and interpretation of the output.
Data Analysis Process
Data Analysis is defined by the statistician John Tukey in 1961 as "Procedures for
analyzing data, techniques for interpreting the results of such procedures, ways of
planning the gathering of data to make its analysis easier, more precise or more
accurate, and all the machinery and results of (mathematical) statistics which apply to
analyzing data.”
Thus, data analysis is a process for obtaining large, unstructured data from various
sources and converting it into information that is useful for −
Answering questions
Test hypotheses
Decision-making
Disproving theories
Data Collection
Data Collection is the process of gathering information on targeted variables identified
as data requirements. The emphasis is on ensuring accurate and honest collection of
data. Data Collection ensures that data gathered is accurate such that the related
decisions are valid. Data Collection provides both a baseline to measure and a target
to improve.
Data is collected from various sources ranging from organizational databases to the
information in web pages. The data thus obtained, may not be structured and may
contain irrelevant information. Hence, the collected data is required to be subjected to
Data Processing and Data Cleaning.
Data Processing
The data that is collected must be processed or organized for analysis. This includes
structuring the data as required for the relevant Analysis Tools. For example, the data
might have to be placed into rows and columns in a table within a Spreadsheet or
Statistical Application. A Data Model might have to be created.
Data Cleaning
The processed and organized data may be incomplete, contain duplicates, or contain
errors. Data Cleaning is the process of preventing and correcting these errors. There
are several types of Data Cleaning that depend on the type of data. For example, while
cleaning the financial data, certain totals might be compared against reliable published
numbers or defined thresholds. Likewise, quantitative data methods can be used for
outlier detection that would be subsequently excluded in analysis.
Data Analysis
Data that is processed, organized and cleaned would be ready for the analysis.
Various data analysis techniques are available to understand, interpret, and derive
conclusions based on the requirements. Data Visualization may also be used to
examine the data in graphical format, to obtain additional insight regarding the
messages within the data.
Statistical Data Models such as Correlation, Regression Analysis can be used to
identify the relations among the data variables. These models that are descriptive of
the data are helpful in simplifying analysis and communicate results.
The process might require additional Data Cleaning or additional Data Collection, and
hence these activities are iterative in nature.
Communication
The results of the data analysis are to be reported in a format as required by the users
to support their decisions and further action. The feedback from the users might result
in additional analysis.
The data analysts can choose data visualization techniques, such as tables and charts,
which help in communicating the message clearly and efficiently to the users. The
analysis tools provide facility to highlight the required information with color codes and
formatting in tables and charts.
Conditional Formatting
Excel provides you conditional formatting commands that allow you to color the cells or
font, have symbols next to values in the cells based on predefined criteria. This helps
one in visualizing the prominent values. You will understand the various commands for
conditionally formatting the cells.
Quick Analysis
With Quick Analysis tool in Excel, you can quickly perform various data analysis tasks
and make quick visualizations of the results.
PivotTables
With PivotTables you can summarize the data, prepare reports dynamically by
changing the contents of the PivotTable.
Data Visualization
You will learn several Data Visualization techniques using Excel Charts. You will also
learn how to create Band Chart, Thermometer Chart, Gantt chart, Waterfall Chart,
Sparklines and PivotCharts.
Data Validation
It might be required that only valid values be entered into certain cells. Otherwise, they
may lead to incorrect calculations. With data validation commands, you can easily set
up data validation values for a cell, an input message prompting the user on what is
expected to be entered in the cell, validate the values entered with the defined criteria
and display an error message in case of incorrect entries.
Financial Analysis
Excel provides you several financial functions. However, for commonly occurring
problems that require financial analysis, you can learn how to use a combination of
these functions.
Formula Auditing
When you use formulas, you might want to check whether the formulas are working as
expected. In Excel, Formula Auditing commands help you in tracing the precedent and
dependent values and error checking.
Inquire
Excel also provides Inquire add-in that enables you compare two workbooks to identify
changes, create interactive reports, and view the relationships among workbooks,
worksheets, and cells. You can also clean the excessive formatting in a worksheet that
makes Excel slow or makes the file size huge.
Gartner Top 10 Trends in Data and Analytics for 2020
Data and analytics leaders need to regularly evaluate their existing analytics
and business intelligence (BI) tools and innovative startups offering new
augmented and NLP-driven user experiences beyond the predefined
dashboard.
Data and analytics leaders should look for augmented data management
enabling active metadata to simplify and consolidate their architectures, and
also increase automation in their redundant data management tasks.
As data and analytics moves to the cloud, data and analytics leaders still
struggle to align the right services to the right use cases, which leads to
unnecessary increased governance and integration overhead.
The question for data and analytics is moving from how much a given service
costs to how it can meet the workload’s performance requirements beyond the
list price.
Data and analytics leaders need to prioritize workloads that can exploit cloud
capabilities and focus on cost optimization and other benefits such as change
and innovation acceleration when moving to cloud.
The collision of data and analytics will increase interaction and collaboration
between historically separate data and analytics roles. This impacts not only
the technologies and capabilities provided, but also the people and processes
that support and use them. The spectrum of roles will extend from traditional
data and analytics roles in IT to information explorer, consumer and citizen
developer as an example.
Outside of limited bitcoin and smart contract use cases, ledger database
management systems (DBMSs) will provide a more attractive option for
single-enterprise auditing of data sources. By 2021, Gartner estimates that
most permissioned blockchain uses will be replaced by ledger DBMS
products.
Data and analytics should position blockchain technologies as supplementary
to their existing data management infrastructure by highlighting the
capabilities mismatch between data management infrastructure and
blockchain technologies.
It helps data and analytics leaders find unknown relationships in data and
review data not easily analyzed with traditional analytics.
https://www.gartner.com/smarterwithgartner/gartner-top-10-trends-in-data-and-analytics-for-2020