Unit 1ppt
Unit 1ppt
• Data Collection: The first step in data analytics is to gather data from
various sources such as databases, sensors, or customer interactions.
• Data Cleaning: The next step is to clean the data to remove any errors or
inconsistencies that may compromise the analysis.
• Data Exploration and Visualization: In this step, data analysts will explore
the data using descriptive statistics, graphs, and other visualization tools to
understand the underlying patterns and relationships in the data.
• Data Modelling: In this step, data analysts will use statistical models and
algorithms to extract insights from the data. Some common models used in
data analytics include linear regression, decision trees, and neural
networks.
• Data Interpretation: Finally, the results of the analysis are interpreted to
provide valuable insights and knowledge that can inform decision making.
Types of Data
Classification of Data
• In terms of nature, data can be classified into structured, semi-
structured, and unstructured data.
• Structured Data: Structured data refers to data that is organized into a
fixed format, such as databases or spreadsheets. It is the easiest to
analyze and has well-defined fields, columns, and rows. Structured data
is often numerical in nature and can be easily processed by computers.
Examples of structured data include customer transactions, employee
records, and product inventory data.
• Semi-Structured Data: Semi-structured data contains elements of
structure, but also has unstructured elements, such as text or images.
This type of data is not as easy to analyze as structured data, but it still
has some structure that can be leveraged. Examples of semi-structured
data include email, customer reviews, and product descriptions.
• Unstructured Data: Unstructured data refers to data that does not have
a pre-defined structure and is often more difficult to analyze. This type
of data is usually in the form of text, audio, video, or images and does
not easily fit into a database or spreadsheet. Examples of unstructured
data include audio recordings, video files, and social media posts.
Sources of Data
Most of the data collected are of two types known as
“qualitative data“ which is a group of non-numerical data
such as words, sentences mostly focus on behavior and
actions of the group and another one is “quantitative data”
which is in numerical forms and can be calculated using
different scientific tools and sampling data.
The actual data is then further divided mainly into two types known
as:
•Primary data
•Secondary data
•Other Sources
Primary Data
The data which is Raw, original, and extracted
directly from the official sources is known as
primary data. This type of data is collected
directly by performing techniques such as
questionnaires, interviews, and surveys. The data
collected must be according to the demand and
requirements of the target audience on which
analysis is performed otherwise it would be a
burden in the data processing.Few methods of
collecting primary data:
• Interview method
• Survey method
• Observation method
• Experimental method
Secondary data
Secondary data is the data which has already been
collected and reused again for some valid
purpose. This type of data is previously recorded
from primary data and it has two types of sources
named internal source and external source.
Internal Sources: These types of data can easily be
found within the organization such as market
record, a sales record, transactions, customer
data, accounting resources, etc.
External source: The data which can’t be found at
internal organizations and can be gained through
external third party resources is external source
data.
Other sources
• Sensors data: With the advancement of IoT devices, the
sensors of these devices collect data which can be used
for sensor data analytics to track the performance and
usage of products.
• Satellites data: Satellites collect a lot of images and data
in terabytes on daily basis through surveillance cameras
which can be used to collect useful information.
• Web traffic: Due to fast and cheap internet facilities
many formats of data which is
• uploaded by users on different platforms can be
predicted and collected with their permission for data
analysis. The search engines also provide their data
through keywords and queries searched mostly.
Characteristics of data
• Accuracy
• Accessibility
• Completeness
• Consistency
• Validity (Integrity)
• Uniqueness
• Currency
• Reliability
• Relevancy
• Timeliness
Introduction to Big Data
• Big Data is a collection of data that is huge in volume, yet
growing exponentially with time. It is a data with so large
size and complexity that none of traditional data
management tools can store it or process it efficiently. Big
data is also a data but with huge size.
• Examples Of Big Data
• Stock Exchange: The New York Stock Exchange generates
about one terabyte of new trade data per day.
• Social Media: The statistic shows that 500+terabytes of new
data get ingested into the databases of social media site
Facebook, every day. This data is mainly generated in terms
of photo and video uploads, message exchanges, putting
comments etc.
• Jet Engine: A single Jet engine can generate 10+terabytes of
data in 30 minutes of flight time. With many thousand flights
per day, generation of data reaches up to many Petabytes.
Need of Data Analytics
Data analytics is used to identify patterns, trends, and insights in
large and complex data sets, and then to use this information
to make informed decisions. The need for data analytics arises
from the following factors:
• Large and complex data sets: With the growth of data-driven
businesses and the increasing amount of data generated by
organizations, the need for data analytics has grown.
• Business insights: By analyzing large and complex data sets,
organizations can gain valuable insights into customer
behavior, market trends, and other important aspects of their
business.
• Competitive advantage: Organizations that are able to
effectively analyze data are better able to compete in today's
fast-paced business environment, as they can quickly identify
new opportunities and respond to changing market
conditions.
Evolution of Analytic Scalability
The evolution of analytic scalability refers to the development of
technologies and methods that allow organizations to analyze
increasing amounts of data more efficiently and effectively. This
evolution has been driven by a number of factors, including:
• Big Data: The growth of data-driven businesses has led to the
development of new technologies and methods for managing, storing,
and analyzing big data.
• Cloud computing: Cloud computing has made it possible for
organizations to store and analyze large amounts of data at a lower
cost and with greater flexibility.
• Artificial intelligence: Artificial intelligence (AI) and machine learning
algorithms have made it possible to analyze data at scale and automate
many of the tasks involved in data analysis.
• Parallel processing: Parallel processing technology allows organizations
to distribute data processing across multiple computers, making it
possible to analyze larger data sets in a shorter amount of time.
• Real-time analytics: Real-time analytics technologies have made it
possible to process and analyze data in near real-time, which is
essential for organizations that need to make data-driven decisions
quickly.
Analytic Process and Tools
• The analytics process typically consists of the following steps:
•
• Data Collection: Gathering relevant data from various sources,
such as databases, spreadsheets, and APIs.
• Data Preparation: Cleaning, transforming and organizing the
data to make it usable for analysis.
• Data Exploration: Examining the data to understand its
structure, patterns, and relationships.
• Modeling: Building mathematical models to identify
relationships and make predictions.
• Evaluation: Testing and validating the models to determine
their accuracy and effectiveness.
• Deployment: Implementing the models and integrating them
into business processes and decision-making.
• Monitoring: Continuously monitoring the models to ensure
their continued relevance and accuracy.
Data Analysis
• Goal: The goal of data analysis is to gain a deeper
understanding of the data and make informed
decisions based on the insights generated. It involves
evaluating data to identify patterns, relationships, and
trends, as well as predicting future trends and
outcomes.
• Techniques: Data analysis techniques include statistical
analysis, machine learning, data visualization, and
predictive modeling. These techniques help to uncover
meaningful insights and patterns in the data that can
be used to inform decision-making.
• Output: The output of data analysis is often a set of
findings, insights, and predictions, which are then used
to make decisions or inform further analysis.
Data Reporting
• Goal: The goal of data reporting is to communicate the
insights and results of data analysis to stakeholders in a
clear and concise format. Data reporting is used to provide
a summary of the findings and insights generated through
data analysis, and to present the information in a way that
is easy to understand and use.
• Techniques: Data reporting techniques include creating
charts, tables, dashboards, and other visual representations
of the data. These techniques are used to present the data
in a clear and concise format, making it easier for
stakeholders to understand and use the information to
inform decision-making.
• Output: The output of data reporting is a visual
representation of the data and insights generated through
data analysis. This information is used to communicate the
results of the analysis to stakeholders and support data-
driven decision-making.
Modern Data Analytic Tools
• Hadoop and Apache Spark: Distributed systems for big data processing
and storage.
• Tableau: A data visualization tool that allows users to create interactive
dashboards and reports.
• Power BI: A business intelligence tool that allows users to create and
share data visualizations and reports.
• Google Analytics: A web analytics service that provides insights into
website traffic and user behavior.
• Alteryx: A data analysis and visualization tool that helps users prepare,
blend, and analyze data from various sources.
• SAP Lumira: A data visualization tool for exploring and visualizing large
data sets.
• QlikView: A data discovery and analytics platform that allows users to
create and explore data visualizations.
• IBM Cognos Analytics: A business intelligence and analytics platform
that provides a full range of data analytics capabilities.
• TIBCO Spotfire: A data visualization and discovery tool that helps users
explore and analyze large data sets.
Applications of data analytics
• Marketing: Analyzing customer data to understand buying patterns and
preferences, target advertising and promotions, and measure campaign
effectiveness.
• Finance: Analyzing financial data to detect fraud, manage risk, and make
informed investment decisions.
•
Healthcare: Analyzing medical data to improve patient outcomes, reduce
healthcare costs, and support clinical decision-making.
• Retail: Analyzing sales data to optimize inventory management, improve supply
chain efficiency, and personalize customer experiences.
• Manufacturing: Analyzing production data to improve quality control, increase
efficiency, and reduce waste.
• Telecommunications: Analyzing network data to optimize network
performance, detect network issues, and improve customer satisfaction.
• Energy: Analyzing energy usage data to optimize energy usage, reduce costs,
and reduce environmental impact.
• Sports: Analyzing player performance data to optimize training, support tactical
decision-making, and improve player performance.
• Transportation: Analyzing transportation data to optimize routes, reduce fuel
consumption, and improve safety.
Need of Data Analytics
• Data analytics is used to identify patterns, trends, and insights in large and complex data
sets, and then to use this information to make informed decisions. The need for data
analytics arises from the following factors:
• Large and complex data sets: With the growth of data-driven businesses and the
increasing amount of data generated by organizations, the need for data analytics has
grown.
• Business insights: By analyzing large and complex data sets, organizations can gain
valuable insights into customer behavior, market trends, and other important aspects of
their business.
• Competitive advantage: Organizations that are able to effectively analyze data are
better able to compete in today's fast-paced business environment, as they can quickly
identify new opportunities and respond to changing market conditions.
• Improved decision making: Data analytics helps organizations make better decisions by
providing them with a clearer picture of what is happening in their business. This
includes identifying risks, detecting anomalies, and tracking performance metrics.
• Customer engagement: Data analytics can be used to gain a deeper understanding of
customer behavior, preferences, and opinions. This information can be used to improve
customer engagement and customer satisfaction.
• Cost savings: Data analytics can help organizations identify areas where they can
improve efficiency, reduce waste, and lower costs.
• Compliance: Data analytics can help organizations comply with regulations by tracking
and analyzing data related to key performance metrics, such as safety and security.
Key Roles for Successful Analytic Projects