0% found this document useful (0 votes)
7 views29 pages

Unit 1ppt

The document outlines a comprehensive syllabus for a course on Data Analytics, covering topics such as data classification, the data analytics lifecycle, and modern analytic tools. It emphasizes the importance of data analytics in making informed decisions, understanding data types, and the evolution of analytic scalability. Additionally, it details the roles necessary for successful analytic projects and the applications of data analytics across various industries.

Uploaded by

Abhishek Kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views29 pages

Unit 1ppt

The document outlines a comprehensive syllabus for a course on Data Analytics, covering topics such as data classification, the data analytics lifecycle, and modern analytic tools. It emphasizes the importance of data analytics in making informed decisions, understanding data types, and the evolution of analytic scalability. Additionally, it details the roles necessary for successful analytic projects and the applications of data analytics across various industries.

Uploaded by

Abhishek Kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 29

UNIT-I- Syllabus

Introduction to Data Analytics: Sources and nature


of data, classification of data (structured, semi-
structured, unstructured), characteristics of data,
introduction to Big Data platform, need of data
analytics, evolution of analytic scalability, analytic
process and tools, analysis vs reporting, modern
data analytic tools, applications of data analytics.
Data Analytics Lifecycle: Need, key roles for
successful analytic projects, various phases of data
analytics lifecycle – discovery, data preparation,
model planning, model building, communicating
results, operationalization.
Data & Information
• Row Facts: The word raw means that the facts have not yet been
processed to get their exact meaning. Raw facts can be sense anu
part of our body.
• Data: Data refers to arranged raw facts that include numbers,
letters and symbols, images and sounds entered into a computer
for processing. Data is all the raw, unprocessed facts, concepts or
instructions that form the basic materials for transforming or
processing into useful and meaningful results. Such facts are of little
meaning or significance until they are sorted or put in a more useful
form. When stored electronically in files, data can be used directly
as input for the information system.
• Information: Information is processed data that is useful and
meaningful to the user. Information is data that is organised,
meaningful and useful. Information is a product of data processing
Knowledge & Science
Data Analytics
Data Analytics is a field that encompasses the collection, processing, analysis,
and interpretation of data to extract valuable insights and knowledge. The
goal of data analytics is to provide decision makers with the information
they need to make informed decisions. The following are some of the key
components of data analytics:

• Data Collection: The first step in data analytics is to gather data from
various sources such as databases, sensors, or customer interactions.
• Data Cleaning: The next step is to clean the data to remove any errors or
inconsistencies that may compromise the analysis.
• Data Exploration and Visualization: In this step, data analysts will explore
the data using descriptive statistics, graphs, and other visualization tools to
understand the underlying patterns and relationships in the data.
• Data Modelling: In this step, data analysts will use statistical models and
algorithms to extract insights from the data. Some common models used in
data analytics include linear regression, decision trees, and neural
networks.
• Data Interpretation: Finally, the results of the analysis are interpreted to
provide valuable insights and knowledge that can inform decision making.
Types of Data
Classification of Data
• In terms of nature, data can be classified into structured, semi-
structured, and unstructured data.
• Structured Data: Structured data refers to data that is organized into a
fixed format, such as databases or spreadsheets. It is the easiest to
analyze and has well-defined fields, columns, and rows. Structured data
is often numerical in nature and can be easily processed by computers.
Examples of structured data include customer transactions, employee
records, and product inventory data.
• Semi-Structured Data: Semi-structured data contains elements of
structure, but also has unstructured elements, such as text or images.
This type of data is not as easy to analyze as structured data, but it still
has some structure that can be leveraged. Examples of semi-structured
data include email, customer reviews, and product descriptions.
• Unstructured Data: Unstructured data refers to data that does not have
a pre-defined structure and is often more difficult to analyze. This type
of data is usually in the form of text, audio, video, or images and does
not easily fit into a database or spreadsheet. Examples of unstructured
data include audio recordings, video files, and social media posts.
Sources of Data
Most of the data collected are of two types known as
“qualitative data“ which is a group of non-numerical data
such as words, sentences mostly focus on behavior and
actions of the group and another one is “quantitative data”
which is in numerical forms and can be calculated using
different scientific tools and sampling data.
The actual data is then further divided mainly into two types known
as:
•Primary data
•Secondary data
•Other Sources
Primary Data
The data which is Raw, original, and extracted
directly from the official sources is known as
primary data. This type of data is collected
directly by performing techniques such as
questionnaires, interviews, and surveys. The data
collected must be according to the demand and
requirements of the target audience on which
analysis is performed otherwise it would be a
burden in the data processing.Few methods of
collecting primary data:
• Interview method
• Survey method
• Observation method
• Experimental method
Secondary data
Secondary data is the data which has already been
collected and reused again for some valid
purpose. This type of data is previously recorded
from primary data and it has two types of sources
named internal source and external source.
Internal Sources: These types of data can easily be
found within the organization such as market
record, a sales record, transactions, customer
data, accounting resources, etc.
External source: The data which can’t be found at
internal organizations and can be gained through
external third party resources is external source
data.
Other sources
• Sensors data: With the advancement of IoT devices, the
sensors of these devices collect data which can be used
for sensor data analytics to track the performance and
usage of products.
• Satellites data: Satellites collect a lot of images and data
in terabytes on daily basis through surveillance cameras
which can be used to collect useful information.
• Web traffic: Due to fast and cheap internet facilities
many formats of data which is
• uploaded by users on different platforms can be
predicted and collected with their permission for data
analysis. The search engines also provide their data
through keywords and queries searched mostly.
Characteristics of data
• Accuracy
• Accessibility
• Completeness
• Consistency
• Validity (Integrity)
• Uniqueness
• Currency
• Reliability
• Relevancy
• Timeliness
Introduction to Big Data
• Big Data is a collection of data that is huge in volume, yet
growing exponentially with time. It is a data with so large
size and complexity that none of traditional data
management tools can store it or process it efficiently. Big
data is also a data but with huge size.
• Examples Of Big Data
• Stock Exchange: The New York Stock Exchange generates
about one terabyte of new trade data per day.
• Social Media: The statistic shows that 500+terabytes of new
data get ingested into the databases of social media site
Facebook, every day. This data is mainly generated in terms
of photo and video uploads, message exchanges, putting
comments etc.
• Jet Engine: A single Jet engine can generate 10+terabytes of
data in 30 minutes of flight time. With many thousand flights
per day, generation of data reaches up to many Petabytes.
Need of Data Analytics
Data analytics is used to identify patterns, trends, and insights in
large and complex data sets, and then to use this information
to make informed decisions. The need for data analytics arises
from the following factors:
• Large and complex data sets: With the growth of data-driven
businesses and the increasing amount of data generated by
organizations, the need for data analytics has grown.
• Business insights: By analyzing large and complex data sets,
organizations can gain valuable insights into customer
behavior, market trends, and other important aspects of their
business.
• Competitive advantage: Organizations that are able to
effectively analyze data are better able to compete in today's
fast-paced business environment, as they can quickly identify
new opportunities and respond to changing market
conditions.
Evolution of Analytic Scalability
The evolution of analytic scalability refers to the development of
technologies and methods that allow organizations to analyze
increasing amounts of data more efficiently and effectively. This
evolution has been driven by a number of factors, including:
• Big Data: The growth of data-driven businesses has led to the
development of new technologies and methods for managing, storing,
and analyzing big data.
• Cloud computing: Cloud computing has made it possible for
organizations to store and analyze large amounts of data at a lower
cost and with greater flexibility.
• Artificial intelligence: Artificial intelligence (AI) and machine learning
algorithms have made it possible to analyze data at scale and automate
many of the tasks involved in data analysis.
• Parallel processing: Parallel processing technology allows organizations
to distribute data processing across multiple computers, making it
possible to analyze larger data sets in a shorter amount of time.
• Real-time analytics: Real-time analytics technologies have made it
possible to process and analyze data in near real-time, which is
essential for organizations that need to make data-driven decisions
quickly.
Analytic Process and Tools
• The analytics process typically consists of the following steps:

• Data Collection: Gathering relevant data from various sources,
such as databases, spreadsheets, and APIs.
• Data Preparation: Cleaning, transforming and organizing the
data to make it usable for analysis.
• Data Exploration: Examining the data to understand its
structure, patterns, and relationships.
• Modeling: Building mathematical models to identify
relationships and make predictions.
• Evaluation: Testing and validating the models to determine
their accuracy and effectiveness.
• Deployment: Implementing the models and integrating them
into business processes and decision-making.
• Monitoring: Continuously monitoring the models to ensure
their continued relevance and accuracy.
Data Analysis
• Goal: The goal of data analysis is to gain a deeper
understanding of the data and make informed
decisions based on the insights generated. It involves
evaluating data to identify patterns, relationships, and
trends, as well as predicting future trends and
outcomes.
• Techniques: Data analysis techniques include statistical
analysis, machine learning, data visualization, and
predictive modeling. These techniques help to uncover
meaningful insights and patterns in the data that can
be used to inform decision-making.
• Output: The output of data analysis is often a set of
findings, insights, and predictions, which are then used
to make decisions or inform further analysis.
Data Reporting
• Goal: The goal of data reporting is to communicate the
insights and results of data analysis to stakeholders in a
clear and concise format. Data reporting is used to provide
a summary of the findings and insights generated through
data analysis, and to present the information in a way that
is easy to understand and use.
• Techniques: Data reporting techniques include creating
charts, tables, dashboards, and other visual representations
of the data. These techniques are used to present the data
in a clear and concise format, making it easier for
stakeholders to understand and use the information to
inform decision-making.
• Output: The output of data reporting is a visual
representation of the data and insights generated through
data analysis. This information is used to communicate the
results of the analysis to stakeholders and support data-
driven decision-making.
Modern Data Analytic Tools
• Hadoop and Apache Spark: Distributed systems for big data processing
and storage.
• Tableau: A data visualization tool that allows users to create interactive
dashboards and reports.
• Power BI: A business intelligence tool that allows users to create and
share data visualizations and reports.
• Google Analytics: A web analytics service that provides insights into
website traffic and user behavior.
• Alteryx: A data analysis and visualization tool that helps users prepare,
blend, and analyze data from various sources.
• SAP Lumira: A data visualization tool for exploring and visualizing large
data sets.
• QlikView: A data discovery and analytics platform that allows users to
create and explore data visualizations.
• IBM Cognos Analytics: A business intelligence and analytics platform
that provides a full range of data analytics capabilities.
• TIBCO Spotfire: A data visualization and discovery tool that helps users
explore and analyze large data sets.
Applications of data analytics
• Marketing: Analyzing customer data to understand buying patterns and
preferences, target advertising and promotions, and measure campaign
effectiveness.
• Finance: Analyzing financial data to detect fraud, manage risk, and make
informed investment decisions.

Healthcare: Analyzing medical data to improve patient outcomes, reduce
healthcare costs, and support clinical decision-making.
• Retail: Analyzing sales data to optimize inventory management, improve supply
chain efficiency, and personalize customer experiences.
• Manufacturing: Analyzing production data to improve quality control, increase
efficiency, and reduce waste.
• Telecommunications: Analyzing network data to optimize network
performance, detect network issues, and improve customer satisfaction.
• Energy: Analyzing energy usage data to optimize energy usage, reduce costs,
and reduce environmental impact.
• Sports: Analyzing player performance data to optimize training, support tactical
decision-making, and improve player performance.
• Transportation: Analyzing transportation data to optimize routes, reduce fuel
consumption, and improve safety.
Need of Data Analytics
• Data analytics is used to identify patterns, trends, and insights in large and complex data
sets, and then to use this information to make informed decisions. The need for data
analytics arises from the following factors:
• Large and complex data sets: With the growth of data-driven businesses and the
increasing amount of data generated by organizations, the need for data analytics has
grown.
• Business insights: By analyzing large and complex data sets, organizations can gain
valuable insights into customer behavior, market trends, and other important aspects of
their business.
• Competitive advantage: Organizations that are able to effectively analyze data are
better able to compete in today's fast-paced business environment, as they can quickly
identify new opportunities and respond to changing market conditions.
• Improved decision making: Data analytics helps organizations make better decisions by
providing them with a clearer picture of what is happening in their business. This
includes identifying risks, detecting anomalies, and tracking performance metrics.
• Customer engagement: Data analytics can be used to gain a deeper understanding of
customer behavior, preferences, and opinions. This information can be used to improve
customer engagement and customer satisfaction.
• Cost savings: Data analytics can help organizations identify areas where they can
improve efficiency, reduce waste, and lower costs.
• Compliance: Data analytics can help organizations comply with regulations by tracking
and analyzing data related to key performance metrics, such as safety and security.
Key Roles for Successful Analytic Projects

• Project Manager: Responsible for managing the project


timeline, resources, and budget.
• Data Engineer: Responsible for data preparation, storage, and
management.
• Data Analyst: Conducts data exploration and analysis, creates
visualizations, and generates insights.
• Business SME (Subject Matter Expert): Provides domain
expertise and helps to ensure that the insights generated are
relevant to the business.
• Data Scientist: Develops complex algorithms and models to
analyze data and extract insights.

Decision Maker: Uses the insights generated from the data
analytics project to make informed decisions.
• IT/Technical Support: Ensures that the technology
infrastructure and systems are in place to support the data
analytics project.
Data Analytics Lifecycle
• Phase 1: Discovery –
• The data science team learn and investigate the problem.
• Develop context and understanding.
• Come to know about data sources needed and available for
the project.
• The team formulates initial hypothesis that can be later
tested with data.

• Phase 2: Data Preparation –


• Steps to explore, preprocess, and condition data prior to
modeling and analysis.
• It requires the presence of an analytic sandbox, the team
execute, load, and transform, to get data into the sandbox.
• Data preparation tasks are likely to be performed multiple
times and not in predefined order.
• Several tools commonly used for this phase are – Hadoop,
Alpine Miner, Open Refine, etc.
Phase 3: Model Planning –
• Team explores data to learn about relationships between variables
and subsequently, selects
• key variables and the most suitable models.
• In this phase, data science team develop data sets for training,
testing, and production purposes.
• Team builds and executes models based on the work done in the
model planning phase.
• Several tools commonly used for this phase are – Matlab, STASTICA.
Phase 4: Model Building –
• Team develops datasets for testing, training, and production
purposes.
• Team also considers whether its existing tools will suffice for
running the models or if they need more robust environment for
executing models.
• Free or open-source tools – Rand PL/R, Octave, WEKA.
• Commercial tools – Matlab , STASTICA.
• Phase 5: Communication Results –
• After executing model team need to compare outcomes of modeling to
criteria established for success and failure.
• Team considers how best to articulate findings and outcomes to various
team members and
• stakeholders, taking into account warning, assumptions.
• Team should identify key findings, quantify business value, and develop
narrative to summarize and convey findings to stakeholders.
• Phase 6: Operationalize –
• The team communicates benefits of project more broadly and sets up
pilot project to deploy work in controlled way before broadening the
work to full enterprise of users.
• This approach enables team to learn about performance and related
constraints of the model
• in production environment on small scale &nbsp, and make adjustments
before full deployment.
• The team delivers final reports, briefings, codes.
• Free or open source tools – Octave, WEKA, SQL, MADlib.
• Improved decision making: Data analytics helps
organizations make better decisions by providing them
with a clearer picture of what is happening in their
business. This includes identifying risks, detecting
anomalies, and tracking performance metrics.
• Customer engagement: Data analytics can be used to
gain a deeper understanding of customer behavior,
preferences, and opinions. This information can be
used to improve customer engagement and customer
satisfaction.
• Cost savings: Data analytics can help organizations
identify areas where they can improve efficiency,
reduce waste, and lower costs.
• Compliance: Data analytics can help organizations
comply with regulations by tracking and analyzing data
related to key performance metrics, such as safety and
security.

You might also like