0% found this document useful (0 votes)
12 views12 pages

Unit 1

The document outlines the syllabus for the Data Analytics course (BCS052) at ITECH WORLD AKTU, covering key topics such as data sources, classification, lifecycle, and modern analytic tools. It emphasizes the importance of data analytics in decision-making, operational optimization, and gaining competitive advantages across various industries. Additionally, it details the structured Data Analytics Lifecycle, which guides the systematic execution of analytics projects to ensure accuracy and efficiency.

Uploaded by

prayagpetsworld
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views12 pages

Unit 1

The document outlines the syllabus for the Data Analytics course (BCS052) at ITECH WORLD AKTU, covering key topics such as data sources, classification, lifecycle, and modern analytic tools. It emphasizes the importance of data analytics in decision-making, operational optimization, and gaining competitive advantages across various industries. Additionally, it details the structured Data Analytics Lifecycle, which guides the systematic execution of analytics projects to ensure accuracy and efficiency.

Uploaded by

prayagpetsworld
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

ITECH WORLD AKTU

ITECH WORLD AKTU


Subject Name: Data Analytics (DA)
Subject Code: BCS052

Unit 1:

Syllabus:

1. Introduction to Data Analytics:

• Sources and nature of data


• Classification of data (structured, semistructured, unstructured)
• Characteristics of data
• Introduction to Big Data platform
• Need of data analytics
• Evolution of analytic scalability
• Analytic process and tools
• Analysis vs reporting
• Modern data analytic tools
• Applications of data analytics

2. Data Analytics Lifecycle:

• Need for data analytics lifecycle


• Key roles for successful analytic projects
• Various phases of data analytics lifecycle:
(a) Discovery
(b) Data preparation
(c) Model planning
(d) Model building
(e) Communicating results
(f) Operationalization

0.1 Introduction to Data Analytics


0.1.1 Definition of Data Analytics
Data Analytics is the process of examining raw data to uncover trends, patterns, and
insights that can assist in informed decision-making. It involves the use of statistical.

1
ITECH WORLD AKTU

Key Points:

• Objective: Transform data into actionable insights.

• Methods: Involves data cleaning, processing, and analysis.

• Outcome: Generates insights for strategic decisions in various domains like busi-
ness, healthcare, and technology.

• Tools: Includes Python, R, Excel, and specialized tools like Tableau, Power BI.

Example: A retail store uses data analytics to identify customer buying patterns
and optimize inventory management, ensuring popular products are always in stock.

0.1.2 Sources and Nature of Data


Data originates from various sources, primarily categorized as social, machine-generated,
and transactional data. Below is a detailed explanation of these sources:

1. Social Data:

• User-Generated Content: Posts, likes, and comments on platforms like


Facebook, Twitter, and Instagram.
• Reviews and Ratings: Feedback on platforms such as Amazon and Yelp
that reflect customer opinions.
• Social Network Analysis: Connections and interactions between users that
reveal behavioral patterns.
• Trending Topics: Real-time topics gaining popularity, aiding in sentiment
and trend analysis.

2. Machine-Generated Data:

• Sensors and IoT Devices: Data from devices like thermostats, smart-
watches, and industrial sensors.
• Log Data: Records of system activities, such as server logs and application
usage.
• GPS Data: Location information generated by devices like smartphones and
vehicles.
• Telemetry Data: Remote data transmitted from devices, such as satellites
and drones.

3. Transactional Data:

• Sales Data: Information about products sold, quantities, and revenues.


• Banking Transactions: Records of deposits, withdrawals, and payments.
• E-Commerce Transactions: Online purchases, customer behavior, and cart
abandonment rates.
• Invoices and Receipts: Structured records of financial exchanges between
businesses or customers.

2
ITECH WORLD AKTU

Example:
• A social media platform like Twitter generates vast amounts of social data from
tweets, hashtags, and mentions.

• Machine-generated data from GPS in delivery trucks helps optimize routes and
reduce costs.

• A retail store’s transactional data tracks customer purchases and identifies high-
demand products.

0.1.3 Classification of Data


Data can be classified into three main categories: structured, semi-structured, and un-
structured. Below is a detailed explanation of each type:

• Structured Data: Data that is organized in a tabular format with rows and
columns. It follows a fixed schema, making it easy to query and analyze.

– Examples: Excel sheets, relational databases (e.g., SQL).


– Common Tools: SQL, Microsoft Excel.

• Semi-Structured Data: Data that does not have a rigid structure but contains
tags or markers to separate elements. It lies between structured and unstructured
data.

– Examples: JSON files, XML files.


– Common Tools: NoSQL databases, tools like MongoDB.

• Unstructured Data: Data without a predefined format or organization. It re-


quires advanced tools and techniques for analysis.

– Examples: Images, videos, audio files, and text documents.


– Common Tools: Machine Learning models, Hadoop, Spark.

Example: Email metadata (e.g., sender, recipient, timestamp) is semi-structured,


while the email body is unstructured.

Comparison Table:

0.1.4 Characteristics of Data


The key characteristics of data, often referred to as the 4Vs, include:

• Volume: Refers to the sheer amount of data generated. Modern data systems
must handle terabytes or even petabytes of data.

– Example: A social media platform like Facebook generates billions of user


interactions daily.

• Velocity: Refers to the speed at which data is generated and processed. Real-time
data processing is crucial for timely insights.

3
ITECH WORLD AKTU

Aspect Structured Data Semi-Structured Unstructured Data


Data
Definition Organized in rows and Contains elements Lacks any predefined
columns with a fixed with tags or markers format or schema.
schema. but lacks strict struc-
ture.
Examples SQL databases, Excel JSON, XML, NoSQL Images, videos, audio
sheets. databases. files, text documents.
Storage Stored in relational Stored in NoSQL Stored in data lakes or
databases. databases or files. object storage.
Ease of Analysis Easy to query and an- Moderate difficulty Requires advanced
alyze using traditional due to partial struc- techniques and tools
tools. ture. for analysis.
Schema Depen- Follows a predefined Partially structured Does not follow any
dency and fixed schema. with flexible schema. schema.
Data Size Typically smaller in Moderate size, often Usually the largest in
size compared to oth- larger than structured size due to diverse for-
ers. data. mats.
Processing Tools SQL, Excel, and BI MongoDB, NoSQL, Hadoop, Spark, and
tools. and custom parsers. AI/ML tools.

Table 1: Comparison of Structured, Semi-Structured, and Unstructured Data

– Example: Stock market systems process millions of trades per second to pro-
vide real-time updates.

• Variety: Refers to the different types and formats of data, including structured,
semi-structured, and unstructured data.

– Example: A company might analyze customer reviews (text), social media


posts (images/videos), and sales transactions (structured data).

• Veracity: Refers to the quality and reliability of the data. High veracity ensures
data accuracy, consistency, and trustworthiness.

– Example: Data from unreliable sources or with missing values can lead to
incorrect insights.

Real-Life Scenario: Social media platforms like Twitter deal with high Volume
(millions of tweets daily), high Velocity (real-time updates), high Variety (text, images,
videos), and mixed Veracity (authentic and fake information).

0.1.5 Introduction to Big Data Platform


Big Data platforms are specialized frameworks and technologies designed to handle the
processing, storage, and analysis of massive datasets that traditional systems cannot effi-

4
ITECH WORLD AKTU

ciently manage. These platforms enable businesses and organizations to derive meaningful
insights from large-scale and diverse data.
Key Features of Big Data Platforms:

• Scalability: Ability to handle growing volumes of data efficiently.

• Distributed Computing: Processing data across multiple machines to improve per-


formance.

• Fault Tolerance: Ensuring reliability even in the event of hardware failures.

• High Performance: Providing fast data access and processing speeds.

Common Tools in Big Data Platforms:

• Hadoop:

– A distributed computing framework that processes and stores large datasets


using the MapReduce programming model.
– Components include:
∗ HDFS (Hadoop Distributed File System): For distributed storage.
∗ YARN: For resource management and job scheduling.
– Example: A telecom company uses Hadoop to analyze call records for iden-
tifying network issues.

• Spark:

– A fast and flexible in-memory processing framework for Big Data.


– Offers support for a wide range of workloads such as batch processing, real-
time streaming, machine learning, and graph computation.
– Compatible with Hadoop for storage and cluster management.
– Example: A financial institution uses Spark for fraud detection by analyzing
transaction data in real time.

• NoSQL Databases:

– Designed to handle unstructured and semi-structured data at scale.


– Types of NoSQL databases:
∗ Document-based (e.g., MongoDB).
∗ Key-Value stores (e.g., Redis).
∗ Columnar databases (e.g., Cassandra).
∗ Graph databases (e.g., Neo4j).
– Example: An e-commerce platform uses MongoDB to store customer profiles,
product details, and purchase history.

Applications of Big Data Platforms:

• Personalized marketing by analyzing customer preferences.

5
ITECH WORLD AKTU

• Real-time analytics for monitoring industrial equipment using IoT sensors.

• Enhancing healthcare diagnostics by analyzing patient records and medical images.

• Predictive maintenance in manufacturing by identifying patterns in machine per-


formance data.

Example in Action: Hadoop processes petabytes of clickstream data from a large


online retailer to optimize website navigation and improve the user experience.

1 Need of Data Analytics


Data analytics has become essential in modern organizations for the following reasons:

• Data-Driven Decision Making: Organizations increasingly rely on data-driven


insights to make informed decisions, improve performance, and predict future trends.

• Optimization of Operations: Analytics helps organizations identify inefficien-


cies, optimize processes, and improve resource allocation.

• Competitive Advantage: By leveraging data analytics, companies can better


understand customer preferences, market trends, and competitor behavior, giving
them a competitive edge.

• Personalization and Customer Insights: Data analytics enables organizations


to personalize products and services according to customer needs by analyzing data
such as preferences and buying behavior.

• Risk Management: By analyzing historical data, companies can predict potential


risks and take proactive measures to mitigate them.

Example: A retail company uses data analytics to predict customer demand for
products, enabling them to stock inventory more efficiently.

2 Evolution of Analytic Scalability


The scalability of analytics has evolved over time, allowing organizations to handle larger
and more complex datasets efficiently. The key stages in this evolution include:

• Early Stages (Manual and Small Data): In the past, analytics was performed
manually with small datasets, often using spreadsheets or simple statistical tools.

• Relational Databases and SQL: With the rise of structured data, relational
databases and SQL-based querying became more prevalent, offering better scala-
bility for handling larger datasets.

• Big Data and Distributed Computing: The advent of big data technolo-
gies such as Hadoop and Spark allowed for the processing and analysis of massive
datasets across distributed systems.

6
ITECH WORLD AKTU

• Cloud Computing: Cloud-based platforms like AWS, Google Cloud, and Azure
have made scaling analytics infrastructure easier by providing on-demand resources,
reducing the need for physical hardware.
• Real-Time Data Analytics: Technologies such as Apache Kafka and stream
processing frameworks have enabled the processing of data in real-time, further
enhancing scalability.

3 Analytic Process and Tools


The analytic process involves several stages, each requiring different tools and techniques
to effectively analyze and extract valuable insights from data. The process can typically
be broken down into the following steps:

• Data Collection: Gathering raw data from various sources such as databases,
APIs, or sensors.
• Data Cleaning: Identifying and correcting errors or inconsistencies in the dataset
to improve the quality of the data.
• Data Exploration: Visualizing and summarizing data to understand patterns and
distributions.
• Model Building: Selecting and applying statistical or machine learning models
to predict or classify data.
• Evaluation and Interpretation: Evaluating the accuracy and effectiveness of
models, and interpreting the results for actionable insights.

Tools:
• Statistical Tools: R, Python (with libraries like Pandas, NumPy), SAS
• Machine Learning Frameworks: TensorFlow, Scikit-learn, Keras
• Big Data Tools: Hadoop, Apache Spark
• Data Visualization: Tableau, Power BI, Matplotlib (Python)

4 Analysis vs Reporting
The difference between analysis and reporting lies in their purpose and approach to data:

• Analysis: Involves deeper insights into data, such as identifying trends, patterns,
and correlations. It often requires complex statistical or machine learning methods.
• Reporting: Focuses on summarizing data into a readable format, such as charts,
tables, or dashboards, to provide stakeholders with easy-to-understand summaries.

Example: A report might display sales numbers for the last quarter, while analysis
might uncover reasons behind those numbers, such as customer buying behavior or market
conditions.

7
ITECH WORLD AKTU

5 Modern Data Analytic Tools


Modern tools have revolutionized data analytics, making it easier to handle vast amounts
of data and perform sophisticated analyses. Some of the most popular modern tools
include:

• Apache Hadoop: A framework for processing large datasets in a distributed


computing environment.

• Apache Spark: A fast, in-memory data processing engine for big data analytics.

• Power BI: A powerful business analytics tool that allows users to visualize data
and share insights.

• Tableau: A data visualization tool that enables users to create interactive dash-
boards and visual reports.

• Python with Libraries: Libraries like Pandas, Matplotlib, and Scikit-learn enable
efficient data analysis and visualization.

6 Applications of Data Analytics


Data analytics is used in various industries and domains to solve complex problems and
enhance decision-making. Some common applications include:

• Healthcare: Analyzing patient data for better diagnosis, treatment plans, and
management of healthcare resources.

• Finance: Fraud detection, risk assessment, and portfolio optimization through the
analysis of financial data.

• Retail: Predicting customer behavior, optimizing inventory, and personalizing


marketing campaigns.

• Manufacturing: Predictive maintenance, quality control, and process optimiza-


tion to improve production efficiency.

• Telecommunications: Network optimization, customer churn prediction, and


fraud detection.

6.0.1 Need for Data Analytics Lifecycle


What is Data Analytics Lifecycle?
The Data Analytics Lifecycle refers to a series of stages or steps that guide the process
of analyzing data from initial collection to final insights and decision-making. It is a
structured framework designed to ensure systematic execution of analytics projects, which
helps in producing accurate and actionable results. The lifecycle consists of multiple
phases, each with specific tasks, and is essential for managing complex data projects.
The key stages of the Data Analytics Lifecycle typically include:

• Discovery: Understanding the project objectives and data requirements.

8
ITECH WORLD AKTU

• Data Preparation: Collecting, cleaning, and transforming data into usable for-
mats.

• Model Planning: Identifying suitable analytical techniques and models.

• Model Building: Developing models to extract insights.

• Communicating Results: Presenting insights and findings to stakeholders.

• Operationalization: Implementing the model or results into a business process.

Need for Data Analytics Lifecycle


A structured approach to managing data analytics projects is crucial for several rea-
sons. The following points highlight the importance of adopting the Data Analytics
Lifecycle:

• Ensures Systematic Approach: The lifecycle provides a systematic framework


for managing projects. It ensures that every step is accounted for, avoiding ran-
domness in execution and ensuring that tasks are completed in the correct order.

• Minimizes Errors: By following a predefined process, the risk of errors is reduced.


Each stage builds upon the previous one, ensuring accuracy and reliability in data
processing and analysis.

9
ITECH WORLD AKTU

• Optimizes Resource Usage: The lifecycle ensures efficient use of resources, such
as time, tools, and personnel. By organizing tasks in a structured way, projects are
completed more efficiently, avoiding wasted effort and resources.

• Increases Efficiency: With a clear workflow in place, tasks are completed in a


more streamlined manner, making the entire process more efficient. The structured
approach ensures that insights can be derived quickly and accurately.

• Improves Communication: Clear milestones and stages help teams stay aligned
and facilitate communication about the progress of the project. This clarity is
especially useful when different teams or departments are involved.

• Better Decision-Making: The lifecycle ensures that all steps are thoroughly exe-
cuted, leading to high-quality insights. This improves decision-making by providing
businesses with reliable and actionable data.

• Scalable: The lifecycle framework is adaptable to projects of different sizes. Whether


it’s a small-scale analysis or a large, complex dataset, the process can scale accord-
ing to the project requirements.

6.0.2 Key Roles in Analytics Projects


In data analytics projects, various roles contribute to the successful execution and delivery
of insights. Each role plays a vital part in the project lifecycle, ensuring that the right
data is collected, processed, analyzed, and interpreted for decision-making. The key roles
typically include:

• Data Scientist:

– A data scientist is responsible for analyzing and interpreting complex data to


extract meaningful insights.
– They design and build models to forecast trends, make predictions, and identify
patterns within data.
– Data scientists use machine learning algorithms, statistical models, and ad-
vanced analytics techniques to solve business problems.
– Example: A data scientist develops a predictive model to forecast customer
churn based on historical data and trends.

• Data Engineer:

– A data engineer is responsible for designing, constructing, and maintaining the


systems and infrastructure that collect, store, and process data.
– They ensure that data pipelines are efficient, scalable, and capable of handling
large volumes of data.
– Data engineers work closely with data scientists to ensure the availability of
clean and well-structured data for analysis.
– Example: A data engineer designs and implements a data pipeline that ex-
tracts real-time transactional data from an e-commerce platform and stores it
in a data warehouse.

10
ITECH WORLD AKTU

• Business Analyst:

– A business analyst bridges the gap between the technical team (data scientists
and engineers) and business stakeholders.
– They are responsible for understanding the business problem and translating
it into actionable data-driven solutions.
– Business analysts also interpret the results of data analysis and communicate
them in a way that is understandable for non-technical stakeholders.
– Example: A business analyst analyzes customer feedback data and interprets
the results to help the marketing team refine their targeting strategy.

• Project Manager:

– A project manager oversees the overall execution of an analytics project, en-


suring that it stays on track and is completed within scope, time, and budget.
– They coordinate between teams, manage resources, and resolve any issues that
may arise during the project.
– Project managers also ensure that the project delivers business value and meets
stakeholder expectations.
– Example: A project manager ensures that the data engineering team delivers
clean data on time, while also coordinating with the data scientists to make
sure the model development phase proceeds smoothly.

6.0.3 Phases of Data Analytics Lifecycle


The phases of the Data Analytics Lifecycle are critical to successfully executing an ana-
lytics project. Each phase ensures the project follows a systematic approach from start
to finish:

1. Discovery:

• Identify the business problem or goal.


• Understand the data requirements and sources.
• Define the scope and objectives of the project.

2. Data Preparation:

• Collect and consolidate relevant data.


• Clean the data by handling missing values, duplicates, and errors.
• Transform the data into a suitable format for analysis (e.g., normalization,
encoding).

3. Model Planning:

• Choose the appropriate analytical methods (e.g., regression, clustering).


• Select suitable algorithms based on the business needs.
• Define evaluation metrics (e.g., accuracy, precision, recall).

11
ITECH WORLD AKTU

4. Model Building:

• Implement the selected models using tools like Python, R, or machine learning
libraries (e.g., Scikit-learn, TensorFlow).
• Train the model on the prepared dataset.
• Tune hyperparameters to improve model performance.

5. Communicating Results:

• Visualize findings using tools like Tableau, Power BI, or matplotlib.


• Present insights to stakeholders in a clear, understandable format.
• Provide actionable recommendations based on the results.

6. Operationalization:

• Deploy the model into a production environment for real-time analysis or batch
processing.
• Integrate the model with existing business systems (e.g., CRM, ERP).
• Monitor and maintain the model’s performance over time.

Example: A retail company builds a model to predict customer churn and integrates
it into their CRM system.

12

You might also like