0% found this document useful (0 votes)
17 views6 pages

Abhijitya Midsem

The document provides an overview of data and analytics, including sources, classifications, characteristics, and the need for data analytics. It outlines the data analytics lifecycle, key roles, and various analytical techniques such as regression modeling and neural networks. Additionally, it highlights modern data analytic tools and the evolution of analytics from manual calculations to AI-driven methods.

Uploaded by

atulrai840
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views6 pages

Abhijitya Midsem

The document provides an overview of data and analytics, including sources, classifications, characteristics, and the need for data analytics. It outlines the data analytics lifecycle, key roles, and various analytical techniques such as regression modeling and neural networks. Additionally, it highlights modern data analytic tools and the evolution of analytics from manual calculations to AI-driven methods.

Uploaded by

atulrai840
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

UNIT 1: Data and Analytics Basics

1. Sources and Nature of Data

Layman Explanation:
Data is all around us! It can come from different sources like social media, smart devices, businesses,
and even nature. It can be simple numbers or complex videos and images.

Examples:

• A smartwatch collects heart rate, step count, and sleep patterns.

• Google Maps gathers live traffic data from millions of users to provide accurate route
suggestions.

2. Classification of Data

Data is classified based on its structure:

Structured Data → Well-organized, easy to search in databases.


Example 1: A bank’s database storing account numbers, names, and balances.
Example 2: An online store’s inventory showing product name, price, and stock.

Semi-Structured Data → Partially organized, often in formats like JSON, XML.


Example 1: Emails—each has a subject and sender, but different message styles.
Example 2: A JSON file storing customer preferences on an e-commerce website.

Unstructured Data → No fixed format, hard to analyze.


Example 1: A collection of customer feedback in different languages and styles.
Example 2: YouTube videos with random descriptions and thumbnails.

3. Characteristics of Data (The 4Vs of Big Data)

• Volume: Large amounts of data (e.g., Facebook stores billions of posts).

• Variety: Different types of data (e.g., texts, images, videos).

• Velocity: The speed at which data is generated (e.g., live stock market prices).

• Veracity: Accuracy and trustworthiness (e.g., news sources may have fake news).

4. Introduction to Big Data Platforms

Big Data platforms manage and process massive data that normal systems cannot handle.

Example 1: Amazon analyzes millions of purchases daily to improve recommendations.


Example 2: NASA processes satellite images to track climate changes.

5. Need for Data Analytics


Data analytics helps in making better decisions.

Example 1: Netflix suggests shows based on your watch history.


Example 2: An online store tracks top-selling products and offers discounts.

6. Evolution of Analytic Scalability

We moved from manual calculations to AI-driven analytics for handling large-scale data.

Example 1: In the 1990s, businesses used Excel for sales tracking.


Example 2: Today, AI models predict customer buying trends in real-time.

7. Analytic Process and Tools

Steps: Collect → Clean → Analyze → Interpret → Report


Tools: Python, R, SQL, Tableau, Power BI

Example 1: A company uses Python to analyze customer shopping behavior.


Example 2: A hospital uses Tableau to visualize patient health records.

8. Analysis vs Reporting

• Analysis → Finding patterns and making predictions.

• Reporting → Presenting data with summaries and charts.

Example 1: A weather forecast predicting rainfall next week (Analysis).


Example 2: A store generating a sales report for the last 6 months (Reporting).

9. Modern Data Analytic Tools

• Python & R → Data processing and machine learning.

• SQL → Managing databases.

• Tableau & Power BI → Data visualization.

• Hadoop & Spark → Handling large-scale data.

Example 1: A sports analyst uses Python to predict football match outcomes.


Example 2: A bank uses SQL to check customer transactions for fraud.

UNIT 2: Data Analytics Lifecycle

1. Need for Data Analytics Lifecycle

A structured approach ensures successful data-driven projects.


Example 1: A company follows a step-by-step method to analyze customer behavior.
Example 2: Scientists analyze climate data methodically to predict global warming trends.

2. Key Roles in Data Analytics

• Data Scientist → Finds insights from data.

• Data Engineer → Manages and processes raw data.

• Business Analyst → Uses insights to make business decisions.

Example 1: A bank hires a data scientist to detect fraud in transactions.


Example 2: An airline company uses business analysts to optimize ticket pricing.

3. Phases of Data Analytics Lifecycle

1⃣ Discovery – Understand the problem.


Example: A company studies why customers are leaving its platform.

2️⃣ Data Preparation – Collect and clean data.


Example: A business removes duplicate customer records before analysis.

3️⃣ Model Planning – Choose the best analytics method.


Example: A store plans to use AI to predict future sales trends.

4️⃣ Model Building – Apply data science techniques.


Example: A healthcare startup builds a model to detect diseases from patient records.

5️⃣ Communicating Results – Present findings.


Example: A marketing team shares reports on successful ad campaigns.

6⃣ Operationalization – Implement the solution.


Example: A stock market app deploys an AI model to suggest profitable stocks.

1. Regression Modeling

Regression is used to find the relationship between variables. It helps in making predictions based on
historical data.

Real-time Example:
Imagine an e-commerce company wants to predict the sales of a product based on its price and
advertising budget. By analyzing past data, regression modeling can determine whether increasing
the ad budget leads to higher sales and by how much.

Another Example:
A bank uses regression to predict a customer’s loan repayment probability based on their income,
credit score, and previous repayment history.
2. Multivariate Analysis

Multivariate analysis studies multiple factors affecting an outcome. It helps in understanding the
combined effect of different variables.

Real-time Example:
A restaurant chain wants to find out why some of its outlets perform better than others. They
analyze data on location, menu pricing, customer reviews, and weather conditions to understand
what factors contribute to higher sales.

Another Example:
A hospital studies how age, diet, physical activity, and genetics together influence a patient’s risk of
heart disease.

3. Bayesian Modeling & Inference

This method uses probabilities to update predictions when new data is available.

Real-time Example:
A weather forecasting system predicts rain probability based on temperature, humidity, and wind
speed. If a sudden drop in pressure occurs, the model updates its probability and predicts rain with
higher certainty.

Another Example:
A medical AI system predicts the likelihood of a patient having cancer based on test results and
symptoms. If new symptoms appear, the model adjusts its probability estimate accordingly.

4. Support Vector & Kernel Methods

These methods classify data into different groups.

Real-time Example:
A banking system detects fraudulent transactions by classifying them as "genuine" or "fraud" based
on transaction amount, location, and past patterns.

Another Example:
An email system automatically filters spam emails based on past behavior. If an email contains too
many promotional words, it's classified as spam.

5. Time Series Analysis

This technique analyzes data collected over time to identify patterns and make predictions.

Real-time Example:
A stock market analyst uses time series analysis to predict stock prices based on historical trends and
market conditions.
Another Example:
An electricity company forecasts power demand by analyzing past consumption trends. If the data
shows increased power usage during summer, the company prepares for higher production.

6. Rule Induction

This technique discovers hidden rules in large datasets.

Real-time Example:
An online grocery store finds that customers who buy milk also tend to buy eggs. The system
automatically suggests eggs to customers purchasing milk, increasing sales.

Another Example:
A university analyzes student study habits and finds that students who revise for more than 6 hours
a week tend to score above 80% in exams.

7. Neural Networks & Learning

Neural networks mimic the human brain to recognize patterns and make decisions.

Real-time Example:
A self-driving car uses neural networks to recognize traffic signs, pedestrians, and road lanes,
allowing it to drive safely without human intervention.

Another Example:
A smartphone’s face recognition system scans the user's face and identifies unique features to
unlock the device.

8. Principal Component Analysis (PCA)

PCA is used to simplify complex datasets while keeping essential information.

Real-time Example:
A clothing brand wants to understand customer preferences. Instead of analyzing hundreds of
customer behavior factors, PCA helps them focus on the 5 most important factors that influence
purchases.

Another Example:
A medical research team studies genetic data to identify key genes responsible for a disease,
reducing the dataset from 100,000 genes to the 10 most important ones.

9. Fuzzy Logic & Decision Trees

Fuzzy logic helps in uncertain decision-making, while decision trees provide step-by-step guidance.

Real-time Example:
A washing machine decides the best wash cycle based on how dirty clothes are. Instead of just
"clean" or "dirty," it uses fuzzy logic to determine levels like "slightly dirty," "moderately dirty," and
"heavily dirty" and adjusts the wash cycle accordingly.

Another Example:
A customer service chatbot uses decision trees to answer questions. If a customer asks about
refund policies, the bot follows a predefined tree structure to guide them to the correct solution.

10. Stochastic Search Methods

These methods explore multiple solutions to find the best one.

Real-time Example:
A chess AI tests thousands of possible moves before making the best move against an opponent.

Another Example:
A delivery company uses stochastic search to find the fastest delivery route based on live traffic
data. Instead of following a fixed route, it constantly searches for better options in real time.

You might also like