0% found this document useful (0 votes)
8 views9 pages

BIDA Thoerypdf

The document outlines various data analysis techniques using tools like Microsoft Excel, R, and Python, covering topics such as pivot tables, what-if analysis, decision tree classification, clustering, regression, and data visualization. It highlights applications in financial reporting, sales analysis, customer segmentation, and more, emphasizing the importance of data staging and OLAP models for efficient data management. Each section provides a brief overview of methods, key concepts, and practical applications in business intelligence and analytics.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views9 pages

BIDA Thoerypdf

The document outlines various data analysis techniques using tools like Microsoft Excel, R, and Python, covering topics such as pivot tables, what-if analysis, decision tree classification, clustering, regression, and data visualization. It highlights applications in financial reporting, sales analysis, customer segmentation, and more, emphasizing the importance of data staging and OLAP models for efficient data management. Each section provides a brief overview of methods, key concepts, and practical applications in business intelligence and analytics.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Prac_1: Data Analysis Using Microsoft Excel

●​ Pivot Tables: A tool to summarize large datasets by grouping, filtering, and


organizing data into meaningful categories like totals, averages, and counts.
●​ Pivot Charts: Graphical representations of Pivot Table data, enabling visual analysis
of trends and patterns.
●​ Data Cubes: Multidimensional data models that store data across multiple
dimensions (e.g., time, product, region) for faster analysis.

Applications:

●​ Financial reporting: Summarizing company expenses and revenues.


●​ Sales analysis: Tracking performance by region or product.
●​ Inventory management: Analyzing stock levels and trends over time.

Prac_2: What-If Analysis Using Excel

●​ What-if Analysis: A feature in Excel that allows users to explore different scenarios
by changing input variables and seeing how they affect the outcome.
●​ Scenario Manager: Enables creating and comparing multiple possible outcomes
(e.g., best-case, worst-case scenarios).
●​ Goal Seek: Helps in determining the required input value to achieve a desired result
by adjusting other variables.

Applications:

●​ Budget forecasting: Evaluating the impact of different cost assumptions.


●​ Profit maximization: Analyzing how price changes affect profits.
●​ Loan repayment planning: Calculating how varying interest rates influence loan
payments.
Prac_3: - Perform the data classification using classification algorithm using R.

Decision Tree Classification:

1.​ A decision tree is a supervised learning model used for classification or regression.
2.​ It works by recursively splitting the data based on feature values, creating a tree-like
structure.
3.​ Each internal node represents a decision on a feature, each branch represents an
outcome, and leaf nodes represent the predicted class.

Applications:

1.​ Customer segmentation,


2.​ Medical diagnosis,
3.​ credit risk assessment.

Prac_4: - Perform the data clustering using a clustering algorithm using R.

Clustering Overview:

1.​ Clustering is an unsupervised learning technique used to group similar data points
into clusters based on their features.
2.​ Unlike supervised learning, clustering does not require labeled data or predefined
classes. Instead, the algorithm identifies inherent patterns or groupings in the dataset.
3.​ Common clustering algorithms include K-Means, Hierarchical Clustering, and
DBSCAN.

Applications of Clustering:

1.​ Customer segmentation in marketing.

2.​ Image and pattern recognition.

3.​ Document classification.


4.​ Social network analysis.

Steps for K-Means:

1.​ Dataset: The table contains Height and Weight values of individuals, serving as input
features.

2.​ Initialization:

o​ Number of clusters: K = 2.

o​ Initial centroids are randomly chosen:

▪​ K1: (185, 72).

▪​ K2: (170, 56).

3.​ Step 1: Calculate Euclidean Distance: For each data point, calculate its distance
from K1 and K2 using the Euclidean distance formula:

Example: For data point (162, 60),

Assign the point to the nearest cluster.

4.​ Step 2: Update Centroids: Compute the new centroid for each cluster by averaging
the Height and Weight of all points in the cluster.
Example for K2:

5.​ Step 3: Repeat: Recalculate distances using the updated centroids and reassign points
to clusters. Update centroids iteratively until cluster assignments stabilize.

6.​ Final Clusters:

o​ K1: Data points {1, 4, 5, 6, 7, 8, 9, 10, 11, 12}.

o​ K2: Data points {2, 3}.


Prac_5: - Perform the Linear regression on the given data warehouse data using R.

Regression Theory

1.​ Regression is a supervised learning technique used to model the relationship between
a dependent variable (target) and one or more independent variables (predictors).
2.​ It helps predict outcomes and understand how variables are related.

Key Concepts:

●​ Dependent Variable: The variable we aim to predict (e.g., annual profit).

●​ Independent Variables: The variables used to make predictions (e.g., profit per
month, cost per product).

●​ Applications: Forecasting (e.g., stock prices, sales), understanding trends, and


decision-making in business or science.

Types of Regression

1.​ Linear Regression:

o​ Used for predicting numerical/continuous values.

o​ Fits the best line through the data points, minimizing the error (distance
between predicted and actual values).

Example: Predicting a company’s annual profit based on monthly profit.

2.​ Logistic Regression:

o​ Used for predicting categorical outcomes (e.g., yes/no).

o​ Outputs probabilities (0 to 1) that are converted into classes.

Example: Will a customer buy a product (yes or no)?


Prac_6: Perform the logistic regression on the given data warehouse data using
R/Python.

●​ Logistic regression models the relationship between a binary dependent variable and
one or more independent variables using a sigmoid function:​


It is commonly used for classification tasks.

Steps:

1.​ Import libraries (pandas, sklearn).


2.​ Load and preprocess the dataset (handle missing values, encode categorical variables).
3.​ Split data into training and testing sets.
4.​ Train a logistic regression model using LogisticRegression from sklearn.
5.​ Predict and evaluate the model using metrics like accuracy and confusion matrix.
Prac_7: Python Program for Data Analysis Using Pandas

●​ CSV Handling: Reading and analyzing data from CSV files using the Pandas library
in Python, which provides efficient data manipulation tools.
●​ Dataframes: Pandas dataframes are two-dimensional structures that simplify data
filtering, sorting, and grouping for analysis.
●​ Basic Insights: Pandas offers methods for generating insights, like statistical
summaries, correlation analysis, and data visualization.

Applications:

●​ Data cleaning and preprocessing: Preparing datasets for machine learning or


reporting.
●​ Financial analysis: Analyzing stock price trends or revenue growth.
●​ Customer behavior analysis: Segmenting customers based on purchase history.

Prac_8: Data Visualization

a. Python Visualization

●​ Matplotlib/Seaborn: Libraries in Python used for creating plots like bar charts, line
graphs, and heatmaps to visualize patterns in data.
●​ Customization: Offers flexibility in styling, labeling, and formatting graphs for better
clarity.
●​ Interactive Visualizations: Tools like Plotly enable interactive graphs for better
insights.

Applications:

●​ Sales performance visualization: Tracking sales growth or dips over time.


●​ Marketing campaigns: Analyzing engagement rates across different regions or
demographics.
●​ Operational efficiency: Monitoring performance metrics in real-time dashboards.
b. PowerBI Visualization

●​ Data Dashboards: PowerBI is a business intelligence tool that creates interactive


dashboards to provide real-time insights.
●​ Data Connection: Integrates with multiple data sources like SQL databases, Excel,
and cloud services.
●​ Interactive Reports: Allows users to drill down into the data and create customized
reports tailored to their needs.

Applications:

●​ Executive dashboards: Real-time monitoring of business KPIs.


●​ Financial analysis: Visualizing monthly or quarterly performance for stakeholders.
●​ Customer analytics: Creating dashboards for customer satisfaction and retention
analysis.

Prac_9: Data Staging Using SQL

●​ Data Staging: The process of cleaning, transforming, and loading data from various
sources into a staging area before loading it into a data warehouse.
●​ ETL (Extract, Transform, Load): Involves extracting raw data, transforming it
(cleaning, filtering, merging), and loading it into the destination database.
●​ Optimization: Data is optimized for faster querying and analysis by organizing it in
staging tables, removing duplicates, and ensuring consistency.

Applications:

●​ Data warehousing: Prepares data for storage in large-scale data warehouses.


●​ Reporting systems: Ensures accurate and timely data is available for generating
reports.
●​ Business intelligence: Supports decision-making by staging clean and structured data
for analysis.
Prac_10: Cube Creation and OLAP Models

●​ Data Cubes: Multidimensional data storage models that allow for fast querying and
analysis by aggregating data across different dimensions (e.g., time, location).
●​ ROLAP, MOLAP, HOLAP:
○​ ROLAP (Relational OLAP) uses relational databases for data storage.
○​ MOLAP (Multidimensional OLAP) uses pre-calculated cubes for fast
querying.
○​ HOLAP (Hybrid OLAP) combines both approaches to balance performance
and storage.
●​ Cube Dimensions and Facts: Dimensions represent perspectives like time or
product, while fact tables store quantitative data like sales or revenue.

Applications:

●​ Retail analysis: Analyzing sales across different stores and time periods.
●​ Financial planning: Summarizing revenue, expenses, and profit margins across
regions.
●​ Inventory management: Tracking product demand and stock levels over multiple
dimensions.

You might also like