Business Analytics
Business Analytics
NO TOPIC PAGE NO
1 Google Spreadsheet
2 Power BI
3 SPSS
Google's product offers typical spreadsheet features, such as the ability to add,
delete and sort rows and columns. But unlike other spreadsheet programs,
Google Sheets also enables multiple geographically dispersed users to
collaborate on a spreadsheet at the same time and chat through a built-in
instant messaging program. Users can upload spreadsheets directly from their
computers or mobile devices. The application saves every change
automatically, and users can see other users' changes as they are being made.
Google Sheets is included as part of the Google Docs Editors suite of free web
applications. This suite also includes Google Docs, Google Slides, Google
Drawings, Google Forms, Google Sites and Google Keep.
CONDITIONAL FORMATTING
Highlight the cells or range of cells to which you want to apply conditional formatting.
Under Format cells if, choose a predefined condition or select Custom formula is for more
advanced rules.
You can add more rules for the same range by clicking Add another rule.
Check the spreadsheet to ensure the rules work as expected. Adjust if necessary.
PIVOT TABLE
Open the Google Sheets file containing the data you want to summarize.
Highlight the range of cells that contains the data (including headers).
Choose whether to place the pivot table in a New Sheet (recommended) or in the Existing Sheet.
Customize the pivot table by dragging fields into the following sections:
Rows: Add fields that will appear as rows (e.g., categories like names, products, dates).
Values: Add fields for numerical data or calculations (e.g., sum, average, count).
Filters: Add fields to filter the data dynamically.
For Values, click the dropdown arrow to choose the aggregation type (e.g., sum, average, count,
max, min).
"If a number in column A is greater than 50, display 'Pass'; otherwise, display 'Fail'."
In a cell, type the formula based on your condition. For the example above:
=IF(A1 > 90, "A", IF(A1 > 75, "B", IF(A1 > 50, "C", "F")))
AND Example:
OR Example:
Check that the formula provides the expected results for different inputs.
1.4 In your computer, open a spreadsheet in Google Sheets. Find and replace. Next to
"Find," type the word you want to find, if you want to replace the word, enter the new
word next to "Replace with”.
Find: Type the word, number, or value you want to search for.
Match entire cell contents: Check this to find exact matches only.
Replace: Replaces the current instance of the search term with your new value.
Replace all: Replaces all instances of the term across the selected range or sheet.
5. Review Changes
The column you are searching in (first column of the range) should contain the values you want
to search for.
Example:
Suppose your data is in range A2:C10, and you want to search for the value "Product A" in
column A and return its price from column B. Use this formula:
3. Adjust parameters:
Replace "Product A" with a cell reference (e.g., D2) if the search key is in another cell.
Adjust the index number to match the column you want to retrieve data from.
4. Press Enter:
Google Sheets will display the result if the search key is found. If not, you'll see #N/A.
COUNT IF
1.6 You can use the COUNTIF function to: Track sales figures, monitor project
completions, and Analyse customer data.
You can also use the COUNTIFS function to count the number of times all criteria are
met across multiple ranges.
Step-by-Step Guide:
Go to your Google Sheets document or create a new one by visiting Google Sheets.
Click on the cell where you want the result of the COUNTIF formula to appear.
Replace range with the range of cells you want to evaluate, and criterion with the condition for
counting.
For example:
To count how many times the word "Apple" appears in cells A1 to A10, you would type:
4. Press Enter:
After typing the formula, press Enter. Google Sheets will evaluate the condition and display the
result in the selected cell.
Key features:
Usage Metrics: Insights on how reports and dashboards are being used.
Power BI Desktop is a desktop application used for creating reports, queries, and data
models.
Key features:
Query Editor: Allows users to shape and transform data before loading it into Power
BI.
Data Modelling: Users can create relationships between tables, define calculated
columns, and build measures using DAX (Data Analysis Expressions).
Visualizations: Drag-and-drop interface for creating visual reports, charts, and tables.
Publish to Service: Once reports are built, users can publish to the Power BI Service.
The Power BI Admin Console is designed for administrators to manage the Power BI
tenant. It's available to Power BI Service admins.
Key features:
Audit Logs: View and export activity logs to track usage and access events.
Capacity Metrics: Monitor the performance and resource usage of Power BI Premium
capacities.
Tenant Settings: Configure service settings and governance policies like dataset size
limits, sharing settings, and data retention.
PowerShell commands for automating administrative tasks in Power BI. It’s useful for
bulk management or automation.
Key features:
Power BI Cmdlets: Automate tasks like dataset refreshes, managing user permissions,
workspace creation, etc.
The Power BI REST API allows programmatic access to manage content and services
within Power BI.
Key features:
Dataset and Report Management: Automate processes like dataset refresh, report
publishing, or data integration.
Each of these consoles plays a critical role in managing and troubleshooting Power BI
environments, either at the individual report level or the organizational/tenant level.
2.2 Data Uploading
Data uploading in Power BI is the process of adding files to your workspace so you can
analyze and visualize data.
2. Get Data:
Choose your data source type (Excel, SQL Server, Web, Text/CSV, etc.).
Select the appropriate data source (for example, choose Excel for an Excel file).
Browse to the file location or provide connection details (for databases or online
sources).
4. Load Data:
After selecting the data source, click Connect (for databases or online sources).
You can preview the data and apply transformations if needed (click Transform Data to
open Power Query for more options).
5. Load into Power BI:
After selecting the desired tables, click Load to import the data into Power BI.
Once the data is loaded, you can start building your report by dragging fields from the
Fields pane to the report canvas.
Save the Power BI report on your local machine or publish it to the Power BI service for
sharing and further collaboration
2.3 DATA MODELING
Data modeling in Power BI is the process of connecting multiple data sources and establishing
relationships between them to create a foundation for a database.
Connect to Data Sources: Power BI allows you to connect to various data sources like databases
(SQL, Oracle), cloud services (Azure, Google Analytics), Excel files, and web data.
Load Data: After connecting, import the data into Power BI using the Power Query Editor.
Clean Data: Remove errors, duplicate records, and handle missing values.
Shape Data: Transform the data by filtering, splitting columns, creating new calculated columns,
changing data types, etc.
3. Define Relationships
Identify Tables and Keys: Determine which tables are related to each other, and create
relationships based on primary and foreign keys.
Create Relationships: Use the "Model" view to define the relationships between tables (e.g.,
one-to-many, many-to-many).
Set Relationship Properties: Define cardinality, cross-filter direction, and other properties of
relationships.
Calculated Columns: These are columns that are created using DAX (Data Analysis Expressions)
formulas. They are computed when the data is loaded.
Measures: Measures are dynamic calculations used in reports that aggregate or analyze data
(e.g., sum, average, count).
KPIs: Key Performance Indicators can be defined using DAX for performance tracking.
5. Hierarchies and Aggregations
Create Hierarchies: Create logical groupings like Date hierarchy (Year > Quarter > Month > Day)
to drill down in reports.
Set Aggregation: Specify how Power BI should aggregate data for measures, such as sum,
average, or distinct counts.
Star Schema vs Snowflake Schema: Organize data into a star schema (fact tables and dimension
tables) for better performance and simplicity, or snowflake schema if normalization is necessary.
Optimize Data Types: Reduce storage size by choosing appropriate data types for columns (e.g.,
Integer vs Decimal).
Handle Large Data Models: Use techniques like data reduction, aggregation tables, and
incremental data loads to improve performance.
Define Roles: Create roles in Power BI to control access to data based on user identity.
Set Row-Level Security (RLS): Define filters on tables so that users see only the data that they are
permitted to see.
Check for Errors: Ensure that all relationships, calculations, and filters are functioning as
expected.
Test the Model: Use the "Data" and "Model" views to review the tables, relationships, and
calculations for consistency and accuracy.
Publish to Power BI Service: Once the model is ready, publish it to the Power BI Service for
sharing with users and collaborating.
Schedule Refresh: Set up scheduled refresh for the dataset so the model gets updated with new
data automatically.
10. Create Reports and Dashboards
Build Visualizations: Use the data model to create interactive charts, tables, and graphs in Power
BI reports.
Create Dashboards: Aggregate the reports into dashboards for sharing insights across teams.
2.4 PIE CHART
A pie chart in Power BI is a visual representation of data that displays it as a proportion of the
whole. Pie charts are a type of shape chart, which are charts without axes. When a numeric
field is dropped onto a shape chart, it calculates the percentage of each value to the total.
Import your dataset by clicking on the "Home" tab and selecting "Get Data". Choose your data
source and load the data into Power BI.
Once your data is loaded, click on the "Report" view (the canvas area).
In the Visualizations pane (on the right), click on the Pie Chart icon (it looks like a circle divided
into sections).
This will add a blank pie chart visual to the report canvas.
Drag and drop the data fields into the Values and Legend sections:
Values: This is typically the numerical data you want to visualize (e.g., sales, revenue).
Legend: This field represents categories (e.g., product categories, regions) that will split the pie
chart.
To customize the chart, use the Format pane (paint roller icon). Here you can adjust:
Details: Control the slice labels, percentages, and other visual details.
Once satisfied with your pie chart, save the report by clicking "File" > "Save As".
2.5 HISTOGRAM CHART
A histogram in Power BI is a visualization chart that shows how data is distributed in a dataset.
It's a bar chart that groups data points into bins and shows the number of data points in each
bin.
Ensure you have the data you want to visualize in Power BI. A histogram typically requires
continuous numerical data to create bins.
Load your dataset by clicking on Home > Get Data and selecting the appropriate source (Excel,
SQL Server, etc.).
Once the data is loaded, you will see it in the Fields pane.
In the Visualizations pane, there is no direct "Histogram" chart type, but you can use the Column
chart or Bar chart and adjust it to resemble a histogram.
To create bins (ranges for the histogram), you need to define the bin size. You can either:
Use the Field itself (e.g., Age, Income, etc.) and apply a bin using the Group By feature.
In the Fields pane, right-click on the numeric field (e.g., Age) and select New Group. Then, select
the bin size or define custom bin ranges.
Drag the numerical field (e.g., Age) to the Axis well of the chart.
Drag the same field or a corresponding count to the Values well (this counts the number of
entries in each bin).
You can further adjust the bin size by modifying the grouping in the Fields pane. Right-click the
group and choose Edit Group to adjust bin size or range.
For more flexibility, consider using DAX functions to create custom bins or calculate bin ranges.
After you’ve set up the histogram, you can format it as needed. Click on the Format pane (paint
roller icon) and customize the chart:
Once the data is visualized, ensure that the bins and frequencies are displayed correctly.
Modify the chart further as needed (e.g., by adjusting the axis, labels, or appearance).
These steps will help you create a basic histogram in Power BI.
SPSS
SPSS (Statistical Package for the Social Sciences), also known as IBM SPSS Statistics since 2009,
is a user-friendly software package used for the analysis of statistical data and to make data-
driven decisions.
You can enter data directly into the SPSS Data View, similar to entering data in a spreadsheet.
Each row represents an observation or case, and each column represents a variable.
Excel Files: You can import data from Excel (.xlsx, .xls) files by going to File > Open > Data and
selecting the Excel file.
CSV Files: Comma-separated value files (.csv) can also be loaded into SPSS by selecting File >
Open > Data and choosing the .csv file.
Text Files: Text files (.txt) can be imported using the Text Wizard to specify delimiters and other
file parameters.
Databases: SPSS can also connect to databases like SQL, allowing you to import data directly.
3. Using Syntax:
SPSS also supports syntax commands to load data programmatically, which can be useful for
automation or reproducibility. For example:
After loading data, SPSS can perform a variety of statistical operations, such as descriptive
statistics, hypothesis testing, regression analysis, etc., based on the imported data.
3.2 One Sample T test
A one-sample t-test is a statistical test used to determine whether the mean of a single sample
is significantly different from a known or hypothesized population mean. It compares the
sample mean to the population mean while accounting for the sample's size and variability.
Open SPSS and load your dataset. If you’re entering data manually, go to the Data View and
enter your variable in one column (for example, "Scores" or "Weights").
From the dropdown menu, select Compare Means, then choose One-Sample T Test.
In the "One-Sample T Test" dialog box, move your test variable (e.g., the column with the data
you want to test) into the Test Variable(s) box. This is the variable you want to compare to the
population mean.
In the Test Value box, enter the population mean (the value you want to compare your sample
mean to). For example, if you are comparing the sample mean of a group of students' test scores
to a known population mean of 75, enter 75.
Click on the Options button if you want to change the confidence level or other settings. By
default, SPSS uses a 95% confidence level.
Open your dataset in SPSS where you have the two related variables (e.g., pre-test and post-test
scores for the same subjects).
From the dropdown, choose Compare Means and then select Paired-Samples T Test.
Select the two variables you want to compare (e.g., pre-test and post-test scores) and move
them to the Paired Variables box. SPSS will ask you to specify pairs, so ensure the correct pairing.
5. Check Options:
You can click Options to set the confidence interval and other statistics if needed, but the default
settings are typically fine.
If the p-value is less than your alpha level (usually 0.05), you can reject the null hypothesis,
indicating a significant difference between the paired groups.
3.4 Two Sample T Test
2. Check Your Data: Ensure that your data is in the correct format:
One column for the dependent variable (the measurement you want to compare).
One column for the independent variable (the grouping variable, which indicates the two groups
being compared).
In the dialog box, move the dependent variable (the one you are comparing) to the Test
Variable(s) box.
Move the grouping variable (the one that defines the two groups) to the Grouping Variable box.
5. Define Groups:
In the pop-up window, specify the values for the two groups you want to compare (e.g., Group
1: 1 and Group 2: 2, depending on your data).
Click Continue.
6. Select Additional Options (optional):
Click Continue.
Ensure your data is in the correct format for the Chi-square test. Each variable should be
categorical (nominal or ordinal), and each row should represent an individual observation.
2. Open SPSS
3. Go to Crosstabs
4. Select Variables
Move one categorical variable (e.g., Gender) into the Row(s) box.
Move the other categorical variable (e.g., Voting Preference) into the Column(s) box.
Click Continue.
Ensure that your data is well-organized, with each variable as a column and each observation as
a row.
Check for missing values, as they can interfere with the analysis. You might want to either
remove or impute missing values before proceeding.
4. You can choose between K-Means Cluster (for partitioning cases into a specified number of
clusters) or Hierarchical Cluster (for a data-driven approach).
1. Select Variables: Choose the variables you want to use for clustering (e.g., those that describe
your subjects or cases).
2. Choose the Number of Clusters: Under "Number of Clusters," specify the number of clusters
you want to form.
3. Options: You can choose to standardize the data (important if the variables are on different
scales) and set criteria for convergence.
2. Choose Clustering Method: In the "Method" section, select a method (e.g., Ward's method,
Average Linkage).
3. Choose a Distance Measure: Typically, Squared Euclidean distance is used for continuous data.
K-Means: SPSS will give you the cluster centers, the number of cases in each cluster, and the
ANOVA table to check the between-group differences.
Hierarchical: The output includes a dendrogram, which helps visualize the hierarchical structure
of your data. You can also get a cluster membership table to see which case belongs to which
cluster.
Review the cluster centers (for K-Means) or the dendrogram (for hierarchical) to understand the
characteristics of each cluster.
You can visualize clusters using scatter plots or other graphical methods, especially if you
reduced the number of variables through factor analysis.
You can save the cluster membership to a new variable (e.g., "Cluster") by clicking on the option
in the Cluster Analysis dialog box to Save the cluster membership as a new variable.