Data Analytics Practical PDF
Data Analytics Practical PDF
Data Analytics Practical PDF
Course: B.Tech
Session: 2023-24
Page 1
LAKSHMI NARAIN COLLEGE OF TECHNOLOGY EXCELLENCE, BHOPAL
Data Analysis
Data analysis is a naturally integral process of cleansing, transforming, and analyzing raw data
obtain usable, relevant information that can assist businesses in making educated decisions. By givi
relevant insights and data, which are commonly presented in charts, photos, tables, and graphs, t
technique helps to lessen the risks associated with decision-making. When it comes to implementi
effective data analysis for Excel, the robust capabilities of the software enhance the entire proces
Excel’s features, including pivot tables, data tables, and various statistical functions, play a vital ro
in streamlining and optimizing data analysis for Excel. This synergy between data analysis and Exc
empowers users to navigate and derive meaningful insights from complex datasets naturally. Da
analytics encompasses not just data analysis, but also data collecting, organization, storage, and t
tools and techniques used to delve deeper into data, as well as those used to present the finding
such as data visualization tools. On the other hand, data analysis is concerned with the process
transforming raw data into meaningful statistics, information, and explanations. Data visualization
an interdisciplinary field concerned with the depiction of data graphically. When the data is large, su
as in a time series, it is a very effective manner of conveying. The mapping establishes how the
components’ characteristics change in response to the data. A ba chart, in this sense, is a mapping
a variable’s magnitude to the length of a bar. Mapping is a basic component of Data visualization sin
the graphic design of the mapping can negatively affect the reading of a chart.
The iterative Data Analysis Process is comprised of the following phases:
Page 18
LAKSHMI NARAIN COLLEGE OF TECHNOLOGY EXCELLENCE, BHOPAL
Experiment 1
Microsoft Excel allows you to examine and interpret data in a variety of ways. The information
could come from several different places. A variety of formats and conversions are available for
the data. Conditional Formatting, Ranges, Tables, Text functions, Date functions, Time
functions, financial functions, Subtotals, Quick Analysis, Formula Auditing, Inquire Tool, What-
if Analysis, Solvers, Data Model, PowerPivot, PowerView, PowerMap, and other Excel
commands, functions, and tools can all be used to analyse it.
Essential Excel Data Analysis Functions
E cdealt ah as hundreds of functions and trying to match the proper formula with the right kind
ofx
analysis can be overwhelming. It is not necessary for the most valuable functions to be difficult.
You’ll
w
inotenrdperre th ow you ever lived without fifteen easy functions that will increase your ability to
data.
1. Concatenate
When conducting data analysis, the formula =CONCATENATE is one of the simplest to
understand
b
coumt bminoesdt ipnotow ae rful. Text, numbers, dates, and other data from numerous cells can be
single cell.
SYNTAX = CONCATENATE (text1, text2, [text3], …)
2. Len () uInti ldisaetda analysis, LEN is used to show the number of characters in each cell. It’s
frequently
when working with text that has a character limit or when attempting to distinguish between
product
numbers.
SYNTAX = LEN (text)
3. Days ()
The number of calendar days between two dates is calculated using this function = DAYS.
SYNTAX =DAYS (end_date, start_date)
4. Networkdays
The number of weekends is automatically excluded when using the function. It’s classified as a
Date/Time Function in Excel. The net workday’s function is used in finance and accounting for
determining employee benefits based on days worked, the number of working days available
throughout a project, or the number of business days required to resolve a customer problem,
among
other things.
SYNTAX = NETWORKDAYS (start_date, end_date, [holidays])
5. Sumifs()
One of the “must-know” formulas for a data analyst is =SUMIFS. =SUM is a familiar formula,
but
what if you need to sum data based on numerous criteria? It’s SUMIFS.
SYNTAX = SUMIFS (sum_range, range1, criteria1, [range2], [criteria2], …)
Page 19 6. Averageifs()AVERAGEIFS, like SUMIFS, lets you take an average based on one or
more
LAKSHMI NARAIN COLLEGE OF TECHNOLOGY EXCELLENCE, BHOPAL
utilised in this case to determine which clients order the most stuff. SYNTAX = RANK (number, re
[order]) Some of the Methods for Data Analysis in Excel 1. Ranges and Tables The information yo
have can be in the form of a table or a range. Whether the data is in a range or a table, certain actio
can be performed on it. Certain procedures, however, are more successful when data is stored
tables rather than ranges. There are some operations that are only applicable to table You will als
gain an understanding of how to analyze data in ranges and tables. You’ll learn how to name range
how to utilise them, and how to manage them. The same may be said for table names. 2. Da
Cleaning – Text Functions, Dates and Times Before moving on to data analysis, you must clean an
organize the data you’ve gathered from multipl sources. The following approaches can be used
clean data in Excel.
Page 21
LAKSHMI NARAIN COLLEGE OF TECHNOLOGY EXCELLENCE, BHOPAL
9. Data Visualization in Excel Charts are simple to make and display data in a variety of ways,
making them more helpful than a sheet. You can make a chart, modify its type, adjust the row or
column, the legend location, and the data labels. Column Chart, Line Chart, Pie Chart, Bar Chart,
Area Chart, Scatter Plot are some of the different types of charts provided in Microsoft Excel.
10. Data Validation Only valid values may need to be entered into cells. Otherwise, they risk
producing erroneous results. Using data validation commands, you can rapidly set up data
validation values for a cell, an input message prompting the user on what should be typed in the
cell, validate the values provided against the supplied criteria, and display an error message in
the case of incorrect entries. It may be necessary to insert only valid values into cells.
Otherwise, they could result in inaccurate calculations. You may quickly set up data validation
values for a cell, an input message prompting the user on what should be typed in the cell,
validate the values entered against the given criteria, and display an error message in the case
of wrong entries using data validation commands. 11. Financial Analysis Excel has several
financial features. However, you may learn to employ a combination of these functions to solve
common situations that need financial analysis. 12. Working with Multiple Worksheets It’s
possible that you’ll need to run multiple identical calculations in different worksheets. Instead
of duplicating these calculations in each worksheet, you can complete them in one and have
them display in all of the others. You may also use a report worksheet to compile the data from
the multiple worksheets. 13. Formula Auditing When you utilise formulas, you should double-
check that they are working correctly. Formula Auditing commands in Excel assist you in tracing
previous and dependent variables as well as error checking. 14. What-if Analysis You can
extract critical data from a large dataset using pivot tables. This form of data analysis is the
most practical. You can drag fields, sort, filter, and adjust the summary calculation after a Pivot
Table has been inserted. Pivot Tables can also be made in two dimensions. The functions of
Group Pivot Table Items, Multi-level Pivot Table, Frequency Distribution, Pivot Chart, Slicers,
Update Pivot Table, Calculated Field/Item, and GetPivotData are all essential.
Page 22
LAKSHMI NARAIN COLLEGE OF TECHNOLOGY EXCELLENCE, BHOPAL
Experiment 2
Correlation:
1. Data Preparation:
Organize your data into two columns (let's say Column A and Column B).
2. Correlation Calculation:
In a new cell, use the following formula:
lessCopy code
=CORREL(A1:A10, B1:B10)
Replace A1:A10 and B1:B10 with your actual data ranges.
Regression:
1. Data Preparation:
Similar to correlation, organize your data into two columns (let's say Column A for the independent
variable and Column B for the dependent variable).
2. Regression Calculation:
In a new cell, use the following formula:
phpCopy code
=LINEST(B1:B10, A1:A10, TRUE, TRUE)
Replace B1:B10 and A1:A10 with your actual data ranges. The TRUE parameters include the intercept and
statistics.
The LINEST function returns an array with slope, y-intercept, standard errors, and other regression
statistics.
Covariance:
1. Data Preparation:
Again, organize your data into two columns (let's say Column A and Column B).
2. Covariance Calculation:
In a new cell, use the following formula:
lessCopy code
=COVARIANCE.P(A1:A10, B1:B10)
Replace A1:A10 and B1:B10 with your actual data ranges.
Note: COVARIANCE.P calculates population covariance, while COVARIANCE.S calculates
sample covariance. Choose the appropriate one based on your data.
Page 23
LAKSHMI NARAIN COLLEGE OF TECHNOLOGY EXCELLENCE, BHOPAL
Experiment 3
Basic syntax in the R programming language for Data analytics, R is a powerful statistical programming
language commonly used for data analysis, statistical modeling, and visualization. Here are some fundament
aspects of R syntax:
1. Assigning Values:
Use the assignment operator <- or = to assign values to variables.
RCopy code
x <- 10 y = 5
2. Data Types:
R supports various data types, including numeric, character, logical, and more.
RCopy code
numeric_var <- 3.14 character_var <- "Hello, World!" logical_var <- TRUE
3. Vectors:
Vectors are fundamental data structures in R.
RCopy code
numeric_vector <- c(1, 2, 3, 4, 5) character_vector <- c("apple", "banana", "orange")
4. Indexing and Slicing:
Access elements in a vector using square brackets.
RCopy code
numeric_vector[2] # Access the second element character_vector[1:2] # Access the first two elements
5. Matrices:
Create matrices using the matrix() function.
RCopy code
matrix_data <- matrix(c(1, 2, 3, 4, 5, 6), nrow = 2, ncol = 3)
6. Data Frames:
Data frames are used for tabular data.
RCopy code
df <- data.frame( Name = c("Alice", "Bob", "Charlie"), Age = c(25, 30, 22), Grade = c("A", "B", "C") )
7. Functions:
Define and use functions.
RCopy code
square <- function(x) { return(x^2) } result <- square(4)
8. Conditional Statements:
Use if, else if, and else for conditional logic.
RCopy code
x <- 10 if (x > 5) { print("x is greater than 5") } else { print("x is not greater than 5") }
9. Loops:
Use for and while loops for iteration.
RCopy code
for (i in 1:5) { print(i) } j <- 1 while (j <= 5) { print(j) j <- j + 1 }
10. Packages:
Install and load packages for additional functionality.
RCopy code
install.packages("tidyverse") # Install the tidyverse package library(tidyverse) # Load the tidyverse package
Page 24
LAKSHMI NARAIN COLLEGE OF TECHNOLOGY EXCELLENCE, BHOPAL
Experiment 4
The implementation of matrices, arrays, and factors in R, and then perform some basic operations, includin
calculating the variance.
Matrices:
RCopy code
# Creating a matrix mat <- matrix(c(1, 2, 3, 4, 5, 6), nrow = 2, ncol = 3) # Print the matrix print(mat)
Arrays:
RCopy code
# Creating an array arr <- array(c(1, 2, 3, 4, 5, 6), dim = c(2, 3, 1)) # Print the array print(arr)
Factors:
RCopy code
# Creating a factor gender <- c("Male", "Female", "Male", "Female", "Male") factor_gender <- factor(gende
Print the factor print(factor_gender)
Variance Calculation:
Now, let's perform variance calculations on a numeric vector, a matrix, and an array.
Variance of Numeric Vector:
RCopy code
# Numeric vector numeric_vector <- c(2, 4, 6, 8, 10) # Calculate variance variance_numeric <-
var(numeric_vector) # Print the result print(variance_numeric)
Variance of Matrix:
RCopy code
# Matrix matrix_data <- matrix(c(1, 2, 3, 4, 5, 6), nrow = 2, ncol = 3) # Calculate variance for each column
variance_matrix <- apply(matrix_data, 2, var) # Print the result print(variance_matrix)
Variance of Array:
RCopy code
# Array array_data <- array(c(1, 2, 3, 4, 5, 6), dim = c(2, 3, 1)) # Calculate variance for each dimension
variance_array <- apply(array_data, c(1, 2), var) # Print the result print(variance_array)
Page 25
LAKSHMI NARAIN COLLEGE OF TECHNOLOGY EXCELLENCE, BHOPAL
Experiment 5
Data frames are a fundamental data structure in R, widely used for storing tabular data. They
allow you to organize data in rows and columns, similar to a spreadsheet. Here's a basic
implementation and use of data frames in R: Creating a Data Frame: RCopy code # Creating a
data frame df <- data.frame( Name = c("Alice", "Bob", "Charlie"), Age = c(25, 30, 22), Grade =
c("A", "B", "C"), stringsAsFactors = FALSE # Optional: Prevent character vectors from being
converted to factors ) # Print the data frame print(df) Accessing and Modifying Data Frame:
RCopy code # Accessing a specific column ages <- df$Age # Accessing a specific row
row_bob <- df[2, ] # Modifying a value in the data frame df[3, "Grade"] <- "B" # Print the
modified data frame print(df) Adding and Removing Columns: RCopy code # Adding a new
column df$City <- c("New York", "San Francisco", "Los Angeles") # Removing a column df <-
df[, -4] # Remove the Grade column # Print the updated data frame print(df) Filtering and
Subsetting: RCopy code # Filtering rows based on a condition young_people <- df[df$Age <
30, ] # Subsetting columns subset_df <- df[, c("Name", "Age")] # Print the filtered and subset
data frames print(young_people) print(subset_df) Summary Statistics: RCopy code #
Summary statistics for numeric columns summary_stats <- summary(df$Age) # Print the
summary statistics print(summary_stats) Merging Data Frames: RCopy code # Creating a
second data frame df2 <- data.frame( Name = c("David", "Eva"), Age = c(28, 35), Grade =
c("B", "A"), stringsAsFactors = FALSE ) # Merging data frames merged_df <- rbind(df, df2) #
Print the merged data frame print(merged_df)
Page 26
LAKSHMI NARAIN COLLEGE OF TECHNOLOGY EXCELLENCE, BHOPAL
Experiment 6
Page 27
LAKSHMI NARAIN COLLEGE OF TECHNOLOGY EXCELLENCE, BHOPAL
The t() function is used to transpose the data frame. It swaps rows and columns, effectively turning
columns into rows and vice versa.
Keep in mind that transposing a data frame may not always be suitable, especially if the data has mi
types or if you are working with a large dataset. However, in some scenarios, transposing can be use
for reshaping data.
Page 28
LAKSHMI NARAIN COLLEGE OF TECHNOLOGY EXCELLENCE, BHOPAL
Experiment 7
The briefly discussion of various control structures in R, and then we'll move on to data
manipulation using the dplyr package. Control Structures in R: 1. if-else Statements: RCopy code
# Example of if-else statement x <- 10 if (x > 5) { print("x is greater than 5") } else { print("x is not
greater than 5") } 2. for Loops: RCopy code # Example of for loop for (i in 1:5) { print(i) } 3. while
Loops: RCopy code # Example of while loop j <- 1 while (j <= 5) { print(j) j <- j + 1 } 4. Switch
Case: RCopy code # Example of switch case day <- "Monday" switch( day, "Monday" = print("It's
the start of the week."), "Friday" = print("It's almost the weekend."), print("It's a regular day.") )
Data Manipulation with dplyr Package: The dplyr package is widely used for data manipulation in
R. It provides a set of functions that make it easy to manipulate and analyze data frames.
Installation: RCopy code install.packages("dplyr") library(dplyr) Examples of dplyr Functions: 1.
Filtering Rows: RCopy code # Filter rows based on a condition filtered_data <- filter(data, Age <
30) 2. Selecting Columns: RCopy code # Select specific columns selected_data <- select(data,
Name, Age, Grade) 3. Mutating (Adding/Modifying) Columns: RCopy code # Add a new column
for the total score mutated_data <- mutate(data, Total_Score = Score1 + Score2) 4. Arranging
Rows: RCopy code # Arrange rows based on a column arranged_data <- arrange(data,
desc(Age)) 5. Summarizing Data: RCopy code # Summarize data summary_data <-
summarize(data, Avg_Score = mean(Total_Score)) These are just a few examples of the powerful
data manipulation functions provided by the dplyr package. The syntax is designed to be
expressive and readable, making it easier to write and understand complex data manipulations
in R.
Page 29
LAKSHMI NARAIN COLLEGE OF TECHNOLOGY EXCELLENCE, BHOPAL
Experiment 8
Matlab is a powerful tool for data visualization and analysis. Below is a simple example of data
visualization using Matlab. In this example, we'll generate some sample data and create a scatter plo
matlabCopy code
% Generate sample data x = randn(100, 1); % 100 random values from a normal distribution y = 2*x
0.5*randn(100, 1); % Linear relationship with some noise % Scatter plot scatter(x, y, 'filled');
title('Scatter Plot'); xlabel('X-axis'); ylabel('Y-axis'); grid on;
Explanation of the code:
Page 30
LAKSHMI NARAIN COLLEGE OF TECHNOLOGY EXCELLENCE, BHOPAL
Experiment 9
Using Numpy and Pandas in a Jupyter Notebook:
1. Install Numpy and Pandas:
pythonCopy code
!pip install numpy pandas
2. Import Libraries:
pythonCopy code
import numpy as np import pandas as pd
3. Create Numpy Array:
pythonCopy code
# Creating a Numpy array numpy_array = np.array([[1, 2, 3], [4, 5, 6]]) print("Numpy Array:")
print(numpy_array)
4. Create Pandas DataFrame:
pythonCopy code
# Creating a Pandas DataFrame from a Numpy array pandas_df = pd.DataFrame(numpy_array,
columns=['A', 'B', 'C']) print("\nDataFrame:") print(pandas_df)
5. Data Manipulation with Pandas:
pythonCopy code
# Adding a new column pandas_df['D'] = [7, 8] # Filtering rows filtered_df = pandas_df[pandas_df['B
> 2] # Display the manipulated DataFrame print("\nManipulated DataFrame:") print(filtered_df)
6. Displaying DataFrame in a Table:
pythonCopy code
# Using the `table` package to display the DataFrame in a table format from table import Table tabl
Table() table.add_rows(filtered_df) table.display()
Note: Make sure to install the table package first using !pip install table.
Page 31
LAKSHMI NARAIN COLLEGE OF TECHNOLOGY EXCELLENCE, BHOPAL
Experiment 10
Data visualization is a crucial aspect of data analysis, and Python offers several powerful libraries for
creating visualizations. Two popular libraries for this purpose are Matplotlib and Seaborn. Let's go
through a basic study and implementation using these libraries.
Installation:
pythonCopy code
!pip install matplotlib seaborn
Importing Libraries:
pythonCopy code
import matplotlib.pyplot as plt import seaborn as sns
Matplotlib:
Line Plot:
pythonCopy code
# Create data x = [1, 2, 3, 4, 5] y = [2, 4, 6, 8, 10] # Line plot plt.plot(x, y, label='Line Plot')
plt.title('Line Plot Example') plt.xlabel('X-axis') plt.ylabel('Y-axis') plt.legend() plt.show()
Scatter Plot:
pythonCopy code
# Scatter plot plt.scatter(x, y, label='Scatter Plot', color='red', marker='o') plt.title('Scatter Plot
Example') plt.xlabel('X-axis') plt.ylabel('Y-axis') plt.legend() plt.show()
Seaborn:
Pair Plot:
pythonCopy code
# Load a sample dataset iris = sns.load_dataset('iris') # Pair plot sns.pairplot(iris, hue='species')
plt.title('Pair Plot Example') plt.show()
Heatmap:
pythonCopy code
Page 32