DSA Lab Manual Pgms - fINAL
DSA Lab Manual Pgms - fINAL
5. Regression
6. Z-test
7. T-test
8. Anova
AIM:
To write a python program to work with pandas data frames
ALGORITHM:
1. Import the pandas library: To use pandas, you need to import the library first.
2. Read data into a pandas data frame: Use the read_csv(), read_excel(), or read_sql() functions
to read data from a file or database into a pandas data frame.
3. Explore the data: Use functions like head(), tail(), info(), describe(), and shape to get a sense
of the data you're working with.
4. Select and filter data: Use indexing and slicing to select subsets of the data. You can use the
loc[] and iloc[] methods to select rows and columns based on labels or indices. You can also
filter rows based on conditions using the query() or loc[] method.
5. Manipulate the data: Use pandas functions to manipulate the data, such as adding or dropping
columns, renaming columns, and aggregating data.
6. Handle missing data: Use functions like isnull(), dropna(), fillna(), and interpolate() to handle
missing data in the data frame.
7. Save the data: Use the to_csv(), to_excel(), or to_sql() methods to save the data frame to a file
or database.
Program:
# Create a data frame from a dictionary of lists
data = {'Name': ['Alice', 'Bob', 'Charlie', 'Dave', 'Emily'], 'Age': [25, 30, 35, 40, 45], 'Country': ['USA',
'Canada', 'USA', 'Canada', 'USA']}
df = pd.DataFrame(data)
OUTPUT:
Original data frame:
Name Age Country
0 Alice 25 USA
1 Bob 30 Canada
2 Charlie 35 USA
3 Dave 40 Canada
4 Emily 45 USA
Select rows where Age > 30:
Name Age Country
2 Charlie 35 USA
3 Dave 40 Canada
4 Emily 45 USA
RESULT:
Thus the python program to work with pandas dataframes is executed successfully.
AIM:
To write a python program to draw basic plots using matplotlib.
ALGORITHM:
1. Import the necessary libraries: You'll need to import both NumPy and Matplotlib in order to
create plots.
2. Create data: You need data to plot! Create your data as NumPy arrays.
3. Create a figure and axes: Before you can plot anything, you need to create aigure and axes.
The figure is the canvas that holds your plot, while the axes are the actual plot area.
4. Plot the data: Now it's time to plot the data. You can do this using the plot() function.
5. Customize the plot: You can customize your plot in many ways, including adding a title,
changing the axis labels, and adding a legend.
6. Save the plot: Finally, you can save your plot to a file using the savefig() function.
PROGRAM:
import matplotlib.pyplot as plt
a = [1, 2, 3, 4, 5]
b = [0, 0.6, 0.2, 15, 10, 8, 16, 21]
plt.plot(a)
OUTPUT
RESULT:
Thus the python program to draw basic plots using matplotlib is executed successfully.
AIM:
To write a python program for finding out frequency distributions, averages, variability.
ALGORITHM:
1. Initialize an empty dictionary called frequency_distribution.
2. Calculate the sum of the data and store it in a variable called total_sum
3. Calculate the length of the data and store it in a variable called n
4. Calculate the mean by dividing total_sum by n and store it in a variable called mean
5. Calculate the sum of the squared differences between each value in the data and the mean and
store it in a variable called squared_diff_sum
6. Calculate the variance by dividing squared_diff_sum by n-1 and store it in a variable called
variance
7. Loop over each value in the data: a. If the value is not in the frequency_distribution
dictionary, add it with a value of 1 b. If the value is already in the frequency_distribution
dictionary, increment its value by 1
8. Return the frequency_distribution, mean, and variance
PROGRAM:
import numpy as np
# Python program to get average of a list # Importing the NumPy module import numpy as np
# Taking a list of elements
list = [2, 40, 2, 502, 177, 7, 9]
# Calculating average using average()
print(np.average(list))
# Python program to get variance of a list # Importing the NumPy module import numpy as np
# Taking a list of elements
list = [2, 4, 4, 4, 5, 5, 7, 9]
# Calculating variance using var()
print(np.var(list))
# Python program to get standard deviation of a list# Importing the NumPy module import numpy as
np# Taking a list of elements
list = [290, 124, 127, 899]
# Calculating standard # deviation using var()
print(np.std(list))
OUTPUT:
105.57142857142857
4.0
318.35750344541907
RESULT:
Thus the python program for finding out frequency distributions, averages, variability is executed
successfully.
AIM:
The aim of finding normal curves, correlation and scatter plots, and correlation coefficients is to better
understand the relationship between variables in a dataset.
ALGORITHM:
Normal Curves:
To find a normal curve, we need to first calculate the mean and standard deviation of the dataset.
Then, we can use a statistical software or calculator to generate a graph of the normal distribution.
The algorithm for finding a normal curve is as follows:
a. Calculate the mean (µ) and standard deviation (σ) of the dataset.
b. Use a statistical software or calculator to generate a graph of the normal distribution.
c. Plot the normal curve on a graph with the x-axis representing the variable of interest and the
y-axis representing the frequency or probability.
Scatter Plots:
To create a scatter plot, we need to first collect data for two variables that we want to investigate.
Then, we can plot the data points on a graph and look for patterns or trends. The algorithm for
creating a scatter plot is as follows:
a. Collect data for two variables.
b. Plot the data points on a graph with one variable on the x-axis and the other variable on the y-
axis.
c. Look for patterns or trends in the data points.
Correlation Coefficient:
The correlation coefficient is a measure of the strength and direction of the relationship between two
variables. To calculate the correlation coefficient, we need to use a statistical formula or software. The
algorithm for calculating the correlation coefficient is as follows:
a. Collect data for two variables.
b. Calculate the mean (µ) and standard deviation (σ) of each variable.
c. Calculate the covariance (cov) between the two variables.
d. Calculate the correlation coefficient (r) using the formula:
r = cov / (σ1 * σ2)
where cov is the covariance between the two variables, σ1 is the standard deviation of variable 1, and
σ2 is the standard deviation of variable 2.
PROGRAM 1:
import numpy as np
import matplotlib.pyplot as plt
plt.show()
Output:
PROGRAM 2:
import numpy as np
from scipy.stats import norm
import matplotlib.pyplot as plt
# the data:
# mean and standard deviation
mu, std = norm.fit(data)
plt.show()
OUTPUT:
RESULT:
Thus the python program for finding out normal curves,correlation and scatter plots, correlation
coefficient is executed successfully.
Exp No : 5 REGRESSION
Date:
AIM:
The aim of regression analysis is to examine the relationship between a dependent variable and one or
more independent variables.
ALGORITHM:
1. Collect data: Collect data on the dependent variable and one or more independent variables.
2. Check for linearity: Check whether there is a linear relationship between the dependent
variable and the independent variables. This can be done by plotting the data on a scatter plot
and looking for a linear pattern.
3. Determine the regression equation: Determine the regression equation that best fits the data.
This can be done using the least squares method.
4. Test the regression equation: Test the regression equation to see if it accurately predicts the
value of the dependent variable. This can be done by comparing the predicted values with the
actual values.
5. Interpret the results: Interpret the results to draw conclusions about the relationship between
the dependent variable and the independent variables.
6. The formula for the regression equation is: y = b0 + b1x1 + b2x2 + ... + bnxn
PROGRAM:
import numpy as np
import matplotlib.pyplot as plt
def estimate_coef(x, y):
# number of observations/points
n = np.size(x)
# mean of x and y vector
m_x = np.mean(x)
m_y = np.mean(y)
# calculating cross-deviation and deviation about x
SS_xy = np.sum(y*x) - n*m_y*m_x
SS_xx = np.sum(x*x) - n*m_x*m_x
# calculating regression coefficients
b_1 = SS_xy / SS_xx
b_0 = m_y - b_1*m_x
return (b_0, b_1)
OUTPUT:
RESULT:
Thus the python program for finding out the regression is executed successfully.
AIM:
The aim of a z-test is to determine whether the mean of a sample is statistically significantly different
from the known or hypothesized population mean.
ALGORITHM:
1. State the null and alternative hypotheses: The null hypothesis (H0) is that there is no
significant difference between the sample mean and the population mean, while the
alternative hypothesis (Ha) is that there is a significant difference.
2. Determine the level of significance: Choose the level of significance, α, that will be used to
test the hypothesis. Typically, α is set at 0.05 or 0.01.
3. Collect data: Collect a random sample from the population of interest, and calculate the
sample mean and sample standard deviation.
4. Calculate the test statistic: Calculate the z-test statistic using the formula: z = (x̄ - μ) / (σ / √n)
5. where x is the sample mean, μ is the population mean, σ is the population standard
6. deviation, and n is the sample size.
7. Determine the critical value: Determine the critical value of z at the chosen level of
significance and degrees of freedom.
8. Compare the test statistic to the critical value: If the test statistic is greater than the critical
value, reject the null hypothesis. If the test statistic is less than the critical value, fail to reject
the null hypothesis.
9. Interpret the results: If the null hypothesis is rejected, it can be concluded that the sample
mean is significantly different from the population mean at the chosen level of significance. If
the null hypothesis is not rejected, it can be concluded that there is not enough evidence to
support the alternative hypothesis.
PROGRAM:
import math
import numpy as np
from numpy.random import randn
from statsmodels.stats.weightstats import ztest
# Generate a random array of 50 numbers having mean 110 and sd 15 # similar to the IQ scores data
we assume above
mean_iq = 110
sd_iq = 15/math.sqrt(50)
alpha = 0.05
null_mean =100
data = sd_iq*randn(50)+mean_iq # print mean and sd
print('mean=%.2f stdv=%.2f' % (np.mean(data), np.std(data)))
# now we perform the test. In this function, we passed data, in the value parameter
# we passed mean value in the null hypothesis, in alternative hypothesis we check whether the
# mean is larger
ztest_Score,p_value=ztest(data,value=null_mean,alternative='larger')
# the function outputs a p_value and z-score corresponding to that value, we compare the
# p-value with alpha, if it is greater than alpha then we do not null hypothesis # else we reject it.
if(p_value<alpha):
print("Reject Null Hypothesis")
else:
print("Fail to Reject NUll Hypothesis")
OUTPUT:
mean=110.55 stdv=1.50
Reject Null Hypothesis
RESULT:
Thus the python program for performing z-test is executed successfully.
AIM:
The aim of a t-test is to determine whether the mean of a sample is statistically significantly different
from the hypothesized population mean.
ALGORITHM:
1. The algorithm for a one-sample t-test involves the following steps:
2. State the null and alternative hypotheses: The null hypothesis (H0) is that there is no
significant difference between the sample mean and the population mean, while the
alternative hypothesis (Ha) is that there is a significant difference.
3. Determine the level of significance: Choose the level of significance, α, that will be used to
test the hypothesis. Typically, α is set at 0.05 or 0.01.
4. Collect data: Collect a random sample from the population of interest, and calculate the
sample mean and sample standard deviation.
5. Calculate the test statistic: Calculate the t-test statistic using the formula: t = (x̄ - μ) / (s / √n)
6. where x is the sample mean, μ is the hypothesized population mean, s is the sample
7. standard deviation, and n is the sample size.
8. Determine the degrees of freedom: Determine the degrees of freedom for the t- distribution
using the formula: df = n - 1.
9. Determine the critical value: Determine the critical value of t at the chosen level of
significance and degrees of freedom.
10. Compare the test statistic to the critical value: If the absolute value of the test statistic is
greater than the critical value, reject the null hypothesis. If the absolute value of the test
statistic is less than the critical value, fail to reject the null hypothesis.
11. Interpret the results: If the null hypothesis is rejected, it can be concluded that the sample
mean is significantly different from the hypothesized population mean at the chosen level of
significance. If the null hypothesis is not rejected, it can be concluded that there is not enough
evidence to support the alternative hypothesis.
PROGRAM:
# Importing the required libraries and packages
import numpy as np
from scipy import stats
# Defining two random distributions # Sample Size
N = 10
# Gaussian distributed data with mean = 2 and var = 1
x = np.random.randn(N) + 2
# Gaussian distributed data with mean = 0 and var = 1
y = np.random.randn(N)
# Calculating the Standard Deviation
# Calculating the variance to get the standard deviation
var_x = x.var(ddof = 1)
var_y = y.var(ddof = 1)
# Standard Deviation
SD = np.sqrt((var_x + var_y) / 2)
print("Standard Deviation =", SD)
#Calculating the T-Statistics
tval = (x.mean() - y.mean()) / (SD * np.sqrt(2 / N))
# Comparing with the critical T-Value
# Degrees of freedom
dof = 2 * N - 2
# p-value after comparison with the T-Statistics
pval = 1 - stats.t.cdf(tval, df = dof)
print("t = " + str(tval))
print("p = " + str(2 * pval))
OUTPUT:
Standard Deviation = 0.7642398582227466
t = 4.87688162540348
p = 0.0001212767169695983
t = 4.876881625403479
p = 0.00012127671696957205
RESULT:
Thus the python program for performing t-test is executed successfully.
AIM:
The aim of an ANOVA (Analysis of Variance) is to determine whether there is a significant
difference between the means of three or more groups.
ALGORITHM:
1. The algorithm for a one-way ANOVA involves the following steps:
2. State the null and alternative hypotheses: The null hypothesis (H0) is that there is no
significant difference between the means of the groups, while the alternative hypothesis (Ha)
is that there is a significant difference.
3. Determine the level of significance: Choose the level of significance, α, that will be used to
test the hypothesis. Typically, α is set at 0.05 or 0.01.
4. Collect data: Collect data from three or more groups, and calculate the mean and variance for
each group.
5. Calculate the sum of squares between groups: Calculate the sum of squares between groups
using the formula: SSbetween = ∑ni(x̄ i - x̄ )^2
6. whereni is the sample size for group i, x̄ i is the mean of group i, and x̄ is the overall mean.
7. Calculate the sum of squares within groups: Calculate the sum of squares within groups using
the formula:SSwithin = ∑∑(xi - x̄ i)^2
8. where xi is the value of the ith observation in the jth group, x̄ i is the mean of group j, and j is
the number of groups.
9. Calculate the F-statistic: Calculate the F-statistic using the formula: F = (SSbetween / (k-1)) /
(SSwithin / (N-k))
10. where k is the number of groups, and N is the total number of observations.
11. Determine the critical value: Determine the critical value of F at the chosen level of
significance and degrees of freedom.
12. Compare the F-statistic to the critical value: If the F-statistic is greater than the critical value,
reject the null hypothesis. If the F-statistic is less than the critical value, fail to reject the null
hypothesis.
13. Interpret the results: If the null hypothesis is rejected, it can be concluded that there is a
significant difference between the means of the groups at the chosen level of significance. If
the null hypothesis is not rejected, it can be concluded that there is not enough evidence to
support the alternative hypothesis.
PROGRAM:
One Way ANOVA
# Importing library
from scipy.stats import f_oneway
#create data
df = pd.DataFrame({'water': np.repeat(['daily', 'weekly'], 15), 'sun': np.tile(np.repeat(['low', 'med',
'high'], 5), 2), 'height': [6, 6, 6, 5, 6, 5, 5, 6, 4, 5, 6, 6, 7, 8, 7, 3, 4, 4, 4, 5, 4, 4, 4, 4, 4, 5, 6, 6, 7, 8]})
import statsmodels.api as sm
from statsmodels.formula.api import ols
OUTPUT:
One Way ANOVA:
F_onewayResult(statistic=4.625000000000002, pvalue=0.016336459839780215)
RESULT:
Thus the python program for performing ANOVA is executed successfully.
AIM:
The aim of building and validating linear models is to create a model that accurately describes the
relationship between a dependent variable and one or more independent variables, and to determine
whether the model is a good fit for the data.
ALGORITHM:
1. 1.Collect data: Collect data on the dependent variable and one or more independent variables.
2. Choose a linear model: Choose a linear model that describes the relationship between the
dependent variable and independent variable(s). A simple linear model has one independent
variable, while a multiple linear model has two or more independent variables.
3. Estimate model coefficients: Use a statistical software package to estimate the coefficients of
the linear model that best fit the data. The most common method for doing this is least
squares regression.
4. Evaluate model fit: Evaluate the fit of the model by examining the residual plots, which show
the difference between the predicted and actual values of the dependent variable. A good
model will have residuals that are randomly distributed around zero, with no discernible
patterns.
5. Test for significance: Test the significance of the model by calculating the p-value for the
overall F-test of the model. A low p-value indicates that the model is a good fit for the data.
6. Evaluate individual coefficients: Evaluate the significance of individual coefficients in the
model by calculating their t-values and p-values. A low p-value indicates that the coefficient
is significant and should be included in the model.
7. Validate the model: Validate the model by testing it on new data that was not used to estimate
the coefficients. This can be done by using a hold-out sample, or by using cross- validation
techniques.
8. Refine the model: Refine the model by making adjustments to the model specification, such
as adding or removing variables, transforming variables, or adding interaction terms.
9. Interpret the results: Interpret the coefficients of the model in terms of the relationship
between the dependent variable and independent variable(s).
PROGRAM
import numpy as np
from sklearn.linear_model import LinearRegression
x = np.array([5, 15, 25, 35, 45, 55]).reshape((-1, 1))
y = np.array([5, 20, 14, 32, 22, 38])
model = LinearRegression()
model.fit(x, y)
model = LinearRegression().fit(x, y)
r_sq = model.score(x, y)
print(f"coefficient of determination: {r_sq}")
print(f"intercept: {model.intercept_}")
print(f"slope: {model.coef_}")
new_model = LinearRegression().fit(x, y.reshape((-1, 1)))
print(f"intercept: {new_model.intercept_}")
print(f"slope: {new_model.coef_}")
y_pred = model.predict(x)
print(f"predicted response:\n{y_pred}")
y_pred = model.intercept_ + model.coef_ * x
print(f"predicted response:\n{y_pred}")
plt.scatter(y_pred, x)
OUTPUT:
coefficient of determination: 0.7158756137479542
intercept: 5.633333333333329
slope: [0.54]
intercept: [5.63333333]
slope: [[0.54]]
predicted response:
[ 8.33333333 13.73333333 19.13333333 24.53333333 29.93333333 35.33333333]
predicted response:
[[ 8.33333333]
[13.73333333]
[19.13333333]
[24.53333333]
[29.93333333]
[35.33333333]]
RESULT:
Thus the python program for building and validating linear models is executed successfully.
Aim:
The aim of building and validating logistic models is to create a model that accurately predicts the
probability of a binary outcome (e.g., success or failure) based on one or more independent variables,
and to determine whether the model is a good fit for the data.
Algorithm:
The algorithm for building and validating logistic models involves the following steps:
1. Collect data: Collect data on the binary outcome variable and one or more independent
variables.
2. Choose a logistic model: Choose a logistic model that describes the relationship between the
dependent variable and independent variable(s). A simple logistic model has one independent
variable, while a multiple logistic model has two or more independent variables.
3. Estimate model coefficients: Use a statistical software package to estimate the coefficients of
the logistic model that best fit the data. The most common method for doing this is maximum
likelihood estimation.
4. Evaluate model fit: Evaluate the fit of the model by examining the goodness-of-fit statistics,
such as the deviance, the Akaike Information Criterion (AIC), and the Bayesian Information
Criterion (BIC). A good model will have a low deviance and low values of AIC and BIC.
5. Test for significance: Test the significance of the model by calculating the p-value for the
overall chi-square test of the model. A low p-value indicates that the model is a good fit for
the data.
6. Evaluate individual coefficients: Evaluate the significance of individual coefficients in the
model by calculating their Wald test statistics and p-values. A low p-value indicates that the
coefficient is significant and should be included in the model.
7. Validate the model: Validate the model by testing it on new data that was not used to estimate
the coefficients. This can be done by using a hold-out sample, or by using cross- validation
techniques.
8. Refine the model: Refine the model by making adjustments to the model specification, such
as adding or removing variables, transforming variables, or adding interaction terms.
9. Interpret the results: Interpret the coefficients of the model in terms of the relationship
between the independent variable(s) and the probability of the binary outcome.
Program:
import numpy
from sklearn import linear_model
X = numpy.array([3.78, 2.44, 2.09, 0.14, 1.72, 1.65, 4.92, 4.37, 4.96, 4.52, 3.69, 5.88]).reshape(-1,1)
y = numpy.array([0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1])
logr = linear_model.LogisticRegression()
logr.fit(X,y)
print(logit2prob(logr, X))
plt.scatter(logit2prob(logr, X), X)
OUTPUT:
[[0.60749955]
[0.19268876]
[0.12775886]
[0.00955221]
[0.08038616]
[0.07345637]
[0.88362743]
[0.77901378]
[0.88924409]
[0.81293497]
[0.57719129]
[0.96664243]]
RESULT:
Thus the python program for building and validating logistic models is executed successfully.
AIM:
The aim of performing time series analysis is to model and forecast the behavior of a time series data
over a period of time, using statistical methods, in order to identify patterns, trends, and seasonality in
the data.
ALGORITHM:
The algorithm for performing time series analysis involves the following steps:
1. 1.Collect data: Collect data on the time series variable over a period of time.
2. 2.Visualize the data: Plot the time series data to identify patterns, trends, and seasonality.
3.Decompose the time series: Decompose the time series into its components, which are
trend, seasonality, and residual variation. This can be done using techniques such as moving
averages, exponential smoothing, or the Box-Jenkins method.
3. Model the trend: Model the trend component of the time series using techniques such as linear
regression, exponential smoothing, or ARIMA models.
4. Model the seasonality: Model the seasonality component of the time series using techniques
such as seasonal decomposition, dummy variables, or Fourier series.
5. Model the residual variation: Model the residual variation component of the time series using
techniques such as autoregressive models, moving average models, or ARIMA models.
6. Choose the best model: Evaluate the fit of the different models using measures such as AIC,
BIC, and RMSE, and choose the model that best fits the data.
7. Forecast future values: Use the chosen model to forecast future values of the time series
variable.
8. Validate the model: Validate the model by comparing the forecasted values with actual values
from a hold-out sample, or by using cross-validation techniques.
9. Refine the model: Refine the model by making adjustments to the model specification, such
as adding or removing variables, transforming variables, or adding interaction terms.
10. Interpret the results: Interpret the results of the time series analysis in terms of the patterns,
trends, and seasonality of the data, and use the forecasted values to make predictions and
inform decision-making.
PROGRAM
# import modules
import pandas as pd
import matplotlib.pyplot as plt
import datetime
import numpy as np
# create dataframe
dataframe = pd.DataFrame({'date_of_week': np.array([datetime.datetime(2021, 11, i+1)
for i in range(7)]), 'classes': [5, 6, 8, 2, 3, 7, 4]})
# Plotting the time series of given dataframe
plt.plot(dataframe.date_of_week, dataframe.classes)
# Giving title to the chart using plt.title
plt.title('Classes by Date')
# rotating the x-axis tick labels at 30degree
# towards right
plt.xticks(rotation=30, ha='right')
# Providing x and y label to the chart
plt.xlabel('Date')
plt.ylabel('Classes')
OUTPUT:
RESULT:
Thus the python program for performing time series analysis is executed Successfully.
AIM:
To make a graphical representation of data using colors to visualize the value of the matrix.
ALGORITHM:
1. Import the seaborn library: To use seaborn, you need to import the library first.
2. Explore the data: Use functions like head(), tail(), info(), describe(), and shape to get a sense
of the data you're working with.
3. Select and filter data: Use indexing and slicing to select subsets of the data. You can use the
loc[] and iloc[] methods to select rows and columns based on labels or indices. You can also
filter rows based on conditions using the query() or loc[] method.
4. Manipulate the data: Use pandas functions to manipulate the data, such as adding or dropping
columns, renaming columns, and aggregating data.
5. Handle missing data: Use functions like isnull(), dropna(), fillna(), and interpolate() to handle
missing data in the data frame.
6. Save the data: Use the to_csv(), to_excel(), or to_sql() methods to save the data frame to a file
or database.
PROGRAM:
# importing the modules
import numpy as np
import seaborn as sn
import matplotlib.pyplot as plt
OUTPUT:
The data to be plotted:
[[46 30 55 86 42 94 31 56 21 7]
[68 42 95 28 93 13 90 27 14 65]
[73 84 92 66 16 15 57 36 46 84]
[ 7 11 41 37 8 41 96 53 51 72]
[52 64 1 80 33 30 91 80 28 88]
[19 93 64 23 72 15 39 35 62 3]
[51 45 51 17 83 37 81 31 62 10]
[ 9 28 30 47 73 96 10 43 30 2]
[74 28 34 26 2 70 82 53 97 96]
[86 13 60 51 95 26 22 29 14 29]]
RESULT:
Thus the python program for performing heat map is executed Successfully.
AIM:
To make a interactive data visualization with Bokeh.
ALGORITHM:
1. Import the seaborn library: To use seaborn, you need to import the library first.
2. Explore the data: Use functions like head(), tail(), info(), describe(), and shape to get a sense
of the data you're working with.
3. Select and filter data: Use indexing and slicing to select subsets of the data. You can use the
loc[] and iloc[] methods to select rows and columns based on labels or indices. You can also
filter rows based on conditions using the query() or loc[] method.
4. Manipulate the data: Use pandas functions to manipulate the data, such as adding or dropping
columns, renaming columns, and aggregating data.
5. Handle missing data: Use functions like isnull(), dropna(), fillna(), and interpolate() to handle
missing data in the data frame.
6. Save the data: Use the to_csv(), to_excel(), or to_sql() methods to save the data frame to a file
or database.
PROGRAM:
from bokeh.io import curdoc
from bokeh.plotting import figure, output_file, show
x = [1, 2, 3, 4, 5]
y = [6, 7, 6, 4, 5]
output_file("demo.html")
p = figure(title='demo', width=300, height=300,
toolbar_location="below")
p.circle(x, y)
show(p)
OUTPUT:
RESULT:
Thus the python program for performing heat map is executed successfully.