Fods Lab
Fods Lab
lOMoARcPSD|20936916
lOMoARcPSD|20936916
CONTENTS
lOMoARcPSD|20936916
Page
S.no Experiment name No.
Download, install and explore the features of NumPy, SciPy, Jupyter, Statsmodels
1 3
and Pandas packages.
Reading data from text files, Excel and the web and exploring various commands 15
4
for doing descriptive analytics on the Iris data set
5d Compare the results of the above analysis for the two data sets 28
6a Normal curves 29
6d Histograms 38
AIM:
To download, install numpy package and explore its features.
NUMPY STUDY:
NumPy is a general-purpose array-processing package. It provides a high-performance
multidimensional array object, and tools for working with these arrays. It is the fundamental
package for scientific computing with Python. NumPy’s main object is the homogeneous
multidimensional array. NumPy’s array class is called ndarray. It is also known by the alias
array.
PROCEDURE TO INSTALL:
Step 1: Open Command Prompt.
Step 2: Install pip using command “pip install”.
Step 3: Use the command “pip install numpy” to install Numpy package.
Step 4: After it is installed, import the package to check and use it.
FEATURES:
• A powerful N-dimensional array object
• Sophisticated (broadcasting) functions
• Tools for integrating C/C++ and Fortran code
• Useful linear algebra, Fourier transform, and random number capabilities
RESULT:
Thus the package NUMPY was installed, imported and features are explored.
lOMoARcPSD|20936916
EX NO:1b SCIPY
Study:
SciPy stands for Scientific Python. It provides more utility functions for optimization, stats
and signal processing. Like NumPy, SciPy is open source so we can use it freely. SciPy was
created by NumPy's creator Travis Olliphant. The SciPy library supports integration, gradient
optimization, special functions, ordinary differential equation solvers, parallel
programming tools, and many more. We can say that SciPy implementation exists in every
complex numerical computation. The scipy is a data-processing and system-prototyping
environment as similar to MATLAB. It is easy to use and provides great flexibility to scientists
and engineers.
Uses:
• SciPy has optimized and added functions that are frequently used in NumPy and Data Science.
• SciPy contains varieties of sub packages which help to solve the most common issue related to Scientific
Computation.
• SciPy package in Python is the most used Scientific library only second to GNU
Scientific Library for C/C++ or Matlab’s.
• Easy to use and understand as well as fast computational power.
Aim:
To download, install numpy package and explore its features.
Features:
• It supports integration, gradient optimization and special functions.
• It can operate on an array of NumPy library
• It is the most used Scientific library.
Procedure to Install:
Step 1: Open Command Prompt.
Step 2: Install pip using command “pip install”.
Step 3: Use the command “pip install scipy” to install Scipy package.
Step 4: After it is installed, import the package to check and use it.
lOMoARcPSD|20936916
Output:
RESULT:
Thus the package SCIPY was installed, imported and features are explored.
lOMoARcPSD|20936916
EX NO:1C JUPYTER
:Study
Jupyter notebook is a tool for developing open source data science projects in different
languages such as Python and R. It is an interactive, open source browser based application.
and allows running code in the browser. It doesn’t just consist of code but can also contain
output,mathematical equations and other narratives . So it can be used for sharing data science
projects which can consist of code and reports. It can be installed using the Python pip
command. If you are using Anaconda then it is
automatically installed as part of the Anaconda installation.
Features
➔ .It is used for creating and sharing computational documents
➔ .It offers a simple,streamlined document and centric experience
➔ .It integrates with many programming languages like Python, PHP, R, C#, etc
Procedure
Step 1.Open the command prompt :
Step 2. ”Install pip using the command “pip install :
Step 3. ”To install the Jupyter Notebook, type the command: “pip install jupyter :
Step 4 Once the installation process is completed, you can run your notebook :
. ”on the server using the command “jupyter notebook
Step 5 To upgrade the jupyter notebook type the command “pip install notebook :
”upgrade--
Output
C:\Users\Lenovo>pip install jupyter
Collecting jupyter
Using cached jupyter-1.0.0-py2.py3-none-any.whl (2.7 kB)
Collecting notebook
Downloading notebook-6.5.2-py3-none-any.whl (439 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
kB 1.0 MB/s eta 0:00:00 439.1/439.1
Collecting qtconsole
Downloading qtconsole-5.4.0-py3-none-any.whl (121 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
kB 1.0 MB/s eta 0:00:00 121.0/121.0
Collecting jupyter-console
Using cached jupyter_console-6.4.4-py3-none-any.whl (22 kB)
Collecting nbconvert
Downloading nbconvert-7.2.7-py3-none-any.whl (273 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
kB 1.2 MB/s eta 0:00:00 273.2/273.2
Collecting ipykernel
Downloading ipykernel-6.20.1-py3-none-any.whl (149 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
kB 1.1 MB/s eta 0:00:00 149.2/149.2
Collecting ipywidgets
Downloading ipywidgets-8.0.4-py3-none-any.whl (137 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
kB 910.9 kB/s eta 0:00:00 137.8/137.8
Installing collected packages
lOMoARcPSD|20936916
…Successfully installed
Result
Thus the Jupyter notebook is downloaded and installed successfully
EX NO:1d STATSMODELS
Study:
Statsmodels is a Python module that provides classes and functions for the estimation of
many different statistical models, as well as for conducting statistical tests, and statistical data
exploration. An extensive list of result statistics are available for each estimator. The results are
tested against existing statistical packages to ensure that they are correct. The package is
released under the open source Modified BSD (3-clause) license. The online documentation is
hosted at statsmodels.org.
Aim:
To download, install and explore the features of Statsmodels package.
Features:
➔ It provides classes and functions for the estimation of many different statistical models.
➔ It offers classes and functions for conducting statistical tests.
➔ It provides classes and functions for statistical data exploration.
Procedure:
Step 1 : Open the command prompt.
Step 2 : Install pip using the command “pip install” .
lOMoARcPSD|20936916
Step 3 : To install the statsmodels, type the command: “pip install statsmodels” .
Step 4 : Once the installation process is completed, you can import and use
stas models in python shell.
lOMoARcPSD|20936916
Output:
Result:
Thus the Statsmodels is downloaded and installed successfully.
lOMoARcPSD|20936916
EX.NO.1E PANDAS
STUDY:
It is a software library written for the python programming language data
manipulation and analysis.In particular,it offers data structures and Operations
for manuplating numerical tables and time series.It is free software Relaxed
under the three clause BSD License.The name Pandas has a reference to booth
Panel data and Python Data Analysis and was created by Wes Mckinney in 2008.
Pandas generally provide two data structures for manuplating data,They are:
• Series
• Index
• Dataframe
It can be used for:
• Data from different file objects can be loaded.
• Easy handling of missing data (represented by NaN) in floating point as well as non-
floating point data .
• Size mutability:Columns can be inserted and deleted from dataframe and higher
Dimensional Objects.
• Provides time-series functionality.
• Powerful group by functionality for performing split-apply-combine operations on data
sets.
Aim:
To download , install and explore the features of Pandas Package.
Features:
• Fast and efficient Dataframe object with default and customized indexing.
• Reshaping and pivoting of Datasets.
• Tools for loading data into in-measure data objects from different
file formats.
Procedure:
Step 1: Open the command Prompt.
Step 2: Install pip using the command “pip install”.
Step 3: To install the Pandas Notebook ,type the command: “pip install
pandas ”.
Step 4: Once the installation process is completed, you can run your notebook on the
server using the command “Pandas Notebook”.
Step 5: To upgrade the Pandas notebook type the command “pip install
Notebook upgrade”.
lOMoARcPSD|20936916
Output:
C:\Users\USER>pip install pandas
Collecting pandas
Using cached pandas-1.5.2-cp311-cp311-win_amd64.whl (10.3 MB)
Requirement already satisfied: python-dateutil>=2.8.1 in d:\pthon works\lib\site-packages
(from pandas) (2.8.2)
Data Science Manual R-2021
lOMoARcPSD|20936916
Result:
Thus the Pandas notebook is downloaded and installed successfully.
lOMoARcPSD|20936916
EX.NO:2 NUMPY
STUDY:
Python lists are a substitute for arrays, but they fail to deliver the performance required
while computing large sets of numerical data. To address this issue we use a python library
called NumPy. The word NumPy stands for Numerical Python. NumPy offers an array object
called ndarray. They are similar to standard python sequences but differ in certain key factors.
Unlike lists, NumPy arrays are of fixed size, and changing the size of an array will lead to the
creation of a new array while the original array will be deleted.All the elements in an array are of
the same type.Numpy arrays are faster, more efficient, and require less syntax than standard
python sequences.
AIM:
To implement python program to make numpy operation with array.
ALGORITHM:
STEP 1: Start
STEP 2: Read the elements in array
STEP 3: Read the choice
STEP 4: Repeat the following steps 5,6,7,8,9 while choice is not equal to 5
STEP 5: If ch=1, read value and index, insert the value using insert() function
STEP 6: If ch=2,read the element and search and return the index using where() function
STEP 7: If ch=3, sort the elements using sort() function
STEP 8: If ch=4, read index and delete element usind delete() function
STEP 9: If ch=5, print exit
STEP 9: If nothing matches, print invalid choice
STEP 10: Stop
PROGRAM:
elif ch==4:
e=int(input("Enter position of the element to be deleted :"))
a=np.delete(a,e)
print("Array after deletion:",a)
elif ch==5:
print("EXIT")
else:
print("Invalid choice")
OUTPUT:
RESULT:
Thus the python program to make numpy operation with array is written,executed and
the output is verified successfully.
lOMoARcPSD|20936916
STUDY:
The Pandas DataFrame is a structure that contains two-dimensional data and its
corresponding labels. DataFrames are widely used in data science, machine learning, scientific
computing, and many other data-intensive fields. DataFrames are similar to SQL tables or the
spreadsheets that you work with in Excel or Calc. In many cases, DataFrames are faster, easier
to use, and more powerful than tables or spreadsheets because they’re an integral part of the
Python and NumPy ecosystems.
AIM:
To write a python program to import pandas dataframe.
ALGORITHM:
STEP 1 : Start.
STEP 2 : Import the pandas package.
STEP 3 : Create the dataframe for the list of elements,
STEP 3.1: roll number, name, datascience, datastructure, oops, maths.
STEP 4 : Display the table using pd.DataFrame().
STEP 5 : Display the output.
STEP 6 : Stop.
PROGRAM:
import pandas as pd
n=int(input("Enter the total number of students:"))
rollnumber=[]
name=[]
datascience=[]
datastructure=[]
oops=[]
maths=[]
result=[]
for i in range(n):
rn=int(input("Enter the roll number"))
rollnumber.append(rn)
n=input("Enter the name")
name.append(n)
dsc=int(input("Enter the datascience mark"))
datascience.append(dsc)
ds=int(input("Enter the datastructure mark"))
datastructure.append(ds)
o=int(input("Enter the oops mark"))
oops.append(o)
m=int(input("Enter the maths mark"))
maths.append(m)
if( (dsc>34)& (ds>34)& (o>34)& (m>34)):
r=["Qualified"]
else:
r=["failed"]
result.append(r)
l=pd.DataFrame({"rollnumber":rollnumber,"name":name,
lOMoARcPSD|20936916
"datascience":datascience,"datastructure":datastructure,"oops":oops,
"maths":maths,"result":result})
print(l)
OUTPUT
:
Enter the total number of students:2
Enter the roll number1
Enter the nameA
Enter the datascience mark90
Enter the datastructure mark89
Enter the oops mark90
Enter the maths mark89
Enter the roll number2
Enter the nameB
Enter the datascience mark99
Enter the datastructure mark89
Enter the oops mark99
Enter the maths mark89
rollnumber name datascience datastructure oops maths result
0 1 A 90 89 90 89 [Qualified]
1 2 B 99 89 99 89 [Qualified]
RESULT:
Thus the python program to import pandas dataframe is written, executed and the
output is verified successfully.
lOMoARcPSD|20936916
AIM:
To write a python program to read data from text files, Excel and from web and to explore
various commands for doing descriptive analytics on the Iris Dataset.
STUDY:
The IRIS dataset is a collection of data that is used to demonstrate the properties of
various statistical models. It contains information about 50 observations on four different
variables: Petal Length, Petal Width, Sepal Length, and Sepal Width. The dataset is often used in
data mining, classification and clustering examples and to test algorithms. Information about the
original paper and usages of the dataset can be found in the UCI Machine Learning
Repository. This data sets consists of 3 different types of irises' (Setosa, Versicolour, and
Virginica) petal and sepal length, stored in a 150x4 numpy.ndarray. The rows being the samples
and the columns being: Sepal Length, Sepal Width, Petal Length and Petal Width.
ALGORITHM:
STEP 1: start the program.
STEP 2: Import the required modules.
STEP 3: Read the iris dataset(text file, Excel, web)
STEP 4: Read the specific column name for calculating mean, median,mode,
sum,maximum,
minimum.
STEP 5: Display the final dataset after replacing the old column name with the new
column
name.
STEP 6: stop
PROGRAM:
import pandas as pd
d=pd.read_csv(“filename”)
print(“The length of the dataset:”, len(d))
start=int(input(“Enter the start value:”))
stop=int(input(“Enter the stop value”))
print(“After silicing:\n”, d[start:stop-1])
n=int(input(“Enter the specific column to display:”))
print(d.iloc[n]) Print(“Mean:”,d[d].mean(),”\
nMedian:”,d[d].median(),
”\nSum:”,d[d].sum(),”\nMinimum:”,d[d].min(),
”\nMaximum:”,d[d].max())
lOMoARcPSD|20936916
OUTPUT:
The length of the dataset: 150
Enter the start value: 2
Enter the stop value: 5
After slicing:
Sepallength sepalwidth petallength petalwidth class
Iris
5.1 3.5 1.4 0.2 setosa
Iris-
4.9 3 1.4 0.2 setosa
Iris-
4.7 3.2 1.3 0.2 setosa
Dataset:
RESULT
Thus the python program to read the dataset from text files, Excel and from the web, to
explore various commands for doing descriptive analytics on the iris dataset was successfully
executed and the output was verified successfully.
lOMoARcPSD|20936916
AIM:
To perform univariate analysis in diabetes data set from uci and pima Indians diabetes
data set.
STUDY:
UNIVARIATE ANALYSIS:
It is the simplest form of statistical analysis. It can be inferential or descriptive. The key
fact is that only one variable is involved. It involves finding the
frequency ,mean ,mode ,median ,standard deviation ,skewness and kurtosis of the variable taken
into concern. It can yield misleading results in cases in which multivariate analysis is more
appropriate.
ALGORITHM:
Step 1: Start
STEP 2: Import pandas and scipy packages.
STEP 3: Download required datasets.
STEP 4: Read dataset using pandas read_csv().
STEP 5: Display top contents of dataset using head().
STEP 6: Display bottom contents of dataset using tail().
STEP 7: Read column to calculate mean and display mean using mean().
STEP 8: Read column to calculate median and display median using median().
STEP 9: Read column to calculate mode and display mode using mode().
STEP 10: Read column name to find frequency and display frequency using value_counts().
STEP 11: Read column name to find variance and display variance using var().
STEP 12: Read column to find standard deviation and display using std().
STEP 13: Read column name to find skewness and display using skew().
STEP 14: Read column name to find kurtosis and display using
kurtosis(). STEP 15: Stop.
PROGRAM:
import pandas as pd
import numpy as np
from scipy.stats import kurtosis
from scipy.stats import skew
print(“ press 1 for UCI diabetes dataset and 2 for Pima Indians diabetes dataset”)
ch=input(“Enter choice:”)
if(ch==1):
d=pd.read_csv(“ uci diabetes.csv”)
else:
d=pd.read_csv(“pima diabetes.csv”)
lOMoARcPSD|20936916
OUTPUT:
press 1 for UCI diabetes dataset and 2 for Pima Indians diabetes dataset
Enter choice:1
Top rows: Pregnancies Glucose Blood Pressure … Diabetes Pedigree
Function Age Outcome 0 6 148 72 …
0.627 50 1
1 1 85 66 … 0.351 31 0
2 8 183 64 … 0.672 32 1
3 1 89 66 … 0.167 21 0
4 0 137 40 … 2.288 33 1
[5 rows x 9 colums]
Bottom rows:
Pregnancies
Glucose
BloodPressure …
Diabetes Pedigree
Function
Age Outcome 763 10 101 64 … 0.672
63 0
764 2 122 72 … 0.627 27 0
765 5 121 66 … 0.167 30 0
766 1 126 66 … 0.351 47 1
767 1 93 40 … 2.288 23 0
[5 rows x 9 columns]
Enter column name to find mean : Age
Mean of Age is
33.240882416666664
Enter column name to find median : Insulin
Median of Insulin is
30.5
Enter column name to find mode : Blood Pressure
Mode of Blood Pressure is
lOMoARcPSD|20936916
0 70
Name : Blood Pressure , dtype : int64
Enter column name to find frequency : Glucose
99 17
100 17
111 14
129 14
125 14
.. ..
191 1
177 1
44 1
62 1
190 1
Name : Glucose , length : 136 , dtype : int64
Enter column name to find variance : Age
Variance of Age :
138.30304589037377
Enter column name to find standard deviation : Glucose
Standard deviation of glucose :
31.97261819513622
Enter column name to find skewness : Age
The skewness value :
1.27389259531697
Enter column name to find kurtosis : Age
The kurtosis value :
0.6311769413798585
RESULT:
Thus python program to perform univariate analysis in diabetes dataset was
written,executed and output was verified.
lOMoARcPSD|20936916
STUDY:
It is a statistical technique applied to a pair of variable ( Features \ Attributes ) of data
to determine the empirical relationship between them .In other words it is meant to determine
any current relation usually over and above a simple correlation analysis
Example:
If you are studying a group of college students to find out their average . SAT score and their
age . you have two pieces of puzzle to find.
AIM:
To write a python program to perform bivariate analysis in diabetes dataset from Pima
Indian data set
plt.scatter(x,y)
lin.fit(x,y)
plt.xlabel(“BMI”)
plt.ylabel(“Diabetes”)
y.pred=lin.predict(x)
plt.plot(x,y-pred,color=’Red’)
plt.show()
RESULT
Thus the program to perform bivariate analysis in diabetes dataset from pima was written ,
executed and output was verified successfully.
lOMoARcPSD|20936916
AIM:
To write a python program to perform bivariate analysis in diabetes dataset from UCI
data set
\
lOMoARcPSD|20936916
RESULT:
Thus the program to perform bivariate analysis in diabetes dataset from UCI was written
,executed and output was verified successfully.
lOMoARcPSD|20936916
STUDY:
Multiple Regression Analysis works by considering the values of the available multiple
independent variables and predicting the value of one dependent variable.
AIM:
lOMoARcPSD|20936916
To perform Multiple Regression Analysis for the diabetes dataset from UCI and PIMA
Indian Dataset.
ALGORITHM:
STEP 1: Start
STEP 2: Display the choices for datasets
STEP 3: Get the choice
STEP 4: Repeat the following STEPs until the choice is not
equal to 3
4.1: If choice is 1, read the diabetes dataset
4.2: If choice is 2, read the PIMA Indians diabetes
Dataset
4.3: Otherwise, print Invalid choice
STEP 5: Import necessary modules
STEP 6: Reshape the category of dataset to be suitable
for the analysis plotting
PROGRAM:
import numpy as np
import pandas as pd
import matplotlib as mpl
from mpl_toolkits.mplot3d import Axes3D
import matplotlib.pyplot as plt import seaborn as sns
print("Choices:\n1.Diabetes\n2.Pima indians\n3.Exit\n")
ch=int(input("Enter choice:"))
while(ch!=3):
if(ch==1):
d=pd.read_csv("pima.csv")
elif(ch==2):
d=pd.read_csv("Dataset of Diabetes.csv")
else:
print("Invalid choice")
d.head()
x=d.iloc[:,5].values.reshape(-1,1)
y=d.iloc[:,1].values.reshape(-1,1)
fig=plt.figure()
ax=fig.add_subplot(projection='3d')
ax.scatter(xs=x,ys=y,c='red',s=5)
plt.show()
ch=int(input("Enter choice"))
lOMoARcPSD|20936916
OUTPUT:
Choices:
1.Diabetes
2.Pima indians
3.Exit
Enter choice:1
Enter choice:2
Enter choice :3
>>>
RESULT:
Thus, the multiple regression analysis on two given datasets was performed successfully.
lOMoARcPSD|20936916
AIM:
To write analysis for result of two data sets[UCI dataset and Pima Indians data
set].
STUDY:
To perform comparision for two data set under following analysis
o Univariate Analysis
o Bivariate Analysis
o Multiple Regression Analysis
UNIVARIATE ANALYSIS:
According to the data set you have downloaded calculate total number of columns and rows.
Take some columns from both data set to find mean,mode,median,frequency,skewness,kurtosi
from the above result of both data set analysis the difference between statistical values. If you
need to differentiate more accurately we can use some plotting techniques.
BIVARIATE ANALYSIS:
In bivariate we are going to calculate the linear and logistic regression to analysis
the above result.
LINEAR REGRESSION:
For Pima Indian and UCI dataset take two columns as x
and y xis. Calculate range between the two axis and
analyse the above result.
LOGISTIC REGRESSION:
For Pima Indian and UCI dataset take two columns as x
and y axis. Calculate range between the two axis and
analyse the above result.
RESULT:
Thus,we have performed the comparision for given analysis using both dataset.
lOMoARcPSD|20936916
Study:
The UCI Machine learning Repository is a collection of databases, domain
theories, data generators that are used by the machine learning community for
the empirical analysis of machine learning algorithms. Visit the UCI dataset.
Thus the graph is plotted using various plotting functions on UCI dataset.
Study:
The normal curve represents the shape of an important class of statistical probabilities.The
normal curve is used to characterize complex constructs containing continuous random
variables. Many phenomena observed in nature have been found to follow a normal
curves.Normal distribution is a probability function used in statistics that tells about how the
data values are distributed. It is the most important probability function used in statistics
because of its advantages in real case scenarios.
For example: -> the height the population
-> shoe size
-> IQ level
-> rolling a die and many more
Algorithm:
Step 1: Start and download the dataset.
Step 2: Import the numpy , scipy , pandas , seaborn packages ,plotting function
. matplotlib and statistics
Step 3: Read the UCI dataset
Step 4: Calculate the mean and standard deviation.
Step 5: Numpy function arange is used for in between the values for plot the
. graph.
Step 6: Plot is displayed.
Step7: Stop
Program:
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import norm
import statistics
import pandas as pd
d=pd.read_csv(“forestfires.csv”)
C=str(input(“Enter column name to plot normal:”))
m=d[c],mean()
Sd=d[c].std()
Print(“mean”,mean,”standard deviation”,std)
a=np.arange(-15,15,0.01)
Plt. plot(a,norm.pdf(a,norm.pdf(a,m,sd))
Plt.show()
lOMoARcPSD|20936916
Output:
Enter column name to plot normal: wind
lOMoARcPSD|20936916
Result:
Thus the python program to draw the normal curve using plotting functions for UCI dataset
was implemented and the output was verified successfully .
Aim:
To write a python program to explore the density and contour plot functions for UCI
dataset.
‘kernel smoothing’ while plotting the values. It is a continuous and smooth version of a
histogram inferred from a data.Density plots uses Kernel Density Estimation (so they are also
known as Kernel density estimation plots or KDE) which is a probability density function. The
region of plot with a higher peak is the region with maximum data points residing between those
values.
Algorithm 1:
Step 1: Start the program.
Step 2: import the modules and dataset.
Step 3: Read the column name to form density plot.
Step 4: Plot the density plot for the column using density() function.
Step 5: Display the plot.
Step 6: Stop the program.
Algorithm 2:
Step 1: Start the program.
Step 2: import the modules and dataset.
Step 3: Read the column names to form density plot.
Step 4: Plot the contour plot for the column using contour() function.
Step 5: Display the plot.
Step 6: Stop the program.
Program 1:
import pandas as pd
lOMoARcPSD|20936916
lOMoARcPSD|20936916
Program 2:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
data=pd.read_csv("forestfires.csv")
print(data.head(2))
a=input("Enter column name for x-axis:")
b=input("Enter column name for y-axis:")
x1=np.linspace(data[a].min(),data[a].max(),50)
y1=np.linspace(data[b].min(),data[b].max(),50)
x,y=np.meshgrid(x1,y1)
z=np.sin(x) ** 10 + np.cos(10 + y * x) * np.cos(x)
plt.xlabel(a)
plt.ylabel(b)
plt.title('Contour Plot')
plt.contour(x,y,z)
plt.show()
Output 1:
X Y month day FFMC DMC DC ISI temp RH wind rain area
0 7 5 mar fri 86.2 26.2 94.3 5.1 8.2 51 6.7 0.0 0.0
1 7 4 octtue 90.6 35.4 669.1 6.7 18.0 33 0.9 0.0 0.0
Enter column name to form density curve:wind
lOMoARcPSD|20936916
Output 2:
X Y month day FFMC DMC DC ISI temp RH wind rain area
0 7 5 mar fri 86.2 26.2 94.3 5.1 8.2 51 6.7 0.0 0.0
1 7 4 octtue 90.6 35.4 669.1 6.7 18.0 33 0.9 0.0 0.0
Enter column name for x-axis:temp
Enter column name for y-axis:ISI
lOMoARcPSD|20936916
Result:
lOMoARcPSD|20936916
Thus, the python program to explore the density and contour plot functions for UCI
dataset was written , executed and output was verified.
Aim:
To write a python program for correlation and scatterplot in UCI dataset.
Study:
Correlati
on:
A correlational research design investigates relationships between two
variables (or more) without the researcher controlling or manipulating any of them. It's a
non- experimental type of quantitative research.
Scatter plot:
A scatter plot is a type of data visualization that shows the relationship
between different variables. This data is shown by placing various data points between an x-
and y-axis. Essentially, each of these data points looks “scattered” around the graph, giving
this type of data visualization its name.
Algorithm:
Step 1: Start
Step 2: Read the following dataset and draw the scatter plots and correlation for two
selective columns.
Step 3: Using seaborn , draw the scatter plots and draw the lint to connect the plots.
Step 4: Stop.
Program:
Import pandas as
pd Import numpy as
np Import seaborn
as sns
Import maplotlib.pyplot as plt
Con=pd.read_csv(“foretfire.csv “)
Cormat=con.corr()
Sns.lmplot(x=”rain”,y=”temp”data=con)
Sns.catterplot(x=”rain”,y=”temp”,data=co
n) Plt.show()
Sns.heatmap(cormat)
Plt.show()
Output:
lOMoARcPSD|20936916
lOMoARcPSD|20936916
Result
Thus the python program for correlation and scatter plot in UCI dataset was written,
executed and outputs were verified.
EX.NO:6D HISTOGRAM
Aim:
To write a python program for plotting histogram for UCI datasets.
Study:
A histogram is a graphical representation of data points organized into user-specified ranges.
Similar in appearance to a bar graph, the histogram condenses a data series into an easily
interpreted visual by taking many data points and grouping them into logical ranges or bins.A
histogram divides up the range of possible values in a data set into classes or groups. For each
group, a rectangle is constructed with a base length equal to the range of values in that specific
group and a length equal to the number of observations falling into that group. A histogram has
an appearance similar to a vertical bar chart, but there are no gaps between the bars.
Generally, a histogram will have bars of equal width.
Algorithm:
Step 1: Start
Step 2: Import libraries
lOMoARcPSD|20936916
Program:
import matplotlib.pyplot as plt
import pandas as pd import
seaborn as sns
d=pd.read_csv("Forestfires.csv")
sns.histplot(d['wind'])
plt.show()
Output:
Result:
lOMoARcPSD|20936916
lOMoARcPSD|20936916
Thus the python program to plot histogram for UCI datasets was written, executed and output
was verified successfully.
lOMoARcPSD|20936916
Study:
Three -dimensional axes are enabled and data can be plotted in 3-dimensions. 3-dimension graph gives a
dynamic approach and makes data more interactive. Like 2-D graphs, we can use different ways to represent
3-D graph. We can make a scatter plot, contour plot, surface plot, etc. The 3d plots are enabled by importing
the mplot3d toolkit. The most basic three- dimensional plot is a line or collection of scatter plot created from
sets of (x, y, z) triples. In analogy with the more common two-dimensional plots discussed earlier, these can
be created using the ax.plot3D and ax.scatter3D functions.
Algorithm:
Step 1: Start
Step 2: Read the UCI dataset and draw three dimensional plotting using
the dataset.
Step 3: Perform various methods for the dataset
Step 4: Stop
Program:
import numpy as np
import pandas as pd
from mpl_toolkits.mplot3d import Axes3D import matplotlib.pyplot as plt
d=pd.read_csv("C:/Users/admin/Documents/mad/forestfires.csv")
x=((d[(d["DMC"]==35.4)]["DC"].values.reshape(-1,1)))
y=((d[(d["DMC"]==43.7)]["DC"].values.reshape(-1,1)))
z=((d[(d["DMC"]==33.3)]["DC"].values.reshape(-1,1))) l=[len(x),len(y),len(z)]
m=min(l) x=x[:m]
y=y[:m] z=z[:m]
np.random.seed(42)
xs=np.random.random(100)*10+0.2
ys=np.random.random(100)*5+4.0
lOMoARcPSD|20936916
zs=np.random.random(100)*
15+0.1 fig=plt.figure()
ax=fig.add_subplot(111,proj
ection='3d')
ax.set_xlabel("wind")
ax.set_ylabel("temp")
ax.set_zlabel("rain")
plt.show()
Output:
Result:
Thus the python program for three dimensional plotting using UCI dataset was written,
executed and output was verified.
Algorithm:
Step 1 : Start
Step 2 : Import the required packages
Step 3 : By using Basemap library create the required map using functions
. drawcoastlines(),fillcontinents(), drawcountries()
Step 4 : Display the plot
Step 5 : Stop
Program:
lOMoARcPSD|20936916
Result:
Thus the python program to visualize geographic data using basemap is written,
executed and the output is verified.