Ex. No: 1 Exploring The Features of Numpy, Scipy, Jupyter, Statsmodels and Pandas Date: 07/08/2024
Ex. No: 1 Exploring The Features of Numpy, Scipy, Jupyter, Statsmodels and Pandas Date: 07/08/2024
Ex. No: 1 Exploring The Features of Numpy, Scipy, Jupyter, Statsmodels and Pandas Date: 07/08/2024
NO: 1 Exploring the features of NumPy, SciPy, Jupyter, Statsmodels and Pandas
DATE: 07/08/2024
AIM:
To download, install and explore the features of NumPy, SciPy, Jupyter, Statsmodels and
Pandas packages and to read data from text file, excel and the web.
PACKAGES :
Pandas: A powerful Python library used for data manipulation and analysis, providing data
structures like DataFrames and Series, which are excellent for handling structured data with
rows and columns. NumPy: A foundational Python library for numerical computing that
provides support for large, multidimensional arrays and matrices, along with a collection of
mathematical functions to perform operations on these arrays.
SciPy: An open-source Python library built on NumPy, used for scientific and technical
computing. It includes modules for optimization, integration, interpolation, eigenvalue
problems, and more.
Statsmodels: A Python module that provides classes and functions for statistical modeling,
including linear regression, time series analysis, and hypothesis testing, making it useful for
statistical data analysis.
Altair: A declarative statistical visualization library for Python, built on Vega and Vega-Lite,
that allows users to create interactive, concise, and high-level charts with minimal code,
making it ideal for exploratory data analysis.
Matplotlib: A widely-used Python library for creating static, interactive, and animated
visualizations in 2D and 3D. It offers extensive customization options for plots, making it
suitable for creating a wide range of charts like line plots, scatter plots, histograms, and more.
CODE:
import pandas as pd
import numpy as np
1
height = [1.87, 1.87, 1.82, 1.91, 1.90, 1.85]
weight = [81.65, 97.52, 95.25, 92.98, 86.18, 88.45]
np_height = np.array(height)
np_weight = np.array(weight)
print(height)
print(np_height)
print(type(height))
print(type(np_height))
orgarr=np_height.tolist()
print(orgarr)
print(type(orgarr))
OUTPUT:
INFERENCE:
The difference is like we can do direct math operations in numpy array
CODE:
arr=np.array([1,2,3,4,5])
print(arr)
x=arr.copy()
arr[0]=42
print(x)
print(arr)
OUTPUT:
[1 2 3 4 5]
[1 2 3 4 5]
[42 2 3 4 5]
2
CODE:
sample = np.array([32,54,23,78,65,12,32,39,56])
filter_arr = sample > 50
new_arr = sample[filter_arr]
print(new_arr)
OUTPUT:
[54 78 65 56]
CODE:
a = np.array([[1,2],[3,4]])
for x in np.nditer(arr):
print(x)
OUTPUT:
42
2
3
4
5
CODE:
a = np.array([[4,5,2],[3,8,8],[12,32,6]])
b = np.array([[3,4,5],[6,5,2],[1,4,3]])
print(a+b)
print()
print(a-b)
OUTPUT:
[[ 7 9 7]
[ 9 13 10]
[13 36 9]]
[[ 1 1 -3]
3
[-3 3 6]
[11 28 3]]
CODE:
OUTPUT:
[42 2 3 4 5]
[42 2 3 4 5]
CODE:
OUTPUT:
[ 1 40 3 4 5] [1 2 3 4 5]
INFERENCE:
COPY MAKES A DEEPCOPY WHERE VIEW IS LINKED WITH THE ORIGINAL
PLACE
PANDAS
CODE:
import pandas as pd
a = [1, 7, 2]
myvar = pd.Series(a)
print(myvar)
4
OUTPUT:
0 1
1 7
2 2
CODE:
df=pd.read_csv("W2023.csv")
df.head()
OUTPUT:
CODE:
OUTPUT:
5
CODE:
plt.figure(figsize=(10, 6))
sns.scatterplot(x='GDP', y='Co2-Emissions', data=df, hue='Country', s=100)
plt.title('GDP vs CO2 Emissions by Country')
plt.xlabel('GDP')
plt.ylabel('CO2 Emissions')
plt.legend(bbox_to_anchor=(1.05, 1), loc='upper left')
OUTPUT:
CODE:
plt.figure(figsize=(10, 6))
sns.scatterplot(x='Birth Rate', y='Fertility Rate', data=df, hue='Country', s=100)
plt.title('Birth Rate vs Fertility Rate by Country')
plt.xlabel('Birth Rate')
plt.ylabel('Fertility Rate')
plt.legend(bbox_to_anchor=(1.05, 1), loc='upper left')
plt.show()
6
OUTPUT:
CODE:
OUTPUT:
7
CODE:
OUTPUT:
CODE:
import statsmodels.api as sm
model = sm.OLS(df['Y'], X)
results = model.fit()
print("\nRegression Results:")
print(results.summary())
8
OUTPUT:
RESULT:
Packages are explored and basic operations are performed on a sample dataset
successfully.