Ex. No: 1 Exploring The Features of Numpy, Scipy, Jupyter, Statsmodels and Pandas Date: 07/08/2024

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 9

EX.

NO: 1 Exploring the features of NumPy, SciPy, Jupyter, Statsmodels and Pandas

DATE: 07/08/2024

AIM:

To download, install and explore the features of NumPy, SciPy, Jupyter, Statsmodels and
Pandas packages and to read data from text file, excel and the web.

PACKAGES :

Pandas: A powerful Python library used for data manipulation and analysis, providing data
structures like DataFrames and Series, which are excellent for handling structured data with
rows and columns. NumPy: A foundational Python library for numerical computing that
provides support for large, multidimensional arrays and matrices, along with a collection of
mathematical functions to perform operations on these arrays.

SciPy: An open-source Python library built on NumPy, used for scientific and technical
computing. It includes modules for optimization, integration, interpolation, eigenvalue
problems, and more.

Statsmodels: A Python module that provides classes and functions for statistical modeling,
including linear regression, time series analysis, and hypothesis testing, making it useful for
statistical data analysis.

Altair: A declarative statistical visualization library for Python, built on Vega and Vega-Lite,
that allows users to create interactive, concise, and high-level charts with minimal code,
making it ideal for exploratory data analysis.

Matplotlib: A widely-used Python library for creating static, interactive, and animated
visualizations in 2D and 3D. It offers extensive customization options for plots, making it
suitable for creating a wide range of charts like line plots, scatter plots, histograms, and more.

CODE:

import pandas as pd
import numpy as np

1
height = [1.87, 1.87, 1.82, 1.91, 1.90, 1.85]
weight = [81.65, 97.52, 95.25, 92.98, 86.18, 88.45]

np_height = np.array(height)
np_weight = np.array(weight)
print(height)
print(np_height)
print(type(height))
print(type(np_height))

orgarr=np_height.tolist()
print(orgarr)
print(type(orgarr))

OUTPUT:

[1.87, 1.87, 1.82, 1.91, 1.9, 1.85]


[1.87 1.87 1.82 1.91 1.9 1.85]
<class 'list'>
<class 'numpy.ndarray'>
[1.87, 1.87, 1.82, 1.91, 1.9, 1.85]
<class 'list'>

INFERENCE:
The difference is like we can do direct math operations in numpy array

CODE:

arr=np.array([1,2,3,4,5])
print(arr)
x=arr.copy()
arr[0]=42
print(x)
print(arr)

OUTPUT:

[1 2 3 4 5]
[1 2 3 4 5]
[42 2 3 4 5]

2
CODE:

sample = np.array([32,54,23,78,65,12,32,39,56])
filter_arr = sample > 50
new_arr = sample[filter_arr]
print(new_arr)

OUTPUT:

[54 78 65 56]

CODE:

a = np.array([[1,2],[3,4]])
for x in np.nditer(arr):
print(x)

OUTPUT:

42
2
3
4
5

CODE:

a = np.array([[4,5,2],[3,8,8],[12,32,6]])
b = np.array([[3,4,5],[6,5,2],[1,4,3]])
print(a+b)
print()
print(a-b)

OUTPUT:

[[ 7 9 7]
[ 9 13 10]
[13 36 9]]

[[ 1 1 -3]

3
[-3 3 6]
[11 28 3]]

CODE:

arr = np.array([1, 2, 3, 4, 5])


x = arr.view()
arr[0] = 42
print(arr)
print(x)

OUTPUT:

[42 2 3 4 5]
[42 2 3 4 5]

CODE:

arr = np.array([1, 2, 3, 4, 5])


x = arr.copy()
arr[1] = 40
print(arr)
print(x)

OUTPUT:
[ 1 40 3 4 5] [1 2 3 4 5]

INFERENCE:
COPY MAKES A DEEPCOPY WHERE VIEW IS LINKED WITH THE ORIGINAL
PLACE

PANDAS

CODE:

import pandas as pd
a = [1, 7, 2]
myvar = pd.Series(a)
print(myvar)

4
OUTPUT:

0 1
1 7
2 2

CODE:

df=pd.read_csv("W2023.csv")
df.head()

OUTPUT:

CODE:

sns.histplot(df['Birth Rate'], bins=10, kde=True)


plt.title('Distribution of Birth Rate')
plt.xlabel('Birth Rate')
plt.ylabel('Frequency')

OUTPUT:

5
CODE:

plt.figure(figsize=(10, 6))
sns.scatterplot(x='GDP', y='Co2-Emissions', data=df, hue='Country', s=100)
plt.title('GDP vs CO2 Emissions by Country')
plt.xlabel('GDP')
plt.ylabel('CO2 Emissions')
plt.legend(bbox_to_anchor=(1.05, 1), loc='upper left')

OUTPUT:

CODE:

plt.figure(figsize=(10, 6))
sns.scatterplot(x='Birth Rate', y='Fertility Rate', data=df, hue='Country', s=100)
plt.title('Birth Rate vs Fertility Rate by Country')
plt.xlabel('Birth Rate')
plt.ylabel('Fertility Rate')
plt.legend(bbox_to_anchor=(1.05, 1), loc='upper left')
plt.show()

6
OUTPUT:

CODE:

top5_co2 = df.nlargest(5, 'Co2-Emissions')


plt.figure(figsize=(8, 8))
plt.pie(top5_co2['Co2-Emissions'], labels=top5_co2['Country'], autopct='%1.1f%%',
startangle=90, colors=sns.color_palette('Set2'))
plt.title('Top 5 Countries by CO2 Emissions')
plt.show()

OUTPUT:

7
CODE:

import scipy.stats as stats


a = np.arange(10)
stats.describe(a)

OUTPUT:

DescribeResult(nobs=10, minmax=(0, 9), mean=4.5, variance=9.166666666666668,


skewness=0.0, kurtosis=-1.2242424242424244)

CODE:

import statsmodels.api as sm

# Create a sample dataset


np.random.seed(0)
X = np.random.rand(100) * 10
Y = 3 * X + np.random.randn(100) * 2

df = pd.DataFrame({'X': X, 'Y': Y})


print("Sample DataFrame:")
print(df.head())

# Linear regression can be performed using statsmodels package


X = sm.add_constant(df['X'])

model = sm.OLS(df['Y'], X)
results = model.fit()

print("\nRegression Results:")
print(results.summary())

8
OUTPUT:

RESULT:

Packages are explored and basic operations are performed on a sample dataset
successfully.

You might also like