0% found this document useful (0 votes)
111 views9 pages

Ex. No: 1 Exploring The Features of Numpy, Scipy, Jupyter, Statsmodels and Pandas Date: 07/08/2024

DATA ANALYTICS 1
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
111 views9 pages

Ex. No: 1 Exploring The Features of Numpy, Scipy, Jupyter, Statsmodels and Pandas Date: 07/08/2024

DATA ANALYTICS 1
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 9

EX.

NO: 1 Exploring the features of NumPy, SciPy, Jupyter, Statsmodels and Pandas

DATE: 07/08/2024

AIM:

To download, install and explore the features of NumPy, SciPy, Jupyter, Statsmodels and
Pandas packages and to read data from text file, excel and the web.

PACKAGES :

Pandas: A powerful Python library used for data manipulation and analysis, providing data
structures like DataFrames and Series, which are excellent for handling structured data with
rows and columns. NumPy: A foundational Python library for numerical computing that
provides support for large, multidimensional arrays and matrices, along with a collection of
mathematical functions to perform operations on these arrays.

SciPy: An open-source Python library built on NumPy, used for scientific and technical
computing. It includes modules for optimization, integration, interpolation, eigenvalue
problems, and more.

Statsmodels: A Python module that provides classes and functions for statistical modeling,
including linear regression, time series analysis, and hypothesis testing, making it useful for
statistical data analysis.

Altair: A declarative statistical visualization library for Python, built on Vega and Vega-Lite,
that allows users to create interactive, concise, and high-level charts with minimal code,
making it ideal for exploratory data analysis.

Matplotlib: A widely-used Python library for creating static, interactive, and animated
visualizations in 2D and 3D. It offers extensive customization options for plots, making it
suitable for creating a wide range of charts like line plots, scatter plots, histograms, and more.

CODE:

import pandas as pd
import numpy as np

1
height = [1.87, 1.87, 1.82, 1.91, 1.90, 1.85]
weight = [81.65, 97.52, 95.25, 92.98, 86.18, 88.45]

np_height = np.array(height)
np_weight = np.array(weight)
print(height)
print(np_height)
print(type(height))
print(type(np_height))

orgarr=np_height.tolist()
print(orgarr)
print(type(orgarr))

OUTPUT:

[1.87, 1.87, 1.82, 1.91, 1.9, 1.85]


[1.87 1.87 1.82 1.91 1.9 1.85]
<class 'list'>
<class 'numpy.ndarray'>
[1.87, 1.87, 1.82, 1.91, 1.9, 1.85]
<class 'list'>

INFERENCE:
The difference is like we can do direct math operations in numpy array

CODE:

arr=np.array([1,2,3,4,5])
print(arr)
x=arr.copy()
arr[0]=42
print(x)
print(arr)

OUTPUT:

[1 2 3 4 5]
[1 2 3 4 5]
[42 2 3 4 5]

2
CODE:

sample = np.array([32,54,23,78,65,12,32,39,56])
filter_arr = sample > 50
new_arr = sample[filter_arr]
print(new_arr)

OUTPUT:

[54 78 65 56]

CODE:

a = np.array([[1,2],[3,4]])
for x in np.nditer(arr):
print(x)

OUTPUT:

42
2
3
4
5

CODE:

a = np.array([[4,5,2],[3,8,8],[12,32,6]])
b = np.array([[3,4,5],[6,5,2],[1,4,3]])
print(a+b)
print()
print(a-b)

OUTPUT:

[[ 7 9 7]
[ 9 13 10]
[13 36 9]]

[[ 1 1 -3]

3
[-3 3 6]
[11 28 3]]

CODE:

arr = np.array([1, 2, 3, 4, 5])


x = arr.view()
arr[0] = 42
print(arr)
print(x)

OUTPUT:

[42 2 3 4 5]
[42 2 3 4 5]

CODE:

arr = np.array([1, 2, 3, 4, 5])


x = arr.copy()
arr[1] = 40
print(arr)
print(x)

OUTPUT:
[ 1 40 3 4 5] [1 2 3 4 5]

INFERENCE:
COPY MAKES A DEEPCOPY WHERE VIEW IS LINKED WITH THE ORIGINAL
PLACE

PANDAS

CODE:

import pandas as pd
a = [1, 7, 2]
myvar = pd.Series(a)
print(myvar)

4
OUTPUT:

0 1
1 7
2 2

CODE:

df=pd.read_csv("W2023.csv")
df.head()

OUTPUT:

CODE:

sns.histplot(df['Birth Rate'], bins=10, kde=True)


plt.title('Distribution of Birth Rate')
plt.xlabel('Birth Rate')
plt.ylabel('Frequency')

OUTPUT:

5
CODE:

plt.figure(figsize=(10, 6))
sns.scatterplot(x='GDP', y='Co2-Emissions', data=df, hue='Country', s=100)
plt.title('GDP vs CO2 Emissions by Country')
plt.xlabel('GDP')
plt.ylabel('CO2 Emissions')
plt.legend(bbox_to_anchor=(1.05, 1), loc='upper left')

OUTPUT:

CODE:

plt.figure(figsize=(10, 6))
sns.scatterplot(x='Birth Rate', y='Fertility Rate', data=df, hue='Country', s=100)
plt.title('Birth Rate vs Fertility Rate by Country')
plt.xlabel('Birth Rate')
plt.ylabel('Fertility Rate')
plt.legend(bbox_to_anchor=(1.05, 1), loc='upper left')
plt.show()

6
OUTPUT:

CODE:

top5_co2 = df.nlargest(5, 'Co2-Emissions')


plt.figure(figsize=(8, 8))
plt.pie(top5_co2['Co2-Emissions'], labels=top5_co2['Country'], autopct='%1.1f%%',
startangle=90, colors=sns.color_palette('Set2'))
plt.title('Top 5 Countries by CO2 Emissions')
plt.show()

OUTPUT:

7
CODE:

import scipy.stats as stats


a = np.arange(10)
stats.describe(a)

OUTPUT:

DescribeResult(nobs=10, minmax=(0, 9), mean=4.5, variance=9.166666666666668,


skewness=0.0, kurtosis=-1.2242424242424244)

CODE:

import statsmodels.api as sm

# Create a sample dataset


np.random.seed(0)
X = np.random.rand(100) * 10
Y = 3 * X + np.random.randn(100) * 2

df = pd.DataFrame({'X': X, 'Y': Y})


print("Sample DataFrame:")
print(df.head())

# Linear regression can be performed using statsmodels package


X = sm.add_constant(df['X'])

model = sm.OLS(df['Y'], X)
results = model.fit()

print("\nRegression Results:")
print(results.summary())

8
OUTPUT:

RESULT:

Packages are explored and basic operations are performed on a sample dataset
successfully.

You might also like