0% found this document useful (0 votes)

16 views31 pages

Final DAA

Uploaded by

awsm40996

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

16 views31 pages

Final DAA

Uploaded by

awsm40996

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 31

DEPARTMENT OF COMPUTER SCI.

& ENGG
( Bhilai Institute of Technology, Durg )

CERTIFICATE OF COMPLETION

This is to certify that Mr/Ms…………………………………………………is bonafide

students of …………… Sem during Academic Session ……………………. In the

Department of Computer Science & Engineering, Bhilai Institute of

Technology, Durg, Chhattisgarh and has successfully completed all the

Experiments of laboratory ………………………………………………………. Within

specified time of academic session.

Approved By:

Prof. In-charge Head of the Deptt.

( ) ( )
INDEX

S.No Experiments Page Date Of Remark

No. Completion

1 File I/O: Write a program for opening, closing,

reading, writing, seeking and exception
handling of a file.

2 NumPy: Write a program demonstrating array

creation and basic operations such as
indexing, slicing, shape manipulation, stacking
and splitting of arrays.

3 Pandas: Write a program to import data (CSV,

excel, text etc) using pandas data frames and
data preparation, filtering, and sorting.

4 Matplotlib: Write a program to understand the

use of Matplotlib for Simple Interactive Chart
(Line Chart, Histogram, Bar Chart, Pie
Charts), subplot with functional method,
Working with Multiple Figures and Axes,
Adding Text, Adding a Grid, Adding a Legend,
Saving the Charts.

5 Seaborn: Write a program to understand the

use of seaborn for visualising statistical
relationships, importing and preparing data,
plotting with categorical data and visualising
linear relationships.

6 Perform different data pre-processing

methods.

7 Perform data cleaning, handling missing

values, imputation techniques
(cleaning/filling/dropping/replacing).

8 Perform Exploratory Analysis for any dataset.

9 Perform the basic statistical analysis by

counting (mean, median, mode, SD etc),
probability, and probability distribution and
sampling distributions.

10 Performing statistical analysis by estimation

and hypothesis testing.
#LAB -1
[ ]: # The name of the file we want to work with
filename = "example.txt"

try:

# Writing to the file

file = open(filename, 'w+')
file.write("Adding some new text.\n")
print("New text added to the file.")

# Explicitly opening the file

file = open(filename, 'r+')
print("File opened successfully.")

# Reading from the file

content = file.read()
print("Current file content:", content)

# Seeking a specific position in the file

file.seek(0)
print("Moved file pointer to the beginning.")

# Reading the updated content

updated_content = file.read()
print("Updated file content:", updated_content)

except FileNotFoundError:
print(f"The file {filename} was not found.")
except IOError:
print(f"Error occurred while accessing the file {filename}.")
except Exception as e:
print(f"An unexpected error occurred: {e}")
finally:
# Closing the file explicitly
try:

1
file.close()
print("File closed successfully.")
except NameError:
# File was never opened, no need to close
pass
except Exception as e:
print(f"An error occurred while closing the file: {e}")

New text added to the file.

File opened successfully.
Current file content: Adding some new text.

Moved file pointer to the beginning.

Updated file content: Adding some new text.

File closed successfully.

#LAB -2
[ ]: import numpy as np

[ ]: arr = np.array([1, 2, 3, 4, 5])

[ ]: print(arr)

[1 2 3 4 5]

[ ]: arr1d = np.array([1, 2, 3, 4, 5])

arr2d = np.array([[1, 2, 3], [4, 5, 6]])

[ ]: arr_range = np.arange(1, 11, 1)

arr_range

[ ]: array([ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10])

[ ]: arr_random = np.random.rand(3, 3)
arr_random

[ ]: array([[0.50535839, 0.79636261, 0.41437649],

[0.39710658, 0.09276994, 0.83068921],
[0.69531753, 0.76831111, 0.42289229]])

[ ]: arr1d + 10

[ ]: array([11, 12, 13, 14, 15])

[ ]: arr1d * 2

2
[ ]: array([ 2, 4, 6, 8, 10])

[ ]: np.sqrt(arr1d)

[ ]: array([1. , 1.41421356, 1.73205081, 2. , 2.23606798])

[ ]: np.exp(arr1d)

[ ]: array([ 2.71828183, 7.3890561 , 20.08553692, 54.59815003,

148.4131591 ])

[ ]: arr2d.ndim, arr2d.size

[ ]: (2, 6)

[ ]: a = np.random.rand(3, 3)

[ ]: a = np.array([[5, 3], [5, 2]])

b = np.array([[3, 3], [2, 7]])

[ ]: a,b

[ ]: (array([[5, 3],
[5, 2]]),
array([[3, 3],
[2, 7]]))

[ ]: np.sort(a, axis = -1)

[ ]: array([[3, 5],
[2, 5]])

[ ]: a[1][1] #Indexing

[ ]: 2

[ ]: stacked_arr = np.vstack((a, b)) # Stack vertically

print("Stacked array vertically:\n", stacked_arr)

Stacked array vertically:

[[5 3]
[5 2]
[3 3]
[2 7]]

[ ]: # Splitting
split_arr = np.split(a, [1]) # Split at indices 1

3
print("Split array:",split_arr)

Split array: [array([[5, 3]]), array([[5, 2]])]

[ ]: x = np.stack((a, b))
x

[ ]: array([[[5, 3],
[5, 2]],

[[3, 3],
[2, 7]]])

[ ]: print("addition\n", a + b)
print("subbtraction\n", a + b)
print("division\n", a + b)
print("multiply 1\n", a * b)
print("multiply 2\n", a @ b)

addition
[[8 6]
[7 9]]
subbtraction
[[8 6]
[7 9]]
division
[[8 6]
[7 9]]
multiply 1
[[15 9]
[10 14]]
multiply 2
[[21 36]
[19 29]]

[ ]: c = np.random.rand(4, 4)

[ ]: c

[ ]: array([[0.38479275, 0.84841358, 0.24102487, 0.80679517],

[0.51642219, 0.71366856, 0.26205537, 0.28291745],
[0.84690828, 0.33984087, 0.18054944, 0.53132507],
[0.01385921, 0.87446289, 0.11849415, 0.26600576]])

[ ]: c[0:2, 0:2] #Slicing

4
[ ]: array([[0.38479275, 0.84841358],
[0.51642219, 0.71366856]])

[ ]: c[0:-2, 0:-2]

[ ]: array([[0.38479275, 0.84841358],
[0.51642219, 0.71366856]])

[ ]: print("mean\n", np.mean(a))
print("sum\n", np.sum(a))
print("min\n", np.min(a))
print("max 1\n", np.max(a))
print("cumsum 2\n", np.cumsum(a))

mean
3.75
sum
15
min
2
max 1
5
cumsum 2
[ 5 8 13 15]

[ ]: a

[ ]: array([[5, 3],
[5, 2]])

[ ]: a

[ ]: array([[5, 3],
[5, 2]])

[ ]: a.T

[ ]: array([[5, 5],
[3, 2]])

[ ]: a.reshape(4, 1) #Shape manipulation

[ ]: array([[5],
[3],
[5],
[2]])

5
[ ]: a.max(axis = 0)

[ ]: array([5, 3])

[ ]: x.ndim, x.size

[ ]: (3, 8)

#LAB -3#
[ ]: import pandas as pd
data = pd.read_csv("/content/gapminder-FiveYearData.csv")

[ ]: #Sorting
data.sort_values(by=["gdpPercap"]).head(5)

[ ]: country year pop continent lifeExp gdpPercap

334 Congo Dem. Rep. 2002 55379852.0 Africa 44.966 241.165876
335 Congo Dem. Rep. 2007 64606759.0 Africa 46.462 277.551859
876 Lesotho 1952 748747.0 Africa 42.138 298.846212
624 Guinea-Bissau 1952 580653.0 Africa 32.500 299.850319
333 Congo Dem. Rep. 1997 47798986.0 Africa 42.587 312.188423

[ ]: #filtering
data_2007 = data[data["year"] == 2007]
data_2007.head(5)

[ ]: country year pop continent lifeExp gdpPercap

11 Afghanistan 2007 31889923.0 Asia 43.828 974.580338
23 Albania 2007 3600523.0 Europe 76.423 5937.029526
35 Algeria 2007 33333216.0 Africa 72.301 6223.367465
47 Angola 2007 12420476.0 Africa 42.731 4797.231267
59 Argentina 2007 40301927.0 Americas 75.320 12779.379640

[ ]: max_gdp = max(data["gdpPercap"])
country = data[data["gdpPercap"] == max_gdp]
country # country with max gdp Per Capita

[ ]: country year pop continent lifeExp gdpPercap

853 Kuwait 1957 212846.0 Asia 58.033 113523.1329

[ ]: year_wise_lifeExp_dict = {}
years = data["year"]
for year in years:
x = data[data["year"] == year].lifeExp.mean()
year_wise_lifeExp_dict[year] = x

6
year_wise_lifeExp = pd.Series(year_wise_lifeExp_dict)

[ ]: year_wise_lifeExp

[ ]: 1952 49.057620
1957 51.507401
1962 53.609249
1967 55.678290
1972 57.647386
1977 59.570157
1982 61.533197
1987 63.212613
1992 64.160338
1997 65.014676
2002 65.694923
2007 67.007423
dtype: float64

[ ]: year_wise_lifeExp_sr = data.groupby("year")["lifeExp"].mean()
year_wise_lifeExp_sr

[ ]: year
1952 49.057620
1957 51.507401
1962 53.609249
1967 55.678290
1972 57.647386
1977 59.570157
1982 61.533197
1987 63.212613
1992 64.160338
1997 65.014676
2002 65.694923
2007 67.007423
Name: lifeExp, dtype: float64

#LAB-4#
[ ]: import matplotlib.pyplot as plt
import numpy as np
x = [10,20,25,15]
y = [5,13,6,7]

data = np.random.randn(1000)

sizes = [15, 30, 45, 10]

labels = ['A', 'B', 'C', 'D']

7
# Working with Multiple Figures and Axes
# Subplots
fig, axs = plt.subplots(2, 2, figsize=(10, 8))

axs[0, 0].plot(x, y, 'r', label="RED Line") ␣

↪# Line chart

axs[0, 0].set_title('Line Chart')

axs[0,0].grid(True) ␣
↪# Adding a grid

axs[0,0].legend() ␣
↪# Adding Legend

axs[0, 1].hist(data, bins=30, color='skyblue', edgecolor='black') ␣

↪# Histogram

axs[0, 1].set_title('Histogram')

axs[1, 0].bar(x, y, color='green') ␣

↪# Bar Chart

axs[1, 0].set_title('Bar Chart')

␣
↪# Pie Chart
axs[1, 1].pie(sizes, labels=labels, autopct='%1.1f%%', colors=['gold',␣
↪'yellowgreen', 'lightcoral', 'lightskyblue'])

axs[1, 1].set_title('Pie Chart')

plt.tight_layout()
# Saving the chart
plt.savefig('figure chart.png')
plt.show()

8
#LAB-5#
[ ]: import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

# Load the example tips dataset

tips = sns.load_dataset("tips")
fig, axes = plt.subplots(1, 3, figsize=(17, 4))

sns.scatterplot(data=tips, x="total_bill", y="tip",ax = axes[0])

plt.title('Scatterplot of Total Bill vs. Tip')

tips['tip_percentage'] = tips['tip'] / tips['total_bill'] * 100

sns.barplot(data=tips, x="day", y="tip_percentage",ax = axes[1])

plt.title('Boxplot of Tip Percentage by Day')

9
sns.regplot(data=tips, x="total_bill", y="tip_percentage",ax = axes[2])
plt.title('Regression Plot of Total Bill vs. Tip Percentage')

g = sns.FacetGrid(tips, col="day", height=4, aspect=.5)

g.map(sns.regplot, "total_bill", "tip_percentage")
plt.show()

#LAB-6#
[ ]: import numpy as np
import pandas as pd

data = pd.read_excel("/content/Case study_Dataset.xlsx")

[ ]: data.head()

10
[ ]: CREATED_DATE CREATED_DATE minus Hour \
0 2016-01-09 00:18:14 2016-01-09
1 2016-01-09 02:28:34 2016-01-09
2 2016-01-09 04:00:34 2016-01-09
3 2016-01-09 10:26:27 2016-01-09
4 2016-01-09 11:37:59 2016-01-09

USER_ID TRANSACTION_ID \
0 45e3c222-38ac-4fdb-b092-ff1639e4438c 27d7fd11-d885-4d2c-9ed1-daa89b7bda1d
1 57c11728-b979-4856-bada-1d268726cfe9 2e1ee26c-0d24-4931-a7f9-0caa0d07eb2e
2 1319cca9-02a7-4a15-8abb-48d4e08e5aa3 bfd20e6f-ddb3-4237-bcd2-f7f8d967e36e
3 3f6bb28c-f945-4027-9178-747956c3ea58 85037186-039a-4ae5-9fea-e87f30822218
4 f54baeeb-7282-4d23-9bb7-e8396ce1b159 8e1e938a-1916-4d5e-b261-82c61a6979d6

TYPE CURRENCY AMOUNT

0 TOPUP EUR 177.38
1 BANK_TRANSFER EUR 310.27
2 CARD_PAYMENT EUR 96.44
3 BANK_TRANSFER EUR 288.51
4 CARD_PAYMENT GBP 88.45

[ ]: data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10000 entries, 0 to 9999
Data columns (total 7 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 CREATED_DATE 10000 non-null datetime64[ns]
1 CREATED_DATE minus Hour 10000 non-null datetime64[ns]
2 USER_ID 10000 non-null object
3 TRANSACTION_ID 10000 non-null object
4 TYPE 10000 non-null object
5 CURRENCY 10000 non-null object
6 AMOUNT 10000 non-null float64
dtypes: datetime64[ns](2), float64(1), object(4)
memory usage: 547.0+ KB

[ ]: data.describe()

[ ]: CREATED_DATE CREATED_DATE minus Hour AMOUNT

count 10000 10000 10000.000000
mean 2016-08-23 00:01:29.126000128 2016-08-22 10:24:14.400000 175.768253
min 2016-01-09 00:18:14 2016-01-09 00:00:00 0.020000
25% 2016-06-19 18:20:33 2016-06-19 00:00:00 88.675000
50% 2016-09-03 16:29:08.500000 2016-09-03 00:00:00 177.455000
75% 2016-11-09 18:34:07.500000 2016-11-09 00:00:00 263.540000

11
max 2017-01-08 23:50:18 2017-01-08 00:00:00 349.980000
std NaN NaN 101.406464

[ ]: data["year"] = pd.DatetimeIndex(data.CREATED_DATE).year
data["month"] = pd.DatetimeIndex(data.CREATED_DATE).month
data["weekdays"] = pd.DatetimeIndex(data.CREATED_DATE).weekday

[ ]: EUR = []

for i in range(len(data)):
if data.iloc[i]["CURRENCY"] == "EUR":
EUR.append(data.iloc[i]["AMOUNT"])

else:
EUR.append(data.iloc[i]["AMOUNT"] * 1.17)

data["AMT_EUR"] = EUR

[ ]: data.head()

[ ]: CREATED_DATE CREATED_DATE minus Hour \

0 2016-01-09 00:18:14 2016-01-09
1 2016-01-09 02:28:34 2016-01-09
2 2016-01-09 04:00:34 2016-01-09
3 2016-01-09 10:26:27 2016-01-09
4 2016-01-09 11:37:59 2016-01-09

TYPE CURRENCY AMOUNT year month weekdays AMT_EUR

0 TOPUP EUR 177.38 2016 1 5 177.3800
1 BANK_TRANSFER EUR 310.27 2016 1 5 310.2700
2 CARD_PAYMENT EUR 96.44 2016 1 5 96.4400
3 BANK_TRANSFER EUR 288.51 2016 1 5 288.5100
4 CARD_PAYMENT GBP 88.45 2016 1 5 103.4865

[ ]: data[["TYPE"]].value_counts()

[ ]: TYPE
TOPUP 2373
BANK_TRANSFER 2371
ATM 2357

12
CARD_PAYMENT 2325
P2P_TRANSFER 574
Name: count, dtype: int64

#LAB-7#
[ ]: import numpy as np
import pandas as pd

data = pd.read_csv("/content/Titanic.csv")

[ ]: data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1309 entries, 0 to 1308
Data columns (total 14 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 pclass 1309 non-null int64
1 survived 1309 non-null int64
2 name 1309 non-null object
3 sex 1309 non-null object
4 age 1046 non-null float64
5 sibsp 1309 non-null int64
6 parch 1309 non-null int64
7 ticket 1309 non-null object
8 fare 1308 non-null float64
9 cabin 295 non-null object
10 embarked 1307 non-null object
11 boat 486 non-null object
12 body 121 non-null float64
13 home.dest 745 non-null object
dtypes: float64(3), int64(4), object(7)
memory usage: 143.3+ KB

[ ]: data.head()

[ ]: pclass survived sex age sibsp parch fare embarked body

0 1 1 0 29.00 0 0 211.3375 2 NaN
1 1 1 1 0.92 1 2 151.5500 2 NaN
2 1 0 0 2.00 1 2 151.5500 2 NaN
3 1 0 1 30.00 1 2 151.5500 2 135.0
4 1 0 0 25.00 1 2 151.5500 2 NaN

[ ]: data.describe()

13
[ ]: pclass survived sex age sibsp \
count 1309.000000 1309.000000 1309.000000 1309.000000 1309.000000
mean 2.294882 0.381971 0.644003 29.881138 0.498854
std 0.837836 0.486055 0.478997 12.883193 1.041658
min 1.000000 0.000000 0.000000 0.170000 0.000000
25% 2.000000 0.000000 0.000000 22.000000 0.000000
50% 3.000000 0.000000 1.000000 29.881138 0.000000
75% 3.000000 1.000000 1.000000 35.000000 1.000000
max 3.000000 1.000000 1.000000 80.000000 8.000000

parch fare embarked body

count 1309.000000 1309.000000 1309.000000 121.000000
mean 0.385027 33.295479 1.605806 160.809917
std 0.865560 51.738879 0.653499 97.696922
min 0.000000 0.000000 0.000000 1.000000
25% 0.000000 7.895800 1.000000 72.000000
50% 0.000000 14.454200 2.000000 155.000000
75% 0.000000 31.275000 2.000000 256.000000
max 9.000000 512.329200 2.000000 328.000000

[ ]: data = data.drop(['cabin','name','ticket','home.dest','boat'], axis=1)␣

↪#Dropping this column as it has high no of null values

[ ]: data.head()

[ ]: pclass survived sex age sibsp parch fare embarked body

0 1 1 0 29.00 0 0 211.3375 2 NaN
1 1 1 1 0.92 1 2 151.5500 2 NaN
2 1 0 0 2.00 1 2 151.5500 2 NaN
3 1 0 1 30.00 1 2 151.5500 2 135.0
4 1 0 0 25.00 1 2 151.5500 2 NaN

[ ]: data.fillna({'age': data['age'].mean()}, inplace=True) #Filling null values␣

↪with mean of age

data.fillna({'fare': data['fare'].mean()}, inplace=True) #Filling null values␣

↪with mean of fare

data.fillna({'embarked': 'S'}, inplace=True)

[ ]: data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1309 entries, 0 to 1308
Data columns (total 9 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 pclass 1309 non-null int64
1 survived 1309 non-null int64

14
2 sex 1309 non-null int64
3 age 1309 non-null float64
4 sibsp 1309 non-null int64
5 parch 1309 non-null int64
6 fare 1309 non-null float64
7 embarked 1309 non-null int64
8 body 121 non-null float64
dtypes: float64(3), int64(6)
memory usage: 92.2 KB

[ ]: data.replace({'sex':{'male':1,'female':0}},inplace=True)

[ ]: data.replace({'embarked':{'S':2 ,'C':1,'Q':0}},inplace=True)

[ ]: data.corr()

[ ]: pclass survived sex age sibsp parch \

pclass 1.000000 -0.312469 0.124617 -0.366371 0.060832 0.018322
survived -0.312469 1.000000 -0.528693 -0.050198 -0.027825 0.082660
sex 0.124617 -0.528693 1.000000 0.057397 -0.109609 -0.213125
age -0.366371 -0.050198 0.057397 1.000000 -0.190747 -0.130872
sibsp 0.060832 -0.027825 -0.109609 -0.190747 1.000000 0.373587
parch 0.018322 0.082660 -0.213125 -0.130872 0.373587 1.000000
fare -0.558477 0.244208 -0.185484 0.171521 0.160224 0.221522
embarked -0.038875 -0.098450 0.120423 -0.035824 0.073461 0.095523
body -0.034642 NaN -0.015903 0.059059 -0.099961 0.051099

fare embarked body

pclass -0.558477 -0.038875 -0.034642
survived 0.244208 -0.098450 NaN
sex -0.185484 0.120423 -0.015903
age 0.171521 -0.035824 0.059059
sibsp 0.160224 0.073461 -0.099961
parch 0.221522 0.095523 0.051099
fare 1.000000 -0.061118 -0.042665
embarked -0.061118 1.000000 -0.033860
body -0.042665 -0.033860 1.000000

[ ]:

#LAB-8#

15
[ ]: import numpy as np
import pandas as pd

data = pd.read_excel("/content/Case study_Dataset.xlsx")

[ ]: data.head()

[ ]: CREATED_DATE CREATED_DATE minus Hour \

0 2016-01-09 00:18:14 2016-01-09
1 2016-01-09 02:28:34 2016-01-09
2 2016-01-09 04:00:34 2016-01-09
3 2016-01-09 10:26:27 2016-01-09
4 2016-01-09 11:37:59 2016-01-09

TYPE CURRENCY AMOUNT

0 TOPUP EUR 177.38
1 BANK_TRANSFER EUR 310.27
2 CARD_PAYMENT EUR 96.44
3 BANK_TRANSFER EUR 288.51
4 CARD_PAYMENT GBP 88.45

[ ]: data.info()

16
[ ]: import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import sklearn as sk
import seaborn as sns
gdp_missing_values_data = pd.read_csv('./Datasets/GDP_missing_data.csv')
gdp_complete_data = pd.read_csv('./Datasets/GDP_complete_data.csv')

---------------------------------------------------------------------------
FileNotFoundError Traceback (most recent call last)
<ipython-input-1-a1d39de8ca53> in <cell line: 6>()
4 import sklearn as sk
5 import seaborn as sns
----> 6 gdp_missing_values_data = pd.read_csv('./Datasets/GDP_missing_data.csv')
7 gdp_complete_data = pd.read_csv('./Datasets/GDP_complete_data.csv')

/usr/local/lib/python3.10/dist-packages/pandas/io/parsers/readers.py in␣
↪read_csv(filepath_or_buffer, sep, delimiter, header, names, index_col,␣
↪usecols, dtype, engine, converters, true_values, false_values,␣
↪skipinitialspace, skiprows, skipfooter, nrows, na_values, keep_default_na,␣
↪na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format,␣
↪keep_date_col, date_parser, date_format, dayfirst, cache_dates, iterator,␣
↪chunksize, compression, thousands, decimal, lineterminator, quotechar,␣
↪quoting, doublequote, escapechar, comment, encoding, encoding_errors, dialect,␣
↪on_bad_lines, delim_whitespace, low_memory, memory_map, float_precision,␣

↪storage_options, dtype_backend)

910 kwds.update(kwds_defaults)
911
--> 912 return _read(filepath_or_buffer, kwds)
913
914

/usr/local/lib/python3.10/dist-packages/pandas/io/parsers/readers.py in␣
↪_read(filepath_or_buffer, kwds)

575
576 # Create the parser.
--> 577 parser = TextFileReader(filepath_or_buffer, **kwds)
578
579 if chunksize or iterator:

/usr/local/lib/python3.10/dist-packages/pandas/io/parsers/readers.py in␣
↪__init__(self, f, engine, **kwds)

1405
1406 self.handles: IOHandles | None = None
-> 1407 self._engine = self._make_engine(f, self.engine)
1408
1409 def close(self) -> None:

/usr/local/lib/python3.10/dist-packages/pandas/io/parsers/readers.py in␣
↪_make_engine(self, f, engine)

17
1659 if "b" not in mode:
1660 mode += "b"
-> 1661 self.handles = get_handle(
1662 f,
1663 mode,

/usr/local/lib/python3.10/dist-packages/pandas/io/common.py in␣
↪get_handle(path_or_buf, mode, encoding, compression, memory_map, is_text,␣

↪errors, storage_options)

857 if ioargs.encoding and "b" not in ioargs.mode:

858 # Encoding
--> 859 handle = open(
860 handle,
861 ioargs.mode,

FileNotFoundError: [Errno 2] No such file or directory: './Datasets/

↪GDP_missing_data.csv'

[ ]: data.describe()

[ ]: CREATED_DATE CREATED_DATE minus Hour AMOUNT

[ ]: data["year"] = pd.DatetimeIndex(data.CREATED_DATE).year
data["month"] = pd.DatetimeIndex(data.CREATED_DATE).month
data["weekdays"] = pd.DatetimeIndex(data.CREATED_DATE).weekday

[ ]: EUR = []

for i in range(len(data)):
if data.iloc[i]["CURRENCY"] == "EUR":
EUR.append(data.iloc[i]["AMOUNT"])

else:
EUR.append(data.iloc[i]["AMOUNT"] * 1.17)

data["AMT_EUR"] = EUR

[ ]: data.head()

18
[ ]: CREATED_DATE CREATED_DATE minus Hour \
0 2016-01-09 00:18:14 2016-01-09
1 2016-01-09 02:28:34 2016-01-09
2 2016-01-09 04:00:34 2016-01-09
3 2016-01-09 10:26:27 2016-01-09
4 2016-01-09 11:37:59 2016-01-09

TYPE CURRENCY AMOUNT year month weekdays AMT_EUR

[ ]: data[["TYPE"]].value_counts()

[ ]: TYPE
TOPUP 2373
BANK_TRANSFER 2371
ATM 2357
CARD_PAYMENT 2325
P2P_TRANSFER 574
Name: count, dtype: int64

[ ]: data["year"].unique()

[ ]: array([2016, 2017], dtype=int32)

[ ]: data.groupby(["CURRENCY"])["AMOUNT"].sum()

[ ]: CURRENCY
EUR 852363.35
GBP 905319.18
Name: AMOUNT, dtype: float64

[ ]: data.groupby(["year", "month", "CURRENCY"])["AMOUNT"].sum()

[ ]: year month CURRENCY

2016 1 EUR 19615.42
GBP 20155.34

19
2 EUR 22249.70
GBP 26937.35
3 EUR 44099.57
GBP 45814.22
4 EUR 43964.14
GBP 45241.07
5 EUR 49489.32
GBP 51630.61
6 EUR 53965.12
GBP 58219.62
7 EUR 81995.70
GBP 82271.76
8 EUR 100820.63
GBP 114643.94
9 EUR 90419.37
GBP 95699.41
10 EUR 101629.15
GBP 115582.59
11 EUR 105934.72
GBP 105177.93
12 EUR 110733.82
GBP 112710.05
2017 1 EUR 27446.69
GBP 31235.29
Name: AMOUNT, dtype: float64

[ ]: data.groupby(["weekdays", "CURRENCY"])["AMOUNT"].sum()

[ ]: weekdays CURRENCY
0 EUR 107370.90
GBP 129305.04
1 EUR 125032.02
GBP 118797.33
2 EUR 121888.83
GBP 129554.67
3 EUR 119865.46
GBP 131812.35
4 EUR 138228.10
GBP 150998.18
5 EUR 132238.72
GBP 135012.44
6 EUR 107739.32
GBP 109839.17
Name: AMOUNT, dtype: float64

[ ]: data.groupby(["TYPE", "CURRENCY"])["AMOUNT"].sum()

20
[ ]: TYPE CURRENCY
ATM EUR 213140.45
GBP 198558.25
BANK_TRANSFER EUR 205127.11
GBP 213737.72
CARD_PAYMENT EUR 210115.77
GBP 204736.58
P2P_TRANSFER EUR 19905.82
GBP 82075.52
TOPUP EUR 204074.20
GBP 206211.11
Name: AMOUNT, dtype: float64

[ ]: data.groupby(["weekdays"])["AMT_EUR"].sum().plot()

[ ]: <Axes: xlabel='weekdays'>

[ ]: data.groupby(["USER_ID"])["TRANSACTION_ID"].count().sort_values(ascending =␣
↪False)

21
[ ]: USER_ID
06bb2d68-bf61-4030-8447-9de64d3ce490 132
d35f19f3-d9ad-48bf-bd1e-90f3ba4f0b98 103
d1bc3cd6-154e-479f-8957-a69cdf414462 95
0fe472c9-cf3e-4e43-90f3-a0cfb6a4f1f0 85
65ac0928-e17d-4636-96f4-ebe6bdb9c98d 84
…
dcf8d6c6-9fb6-4b0b-a190-013d220b33d7 1
2d6259b3-5a22-4b4b-b616-c22d9d7677c2 1
2d518cf9-d853-443d-a3d8-bda56f373901 1
5a99fa7a-72e5-4dbe-ae51-f0fd3bc8a717 1
2588d6c8-1a2e-4a54-a191-3b3111f9658e 1
Name: TRANSACTION_ID, Length: 1134, dtype: int64

#LAB-9#
[ ]: import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

data = pd.read_excel("/content/Case study_Dataset.xlsx")

[ ]: data.head()

[ ]: CREATED_DATE CREATED_DATE minus Hour \

0 2016-01-09 00:18:14 2016-01-09
1 2016-01-09 02:28:34 2016-01-09
2 2016-01-09 04:00:34 2016-01-09
3 2016-01-09 10:26:27 2016-01-09
4 2016-01-09 11:37:59 2016-01-09

TYPE CURRENCY AMOUNT

0 TOPUP EUR 177.38
1 BANK_TRANSFER EUR 310.27
2 CARD_PAYMENT EUR 96.44
3 BANK_TRANSFER EUR 288.51
4 CARD_PAYMENT GBP 88.45

[ ]: data.info()

22
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10000 entries, 0 to 9999
Data columns (total 7 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 CREATED_DATE 10000 non-null datetime64[ns]
1 CREATED_DATE minus Hour 10000 non-null datetime64[ns]
2 USER_ID 10000 non-null object
3 TRANSACTION_ID 10000 non-null object
4 TYPE 10000 non-null object
5 CURRENCY 10000 non-null object
6 AMOUNT 10000 non-null float64
dtypes: datetime64[ns](2), float64(1), object(4)
memory usage: 547.0+ KB

[ ]: data.describe()

[ ]: CREATED_DATE CREATED_DATE minus Hour AMOUNT

[ ]: data[["TYPE"]].value_counts()

[ ]: TYPE
TOPUP 2373
BANK_TRANSFER 2371
ATM 2357
CARD_PAYMENT 2325
P2P_TRANSFER 574
Name: count, dtype: int64

[ ]: data[["AMOUNT"]].mean()

[ ]: AMOUNT 175.768253
dtype: float64

[ ]: data[["AMOUNT"]].median()

[ ]: AMOUNT 177.455
dtype: float64

23
[ ]: data[["AMOUNT"]].mode()

[ ]: AMOUNT
0 124.01

[ ]: data[["AMOUNT"]].std()

[ ]: AMOUNT 101.406464
dtype: float64

[ ]: data[["AMOUNT"]].gt(200).mean()

[ ]: AMOUNT 0.4322
dtype: float64

[ ]: data["AMOUNT"].unique()

[ ]: array([177.38, 310.27, 96.44, …, 285.68, 17.32, 228.9 ])

[ ]: data['AMOUNT'].value_counts()

[ ]: AMOUNT
124.01 6
140.59 4
284.25 4
53.96 3
13.63 3
..
292.14 1
52.09 1
110.65 1
307.05 1
228.90 1
Name: count, Length: 8746, dtype: int64

[ ]: # Create a probability distribution

plt.hist(data['AMOUNT'], bins=10)
plt.xlabel('AMOUNT')
plt.ylabel('Probability')
plt.title('Probability distribution of AMOUNT')
plt.show()

24
[ ]: # Create a sampling distribution
sample = data['AMOUNT'].sample(100, replace=True)
plt.hist(sample, bins=10)
plt.xlabel('AMOUNT')
plt.ylabel('Probability')
plt.title('Sampling distribution of AMOUNT')
plt.show()

25
#LAB 10#
[1]: import numpy as np
import pandas as pd

data = pd.read_excel("/content/Case study_Dataset.xlsx")

[2]: data.head()

[2]: CREATED_DATE CREATED_DATE minus Hour \

0 2016-01-09 00:18:14 2016-01-09
1 2016-01-09 02:28:34 2016-01-09
2 2016-01-09 04:00:34 2016-01-09
3 2016-01-09 10:26:27 2016-01-09
4 2016-01-09 11:37:59 2016-01-09

26
3 3f6bb28c-f945-4027-9178-747956c3ea58 85037186-039a-4ae5-9fea-e87f30822218
4 f54baeeb-7282-4d23-9bb7-e8396ce1b159 8e1e938a-1916-4d5e-b261-82c61a6979d6

TYPE CURRENCY AMOUNT

0 TOPUP EUR 177.38
1 BANK_TRANSFER EUR 310.27
2 CARD_PAYMENT EUR 96.44
3 BANK_TRANSFER EUR 288.51
4 CARD_PAYMENT GBP 88.45

[3]: data.info()

[4]: data.describe()

[4]: CREATED_DATE CREATED_DATE minus Hour AMOUNT

[5]: data["year"] = pd.DatetimeIndex(data.CREATED_DATE).year

data["month"] = pd.DatetimeIndex(data.CREATED_DATE).month
data["weekdays"] = pd.DatetimeIndex(data.CREATED_DATE).weekday

[6]: EUR = []

for i in range(len(data)):
if data.iloc[i]["CURRENCY"] == "EUR":

27
EUR.append(data.iloc[i]["AMOUNT"])

else:
EUR.append(data.iloc[i]["AMOUNT"] * 1.17)

data["AMT_EUR"] = EUR

[7]: data.head()

[7]: CREATED_DATE CREATED_DATE minus Hour \

0 2016-01-09 00:18:14 2016-01-09
1 2016-01-09 02:28:34 2016-01-09
2 2016-01-09 04:00:34 2016-01-09
3 2016-01-09 10:26:27 2016-01-09
4 2016-01-09 11:37:59 2016-01-09

TYPE CURRENCY AMOUNT year month weekdays AMT_EUR

[8]: data[["TYPE"]].value_counts()

[8]: TYPE
TOPUP 2373
BANK_TRANSFER 2371
ATM 2357
CARD_PAYMENT 2325
P2P_TRANSFER 574
Name: count, dtype: int64

Hypothesis: The top 3% users drive 20% value tax Same as bottom 60% of user. For both EUR
and GBP
[9]: top_users = data.groupby(["USER_ID"])["AMT_EUR"].sum().sort_values(ascending =␣
↪False)

bottom_users = data.groupby(["USER_ID"])["AMT_EUR"].sum().sort_values()

28
[10]: top_users_count = len(top_users)

[11]: top_amt = top_users[:int(top_users_count * 0.03)]

bot_amt = bottom_users[:int(top_users_count * 0.636)]

print("Top 3% amt:", top_amt.sum())

print("Bottom 60% amt:", bot_amt.sum())

Top 3% amt: 394176.2463

Bottom 60% amt: 394327.79819999996

[12]: top_amt = top_users[:int(top_users_count * 0.13)]

total_50_amt = top_users.sum() * 0.5

print("Top 13% amt:", top_amt.sum())

print("Total 50% amt:", total_50_amt)

Top 13% amt: 960763.8478

Total 50% amt: 955793.3953

Resume Working Student Jollibee
50% (2)
Resume Working Student Jollibee
3 pages
Time and Decision Economic and Psychological Perspectives of Intertemporal Choice George Loewenstein
No ratings yet
Time and Decision Economic and Psychological Perspectives of Intertemporal Choice George Loewenstein
82 pages
Best Practices For Effectively Implementing An ATP Sanitation Verification Program
100% (1)
Best Practices For Effectively Implementing An ATP Sanitation Verification Program
16 pages
Lecture Notes Respiratory Medicine 9th Edition Stephen J. Bourke Instant Download
100% (1)
Lecture Notes Respiratory Medicine 9th Edition Stephen J. Bourke Instant Download
52 pages
E Data Analysis With Python Master Manual
No ratings yet
E Data Analysis With Python Master Manual
61 pages
M L
No ratings yet
M L
13 pages
Python
No ratings yet
Python
22 pages
Data Science Practical
No ratings yet
Data Science Practical
28 pages
Ilovepdf Merged (2) Merged
No ratings yet
Ilovepdf Merged (2) Merged
65 pages
Foundation of Data Science Lab Manual Full
No ratings yet
Foundation of Data Science Lab Manual Full
8 pages
A Gentle Introduction To SuperCollider (2nd Edition)
No ratings yet
A Gentle Introduction To SuperCollider (2nd Edition)
122 pages
Khadeeja - DS - PRACTICAL 4
No ratings yet
Khadeeja - DS - PRACTICAL 4
24 pages
DA 8th Sem
No ratings yet
DA 8th Sem
32 pages
Data Science
No ratings yet
Data Science
42 pages
Evaluation of A Tick Bite For Possible Lyme Disease - UpToDate
No ratings yet
Evaluation of A Tick Bite For Possible Lyme Disease - UpToDate
24 pages
Python Programs FDP
No ratings yet
Python Programs FDP
20 pages
Rufh 2
No ratings yet
Rufh 2
28 pages
Python Lab PRG
No ratings yet
Python Lab PRG
20 pages
Data Science Lab Exp Lis
No ratings yet
Data Science Lab Exp Lis
72 pages
FDS Lab
No ratings yet
FDS Lab
43 pages
Labsheet 2
No ratings yet
Labsheet 2
21 pages
Pds Record Document Ds II
No ratings yet
Pds Record Document Ds II
36 pages
Python Programming U5
No ratings yet
Python Programming U5
46 pages
ML Lab File Vijay Kumar
No ratings yet
ML Lab File Vijay Kumar
16 pages
KJD ML File
No ratings yet
KJD ML File
45 pages
Mechanical Module 06
No ratings yet
Mechanical Module 06
14 pages
Python Unit IV
No ratings yet
Python Unit IV
12 pages
1.1 Mechanical Tender Drawing For Sanwa Project (R2)
No ratings yet
1.1 Mechanical Tender Drawing For Sanwa Project (R2)
9 pages
Unit5 NumPy Pandas Notes
No ratings yet
Unit5 NumPy Pandas Notes
90 pages
Lab Manual Data Science
No ratings yet
Lab Manual Data Science
24 pages
Data Science Lab Manual
No ratings yet
Data Science Lab Manual
45 pages
Python Lab Manual
No ratings yet
Python Lab Manual
12 pages
Cheat Sheet For Sharing
No ratings yet
Cheat Sheet For Sharing
4 pages
ML Lab File Vijay Kumar
No ratings yet
ML Lab File Vijay Kumar
27 pages
Python Exps Questions
No ratings yet
Python Exps Questions
10 pages
697e9176-7141-4407-ac59-183e04ddf458
No ratings yet
697e9176-7141-4407-ac59-183e04ddf458
44 pages
Fundamentals of Data Science Lab Manual New
No ratings yet
Fundamentals of Data Science Lab Manual New
33 pages
Computer Science Programs
No ratings yet
Computer Science Programs
13 pages
AIML Lab Manual
No ratings yet
AIML Lab Manual
39 pages
Grace Python Numpy MB Final
No ratings yet
Grace Python Numpy MB Final
55 pages
Python Formula Sheet
No ratings yet
Python Formula Sheet
3 pages
ML Manual
No ratings yet
ML Manual
21 pages
Fods Lab Manual
No ratings yet
Fods Lab Manual
26 pages
Rufh 4
No ratings yet
Rufh 4
24 pages
3rd EXPERIMENT
No ratings yet
3rd EXPERIMENT
13 pages
Python
No ratings yet
Python
17 pages
Fundamentals of Data Science Students
No ratings yet
Fundamentals of Data Science Students
52 pages
DV Lab Manual Modified
No ratings yet
DV Lab Manual Modified
31 pages
Unit - V
No ratings yet
Unit - V
29 pages
Chapter 3 Python For Data Science
No ratings yet
Chapter 3 Python For Data Science
81 pages
Part B
No ratings yet
Part B
4 pages
FDS Lab 1 Manuel .1..1new
No ratings yet
FDS Lab 1 Manuel .1..1new
34 pages
Updated Land Law 2 - 085110
No ratings yet
Updated Land Law 2 - 085110
34 pages
FDS Record-1-4
No ratings yet
FDS Record-1-4
18 pages
2023 - RPIA Assessment 2
No ratings yet
2023 - RPIA Assessment 2
5 pages
PH3094D Computational Lab - Exercise3
No ratings yet
PH3094D Computational Lab - Exercise3
3 pages
Module 6 NumPY and Pandas
No ratings yet
Module 6 NumPY and Pandas
12 pages
CS3361 - Data Science
No ratings yet
CS3361 - Data Science
56 pages
Bubble Sort
No ratings yet
Bubble Sort
11 pages
Riphah International University: Student Information System
No ratings yet
Riphah International University: Student Information System
3 pages
Data Science Fundamentals Lab
No ratings yet
Data Science Fundamentals Lab
24 pages
Numpy
No ratings yet
Numpy
30 pages
3rd Semester DDM AI DAA DEV Print Pages For Spiral Record 25-1-24 - Removed
No ratings yet
3rd Semester DDM AI DAA DEV Print Pages For Spiral Record 25-1-24 - Removed
28 pages
EXP1-siddhant Gupta (23 - SE - 148)
No ratings yet
EXP1-siddhant Gupta (23 - SE - 148)
17 pages
FOD Record Sem 1
No ratings yet
FOD Record Sem 1
25 pages
Site Adaptation and Solar Radiation Forecasting1
No ratings yet
Site Adaptation and Solar Radiation Forecasting1
14 pages
MCP Lab-2023 ContentForPythonLibrariesTopic
No ratings yet
MCP Lab-2023 ContentForPythonLibrariesTopic
9 pages
PR Electronics 5715v104 - Uk
No ratings yet
PR Electronics 5715v104 - Uk
25 pages
Dsup Lab File
No ratings yet
Dsup Lab File
18 pages
Vero, Krishia Ann G. (DRRR Week #2)
No ratings yet
Vero, Krishia Ann G. (DRRR Week #2)
3 pages
Introduction To Linear Programming Sau
No ratings yet
Introduction To Linear Programming Sau
42 pages
MaterialsTodayProceedings 1
No ratings yet
MaterialsTodayProceedings 1
9 pages
Heartofcoaching Sample
100% (1)
Heartofcoaching Sample
19 pages
Combined Cheatsheet
No ratings yet
Combined Cheatsheet
5 pages
ID Strategi Pengembangan Cabai Keriting Di
100% (1)
ID Strategi Pengembangan Cabai Keriting Di
12 pages
Msme Tool Room, Indore: Bio-Data
No ratings yet
Msme Tool Room, Indore: Bio-Data
2 pages
Aero Seal
No ratings yet
Aero Seal
14 pages
Admit Card
No ratings yet
Admit Card
3 pages
WindowsSecurityChecklist Group Policy
100% (1)
WindowsSecurityChecklist Group Policy
17 pages
Profound Python Data Science
From Everand
Profound Python Data Science
Onder Teker
No ratings yet
Darjeeling Toy Train
No ratings yet
Darjeeling Toy Train
2 pages
Power and Function of Income Tax Authorities
No ratings yet
Power and Function of Income Tax Authorities
23 pages
JLL The Rise of Global Capabilities Centres in India Updated
No ratings yet
JLL The Rise of Global Capabilities Centres in India Updated
15 pages
BSI05 Adba
No ratings yet
BSI05 Adba
3 pages
Fundamentals of Data Science Lab Manual New1
No ratings yet
Fundamentals of Data Science Lab Manual New1
32 pages
Illustrative Problems 1. Find Minimum in A List
No ratings yet
Illustrative Problems 1. Find Minimum in A List
11 pages
Appraisal Form
No ratings yet
Appraisal Form
12 pages
Sag - Trainers Methodology I PDF
No ratings yet
Sag - Trainers Methodology I PDF
9 pages
Welcomes You To ISO 9001: 2015 Awareness Training Programme
100% (2)
Welcomes You To ISO 9001: 2015 Awareness Training Programme
184 pages
701
100% (2)
701
35 pages