Dsbda 4
Dsbda 4
February 5, 2025
[6]: df=pd.read_csv("/home/dpl11/facebook.csv")
[7]: df.head()
[7]: Unnamed: 0 Page total likes Type Category Post Month Post Weekday \
0 0 139441 Photo 2 12 4
1 1 139441 Status 2 12 3
2 2 139441 Photo 3 12 3
3 3 139441 Photo 2 12 2
4 4 139441 Photo 2 12 2
1
2 2812
3 61027
4 6228
Lifetime People who have liked your Page and engaged with your post \
0 119
1 1108
2 132
3 1386
4 396
[8]: df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 500 entries, 0 to 499
Data columns (total 20 columns):
# Column Non-
Null Count Dtype
--- ------
-------------- -----
0 Unnamed: 0 500
non-null int64
1 Page total likes 500
non-null int64
2 Type 500
non-null object
3 Category 500
non-null int64
4 Post Month 500
non-null int64
5 Post Weekday 500
non-null int64
6 Post Hour 500
2
non-null int64
7 Paid 499
non-null float64
8 Lifetime Post Total Reach 500
non-null int64
9 Lifetime Post Total Impressions 500
non-null int64
10 Lifetime Engaged Users 500
non-null int64
11 Lifetime Post Consumers 500
non-null int64
12 Lifetime Post Consumptions 500
non-null int64
13 Lifetime Post Impressions by people who have liked your Page 500
non-null int64
14 Lifetime Post reach by people who like your Page 500
non-null int64
15 Lifetime People who have liked your Page and engaged with your post 500
non-null int64
16 comment 500
non-null int64
17 like 499
non-null float64
18 share 496
non-null float64
19 Total Interactions 500
non-null int64
dtypes: float64(3), int64(16), object(1)
memory usage: 78.2+ KB
[9]: df.shape
[10]: df.dtypes
3
Lifetime Post Consumers int64
Lifetime Post Consumptions int64
Lifetime Post Impressions by people who have liked your Page int64
Lifetime Post reach by people who like your Page int64
Lifetime People who have liked your Page and engaged with your post int64
comment int64
like float64
share float64
Total Interactions int64
dtype: object
[11]: df.isnull().sum()
[11]: Unnamed: 0 0
Page total likes 0
Type 0
Category 0
Post Month 0
Post Weekday 0
Post Hour 0
Paid 1
Lifetime Post Total Reach 0
Lifetime Post Total Impressions 0
Lifetime Engaged Users 0
Lifetime Post Consumers 0
Lifetime Post Consumptions 0
Lifetime Post Impressions by people who have liked your Page 0
Lifetime Post reach by people who like your Page 0
Lifetime People who have liked your Page and engaged with your post 0
comment 0
like 1
share 4
Total Interactions 0
dtype: int64
[13]: df_subset1
4
495 53.0 26.0
496 53.0 22.0
497 93.0 18.0
498 91.0 38.0
499 91.0 28.0
[15]: df_subset2
merged_data
5
[17]: # Sort by 'Likes' in descending order
sorted_df = df.sort_values(by='like', ascending=False)
print("\nSorted Data by Likes (Descending):")
print(sorted_df)
6
244 4010 6242
379 2254 3391
349 1637 2718
168 2420 4074
3 790 1119
.. … …
21 15 20
100 37 55
441 9 9
417 25 31
111 37 49
Lifetime People who have liked your Page and engaged with your post \
244 3316
379 1936
349 1756
168 2126
3 1386
.. …
21 15
100 32
441 9
7
417 15
111 33
[18]: 0 1 2 3 4 5 6 7 8 \
comment 4 4 4 4 4 4 4 4 4
Type Photo Photo Photo Photo Status Status Status Status Status
like 4.0 4.0 4.0 4.0 4.0 4.0 4.0 4.0 4.0
share 2.0 1.0 0.0 1.0 2.0 1.0 0.0 1.0 2.0
1465 1466
comment 56 56
Type Photo Photo
like 56.0 56.0
share 9.0 25.0
[19]: merged_data.T
[19]: 0 1 2 3 4 5 6 7 8 \
comment 4 4 4 4 4 4 4 4 4
Type Photo Photo Photo Photo Status Status Status Status Status
like 4.0 4.0 4.0 4.0 4.0 4.0 4.0 4.0 4.0
8
share 2.0 1.0 0.0 1.0 2.0 1.0 0.0 1.0 2.0
1465 1466
comment 56 56
Type Photo Photo
like 56.0 56.0
share 9.0 25.0
[20]: #Shape
df.shape
[21]: # Reshape
df_temp = pd.DataFrame({'foo': ['one', 'one', 'one', 'two', 'two', 'two'],
[22]: df_temp
9
[25]: def remove_outliers(column):
Q1 = column.quantile(0.25)
Q3 = column.quantile(0.75)
IQR = Q3 - Q1
threshold = 1.5 * IQR
outlier_mask = (column < Q1 - threshold) | (column > Q3 + threshold)
return column[~outlier_mask]
[27]: df = df.drop_duplicates()
10
11
12
[ ]:
13