Google Play Store Apps Analysis
Google Play Store Apps Analysis
import pandas as pd
%matplotlib inline
In [2]: df = pd.read_csv("googleplaystore.csv")
In [3]: df.head()
Photo Editor & Candy Camera & Grid & 4.0.3 and
0 ART_AND_DESIGN 4.1 159 19M 10,000+ Free 0 Everyone Art & Design 7-Jan-18 1.0.0
ScrapBook up
Varies with
3 Sketch - Draw & Paint ART_AND_DESIGN 4.5 215644 25M 50,000,000+ Free 0 Teen Art & Design 8-Jun-18 4.2 and up
device
4 Pixel Draw - Number Art Coloring Book ART_AND_DESIGN 4.3 967 2.8M 100,000+ Free 0 Everyone Art & Design;Creativity 20-Jun-18 1.1 4.4 and up
In [4]: df.shape
(10841, 13)
Out[4]:
In [5]: df.describe()
Out[5]: Rating
count 9367.000000
mean 4.193338
std 0.537431
min 1.000000
25% 4.000000
50% 4.300000
75% 4.500000
max 19.000000
In [6]: df.boxplot()
<AxesSubplot:>
Out[6]:
In [7]: df.hist()
array([[<AxesSubplot:title={'center':'Rating'}>]], dtype=object)
Out[7]:
In [8]: df.plot()
<AxesSubplot:>
Out[8]:
In [9]: df.info()
<class 'pandas.core.frame.DataFrame'>
Data Cleaning
In [10]: df.isnull().head(20)
Out[10]: App Category Rating Reviews Size Installs Type Price Content Rating Genres Last Updated Current Ver Android Ver
0 False False False False False False False False False False False False False
1 False False False False False False False False False False False False False
2 False False False False False False False False False False False False False
3 False False False False False False False False False False False False False
4 False False False False False False False False False False False False False
5 False False False False False False False False False False False False False
6 False False False False False False False False False False False False False
7 False False False False False False False False False False False False False
8 False False False False False False False False False False False False False
9 False False False False False False False False False False False False False
10 False False False False False False False False False False False False False
11 False False False False False False False False False False False False False
12 False False False False False False False False False False False False False
13 False False False False False False False False False False False False False
14 False False False False False False False False False False False False False
15 False False False False False False False False False False False True False
16 False False False False False False False False False False False False False
17 False False False False False False False False False False False False False
18 False False False False False False False False False False False False False
19 False False False False False False False False False False False False False
In [11]: df.isnull().tail(20)
Out[11]: App Category Rating Reviews Size Installs Type Price Content Rating Genres Last Updated Current Ver Android Ver
10821 False False True False False False False False False False False False False
10822 False False True False False False False False False False False False False
10823 False False True False False False False False False False False False False
10824 False False True False False False False False False False False False False
10825 False False True False False False False False False False False False False
10826 False False False False False False False False False False False False False
10827 False False False False False False False False False False False False False
10828 False False False False False False False False False False False False False
10829 False False False False False False False False False False False False False
10830 False False False False False False False False False False False False False
10831 False False True False False False False False False False False False False
10832 False False False False False False False False False False False False False
10833 False False False False False False False False False False False False False
10834 False False False False False False False False False False False False False
10835 False False True False False False False False False False False False False
10836 False False False False False False False False False False False False False
10837 False False False False False False False False False False False False False
10838 False False True False False False False False False False False False False
10839 False False False False False False False False False False False False False
10840 False False False False False False False False False False False False False
App 0
Out[12]:
Category 0
Rating 1474
Reviews 0
Size 0
Installs 0
Type 1
Price 0
Content Rating 1
Genres 0
Last Updated 0
Current Ver 8
Android Ver 3
dtype: int64
Out[13]: App Category Rating Reviews Size Installs Type Price Content Rating Genres Last Updated Current Ver Android Ver
10472 Life Made WI-Fi Touchscreen Photo Frame 1.9 19.0 3.0M 1,000+ Free 0 Everyone NaN 11-Feb-18 1.0.19 4.0 and up NaN
In [15]: df[10470:10475]
Out[15]: App Category Rating Reviews Size Installs Type Price Content Rating Genres Last Updated Current Ver Android Ver
10470 Jazz Wi-Fi COMMUNICATION 3.4 49 4.0M 10,000+ Free 0 Everyone Communication 10-Feb-17 0.1 2.3 and up
10471 Xposed Wi-Fi-Pwd PERSONALIZATION 3.5 1042 404k 100,000+ Free 0 Everyone Personalization 5-Aug-14 3.0.0 4.0.3 and up
10473 osmino Wi-Fi: free WiFi TOOLS 4.2 134203 4.1M 10,000,000+ Free 0 Everyone Tools 7-Aug-18 6.06.14 4.4 and up
10474 Sat-Fi Voice COMMUNICATION 3.4 37 14M 1,000+ Free 0 Everyone Communication 21-Nov-14 2.2.1.5 2.2 and up
10475 Wi-Fi Visualizer TOOLS 3.9 132 2.6M 50,000+ Free 0 Everyone Tools 17-May-17 0.0.9 2.3 and up
In [16]: df.tail()
10836 Sya9a Maroc - FR FAMILY 4.5 38 53M 5,000+ Free 0 Everyone Education 25-Jul-17 1.48 4.1 and up
10837 Fr. Mike Schmitz Audio Teachings FAMILY 5.0 4 3.6M 100+ Free 0 Everyone Education 6-Jul-18 1 4.1 and up
10838 Parkinson Exercices FR MEDICAL NaN 3 9.5M 1,000+ Free 0 Everyone Medical 20-Jan-17 1 2.2 and up
In [18]: df
Art &
15-Jan-
2 Coloring book moana ART_AND_DESIGN 3.9 967 14M 500,000+ Free 0 Everyone Design;Pretend 2.0.0 4.0.3 and up
18
Play
Varies with
4 Sketch - Draw & Paint ART_AND_DESIGN 4.5 215644 25M 50,000,000+ Free 0 Teen Art & Design 8-Jun-18 4.2 and up
device
... ... ... ... ... ... ... ... ... ... ... ... ... ...
10836 Sya9a Maroc - FR FAMILY 4.5 38 53M 5,000+ Free 0 Everyone Education 25-Jul-17 1.48 4.1 and up
10837 Fr. Mike Schmitz Audio Teachings FAMILY 5.0 4 3.6M 100+ Free 0 Everyone Education 6-Jul-18 1 4.1 and up
20-Jan-
10838 Parkinson Exercices FR MEDICAL NaN 3 9.5M 1,000+ Free 0 Everyone Medical 1 2.2 and up
17
Varies with Mature Books & 19-Jan- Varies with Varies with
10839 The SCP Foundation DB fr nn5n BOOKS_AND_REFERENCE 4.5 114 1,000+ Free 0
device 17+ Reference 15 device device
<AxesSubplot:>
Out[19]:
array([[<AxesSubplot:title={'center':'Rating'}>]], dtype=object)
Out[20]:
Varies with
230 Quick PDF Scanner + OCR FREE BUSINESS 4.2 80805 5,000,000+ Free 0 Everyone Business 26-Feb-18 Varies with device 4.0.3 and up
device
Varies with
240 Google My Business BUSINESS 4.4 70991 5,000,000+ Free 0 Everyone Business 24-Jul-18 2.19.0.204537701 4.4 and up
device
257 ZOOM Cloud Meetings BUSINESS 4.4 31614 37M 10,000,000+ Free 0 Everyone Business 20-Jul-18 4.1.28165.0716 4.0 and up
Varies with
262 join.me - Simple Meetings BUSINESS 4.0 6989 1,000,000+ Free 0 Everyone Business 16-Jul-18 4.3.0.508 4.4 and up
device
... ... ... ... ... ... ... ... ... ... ... ... ... ...
10050 Airway Ex - Intubate. Anesthetize. Train. MEDICAL 4.3 123 86M 10,000+ Free 0 Everyone Medical 1-Jun-18 0.6.88 5.0 and up
10768 AAFP MEDICAL 3.8 63 24M 10,000+ Free 0 Everyone Medical 22-Jun-18 2.3.1 5.0 and up
Out[22]: App Category Rating Reviews Size Installs Type Price Content Rating Genres Last Updated Current Ver Android Ver
194 Google My Business BUSINESS 4.4 70991 Varies with device 5,000,000+ Free 0 Everyone Business 24-Jul-18 2.19.0.204537701 4.4 and up
240 Google My Business BUSINESS 4.4 70991 Varies with device 5,000,000+ Free 0 Everyone Business 24-Jul-18 2.19.0.204537701 4.4 and up
269 Google My Business BUSINESS 4.4 70991 Varies with device 5,000,000+ Free 0 Everyone Business 24-Jul-18 2.19.0.204537701 4.4 and up
In [23]: df.drop_duplicates(subset = ['App'] , keep = 'first', inplace = True ) # removing the duplicate value
In [24]: df.shape
(9659, 13)
Out[24]:
threshold
965.9000000000001
Out[26]:
# threshold meanse alest this many values in the column should be non- NA
# it will drop those column where non-NA rows in that column are less than 965.900..... i.e. threshold
# in this data frame it is this step is not required as we have sufficient data in each row
In [28]: df.isnull().sum()
App 0
Out[28]:
Category 0
Rating 1463
Reviews 0
Size 0
Installs 0
Type 1
Price 0
Content Rating 0
Genres 0
Last Updated 0
Current Ver 8
Android Ver 2
dtype: int64
def impute_median(series):
return series.fillna(series.median())
In [31]: df.isnull().sum()
App 0
Out[31]:
Category 0
Rating 0
Reviews 0
Size 0
Installs 0
Type 1
Price 0
Content Rating 0
Genres 0
Last Updated 0
Current Ver 8
Android Ver 2
dtype: int64
print(df['Type'].mode())
print(df['Current Ver'].mode())
print(df['Android Ver'].mode())
0 Free
0 4.1 and up
In [33]: # filling NUll values with the mode values as mode of all the columnn is 1 i.e. Unimodel Values[0] id that value of mode
In [34]: df.isnull().sum()
App 0
Out[34]:
Category 0
Rating 0
Reviews 0
Size 0
Installs 0
Type 0
Price 0
Content Rating 0
Genres 0
Last Updated 0
Current Ver 0
Android Ver 0
dtype: int64
df.describe()
Data Visualization
In [38]: grp = df.groupby('Category')
ratin = grp['Rating'].agg(np.mean)
price = grp['Price'].agg(np.sum)
reviews = grp['Reviews'].agg(np.mean)
print(ratin)
print("\n")
print(price)
print("\n")
print(reviews)
Category
ART_AND_DESIGN 4.354687
AUTO_AND_VEHICLES 4.205882
BEAUTY 4.283019
BOOKS_AND_REFERENCE 4.334234
BUSINESS 4.173810
COMICS 4.185714
COMMUNICATION 4.154921
DATING 4.041520
EDUCATION 4.363866
ENTERTAINMENT 4.135294
EVENTS 4.395313
FAMILY 4.194378
FINANCE 4.138551
FOOD_AND_DRINK 4.192857
GAME 4.249948
HEALTH_AND_FITNESS 4.251736
HOUSE_AND_HOME 4.174324
LIBRARIES_AND_DEMO 4.207143
LIFESTYLE 4.131436
MAPS_AND_NAVIGATION 4.062595
MEDICAL 4.202025
NEWS_AND_MAGAZINES 4.156693
PARENTING 4.300000
PERSONALIZATION 4.325532
PHOTOGRAPHY 4.166548
PRODUCTIVITY 4.206150
SHOPPING 4.237624
SOCIAL 4.255230
SPORTS 4.232923
TOOLS 4.073881
TRAVEL_AND_LOCAL 4.103196
VIDEO_PLAYERS 4.068098
WEATHER 4.248101
Category
ART_AND_DESIGN 5.97
AUTO_AND_VEHICLES 13.47
BEAUTY 0.00
BOOKS_AND_REFERENCE 119.77
BUSINESS 175.29
COMICS 0.00
COMMUNICATION 83.14
DATING 27.44
EDUCATION 17.96
ENTERTAINMENT 7.98
EVENTS 109.99
FAMILY 2399.86
FINANCE 2900.83
FOOD_AND_DRINK 8.48
GAME 284.31
HEALTH_AND_FITNESS 64.35
HOUSE_AND_HOME 0.00
LIBRARIES_AND_DEMO 0.99
LIFESTYLE 2360.87
MAPS_AND_NAVIGATION 26.95
MEDICAL 995.70
NEWS_AND_MAGAZINES 3.98
PARENTING 9.58
PERSONALIZATION 150.48
PHOTOGRAPHY 118.28
PRODUCTIVITY 250.93
SHOPPING 5.48
SOCIAL 15.97
SPORTS 100.00
TOOLS 267.25
TRAVEL_AND_LOCAL 49.95
VIDEO_PLAYERS 10.46
WEATHER 32.42
Category
ART_AND_DESIGN 22175.046875
AUTO_AND_VEHICLES 13690.188235
BEAUTY 7476.226415
BOOKS_AND_REFERENCE 75321.234234
BUSINESS 23548.202381
COMICS 41822.696429
COMMUNICATION 907337.676190
DATING 21190.315789
EDUCATION 112303.764706
ENTERTAINMENT 340810.294118
EVENTS 2515.906250
FAMILY 78507.362445
FINANCE 36701.756522
FOOD_AND_DRINK 56473.464286
GAME 648903.763295
HEALTH_AND_FITNESS 74171.371528
HOUSE_AND_HOME 26079.013514
LIBRARIES_AND_DEMO 10795.607143
LIFESTYLE 32066.859079
MAPS_AND_NAVIGATION 135337.007634
MEDICAL 2994.863291
NEWS_AND_MAGAZINES 91063.889764
PARENTING 15972.183333
PERSONALIZATION 142401.808511
PHOTOGRAPHY 374915.551601
PRODUCTIVITY 148638.098930
SHOPPING 220553.118812
SOCIAL 953672.807531
SPORTS 108765.578462
TOOLS 277335.644498
TRAVEL_AND_LOCAL 122464.570776
VIDEO_PLAYERS 414015.754601
WEATHER 155634.987342
plt.xticks(rotation = 90)
plt.show()
plt.xticks(rotation = 90)
plt.show()
plt.xticks(rotation=90)
plt.show()
In [ ]: