0% found this document useful (0 votes)
111 views46 pages

Google Play Store Data Analysis

The document discusses analyzing data from the Google Play Store to gain insights into the Android app market. It describes how the Google Play Store dataset contains information on over 1 million apps that can be used to understand trends in categories, ratings, user behavior and more. The data can be analyzed using tools like data visualization, statistical analysis and machine learning to identify patterns and make predictions.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
111 views46 pages

Google Play Store Data Analysis

The document discusses analyzing data from the Google Play Store to gain insights into the Android app market. It describes how the Google Play Store dataset contains information on over 1 million apps that can be used to understand trends in categories, ratings, user behavior and more. The data can be analyzed using tools like data visualization, statistical analysis and machine learning to identify patterns and make predictions.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 46

6/30/23, 12:02 PM Google_playstore_analysis

Google Play Store data analysis.


Google Play Store data analysis is a process of collecting, cleaning, and analyzing data from
the Google Play Store to gain insights into the Android app market. This data can be used to
identify trends, understand user behavior, and make informed decisions about app
development and marketing.

The Google Play Store dataset is a publicly available dataset that contains information about
over 1 million apps, including their category, rating, reviews, size, installs, price, and more.
This dataset can be used to answer a variety of questions about the Android app market,
such as:

1. What are the most popular app categories?


2. Which apps have the highest ratings?
3. What are the most popular apps in different countries?
4. How much do apps cost?
5. What are the most common app features?

The Google Play Store data can be analyzed using a variety of tools and techniques, such as:

1. Data visualization: This can be used to create charts and graphs that illustrate the data
in a clear and concise way.
2. Statistical analysis: This can be used to identify patterns and trends in the data.
3. Machine learning: This can be used to build models that predict future behavior.

Importing Exploratory data analysis packages.

In [1]: # Numpy is also called as Numerical Python, used for Scientific computing and Numer
import numpy as np

# Pandas is used for Data manipulation and Analysis of the Data.


import pandas as pd

importing Data visualization packages

In [2]: # importing the libraries which are needed for the visualization.
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns

In [3]: # to neglect the warning messages, when we execute the code


import warnings
warnings.filterwarnings("ignore")

Checking the versions of the libraries using.

In [4]: #Versions.
print("pandas",pd.__version__)
print("numpy",np.__version__)
print("seaborn",sns.__version__)
localhost:8888/nbconvert/html/DA_Dashboard_Analysis/Google_playstore_analysis.ipynb?download=false 1/46
6/30/23, 12:02 PM Google_playstore_analysis

pandas 1.4.4
numpy 1.21.5
seaborn 0.11.2

Descriptive analysis of the dataset

In [5]: # Loading the dataset to the Jupyter notebook.


df = pd.read_csv(r"C:\Users\91762\OneDrive\Desktop\googleplaystore.csv")

In [6]: # Checking the initial top 5 values of the dataset


df.head()

Out[6]: Content
App Category Rating Reviews Size Installs Type Price
Rating

Photo
Editor &
Candy
0 ART_AND_DESIGN 4.1 159 19M 10,000+ Free 0 Everyone Art
Camera &
Grid &
ScrapBook

Coloring
1 book ART_AND_DESIGN 3.9 967 14M 500,000+ Free 0 Everyone Desig
moana

U
Launcher
Lite –
2 FREE Live ART_AND_DESIGN 4.7 87510 8.7M 5,000,000+ Free 0 Everyone Art
Cool
Themes,
Hide ...

Sketch -
3 Draw & ART_AND_DESIGN 4.5 215644 25M 50,000,000+ Free 0 Teen Art
Paint

Pixel Draw
- Number
4 Art ART_AND_DESIGN 4.3 967 2.8M 100,000+ Free 0 Everyone
Design
Coloring
Book

In [7]: # Checking the last values in the dataset


df.tail(2)

localhost:8888/nbconvert/html/DA_Dashboard_Analysis/Google_playstore_analysis.ipynb?download=false 2/46
6/30/23, 12:02 PM Google_playstore_analysis

Out[7]: Co
App Category Rating Reviews Size Installs Type Price
R

The SCP Varies


M
10839 Foundation BOOKS_AND_REFERENCE 4.5 114 with 1,000+ Free 0
DB fr nn5n device

iHoroscope
- 2018
Daily
10840 LIFESTYLE 4.5 398307 19M 10,000,000+ Free 0 Eve
Horoscope
&
Astrology

In [8]: # Checking the information of the dataset with its datatype.


df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10841 entries, 0 to 10840
Data columns (total 13 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 App 10841 non-null object
1 Category 10841 non-null object
2 Rating 9367 non-null float64
3 Reviews 10841 non-null object
4 Size 10841 non-null object
5 Installs 10841 non-null object
6 Type 10840 non-null object
7 Price 10841 non-null object
8 Content Rating 10840 non-null object
9 Genres 10841 non-null object
10 Last Updated 10841 non-null object
11 Current Ver 10833 non-null object
12 Android Ver 10838 non-null object
dtypes: float64(1), object(12)
memory usage: 1.1+ MB

In [9]: # Checking the number of rows and columns in the dataset


rows, columns = df.shape
print(rows)
print(columns)

10841
13

In [10]: # method can be used to get a quick overview of the distribution of the values in a
#This information can be helpful for understanding the data and identifying any pot
df.describe()

localhost:8888/nbconvert/html/DA_Dashboard_Analysis/Google_playstore_analysis.ipynb?download=false 3/46
6/30/23, 12:02 PM Google_playstore_analysis

Out[10]: Rating

count 9367.000000

mean 4.193338

std 0.537431

min 1.000000

25% 4.000000

50% 4.300000

75% 4.500000

max 19.000000

In [11]: # method in Python is used to transpose the output of the df.describe()


df.describe().T

Out[11]: count mean std min 25% 50% 75% max

Rating 9367.0 4.193338 0.537431 1.0 4.0 4.3 4.5 19.0

In [12]: # Checking for the null values in the dataset


df.isnull().sum()

App 0
Out[12]:
Category 0
Rating 1474
Reviews 0
Size 0
Installs 0
Type 1
Price 0
Content Rating 1
Genres 0
Last Updated 0
Current Ver 8
Android Ver 3
dtype: int64

In [13]: # Checking for the columns in the dataset


df.columns

Index(['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type',


Out[13]:
'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver',
'Android Ver'],
dtype='object')

In [14]: # converting the columns into the list


df.columns.to_list()

localhost:8888/nbconvert/html/DA_Dashboard_Analysis/Google_playstore_analysis.ipynb?download=false 4/46
6/30/23, 12:02 PM Google_playstore_analysis

['App',
Out[14]:
'Category',
'Rating',
'Reviews',
'Size',
'Installs',
'Type',
'Price',
'Content Rating',
'Genres',
'Last Updated',
'Current Ver',
'Android Ver']

In [15]: # Checking the value counts of all the columns in the dataset.
for i in df.columns.to_list():
print("*****************",i,"*****************")
print(df[i].value_counts())

localhost:8888/nbconvert/html/DA_Dashboard_Analysis/Google_playstore_analysis.ipynb?download=false 5/46
6/30/23, 12:02 PM Google_playstore_analysis

***************** App *****************


ROBLOX 9
CBS Sports App - Scores, News, Stats & Watch Live 8
ESPN 7
Duolingo: Learn Languages Free 7
Candy Crush Saga 7
..
Meet U - Get Friends for Snapchat, Kik & Instagram 1
U-Report 1
U of I Community Credit Union 1
Waiting For U Launcher Theme 1
iHoroscope - 2018 Daily Horoscope & Astrology 1
Name: App, Length: 9660, dtype: int64
***************** Category *****************
FAMILY 1972
GAME 1144
TOOLS 843
MEDICAL 463
BUSINESS 460
PRODUCTIVITY 424
PERSONALIZATION 392
COMMUNICATION 387
SPORTS 384
LIFESTYLE 382
FINANCE 366
HEALTH_AND_FITNESS 341
PHOTOGRAPHY 335
SOCIAL 295
NEWS_AND_MAGAZINES 283
SHOPPING 260
TRAVEL_AND_LOCAL 258
DATING 234
BOOKS_AND_REFERENCE 231
VIDEO_PLAYERS 175
EDUCATION 156
ENTERTAINMENT 149
MAPS_AND_NAVIGATION 137
FOOD_AND_DRINK 127
HOUSE_AND_HOME 88
LIBRARIES_AND_DEMO 85
AUTO_AND_VEHICLES 85
WEATHER 82
ART_AND_DESIGN 65
EVENTS 64
PARENTING 60
COMICS 60
BEAUTY 53
1.9 1
Name: Category, dtype: int64
***************** Rating *****************
4.4 1109
4.3 1076
4.5 1038
4.2 952
4.6 823
4.1 708
4.0 568
4.7 499
3.9 386
3.8 303
5.0 274
3.7 239
4.8 234
3.6 174
localhost:8888/nbconvert/html/DA_Dashboard_Analysis/Google_playstore_analysis.ipynb?download=false 6/46
6/30/23, 12:02 PM Google_playstore_analysis

3.5 163
3.4 128
3.3 102
4.9 87
3.0 83
3.1 69
3.2 64
2.9 45
2.8 42
2.7 25
2.6 25
2.5 21
2.3 20
2.4 19
1.0 16
2.2 14
1.9 13
2.0 12
1.7 8
1.8 8
2.1 8
1.6 4
1.4 3
1.5 3
1.2 1
19.0 1
Name: Rating, dtype: int64
***************** Reviews *****************
0 596
1 272
2 214
3 175
4 137
...
342912 1
4272 1
5517 1
4057 1
398307 1
Name: Reviews, Length: 6002, dtype: int64
***************** Size *****************
Varies with device 1695
11M 198
12M 196
14M 194
13M 191
...
429k 1
200k 1
460k 1
728k 1
619k 1
Name: Size, Length: 462, dtype: int64
***************** Installs *****************
1,000,000+ 1579
10,000,000+ 1252
100,000+ 1169
10,000+ 1054
1,000+ 907
5,000,000+ 752
100+ 719
500,000+ 539
50,000+ 479
5,000+ 477
localhost:8888/nbconvert/html/DA_Dashboard_Analysis/Google_playstore_analysis.ipynb?download=false 7/46
6/30/23, 12:02 PM Google_playstore_analysis

100,000,000+ 409
10+ 386
500+ 330
50,000,000+ 289
50+ 205
5+ 82
500,000,000+ 72
1+ 67
1,000,000,000+ 58
0+ 14
0 1
Free 1
Name: Installs, dtype: int64
***************** Type *****************
Free 10039
Paid 800
0 1
Name: Type, dtype: int64
***************** Price *****************
0 10040
$0.99 148
$2.99 129
$1.99 73
$4.99 72
...
$1.75 1
$14.00 1
$4.85 1
$46.99 1
$1.04 1
Name: Price, Length: 93, dtype: int64
***************** Content Rating *****************
Everyone 8714
Teen 1208
Mature 17+ 499
Everyone 10+ 414
Adults only 18+ 3
Unrated 2
Name: Content Rating, dtype: int64
***************** Genres *****************
Tools 842
Entertainment 623
Education 549
Medical 463
Business 460
...
Arcade;Pretend Play 1
Card;Brain Games 1
Lifestyle;Pretend Play 1
Comics;Creativity 1
Strategy;Creativity 1
Name: Genres, Length: 120, dtype: int64
***************** Last Updated *****************
August 3, 2018 326
August 2, 2018 304
July 31, 2018 294
August 1, 2018 285
July 30, 2018 211
...
March 20, 2014 1
April 7, 2015 1
September 22, 2014 1
October 3, 2015 1
March 23, 2014 1
localhost:8888/nbconvert/html/DA_Dashboard_Analysis/Google_playstore_analysis.ipynb?download=false 8/46
6/30/23, 12:02 PM Google_playstore_analysis

Name: Last Updated, Length: 1378, dtype: int64


***************** Current Ver *****************
Varies with device 1459
1.0 809
1.1 264
1.2 178
2.0 151
...
1.0.17.3905 1
15.1.2 1
4.94.19 1
1.1.11.11 1
2.0.148.0 1
Name: Current Ver, Length: 2832, dtype: int64
***************** Android Ver *****************
4.1 and up 2451
4.0.3 and up 1501
4.0 and up 1375
Varies with device 1362
4.4 and up 980
2.3 and up 652
5.0 and up 601
4.2 and up 394
2.3.3 and up 281
2.2 and up 244
4.3 and up 243
3.0 and up 241
2.1 and up 134
1.6 and up 116
6.0 and up 60
7.0 and up 42
3.2 and up 36
2.0 and up 32
5.1 and up 24
1.5 and up 20
4.4W and up 12
3.1 and up 10
2.0.1 and up 7
8.0 and up 6
7.1 and up 3
4.0.3 - 7.1.1 2
5.0 - 8.0 2
1.0 and up 2
7.0 - 7.1.1 1
4.1 - 7.1.1 1
5.0 - 6.0 1
2.2 - 7.1.1 1
5.0 - 7.1.1 1
Name: Android Ver, dtype: int64

In [16]: # Checking for the datatype which are object in the dataset
df.dtypes == "object"

localhost:8888/nbconvert/html/DA_Dashboard_Analysis/Google_playstore_analysis.ipynb?download=false 9/46
6/30/23, 12:02 PM Google_playstore_analysis

App True
Out[16]:
Category True
Rating False
Reviews True
Size True
Installs True
Type True
Price True
Content Rating True
Genres True
Last Updated True
Current Ver True
Android Ver True
dtype: bool

In [17]: for i in df.columns:


if df[i].dtypes == "object":
print(i)

App
Category
Reviews
Size
Installs
Type
Price
Content Rating
Genres
Last Updated
Current Ver
Android Ver

In [18]: for i in df.columns:


if df[i].dtypes == "float":
print(i)

Rating

In [19]: df.head(2)

Out[19]: Content
App Category Rating Reviews Size Installs Type Price G
Rating

Photo
Editor &
Candy
0 ART_AND_DESIGN 4.1 159 19M 10,000+ Free 0 Everyone Art & D
Camera &
Grid &
ScrapBook

Coloring
1 book ART_AND_DESIGN 3.9 967 14M 500,000+ Free 0 Everyone Design;Pr
moana

In [20]: df["Reviews"].dtype

dtype('O')
Out[20]:

In [21]: df.Reviews.str

<pandas.core.strings.accessor.StringMethods at 0x15d5dd90130>
Out[21]:

localhost:8888/nbconvert/html/DA_Dashboard_Analysis/Google_playstore_analysis.ipynb?download=false 10/46
6/30/23, 12:02 PM Google_playstore_analysis

In [22]: df["Reviews"].str

<pandas.core.strings.accessor.StringMethods at 0x15d5dd90130>
Out[22]:

In [23]: df.Reviews.str.isnumeric().sum()

10840
Out[23]:

In [24]: df.Reviews.shape

(10841,)
Out[24]:

In [25]: df[~df.Reviews.str.isnumeric()]

Out[25]: Content
App Category Rating Reviews Size Installs Type Price Genre
Rating

Life Made
WI-Fi
Februar
10472 Touchscreen 1.9 19.0 3.0M 1,000+ Free 0 Everyone NaN
11, 201
Photo
Frame

In [26]: print(df.Reviews.str)

<pandas.core.strings.accessor.StringMethods object at 0x0000015D5DD90130>

In [27]: # creating a copy of the DataFrame df


df_copy = df.copy()

In [28]: # dropping the index 10472


df_copy = df_copy.drop(df_copy.index[10472])

In [29]: df_copy.shape

(10840, 13)
Out[29]:

In [30]: df_copy["Reviews"].dtype

dtype('O')
Out[30]:

In [31]: df_copy["Reviews"] = df_copy["Reviews"].astype("int")

In [32]: df_copy["Reviews"].dtype

dtype('int32')
Out[32]:

In [33]: df_copy.info()

localhost:8888/nbconvert/html/DA_Dashboard_Analysis/Google_playstore_analysis.ipynb?download=false 11/46
6/30/23, 12:02 PM Google_playstore_analysis

<class 'pandas.core.frame.DataFrame'>
Int64Index: 10840 entries, 0 to 10840
Data columns (total 13 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 App 10840 non-null object
1 Category 10840 non-null object
2 Rating 9366 non-null float64
3 Reviews 10840 non-null int32
4 Size 10840 non-null object
5 Installs 10840 non-null object
6 Type 10839 non-null object
7 Price 10840 non-null object
8 Content Rating 10840 non-null object
9 Genres 10840 non-null object
10 Last Updated 10840 non-null object
11 Current Ver 10832 non-null object
12 Android Ver 10838 non-null object
dtypes: float64(1), int32(1), object(11)
memory usage: 1.1+ MB

In [34]: df_copy.head(2)

Out[34]: Content
App Category Rating Reviews Size Installs Type Price G
Rating

Photo
Editor &
Candy
0 ART_AND_DESIGN 4.1 159 19M 10,000+ Free 0 Everyone Art & D
Camera &
Grid &
ScrapBook

Coloring
1 book ART_AND_DESIGN 3.9 967 14M 500,000+ Free 0 Everyone Design;Pr
moana

In [35]: # Checking for the unique values in the Size column


df_copy["Size"].unique()

localhost:8888/nbconvert/html/DA_Dashboard_Analysis/Google_playstore_analysis.ipynb?download=false 12/46
6/30/23, 12:02 PM Google_playstore_analysis

array(['19M', '14M', '8.7M', '25M', '2.8M', '5.6M', '29M', '33M', '3.1M',


Out[35]:
'28M', '12M', '20M', '21M', '37M', '2.7M', '5.5M', '17M', '39M',
'31M', '4.2M', '7.0M', '23M', '6.0M', '6.1M', '4.6M', '9.2M',
'5.2M', '11M', '24M', 'Varies with device', '9.4M', '15M', '10M',
'1.2M', '26M', '8.0M', '7.9M', '56M', '57M', '35M', '54M', '201k',
'3.6M', '5.7M', '8.6M', '2.4M', '27M', '2.5M', '16M', '3.4M',
'8.9M', '3.9M', '2.9M', '38M', '32M', '5.4M', '18M', '1.1M',
'2.2M', '4.5M', '9.8M', '52M', '9.0M', '6.7M', '30M', '2.6M',
'7.1M', '3.7M', '22M', '7.4M', '6.4M', '3.2M', '8.2M', '9.9M',
'4.9M', '9.5M', '5.0M', '5.9M', '13M', '73M', '6.8M', '3.5M',
'4.0M', '2.3M', '7.2M', '2.1M', '42M', '7.3M', '9.1M', '55M',
'23k', '6.5M', '1.5M', '7.5M', '51M', '41M', '48M', '8.5M', '46M',
'8.3M', '4.3M', '4.7M', '3.3M', '40M', '7.8M', '8.8M', '6.6M',
'5.1M', '61M', '66M', '79k', '8.4M', '118k', '44M', '695k', '1.6M',
'6.2M', '18k', '53M', '1.4M', '3.0M', '5.8M', '3.8M', '9.6M',
'45M', '63M', '49M', '77M', '4.4M', '4.8M', '70M', '6.9M', '9.3M',
'10.0M', '8.1M', '36M', '84M', '97M', '2.0M', '1.9M', '1.8M',
'5.3M', '47M', '556k', '526k', '76M', '7.6M', '59M', '9.7M', '78M',
'72M', '43M', '7.7M', '6.3M', '334k', '34M', '93M', '65M', '79M',
'100M', '58M', '50M', '68M', '64M', '67M', '60M', '94M', '232k',
'99M', '624k', '95M', '8.5k', '41k', '292k', '11k', '80M', '1.7M',
'74M', '62M', '69M', '75M', '98M', '85M', '82M', '96M', '87M',
'71M', '86M', '91M', '81M', '92M', '83M', '88M', '704k', '862k',
'899k', '378k', '266k', '375k', '1.3M', '975k', '980k', '4.1M',
'89M', '696k', '544k', '525k', '920k', '779k', '853k', '720k',
'713k', '772k', '318k', '58k', '241k', '196k', '857k', '51k',
'953k', '865k', '251k', '930k', '540k', '313k', '746k', '203k',
'26k', '314k', '239k', '371k', '220k', '730k', '756k', '91k',
'293k', '17k', '74k', '14k', '317k', '78k', '924k', '902k', '818k',
'81k', '939k', '169k', '45k', '475k', '965k', '90M', '545k', '61k',
'283k', '655k', '714k', '93k', '872k', '121k', '322k', '1.0M',
'976k', '172k', '238k', '549k', '206k', '954k', '444k', '717k',
'210k', '609k', '308k', '705k', '306k', '904k', '473k', '175k',
'350k', '383k', '454k', '421k', '70k', '812k', '442k', '842k',
'417k', '412k', '459k', '478k', '335k', '782k', '721k', '430k',
'429k', '192k', '200k', '460k', '728k', '496k', '816k', '414k',
'506k', '887k', '613k', '243k', '569k', '778k', '683k', '592k',
'319k', '186k', '840k', '647k', '191k', '373k', '437k', '598k',
'716k', '585k', '982k', '222k', '219k', '55k', '948k', '323k',
'691k', '511k', '951k', '963k', '25k', '554k', '351k', '27k',
'82k', '208k', '913k', '514k', '551k', '29k', '103k', '898k',
'743k', '116k', '153k', '209k', '353k', '499k', '173k', '597k',
'809k', '122k', '411k', '400k', '801k', '787k', '237k', '50k',
'643k', '986k', '97k', '516k', '837k', '780k', '961k', '269k',
'20k', '498k', '600k', '749k', '642k', '881k', '72k', '656k',
'601k', '221k', '228k', '108k', '940k', '176k', '33k', '663k',
'34k', '942k', '259k', '164k', '458k', '245k', '629k', '28k',
'288k', '775k', '785k', '636k', '916k', '994k', '309k', '485k',
'914k', '903k', '608k', '500k', '54k', '562k', '847k', '957k',
'688k', '811k', '270k', '48k', '329k', '523k', '921k', '874k',
'981k', '784k', '280k', '24k', '518k', '754k', '892k', '154k',
'860k', '364k', '387k', '626k', '161k', '879k', '39k', '970k',
'170k', '141k', '160k', '144k', '143k', '190k', '376k', '193k',
'246k', '73k', '658k', '992k', '253k', '420k', '404k', '470k',
'226k', '240k', '89k', '234k', '257k', '861k', '467k', '157k',
'44k', '676k', '67k', '552k', '885k', '1020k', '582k', '619k'],
dtype=object)

In [36]: np.nan

nan
Out[36]:

In [37]: df_copy["Size"] = df_copy["Size"].str.replace("Varies with device", str(np.nan))

localhost:8888/nbconvert/html/DA_Dashboard_Analysis/Google_playstore_analysis.ipynb?download=false 13/46
6/30/23, 12:02 PM Google_playstore_analysis

In [38]: df_copy["Size"].dtype

dtype('O')
Out[38]:

In [39]: # Replacing M with 000


df_copy["Size"] = df_copy["Size"].str.replace("M","000")

In [40]: # Replacing k with none.


df_copy["Size"] = df_copy["Size"].str.replace("k","")

In [41]: # Checking for the unique values in the size


df_copy["Size"].unique()

localhost:8888/nbconvert/html/DA_Dashboard_Analysis/Google_playstore_analysis.ipynb?download=false 14/46
6/30/23, 12:02 PM Google_playstore_analysis

array(['19000', '14000', '8.7000', '25000', '2.8000', '5.6000', '29000',


Out[41]:
'33000', '3.1000', '28000', '12000', '20000', '21000', '37000',
'2.7000', '5.5000', '17000', '39000', '31000', '4.2000', '7.0000',
'23000', '6.0000', '6.1000', '4.6000', '9.2000', '5.2000', '11000',
'24000', 'nan', '9.4000', '15000', '10000', '1.2000', '26000',
'8.0000', '7.9000', '56000', '57000', '35000', '54000', '201',
'3.6000', '5.7000', '8.6000', '2.4000', '27000', '2.5000', '16000',
'3.4000', '8.9000', '3.9000', '2.9000', '38000', '32000', '5.4000',
'18000', '1.1000', '2.2000', '4.5000', '9.8000', '52000', '9.0000',
'6.7000', '30000', '2.6000', '7.1000', '3.7000', '22000', '7.4000',
'6.4000', '3.2000', '8.2000', '9.9000', '4.9000', '9.5000',
'5.0000', '5.9000', '13000', '73000', '6.8000', '3.5000', '4.0000',
'2.3000', '7.2000', '2.1000', '42000', '7.3000', '9.1000', '55000',
'23', '6.5000', '1.5000', '7.5000', '51000', '41000', '48000',
'8.5000', '46000', '8.3000', '4.3000', '4.7000', '3.3000', '40000',
'7.8000', '8.8000', '6.6000', '5.1000', '61000', '66000', '79',
'8.4000', '118', '44000', '695', '1.6000', '6.2000', '18', '53000',
'1.4000', '3.0000', '5.8000', '3.8000', '9.6000', '45000', '63000',
'49000', '77000', '4.4000', '4.8000', '70000', '6.9000', '9.3000',
'10.0000', '8.1000', '36000', '84000', '97000', '2.0000', '1.9000',
'1.8000', '5.3000', '47000', '556', '526', '76000', '7.6000',
'59000', '9.7000', '78000', '72000', '43000', '7.7000', '6.3000',
'334', '34000', '93000', '65000', '79000', '100000', '58000',
'50000', '68000', '64000', '67000', '60000', '94000', '232',
'99000', '624', '95000', '8.5', '41', '292', '11', '80000',
'1.7000', '74000', '62000', '69000', '75000', '98000', '85000',
'82000', '96000', '87000', '71000', '86000', '91000', '81000',
'92000', '83000', '88000', '704', '862', '899', '378', '266',
'375', '1.3000', '975', '980', '4.1000', '89000', '696', '544',
'525', '920', '779', '853', '720', '713', '772', '318', '58',
'241', '196', '857', '51', '953', '865', '251', '930', '540',
'313', '746', '203', '26', '314', '239', '371', '220', '730',
'756', '91', '293', '17', '74', '14', '317', '78', '924', '902',
'818', '81', '939', '169', '45', '475', '965', '90000', '545',
'61', '283', '655', '714', '93', '872', '121', '322', '1.0000',
'976', '172', '238', '549', '206', '954', '444', '717', '210',
'609', '308', '705', '306', '904', '473', '175', '350', '383',
'454', '421', '70', '812', '442', '842', '417', '412', '459',
'478', '335', '782', '721', '430', '429', '192', '200', '460',
'728', '496', '816', '414', '506', '887', '613', '243', '569',
'778', '683', '592', '319', '186', '840', '647', '191', '373',
'437', '598', '716', '585', '982', '222', '219', '55', '948',
'323', '691', '511', '951', '963', '25', '554', '351', '27', '82',
'208', '913', '514', '551', '29', '103', '898', '743', '116',
'153', '209', '353', '499', '173', '597', '809', '122', '411',
'400', '801', '787', '237', '50', '643', '986', '97', '516', '837',
'780', '961', '269', '20', '498', '600', '749', '642', '881', '72',
'656', '601', '221', '228', '108', '940', '176', '33', '663', '34',
'942', '259', '164', '458', '245', '629', '28', '288', '775',
'785', '636', '916', '994', '309', '485', '914', '903', '608',
'500', '54', '562', '847', '957', '688', '811', '270', '48', '329',
'523', '921', '874', '981', '784', '280', '24', '518', '754',
'892', '154', '860', '364', '387', '626', '161', '879', '39',
'970', '170', '141', '160', '144', '143', '190', '376', '193',
'246', '73', '658', '992', '253', '420', '404', '470', '226',
'240', '89', '234', '257', '861', '467', '157', '44', '676', '67',
'552', '885', '1020', '582', '619'], dtype=object)

In [42]: df_copy["Size"] = df_copy["Size"].astype("float")

In [43]: df_copy["Size"][2]*1000

8700.0
Out[43]:

localhost:8888/nbconvert/html/DA_Dashboard_Analysis/Google_playstore_analysis.ipynb?download=false 15/46
6/30/23, 12:02 PM Google_playstore_analysis

In [44]: for i in df_copy["Size"]:


if i <= 10:
df_copy["Size"] = df_copy["Size"].replace(i,i*1000)

In [45]: df_copy["Size"].head()

0 19000.0
Out[45]:
1 14000.0
2 8700.0
3 25000.0
4 2800.0
Name: Size, dtype: float64

In [46]: df_copy["Size"] = df_copy["Size"]/1000

In [47]: df_copy["Size"].head()

0 19.0
Out[47]:
1 14.0
2 8.7
3 25.0
4 2.8
Name: Size, dtype: float64

In [48]: df_copy.head()

Out[48]: Content
App Category Rating Reviews Size Installs Type Price
Rating

Photo
Editor &
Candy
0 ART_AND_DESIGN 4.1 159 19.0 10,000+ Free 0 Everyone Art
Camera &
Grid &
ScrapBook

Coloring
1 book ART_AND_DESIGN 3.9 967 14.0 500,000+ Free 0 Everyone Desig
moana

U
Launcher
Lite –
2 FREE Live ART_AND_DESIGN 4.7 87510 8.7 5,000,000+ Free 0 Everyone Art
Cool
Themes,
Hide ...

Sketch -
3 Draw & ART_AND_DESIGN 4.5 215644 25.0 50,000,000+ Free 0 Teen Art
Paint

Pixel Draw
- Number
4 Art ART_AND_DESIGN 4.3 967 2.8 100,000+ Free 0 Everyone
Design
Coloring
Book

In [49]: # Checking for the unique values in Installs.


df_copy["Installs"].unique()

localhost:8888/nbconvert/html/DA_Dashboard_Analysis/Google_playstore_analysis.ipynb?download=false 16/46
6/30/23, 12:02 PM Google_playstore_analysis

array(['10,000+', '500,000+', '5,000,000+', '50,000,000+', '100,000+',


Out[49]:
'50,000+', '1,000,000+', '10,000,000+', '5,000+', '100,000,000+',
'1,000,000,000+', '1,000+', '500,000,000+', '50+', '100+', '500+',
'10+', '1+', '5+', '0+', '0'], dtype=object)

In [50]: # Checking for the unique values in Price.


df_copy["Price"].unique()

array(['0', '$4.99', '$3.99', '$6.99', '$1.49', '$2.99', '$7.99', '$5.99',


Out[50]:
'$3.49', '$1.99', '$9.99', '$7.49', '$0.99', '$9.00', '$5.49',
'$10.00', '$24.99', '$11.99', '$79.99', '$16.99', '$14.99',
'$1.00', '$29.99', '$12.99', '$2.49', '$10.99', '$1.50', '$19.99',
'$15.99', '$33.99', '$74.99', '$39.99', '$3.95', '$4.49', '$1.70',
'$8.99', '$2.00', '$3.88', '$25.99', '$399.99', '$17.99',
'$400.00', '$3.02', '$1.76', '$4.84', '$4.77', '$1.61', '$2.50',
'$1.59', '$6.49', '$1.29', '$5.00', '$13.99', '$299.99', '$379.99',
'$37.99', '$18.99', '$389.99', '$19.90', '$8.49', '$1.75',
'$14.00', '$4.85', '$46.99', '$109.99', '$154.99', '$3.08',
'$2.59', '$4.80', '$1.96', '$19.40', '$3.90', '$4.59', '$15.46',
'$3.04', '$4.29', '$2.60', '$3.28', '$4.60', '$28.99', '$2.95',
'$2.90', '$1.97', '$200.00', '$89.99', '$2.56', '$30.99', '$3.61',
'$394.99', '$1.26', '$1.20', '$1.04'], dtype=object)

In [51]: # In Installs replacing + with none


df_copy["Installs"] = df_copy["Installs"].str.replace("+","")

In [52]: # In Installs replacing , with none


df_copy["Installs"] = df_copy["Installs"].str.replace(",","")

In [53]: # converting Installs to float datatype


df_copy["Installs"] = df_copy["Installs"].astype("float")

In [54]: # In Installs replacing $ with none


df_copy["Price"] = df_copy["Price"].str.replace("$","")

In [55]: # converting Price to float datatype


df_copy["Price"] = df_copy["Price"].astype("float")

In [56]: df_copy.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 10840 entries, 0 to 10840
Data columns (total 13 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 App 10840 non-null object
1 Category 10840 non-null object
2 Rating 9366 non-null float64
3 Reviews 10840 non-null int32
4 Size 9145 non-null float64
5 Installs 10840 non-null float64
6 Type 10839 non-null object
7 Price 10840 non-null float64
8 Content Rating 10840 non-null object
9 Genres 10840 non-null object
10 Last Updated 10840 non-null object
11 Current Ver 10832 non-null object
12 Android Ver 10838 non-null object
dtypes: float64(4), int32(1), object(8)
memory usage: 1.4+ MB

In [57]: # Checking unique values in Type


df_copy["Type"].unique()
localhost:8888/nbconvert/html/DA_Dashboard_Analysis/Google_playstore_analysis.ipynb?download=false 17/46
6/30/23, 12:02 PM Google_playstore_analysis

array(['Free', 'Paid', nan], dtype=object)


Out[57]:

In [58]: df_copy.head()

Out[58]: Content
App Category Rating Reviews Size Installs Type Price
Rating

Photo
Editor &
Candy
0 ART_AND_DESIGN 4.1 159 19.0 10000.0 Free 0.0 Everyone Art &
Camera &
Grid &
ScrapBook

Coloring
1 book ART_AND_DESIGN 3.9 967 14.0 500000.0 Free 0.0 Everyone Design
moana

U
Launcher
Lite –
2 FREE Live ART_AND_DESIGN 4.7 87510 8.7 5000000.0 Free 0.0 Everyone Art &
Cool
Themes,
Hide ...

Sketch -
3 Draw & ART_AND_DESIGN 4.5 215644 25.0 50000000.0 Free 0.0 Teen Art &
Paint

Pixel Draw
- Number
4 Art ART_AND_DESIGN 4.3 967 2.8 100000.0 Free 0.0 Everyone
Design;C
Coloring
Book

In [59]: df_copy["Last Updated"]

0 January 7, 2018
Out[59]:
1 January 15, 2018
2 August 1, 2018
3 June 8, 2018
4 June 20, 2018
...
10836 July 25, 2017
10837 July 6, 2018
10838 January 20, 2017
10839 January 19, 2015
10840 July 25, 2018
Name: Last Updated, Length: 10840, dtype: object

In [60]: df_copy["Last Updated"].dtype

dtype('O')
Out[60]:

In [61]: # Converting Last Updated to datetime dataframe


df_copy["Last Updated"] = pd.to_datetime(df_copy["Last Updated"])

In [62]: df_copy.info()

localhost:8888/nbconvert/html/DA_Dashboard_Analysis/Google_playstore_analysis.ipynb?download=false 18/46
6/30/23, 12:02 PM Google_playstore_analysis

<class 'pandas.core.frame.DataFrame'>
Int64Index: 10840 entries, 0 to 10840
Data columns (total 13 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 App 10840 non-null object
1 Category 10840 non-null object
2 Rating 9366 non-null float64
3 Reviews 10840 non-null int32
4 Size 9145 non-null float64
5 Installs 10840 non-null float64
6 Type 10839 non-null object
7 Price 10840 non-null float64
8 Content Rating 10840 non-null object
9 Genres 10840 non-null object
10 Last Updated 10840 non-null datetime64[ns]
11 Current Ver 10832 non-null object
12 Android Ver 10838 non-null object
dtypes: datetime64[ns](1), float64(4), int32(1), object(7)
memory usage: 1.4+ MB

In [63]: df_copy["Day"] = df_copy["Last Updated"].dt.day

In [64]: df_copy["Month"] = df_copy["Last Updated"].dt.month

In [65]: df_copy["Year"] = df_copy["Last Updated"].dt.year

In [66]: df_copy.head()

Out[66]: Content
App Category Rating Reviews Size Installs Type Price
Rating

Photo
Editor &
Candy
0 ART_AND_DESIGN 4.1 159 19.0 10000.0 Free 0.0 Everyone Art &
Camera &
Grid &
ScrapBook

Coloring
1 book ART_AND_DESIGN 3.9 967 14.0 500000.0 Free 0.0 Everyone Design
moana

U
Launcher
Lite –
2 FREE Live ART_AND_DESIGN 4.7 87510 8.7 5000000.0 Free 0.0 Everyone Art &
Cool
Themes,
Hide ...

Sketch -
3 Draw & ART_AND_DESIGN 4.5 215644 25.0 50000000.0 Free 0.0 Teen Art &
Paint

Pixel Draw
- Number
4 Art ART_AND_DESIGN 4.3 967 2.8 100000.0 Free 0.0 Everyone
Design;C
Coloring
Book

localhost:8888/nbconvert/html/DA_Dashboard_Analysis/Google_playstore_analysis.ipynb?download=false 19/46
6/30/23, 12:02 PM Google_playstore_analysis

In [67]: df_copy.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 10840 entries, 0 to 10840
Data columns (total 16 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 App 10840 non-null object
1 Category 10840 non-null object
2 Rating 9366 non-null float64
3 Reviews 10840 non-null int32
4 Size 9145 non-null float64
5 Installs 10840 non-null float64
6 Type 10839 non-null object
7 Price 10840 non-null float64
8 Content Rating 10840 non-null object
9 Genres 10840 non-null object
10 Last Updated 10840 non-null datetime64[ns]
11 Current Ver 10832 non-null object
12 Android Ver 10838 non-null object
13 Day 10840 non-null int64
14 Month 10840 non-null int64
15 Year 10840 non-null int64
dtypes: datetime64[ns](1), float64(4), int32(1), int64(3), object(7)
memory usage: 1.6+ MB

In [68]: # Converting the cleaned thing to CSV


df_copy.to_csv("CleanedGooglePlayStore.csv")

Exploatory Data Analysis


In [69]: # Reading the cleaned dataset
df_copy = pd.read_csv("CleanedGooglePlayStore.csv")

In [70]: df_copy.head(2)

Out[70]: Unnamed: Conten


App Category Rating Reviews Size Installs Type Price
0 Rating

Photo
Editor &
Candy
0 0 ART_AND_DESIGN 4.1 159 19.0 10000.0 Free 0.0 Everyone
Camera &
Grid &
ScrapBook

Coloring
1 1 book ART_AND_DESIGN 3.9 967 14.0 500000.0 Free 0.0 Everyone
moana

In [71]: # Remove a column by specifying its name


column_to_remove = 'Unnamed: 0'
df_copy.drop(column_to_remove, axis=1, inplace=True)

In [72]: df_copy.sample(2)

localhost:8888/nbconvert/html/DA_Dashboard_Analysis/Google_playstore_analysis.ipynb?download=false 20/46
6/30/23, 12:02 PM Google_playstore_analysis

Out[72]: Content
App Category Rating Reviews Size Installs Type Price Genres
Rating U

Code on the 2
9416 TOOLS 3.9 146 0.784 10000.0 Free 0.0 Everyone Tools
egg

EC -
Encumbrance
2
9230 Search - BUSINESS 3.4 45 5.500 10000.0 Free 0.0 Everyone Business
telangana
state

In [73]: df_copy.head(2)

Out[73]: Content
App Category Rating Reviews Size Installs Type Price Ge
Rating

Photo
Editor &
Candy
0 ART_AND_DESIGN 4.1 159 19.0 10000.0 Free 0.0 Everyone Art & De
Camera &
Grid &
ScrapBook

Coloring A
1 book ART_AND_DESIGN 3.9 967 14.0 500000.0 Free 0.0 Everyone Design;Pre
moana

In [74]: df_copy.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10840 entries, 0 to 10839
Data columns (total 16 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 App 10840 non-null object
1 Category 10840 non-null object
2 Rating 9366 non-null float64
3 Reviews 10840 non-null int64
4 Size 9145 non-null float64
5 Installs 10840 non-null float64
6 Type 10839 non-null object
7 Price 10840 non-null float64
8 Content Rating 10840 non-null object
9 Genres 10840 non-null object
10 Last Updated 10840 non-null object
11 Current Ver 10832 non-null object
12 Android Ver 10838 non-null object
13 Day 10840 non-null int64
14 Month 10840 non-null int64
15 Year 10840 non-null int64
dtypes: float64(4), int64(4), object(8)
memory usage: 1.3+ MB

In [75]: df_copy.shape

(10840, 16)
Out[75]:

In [76]: df_copy.isnull().sum()

localhost:8888/nbconvert/html/DA_Dashboard_Analysis/Google_playstore_analysis.ipynb?download=false 21/46
6/30/23, 12:02 PM Google_playstore_analysis

App 0
Out[76]:
Category 0
Rating 1474
Reviews 0
Size 1695
Installs 0
Type 1
Price 0
Content Rating 0
Genres 0
Last Updated 0
Current Ver 8
Android Ver 2
Day 0
Month 0
Year 0
dtype: int64

In [77]: df_copy.duplicated()

0 False
Out[77]:
1 False
2 False
3 False
4 False
...
10835 False
10836 False
10837 False
10838 False
10839 False
Length: 10840, dtype: bool

In [78]: # Checking for the duplicate values in df_copy.


df_copy[df_copy.duplicated()]

localhost:8888/nbconvert/html/DA_Dashboard_Analysis/Google_playstore_analysis.ipynb?download=false 22/46
6/30/23, 12:02 PM Google_playstore_analysis

Out[78]: Content
App Category Rating Reviews Size Installs Type Price
Rating

Quick PDF
229 Scanner + BUSINESS 4.2 80805 NaN 5000000.0 Free 0.0 Everyone
OCR FREE

236 Box BUSINESS 4.2 159872 NaN 10000000.0 Free 0.0 Everyone

Google My
239 BUSINESS 4.4 70991 NaN 5000000.0 Free 0.0 Everyone
Business

ZOOM
256 Cloud BUSINESS 4.4 31614 37.0 10000000.0 Free 0.0 Everyone
Meetings

join.me -
261 Simple BUSINESS 4.0 6989 NaN 1000000.0 Free 0.0 Everyone
Meetings

... ... ... ... ... ... ... ... ... ...

Wunderlist:
8643 To-Do List PRODUCTIVITY 4.6 404610 NaN 10000000.0 Free 0.0 Everyone Pro
& Tasks

TickTick: To
Do List with
8654 PRODUCTIVITY 4.6 25370 NaN 1000000.0 Free 0.0 Everyone Pro
Reminder,
Day Planner

ColorNote
8658 Notepad PRODUCTIVITY 4.6 2401017 NaN 100000000.0 Free 0.0 Everyone Pro
Notes

Airway Ex -
Intubate.
10049 MEDICAL 4.3 123 86.0 10000.0 Free 0.0 Everyone
Anesthetize.
Train.

10767 AAFP MEDICAL 3.8 63 24.0 10000.0 Free 0.0 Everyone

483 rows × 16 columns

In [79]: # dropping the duplicate values in the dataset df_copy


df_copy = df_copy.drop_duplicates()

In [80]: df_copy.shape

(10357, 16)
Out[80]:

Exploring Numerical and Categorical Data


In [81]: # Checking for the categorical values
cat = []
for col in df_copy.columns:
if df_copy[col].dtypes == "O":
cat.append(col)

localhost:8888/nbconvert/html/DA_Dashboard_Analysis/Google_playstore_analysis.ipynb?download=false 23/46
6/30/23, 12:02 PM Google_playstore_analysis

In [82]: cat

['App',
Out[82]:
'Category',
'Type',
'Content Rating',
'Genres',
'Last Updated',
'Current Ver',
'Android Ver']

In [83]: # Checking for the numerical values


num = []
for col in df_copy.columns:
if df_copy[col].dtypes != "O":
num.append(col)

In [84]: num

['Rating', 'Reviews', 'Size', 'Installs', 'Price', 'Day', 'Month', 'Year']


Out[84]:

Data Proportion
In [85]: df_copy["App"].value_counts()

ROBLOX 9
Out[85]:
8 Ball Pool 7
Bubble Shooter 6
Helix Jump 6
Zombie Catchers 6
..
Popsicle Launcher for Android P 9.0 launcher 1
PixelLab - Text on pictures 1
P Launcher for Android™ 9.0 1
Pacify (Android P theme) - Theme for Xperia™ 1
iHoroscope - 2018 Daily Horoscope & Astrology 1
Name: App, Length: 9659, dtype: int64

In [86]: df_copy["App"].value_counts(normalize = "True")

ROBLOX 0.000869
Out[86]:
8 Ball Pool 0.000676
Bubble Shooter 0.000579
Helix Jump 0.000579
Zombie Catchers 0.000579
...
Popsicle Launcher for Android P 9.0 launcher 0.000097
PixelLab - Text on pictures 0.000097
P Launcher for Android™ 9.0 0.000097
Pacify (Android P theme) - Theme for Xperia™ 0.000097
iHoroscope - 2018 Daily Horoscope & Astrology 0.000097
Name: App, Length: 9659, dtype: float64

In [87]: # Checking for the value_counts in Category


df_copy["Category"].value_counts()

localhost:8888/nbconvert/html/DA_Dashboard_Analysis/Google_playstore_analysis.ipynb?download=false 24/46
6/30/23, 12:02 PM Google_playstore_analysis

FAMILY 1943
Out[87]:
GAME 1121
TOOLS 843
BUSINESS 427
MEDICAL 408
PRODUCTIVITY 407
PERSONALIZATION 388
LIFESTYLE 373
COMMUNICATION 366
FINANCE 360
SPORTS 351
PHOTOGRAPHY 322
HEALTH_AND_FITNESS 306
SOCIAL 280
NEWS_AND_MAGAZINES 264
TRAVEL_AND_LOCAL 237
BOOKS_AND_REFERENCE 230
SHOPPING 224
DATING 196
VIDEO_PLAYERS 175
MAPS_AND_NAVIGATION 137
EDUCATION 130
FOOD_AND_DRINK 124
ENTERTAINMENT 111
AUTO_AND_VEHICLES 85
LIBRARIES_AND_DEMO 85
WEATHER 82
HOUSE_AND_HOME 80
ART_AND_DESIGN 65
EVENTS 64
PARENTING 60
COMICS 60
BEAUTY 53
Name: Category, dtype: int64

In [88]: df_copy["Category"].value_counts().sum()

10357
Out[88]:

In [89]: df_copy["Category"].value_counts(normalize = "True")*100

localhost:8888/nbconvert/html/DA_Dashboard_Analysis/Google_playstore_analysis.ipynb?download=false 25/46
6/30/23, 12:02 PM Google_playstore_analysis

FAMILY 18.760259
Out[89]:
GAME 10.823598
TOOLS 8.139423
BUSINESS 4.122815
MEDICAL 3.939365
PRODUCTIVITY 3.929709
PERSONALIZATION 3.746259
LIFESTYLE 3.601429
COMMUNICATION 3.533842
FINANCE 3.475910
SPORTS 3.389012
PHOTOGRAPHY 3.109008
HEALTH_AND_FITNESS 2.954524
SOCIAL 2.703486
NEWS_AND_MAGAZINES 2.549001
TRAVEL_AND_LOCAL 2.288307
BOOKS_AND_REFERENCE 2.220720
SHOPPING 2.162788
DATING 1.892440
VIDEO_PLAYERS 1.689678
MAPS_AND_NAVIGATION 1.322777
EDUCATION 1.255190
FOOD_AND_DRINK 1.197258
ENTERTAINMENT 1.071739
AUTO_AND_VEHICLES 0.820701
LIBRARIES_AND_DEMO 0.820701
WEATHER 0.791735
HOUSE_AND_HOME 0.772424
ART_AND_DESIGN 0.627595
EVENTS 0.617940
PARENTING 0.579318
COMICS 0.579318
BEAUTY 0.511731
Name: Category, dtype: float64

In [90]: num_df = df_copy[num]

Checking the distribution of Numerical Data


In [91]: # kdeplot of Rating
sns.kdeplot(num_df["Rating"])

<AxesSubplot:xlabel='Rating', ylabel='Density'>
Out[91]:

localhost:8888/nbconvert/html/DA_Dashboard_Analysis/Google_playstore_analysis.ipynb?download=false 26/46
6/30/23, 12:02 PM Google_playstore_analysis

In [92]: for i in num_df.columns:


sns.kdeplot(num_df[i])
plt.xlabel(i)
plt.ylabel("Count")
plt.title("Numerical Feature")
plt.show()

localhost:8888/nbconvert/html/DA_Dashboard_Analysis/Google_playstore_analysis.ipynb?download=false 27/46
6/30/23, 12:02 PM Google_playstore_analysis

localhost:8888/nbconvert/html/DA_Dashboard_Analysis/Google_playstore_analysis.ipynb?download=false 28/46
6/30/23, 12:02 PM Google_playstore_analysis

localhost:8888/nbconvert/html/DA_Dashboard_Analysis/Google_playstore_analysis.ipynb?download=false 29/46
6/30/23, 12:02 PM Google_playstore_analysis

localhost:8888/nbconvert/html/DA_Dashboard_Analysis/Google_playstore_analysis.ipynb?download=false 30/46
6/30/23, 12:02 PM Google_playstore_analysis

In [93]: plt.figure(figsize=(15,15))
plt.suptitle("Univariate Analysis of Numerical Features", fontsize = 20)

for i in range(0, len(num)):


plt.subplot(5,3, i+1)
sns.kdeplot(x = df_copy[num[i]],shade = True, color = "r")
plt.xlabel(num[i])
plt.ylabel("Count")
plt.title("Numerical Feature")
plt.tight_layout()

localhost:8888/nbconvert/html/DA_Dashboard_Analysis/Google_playstore_analysis.ipynb?download=false 31/46
6/30/23, 12:02 PM Google_playstore_analysis

In [94]: num_df.isnull().sum()

Rating 1465
Out[94]:
Reviews 0
Size 1526
Installs 0
Price 0
Day 0
Month 0
Year 0
dtype: int64

Checking the Categorical Data


In [95]: # Copying the categorical values
cat_df = df_copy[cat]

In [96]: cat_df["Type"].value_counts()

Free 9591
Out[96]:
Paid 765
Name: Type, dtype: int64

In [97]: # countplot of Type


sns.countplot(cat_df["Type"])

<AxesSubplot:xlabel='Type', ylabel='count'>
Out[97]:

In [98]: cat_df.columns

Index(['App', 'Category', 'Type', 'Content Rating', 'Genres', 'Last Updated',


Out[98]:
'Current Ver', 'Android Ver'],
dtype='object')

In [99]: df_copy.sample(10)

localhost:8888/nbconvert/html/DA_Dashboard_Analysis/Google_playstore_analysis.ipynb?download=false 32/46
6/30/23, 12:02 PM Google_playstore_analysis

Out[99]: Conten
App Category Rating Reviews Size Installs Type Price
Ratin

German
9973 Vocabulary FAMILY 3.3 1218 1.0 100000.0 Free 0.00 Everyon
Trainer

C
3988 FAMILY 4.3 22248 1.8 1000000.0 Free 0.00 Everyon
Programming

Matu
4316 Anna.K Tarot FAMILY 4.8 17 23.0 100.0 Paid 3.99
17

CM FILE
3513 MANAGER PRODUCTIVITY 4.3 144879 NaN 10000000.0 Free 0.00 Everyon
HD

The
7800 ClubHouse HEALTH_AND_FITNESS NaN 5 8.7 100.0 Free 0.00 Everyon
CR

Five Nights at
5648 GAME 4.7 27856 50.0 100000.0 Paid 2.99 Tee
Freddy's 3

Grubhub:
1193 Food FOOD_AND_DRINK 4.5 155944 35.0 5000000.0 Free 0.00 Everyon
Delivery

Jewels Crush-
2017 Match 3 FAMILY 4.4 14774 19.0 1000000.0 Free 0.00 Everyon
Puzzle

Dr. Panda
8800 FAMILY 4.3 3725 67.0 100000.0 Paid 2.99 Everyon
Restaurant 2

CP Smart
7694 PERSONALIZATION NaN 1 3.9 10.0 Free 0.00 Everyon
Check List

In [100… # Checking for the unique values in the Category


df_copy["Category"].unique()

array(['ART_AND_DESIGN', 'AUTO_AND_VEHICLES', 'BEAUTY',


Out[100]:
'BOOKS_AND_REFERENCE', 'BUSINESS', 'COMICS', 'COMMUNICATION',
'DATING', 'EDUCATION', 'ENTERTAINMENT', 'EVENTS', 'FINANCE',
'FOOD_AND_DRINK', 'HEALTH_AND_FITNESS', 'HOUSE_AND_HOME',
'LIBRARIES_AND_DEMO', 'LIFESTYLE', 'GAME', 'FAMILY', 'MEDICAL',
'SOCIAL', 'SHOPPING', 'PHOTOGRAPHY', 'SPORTS', 'TRAVEL_AND_LOCAL',
'TOOLS', 'PERSONALIZATION', 'PRODUCTIVITY', 'PARENTING', 'WEATHER',
'VIDEO_PLAYERS', 'NEWS_AND_MAGAZINES', 'MAPS_AND_NAVIGATION'],
dtype=object)

In [101… cat_df["Category"].value_counts()

localhost:8888/nbconvert/html/DA_Dashboard_Analysis/Google_playstore_analysis.ipynb?download=false 33/46
6/30/23, 12:02 PM Google_playstore_analysis

FAMILY 1943
Out[101]:
GAME 1121
TOOLS 843
BUSINESS 427
MEDICAL 408
PRODUCTIVITY 407
PERSONALIZATION 388
LIFESTYLE 373
COMMUNICATION 366
FINANCE 360
SPORTS 351
PHOTOGRAPHY 322
HEALTH_AND_FITNESS 306
SOCIAL 280
NEWS_AND_MAGAZINES 264
TRAVEL_AND_LOCAL 237
BOOKS_AND_REFERENCE 230
SHOPPING 224
DATING 196
VIDEO_PLAYERS 175
MAPS_AND_NAVIGATION 137
EDUCATION 130
FOOD_AND_DRINK 124
ENTERTAINMENT 111
AUTO_AND_VEHICLES 85
LIBRARIES_AND_DEMO 85
WEATHER 82
HOUSE_AND_HOME 80
ART_AND_DESIGN 65
EVENTS 64
PARENTING 60
COMICS 60
BEAUTY 53
Name: Category, dtype: int64

In [102… cat_df["Category"].value_counts().plot.pie(figsize=(10,10))

<AxesSubplot:ylabel='Category'>
Out[102]:

localhost:8888/nbconvert/html/DA_Dashboard_Analysis/Google_playstore_analysis.ipynb?download=false 34/46
6/30/23, 12:02 PM Google_playstore_analysis

In [103… cat_df["Category"].value_counts().head()

FAMILY 1943
Out[103]:
GAME 1121
TOOLS 843
BUSINESS 427
MEDICAL 408
Name: Category, dtype: int64

In [104… category = pd.DataFrame(cat_df["Category"].value_counts().head(10))

In [105… category

Out[105]: Category

FAMILY 1943

GAME 1121

TOOLS 843

BUSINESS 427

MEDICAL 408

PRODUCTIVITY 407

PERSONALIZATION 388

LIFESTYLE 373

COMMUNICATION 366

FINANCE 360

localhost:8888/nbconvert/html/DA_Dashboard_Analysis/Google_playstore_analysis.ipynb?download=false 35/46
6/30/23, 12:02 PM Google_playstore_analysis

In [106… category.plot(kind = "bar")

<AxesSubplot:>
Out[106]:

In [107… category.plot(kind = "hist")

<AxesSubplot:ylabel='Frequency'>
Out[107]:

localhost:8888/nbconvert/html/DA_Dashboard_Analysis/Google_playstore_analysis.ipynb?download=false 36/46
6/30/23, 12:02 PM Google_playstore_analysis

In [108… category.rename(columns={"Category":"Count"}, inplace = True)

In [109… category.head()

Out[109]: Count

FAMILY 1943

GAME 1121

TOOLS 843

BUSINESS 427

MEDICAL 408

In [110… plt.figure(figsize=(20,15))
plt.xticks(rotation=45)
plt.title("Top 10 categories")
sns.barplot(x = category.index[:10],y = "Count",data=category[:10])

<AxesSubplot:title={'center':'Top 10 categories'}, ylabel='Count'>


Out[110]:

localhost:8888/nbconvert/html/DA_Dashboard_Analysis/Google_playstore_analysis.ipynb?download=false 37/46
6/30/23, 12:02 PM Google_playstore_analysis

Some other Operations


In [111… df_copy["Installs"].max()

1000000000.0
Out[111]:

In [112… # Checking for the apps which are installed maximum number of times
df_copy[df_copy["Installs"] == df_copy["Installs"].max()]

localhost:8888/nbconvert/html/DA_Dashboard_Analysis/Google_playstore_analysis.ipynb?download=false 38/46
6/30/23, 12:02 PM Google_playstore_analysis

Out[112]: Co
App Category Rating Reviews Size Installs Type Price
R

Google
152 BOOKS_AND_REFERENCE 3.9 1433233 NaN 1.000000e+09 Free 0.0
Play Books

Messenger
– Text and
335 Video COMMUNICATION 4.0 56642847 NaN 1.000000e+09 Free 0.0 Eve
Chat for
Free

WhatsApp
336 COMMUNICATION 4.4 69119316 NaN 1.000000e+09 Free 0.0 Eve
Messenger

Google
Chrome:
338 COMMUNICATION 4.3 9642995 NaN 1.000000e+09 Free 0.0 Eve
Fast &
Secure

340 Gmail COMMUNICATION 4.3 4604324 NaN 1.000000e+09 Free 0.0 Eve

341 Hangouts COMMUNICATION 4.0 3419249 NaN 1.000000e+09 Free 0.0 Eve

Messenger
– Text and
382 Video COMMUNICATION 4.0 56646578 NaN 1.000000e+09 Free 0.0 Eve
Chat for
Free

386 Hangouts COMMUNICATION 4.0 3419433 NaN 1.000000e+09 Free 0.0 Eve

Skype -
391 free IM & COMMUNICATION 4.1 10484169 NaN 1.000000e+09 Free 0.0 Eve
video calls

Google
Chrome:
411 COMMUNICATION 4.3 9643041 NaN 1.000000e+09 Free 0.0 Eve
Fast &
Secure

451 Gmail COMMUNICATION 4.3 4604483 NaN 1.000000e+09 Free 0.0 Eve

464 Hangouts COMMUNICATION 4.0 3419513 NaN 1.000000e+09 Free 0.0 Eve

Google
865 Play ENTERTAINMENT 4.3 7165362 NaN 1.000000e+09 Free 0.0
Games

Subway Eve
1654 GAME 4.5 27722264 76.0 1.000000e+09 Free 0.0
Surfers

Subway Eve
1700 GAME 4.5 27723193 76.0 1.000000e+09 Free 0.0
Surfers

localhost:8888/nbconvert/html/DA_Dashboard_Analysis/Google_playstore_analysis.ipynb?download=false 39/46
6/30/23, 12:02 PM Google_playstore_analysis

Co
App Category Rating Reviews Size Installs Type Price
R

Subway Eve
1750 GAME 4.5 27724094 76.0 1.000000e+09 Free 0.0
Surfers

Subway Eve
1872 GAME 4.5 27725352 76.0 1.000000e+09 Free 0.0
Surfers

2544 Facebook SOCIAL 4.1 78158306 NaN 1.000000e+09 Free 0.0

2545 Instagram SOCIAL 4.5 66577313 NaN 1.000000e+09 Free 0.0

2554 Google+ SOCIAL 4.2 4831125 NaN 1.000000e+09 Free 0.0

2604 Instagram SOCIAL 4.5 66577446 NaN 1.000000e+09 Free 0.0

Google
2808 PHOTOGRAPHY 4.5 10858556 NaN 1.000000e+09 Free 0.0 Eve
Photos

Google
2853 PHOTOGRAPHY 4.5 10858538 NaN 1.000000e+09 Free 0.0 Eve
Photos

Google
2884 PHOTOGRAPHY 4.5 10859051 NaN 1.000000e+09 Free 0.0 Eve
Photos

Maps -
3117 Navigate TRAVEL_AND_LOCAL 4.3 9235155 NaN 1.000000e+09 Free 0.0 Eve
& Explore

Google
3127 Street TRAVEL_AND_LOCAL 4.2 2129689 NaN 1.000000e+09 Free 0.0 Eve
View

Maps -
3223 Navigate TRAVEL_AND_LOCAL 4.3 9235373 NaN 1.000000e+09 Free 0.0 Eve
& Explore

Google
3232 Street TRAVEL_AND_LOCAL 4.2 2129707 NaN 1.000000e+09 Free 0.0 Eve
View

3234 Google TOOLS 4.4 8033493 NaN 1.000000e+09 Free 0.0 Eve

Google
3454 PRODUCTIVITY 4.4 2731171 NaN 1.000000e+09 Free 0.0 Eve
Drive

Google
3523 PRODUCTIVITY 4.4 2731211 NaN 1.000000e+09 Free 0.0 Eve
Drive

3665 YouTube VIDEO_PLAYERS 4.3 25655305 NaN 1.000000e+09 Free 0.0

localhost:8888/nbconvert/html/DA_Dashboard_Analysis/Google_playstore_analysis.ipynb?download=false 40/46
6/30/23, 12:02 PM Google_playstore_analysis

Co
App Category Rating Reviews Size Installs Type Price
R

Google
Play
3687 VIDEO_PLAYERS 3.7 906384 NaN 1.000000e+09 Free 0.0
Movies &
TV

Google
3736 NEWS_AND_MAGAZINES 3.9 877635 13.0 1.000000e+09 Free 0.0
News

Google
3816 NEWS_AND_MAGAZINES 3.9 877643 13.0 1.000000e+09 Free 0.0
News

Subway Eve
3896 GAME 4.5 27711703 76.0 1.000000e+09 Free 0.0
Surfers

WhatsApp
3904 COMMUNICATION 4.4 69109672 NaN 1.000000e+09 Free 0.0 Eve
Messenger

3909 Instagram SOCIAL 4.5 66509917 NaN 1.000000e+09 Free 0.0

3928 YouTube VIDEO_PLAYERS 4.3 25623548 NaN 1.000000e+09 Free 0.0

3943 Facebook SOCIAL 4.1 78128208 NaN 1.000000e+09 Free 0.0

Google
Chrome:
3996 COMMUNICATION 4.3 9642112 NaN 1.000000e+09 Free 0.0 Eve
Fast &
Secure

Maps -
4098 Navigate TRAVEL_AND_LOCAL 4.3 9231613 NaN 1.000000e+09 Free 0.0 Eve
& Explore

4144 Google+ SOCIAL 4.2 4828372 NaN 1.000000e+09 Free 0.0

4150 Google TOOLS 4.4 8021623 NaN 1.000000e+09 Free 0.0 Eve

4153 Hangouts COMMUNICATION 4.0 3419464 NaN 1.000000e+09 Free 0.0 Eve

Google
4170 PRODUCTIVITY 4.4 2728941 NaN 1.000000e+09 Free 0.0 Eve
Drive

Google
5395 PHOTOGRAPHY 4.5 10847682 NaN 1.000000e+09 Free 0.0 Eve
Photos

Google
5856 Play FAMILY 4.3 7168735 NaN 1.000000e+09 Free 0.0
Games

Google
9844 NEWS_AND_MAGAZINES 3.9 878065 13.0 1.000000e+09 Free 0.0
News

localhost:8888/nbconvert/html/DA_Dashboard_Analysis/Google_playstore_analysis.ipynb?download=false 41/46
6/30/23, 12:02 PM Google_playstore_analysis

In [113… df_copy.groupby(["Category"])["Installs"]

<pandas.core.groupby.generic.SeriesGroupBy object at 0x0000015D5FD94520>


Out[113]:

In [114… df_copy.groupby(["Category"])["Installs"].sum().sort_values(ascending = False)

Category
Out[114]:
GAME 3.154402e+10
COMMUNICATION 2.415228e+10
SOCIAL 1.251387e+10
PRODUCTIVITY 1.246309e+10
TOOLS 1.145277e+10
FAMILY 1.004169e+10
PHOTOGRAPHY 9.721248e+09
TRAVEL_AND_LOCAL 6.361887e+09
VIDEO_PLAYERS 6.222003e+09
NEWS_AND_MAGAZINES 5.393218e+09
SHOPPING 2.573349e+09
ENTERTAINMENT 2.455660e+09
PERSONALIZATION 2.074495e+09
BOOKS_AND_REFERENCE 1.916470e+09
SPORTS 1.528574e+09
HEALTH_AND_FITNESS 1.361023e+09
BUSINESS 8.636649e+08
FINANCE 7.703487e+08
MAPS_AND_NAVIGATION 7.242819e+08
LIFESTYLE 5.348235e+08
EDUCATION 5.339520e+08
WEATHER 4.261005e+08
FOOD_AND_DRINK 2.578988e+08
DATING 2.065361e+08
HOUSE_AND_HOME 1.252125e+08
ART_AND_DESIGN 1.243381e+08
LIBRARIES_AND_DEMO 6.299591e+07
COMICS 5.608615e+07
AUTO_AND_VEHICLES 5.313021e+07
MEDICAL 4.220418e+07
PARENTING 3.152111e+07
BEAUTY 2.719705e+07
EVENTS 1.597316e+07
Name: Installs, dtype: float64

In [115… df_copy.groupby(["App"])["Installs"].sum().sort_values(ascending = False)

App
Out[115]:
Subway Surfers 5.000000e+09
Google Photos 4.000000e+09
Hangouts 4.000000e+09
Google News 3.000000e+09
Google Chrome: Fast & Secure 3.000000e+09
...
Command & Conquer: Rivals 0.000000e+00
Test Application DT 02 0.000000e+00
AP Series Solution Pro 0.000000e+00
I'm Rich/Eu sou Rico/‫أنا غني‬/我很有錢 0.000000e+00
Ak Parti Yardım Toplama 0.000000e+00
Name: Installs, Length: 9659, dtype: float64

In [116… df_copy.groupby(["Category"])["Installs"].sum().nlargest()

localhost:8888/nbconvert/html/DA_Dashboard_Analysis/Google_playstore_analysis.ipynb?download=false 42/46
6/30/23, 12:02 PM Google_playstore_analysis

Category
Out[116]:
GAME 3.154402e+10
COMMUNICATION 2.415228e+10
SOCIAL 1.251387e+10
PRODUCTIVITY 1.246309e+10
TOOLS 1.145277e+10
Name: Installs, dtype: float64

In [117… df_copy.groupby(["Category"])["Installs"].sum().nlargest(5).plot.pie()

<AxesSubplot:ylabel='Installs'>
Out[117]:

In [118… df_copy.groupby(["Category"])["Installs"].sum().nlargest(5).plot.bar()

<AxesSubplot:xlabel='Category'>
Out[118]:

localhost:8888/nbconvert/html/DA_Dashboard_Analysis/Google_playstore_analysis.ipynb?download=false 43/46
6/30/23, 12:02 PM Google_playstore_analysis

In [119… df_copy.groupby(["Category"])["Installs"].sum().nsmallest(5).plot.pie()

<AxesSubplot:ylabel='Installs'>
Out[119]:

localhost:8888/nbconvert/html/DA_Dashboard_Analysis/Google_playstore_analysis.ipynb?download=false 44/46
6/30/23, 12:02 PM Google_playstore_analysis

Checking how many apps are getting 5 Star ratings


In [120… df_copy.head()

Out[120]: Content
App Category Rating Reviews Size Installs Type Price
Rating

Photo
Editor &
Candy
0 ART_AND_DESIGN 4.1 159 19.0 10000.0 Free 0.0 Everyone Art &
Camera &
Grid &
ScrapBook

Coloring
1 book ART_AND_DESIGN 3.9 967 14.0 500000.0 Free 0.0 Everyone Design
moana

U
Launcher
Lite –
2 FREE Live ART_AND_DESIGN 4.7 87510 8.7 5000000.0 Free 0.0 Everyone Art &
Cool
Themes,
Hide ...

Sketch -
3 Draw & ART_AND_DESIGN 4.5 215644 25.0 50000000.0 Free 0.0 Teen Art &
Paint

Pixel Draw
- Number
4 Art ART_AND_DESIGN 4.3 967 2.8 100000.0 Free 0.0 Everyone
Design;C
Coloring
Book

In [121… df_copy["Rating"].unique()

array([4.1, 3.9, 4.7, 4.5, 4.3, 4.4, 3.8, 4.2, 4.6, 3.2, 4. , nan, 4.8,
Out[121]:
4.9, 3.6, 3.7, 3.3, 3.4, 3.5, 3.1, 5. , 2.6, 3. , 1.9, 2.5, 2.8,
2.7, 1. , 2.9, 2.3, 2.2, 1.7, 2. , 1.8, 2.4, 1.6, 2.1, 1.4, 1.5,
1.2])

In [122… df_copy["Rating"] == 5

0 False
Out[122]:
1 False
2 False
3 False
4 False
...
10835 False
10836 True
10837 False
10838 False
10839 False
Name: Rating, Length: 10357, dtype: bool

In [123… df_copy[df_copy["Rating"] == 5]

localhost:8888/nbconvert/html/DA_Dashboard_Analysis/Google_playstore_analysis.ipynb?download=false 45/46
6/30/23, 12:02 PM Google_playstore_analysis

Out[123]: Content
App Category Rating Reviews Size Installs Type Price Genres
Rating Upd

Hojiboy
Tojiboyev 201
329 COMICS 5.0 15 37.0 1000.0 Free 0.0 Everyone Comics
Life
Hacks

American
Girls Mature 201
612 DATING 5.0 5 4.4 1000.0 Free 0.0 Dating
Mobile 17+
Numbers

Awake Mature 201


615 DATING 5.0 2 70.0 100.0 Free 0.0 Dating
Dating 17+

Spine-
The 201
633 DATING 5.0 5 9.3 500.0 Free 0.0 Teen Dating
dating
app

Girls Live
Talk -
Free Text Mature 201
636 DATING 5.0 6 5.0 100.0 Free 0.0 Dating
and 17+
Video
Chat

... ... ... ... ... ... ... ... ... ... ...

Mad
201
10720 Dash Fo' GAME 5.0 14 16.0 100.0 Free 0.0 Everyone Arcade
Cash

GKPB FP
201
10741 Online LIFESTYLE 5.0 32 7.9 1000.0 Free 0.0 Everyone Lifestyle
Church

Monster 201
10775 GAME 5.0 1 24.0 10.0 Free 0.0 Everyone Racing
Ride Pro

Fr. Daoud 201


10819 FAMILY 5.0 22 8.6 1000.0 Free 0.0 Teen Education
Lamei

Fr. Mike
Schmitz 201
10836 FAMILY 5.0 4 3.6 100.0 Free 0.0 Everyone Education
Audio
Teachings

271 rows × 16 columns

In [ ]:

localhost:8888/nbconvert/html/DA_Dashboard_Analysis/Google_playstore_analysis.ipynb?download=false 46/46

You might also like