0% found this document useful (0 votes)
77 views5 pages

Startup Case Study

This document analyzes Indian startup funding data from a CSV file. It cleans the data by handling missing values and formatting issues. It then creates visualizations showing the trend in annual funding amounts over time and the top 10 cities in India for startups based on company counts. Key findings include that 2015 and 2016 saw the most funding and that Bangalore, Mumbai, and New Delhi have the most startups.

Uploaded by

Anubhav Dutta
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
77 views5 pages

Startup Case Study

This document analyzes Indian startup funding data from a CSV file. It cleans the data by handling missing values and formatting issues. It then creates visualizations showing the trend in annual funding amounts over time and the top 10 cities in India for startups based on company counts. Key findings include that 2015 and 2016 saw the most funding and that Bangalore, Mumbai, and New Delhi have the most startups.

Uploaded by

Anubhav Dutta
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

9/7/2021 STARTUP_CASE_STUDY(GRP - 3)

Indian Startup Case Study


Importing neccessary Libraries
Problem Statement : To perform an Indian startup case study analysis

In [1]: #importing necessary libraries

import numpy as np

import pandas as pd

import matplotlib.pyplot as plt

import seaborn as sns

Reading Data
In [2]: data_1 = pd.read_csv('./Datasets/startup_funding.csv')

data = data_1.copy()

data.head()

Out[2]: Sr Date Industry City Investors


Startup Name SubVertical
No dd/mm/yyyy Vertical Location Name

Tiger Global
0 1 09/01/2020 BYJU’S E-Tech E-learning Bengaluru
Management

App based Susquehanna


1 2 13/01/2020 Shuttl Transportation shuttle Gurgaon Growth
service Equity

Retailer of
baby and Sequoia
2 3 09/01/2020 Mamaearth E-commerce Bengaluru
toddler Capital India
products

Online New Vinod


3 4 02/01/2020 https://www.wealthbucket.in/ FinTech
Investment Delhi Khatumal

Embroiled Sprout
Fashion and
4 5 02/01/2020 Fashor Clothes For Mumbai Venture
Apparel
Women Partners

In [3]: data.shape

Out[3]: (3044, 10)

Cleaning Data
In [4]: data.isnull().sum()

Out[4]: Sr No 0

Date dd/mm/yyyy 0

Startup Name 0

Industry Vertical 171

SubVertical 936

City Location 180

Investors Name 24

InvestmentnType 4

Amount in USD 960

localhost:8888/nbconvert/html/STARTUP_CASE_STUDY(GRP - 3).ipynb?download=false 1/5


9/7/2021 STARTUP_CASE_STUDY(GRP - 3)

Remarks 2625

dtype: int64

In [5]: # changing the names of the columns inside the data

data.columns = ["SNo", "Date", "StartupName", "IndustryVertical", "SubVertical",

"City", "InvestorsName", "InvestmentType", "AmountInUSD", "R


# need to extract year from Date column

data.Date.dtype

Out[5]: dtype('O')

In [6]: # lets clean the strings

def clean_string(x):

return str(x).replace("\\xc2\\xa0","").replace("\\\\xc2\\\\xa0", "")

# lets apply the function to clean the data

for col in ["StartupName", "IndustryVertical", "SubVertical", "City",

"InvestorsName", "InvestmentType", "AmountInUSD", "Remarks"]:

data[col] = data[col].apply(lambda x: clean_string(x))

Checking the trend of investments by plotting


number of fundings done in each year.
In [9]: # to find out issues in Date column like . and // in place of / in some dates .

unique_dates = data.Date.unique().tolist()

# unique_dates

In [12]: # removing issue in Date column

data.Date = data.Date.str.replace('.','/' )

data.Date = data.Date.str.replace('//','/')

# extracting year from date column

year = data.Date.str.split('/' , expand = True)[2]

# sorting year in chronological order

year = year.value_counts().sort_index()

x = year.index

y = year.values

# plotting line plot

plt.plot(x,y)

plt.title('Trend of investments')

plt.xlabel("Year")

plt.ylabel("Number of Fundings")

plt.show()

for i in range(3):

print('Year : ' , x[i],', No. of fundings : ' , y[i])

localhost:8888/nbconvert/html/STARTUP_CASE_STUDY(GRP - 3).ipynb?download=false 2/5


9/7/2021 STARTUP_CASE_STUDY(GRP - 3)

Year : 015 , No. of fundings : 1


Year : 2015 , No. of fundings : 935

Year : 2016 , No. of fundings : 993

In [13]: # function to clean the AmounInUsd Column

def clean_amount(x):

x = ''.join([c for c in str(x) if c in ['0', '1', '2', '3', '4', '5', '6', '7',
x = str(x).replace(",","").replace("+","")

x = str(x).lower().replace("undisclosed","")

x = str(x).lower().replace("n/a","")

if x == '':

x = '-999'

return x

# lets apply the function on the column

data["AmountInUSD"] = data["AmountInUSD"].apply(lambda x: float(clean_amount(x)))

# lets check the head of the column after cleaning it


plt.rcParams['figure.figsize'] = (15, 3)

data['AmountInUSD'].plot(kind = 'line', color = 'black')

plt.title('Distribution of Amount', fontsize = 15)

plt.show()

Top 10 Indian cities which have most number of


startups
In [14]: # droppping rows having NaN values in CityLocation column

data_temp = data.copy()

data_temp = data_temp[data_temp['City'].notnull()]

data_temp.City.dropna(inplace = True)

# sorting out issues in city names

def separateCity(city):

return city.split('/')[0].strip()

data_temp.City = data_temp.City.apply(separateCity)

data_temp.City.replace('Delhi','New Delhi' , inplace = True)

data_temp.City.replace('bangalore' , 'Bangalore' , inplace = True)

In [15]: ## Counting startups in each city

city_num = data.City.value_counts()[0:10]

city = city_num.index

num_city = city_num.values

## plotting a pie chart shwoing percentage share of each city in no. of startups the
plt.rcParams['figure.figsize'] = (15,9)

plt.pie(num_city , labels = city , autopct='%.2f%%' , startangle = 90 , wedgeprops


plt.show()

for i in range(len(city)):

print('City : ' , city[i] ,' , Number of Startups :' , num_city[i])

localhost:8888/nbconvert/html/STARTUP_CASE_STUDY(GRP - 3).ipynb?download=false 3/5


9/7/2021 STARTUP_CASE_STUDY(GRP - 3)

City : Bangalore , Number of Startups : 701

City : Mumbai , Number of Startups : 568

City : New Delhi , Number of Startups : 424

City : Gurgaon , Number of Startups : 291

City : nan , Number of Startups : 180

City : Bengaluru , Number of Startups : 141

City : Pune , Number of Startups : 105

City : Hyderabad , Number of Startups : 99

City : Chennai , Number of Startups : 97

City : Noida , Number of Startups : 93

Calculating percentage of funding each city has got!


In [16]: data_temp.City = data_temp.City.apply(separateCity)

data_temp.City.replace('Delhi','New Delhi' , inplace = True)

data_temp.City.replace('bangalore' , 'Bangalore' , inplace = True)

# Removing ',' in Amount column and converting it to integer

data_temp.AmountInUSD = data_temp.AmountInUSD.apply(lambda x : float(str(x).replace(


data_temp.AmountInUSD = pd.to_numeric(data_temp.AmountInUSD)

# Calculating citywise amount of funding received.

city_amount = data_temp.groupby('City')['AmountInUSD'].sum().sort_values(ascending =
city = city_amount.index

amountCity = city_amount.values

## calculating percentage of the funding each city has received .

perAmount = np.true_divide(amountCity , amountCity.sum())*100

for i in range(len(city)):

print(city[i] , format(perAmount[i], '.2f'),'%')

plt.bar(city, perAmount, color = sns.color_palette("flare"))

Bangalore 31.10 %

Bengaluru 23.45 %

Mumbai 13.51 %

localhost:8888/nbconvert/html/STARTUP_CASE_STUDY(GRP - 3).ipynb?download=false 4/5


9/7/2021 STARTUP_CASE_STUDY(GRP - 3)

Gurgaon 9.52 %

New Delhi 9.18 %

Noida 3.50 %

nan 3.46 %

Gurugram 2.36 %

Chennai 1.96 %

Pune 1.95 %

Out[16]: <BarContainer object of 10 artists>

localhost:8888/nbconvert/html/STARTUP_CASE_STUDY(GRP - 3).ipynb?download=false 5/5

You might also like