0% found this document useful (0 votes)
14 views140 pages

Tba Record Final

Uploaded by

shruthi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views140 pages

Tba Record Final

Uploaded by

shruthi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 140

PBA2345 - Tools for Business Analytics

Submitted in partial fulfillment of the

requirement for the award of the degree of

MASTER OF BUSINESS ADMINISTRATION

Name of the Student:

Register No :

SSN School of Management


SSN College of Engineering, Kalavakkam – 603 110
An Autonomous Institution, Affiliated to Anna University, Chennai
BONAFIDE CERTIFICATE

This is to certify that, this is a bonafide record work done by (Reg.


No ), full time student of SSN School of Management, Kalavakkam,
towards submission for the End Semester Practical Examination held on

Place: Kalavakkam
Date:

Examiner I Examiner II
CONTENTS
EX NO. Particulars PAGE NO. SIGN

UNIT 1 – INTRODUCTION TO TOOLS FOR BUSINESS ANALYTICS

1.1 Introduction to types of variables, Functions 4

1.2 Arithmetic Operations, Types of data structures 6

1.3 Conditional statements, Loops 12

UNIT – 2 BASIC PYTHON LIBRARIES

2.1 Introduction to NumPy and Pandas, Series & DataFrame 16

2.3 Reindexing, Indexing, Selection, Filtering, Sorting 24

2.4 Unique Values, Value Counts 28

UNIT – 3 PYTHON FOR DATA PREPARATION

3.1 Handling Missing Data, Replacing Values, 33

3.2 Removing Duplicates, Outlier Treatment 35

3.3 Scaling, Encoding 44

UNIT – 4 PYTHON FOR VISUALIZATION

4.1 Descriptive statistics, Introduction to Matplotlib, Plotting


Functions, Plotting with seaborn, Box Plot, Histogram, 46
Count Plot, Pie Chart, Violin Plot, Line Plot, Scatter Plot,
Facet Grids, Heat Map, Pair Plot

UNIT – 5 PYTHON FOR MODEL BUILDING

5.1 Introduction to SciPy, scikit-learn, Clustering, statsmodels,


Linear Regression, Logistic regression, Model Performance 75
Measures

6 LAB EXERCISES 105


UNIT 1 – INTRODUCTION TO TOOLS FOR BUSINESS ANALYTICS

Aim: To understand the types of variables, Functions, Arithmetic Operations, Types of data
Structures, Conditional statements and Loops

4
5
6
7
8
9
10
11
12
13
14
15
UNIT – 2 BASIC PYTHON LIBRARIES

Aim: To get introduced to NumPy and Pandas, Series & DataFrame,Reindexing, Indexing, Selection, Filtering,
Sorting, Unique Values and Value Counts.

16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
UNIT – 3 PYTHON FOR DATA PREPARATION

Aim: To learn about Handling Missing Data, Replacing Values, Removing Duplicates, Outlier treatment
Scaling and Encoding of the categorical data.

33
34
35
36
37
38
39
40
41
42
43
44
45
UNIT – 4 PYTHON FOR VISUALIZATION

Aim: To get introduced to Descriptive statistics, Matplotlib, Plotting Functions, Plotting with seaborn,
Box Plot, Histogram, Count Plot, Pie Chart, Violin Plot, Line Plot, Scatter Plot, Facet Grids, Heat Map,
Pair Plot.

46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
UNIT – 5 PYTHON FOR MODEL BUILDING

Aim: To get introduced to SciPy, Clustering, statsmodels, Linear Regression, scikit-learn , Logistic
regression and Model Performance Measures

LINEAR REGRESSION MODEL:

75
76
77
78
MULTIPLE LINEAR REGRESSION:

Problem Statement
Airbnb Inc is an online marketplace for arranging or offering lodging, primarily homestays, or tourism
experiences. Airbnb has close to 150 million customers across the world. Price is the most important factor
considered by the customer while making booking into a property. Strategic pricing of the properties is
important to avoid losing customers to the competitors.

We have data of 74111 Airbnb properties across the nations. Based on this data build a simple and multiple
linear regression model to predict the strategic pricing of a new listed property on Airbnb.
79
80
81
82
83
84
85
86
87
88
Conclusion

The final Linear Regression equation is

log_price = b0 + b1 * instant_bookable[T.True] + b2 * accommodates + b3 * bathrooms + b4 *


review_scores_rating + b5 * bedrooms + b6 * beds + b7 * room_type_private_room + b8 *
room_type_shared_room + b9 * cancellation_policy_moderate + b10 * cancellation_policy_strict + b11 *
89
cleaning_fee_True

log_price = 3.43 + (-0.07) * instant_bookable[T.True] + (0.1) * accommodates + (0.18) * bathrooms + (0.01) *


review_scores_rating + (0.16) * bedrooms + (-0.05) * beds + (-0.61) * room_type_private_room + (-1.08) *
room_type_shared_room + (-0.06) * cancellation_policy_moderate + (-0.01) * cancellation_policy_strict + (-0.08) *
cleaning_fee_True

When accommodations increase by 1 unit, log_price increases by 0.1 units, keeping all other predictors constant ,When
no. of bathrooms increases by 1 unit, log_price increases by 0.18 units, keeping all other predictors constant ,etc....

There are also some negative coefficient values, for instance, room_type_shared_room has its corresponding co-efficient
as -1.08. This implies, when the room type is a shared room, the log_price decreases by 1.08 units, keeping all other
predictors constant. etc..

Insights

1) There is a Decrement in Price of the property by a larger factor if the rooms are shared rather than private.

2) There is a Decrement in Price of the property by a larger factor if the property has a Strict Cancellation Policy rather
than a Moderate Cancellation Policy.

3) More the number of bedrooms/bathrooms the price of the property goes up a little

4) While the number of beds increases, the price of the property goes down a little.

90
K-MEANS CLUSTERING :

91
92
93
94
95
96
97
98
HIERARCHICAL CLUSTERING:

Strategy to group Engineering Colleges


You are an independent trainer who would like to pitch your Data Science training program to a set of
engineering colleges. You have data of 26 colleges after a survey using questionnaires. Each college has
been given a score on 5 performance criteria -Teaching, Fees, Placement, and Internship & Infrastructure.
Ratings are in the standardized scale from 1 to 5 where 5 has a higher weightage than 1. Segment the
colleges into groups and come up with your pitch recommendations for each segment.

99
100
101
102
103
104
LAB EXERCISES

LAB EXERCISE 1

Aim:

A mobile store MobiWorld sells different mobile phones to customers. For each order that is placed, the
store keeps a record of various attributes related to the mobile, like Price, Brand, RAM (GB), and Internal
Storage (GB).

Let's learn Python using the context of the store's data.

First, we have to store our data to a variable that can be used to extract the stored information later. Let's
take an example of how we can do that in Python.

Output:

Q1. Suppose the store sold an Apple iPhone (4GB, 128GB) for $900. Store this information in the
variables price, brand, ram, and storage.

Q2. Let's say the store wants to save the information on the billing status of the above phone in a boolean
variable. Write the code in Python to implement the same

Q3. Check the data type of the variables price, brand, ram, and storage

Q4. Let's say a customer buys two Apple iPhones (4GB, 128GB) at a price of $900 each. What will be the
total bill that the customer has to pay?
105
Q5. Let's say the store provides a discount of 15𝑑𝑜𝑙𝑙𝑎𝑟𝑠𝑜𝑛𝑡ℎ𝑒𝐴𝑝𝑝𝑙𝑒𝑖𝑃ℎ𝑜𝑛(4𝐺𝐵,128𝐺𝐵)𝑡ℎ𝑎𝑡𝑐𝑜𝑠𝑡𝑠 900.
What will be the price of the iPhone after the discount?

Q6. Let's say a customer buys two Apple iPhones (4GB, 128GB) and pays a total bill of $1800. Write the
Python code to find the price of an Apple iPhone.

Q7. Let's say a customer buys x numbers of Apple iPhones (4GB, 128GB) for 900, so how many units
will be purchased for 3600. Write the Python code to find the value of x.

Q8. Suppose the store plans to provide a 4.5% discount on the Apple iPhone. What will be the discounted
price of the mobile?

106
Q9. print('The discounted price of the iPhone is ' + (discounted_price)) # What is the error?

Q10. To Add the variables brand, ram, and storage, What converstion can be used ? Why not the other?

107
LAB EXERCISE 2

Aim: To practice with Data Structure and operations in Data structure

Output:

Q1. Create a variable to store the brand name.

brand_list

Q2. Create a list 'brand_list' and store the brand names in it.

brand_list=['Apple','Samsung','LG','Apple']

Q3. Check the type of the variable 'brand_list'.

type(brand_list[0])
[out] str

Q4. Create lists for other attributes 'ram_list', 'storage_list' and 'price_list'.

ram_list=[4,12,8,8]
storage_list=[128,128,64,128]
price_list=[900,899,600,1000]

Q5.Print the created attributes.

print ('BRAND LIST IS',brand_list)


print ('RAM LIST IS',ram_list)
print ('STORAGE LIST IS',storage_list)
print ('PRICE LIST IS',price_list)

108
[out]BRAND LIST IS ['Apple', 'Samsung', 'LG', 'Apple']
RAM LIST IS [4, 12, 8, 8]
STORAGE LIST IS [128, 128, 64, 128]
PRICE LIST IS [900, 899, 600, 1000]

Q6.Find the number of elements in RAM list.

ram_count=len(ram_list)
ram_count
[out] 4

Q7. Find the minimum and maximum price among the mobile phones sold by the store. Print the output as
' The minimum price is $ followed by the value.

print('The minimum price is $',min(price_list))


print('The maximum price is $',max(price_list))
[out]The minimum price is $ 600
The maximum price is $ 1000

Q8. Print the third item in the list ram_list.

print('The third item in the list ram_list is',ram_list[2])


[out] The third item in the list ram_list is 8

Q9. Print the first three items from the list price_list.

print('The first three items from the list price_list are',price_list[0],',',price_list[1],',',price_list[2])

[out] The first three items from the list price_list are 900 , 899 , 600

Q10. Print the last item in the list brand_list.

print('The last item in the list brand_list is', brand_list[-1])


[out] The last item in the list brand_list is Apple

Q11. Print the last but one item

print('The last but one item in the list brand_list is', brand_list[-2])
[out] The last but one item in the list brand_list is LG

109
Q12. Remove the last element from the list brand_list. Check whether the last element have been
removed.

brand_list.pop()
'Apple'
[out] brand_list
['Apple', 'Samsung', 'LG']

Q13. Print the elements of brand_list

print('The elements of brand_list are', brand_list)


[out] The elements of brand_list are ['Apple', 'Samsung', 'LG']

Q14. Insert 'Motorola' to the list brand_list.

Check whether that has been added to brand_list.

brand_list.append('Motorola')
brand_list
[out] ['Apple', 'Samsung', 'LG', 'Motorola']

Q15.Replace LG in brand_list with LG_1

index=brand_list.index('LG')
brand_list[index]='LG_1'
brand_list
[out] ['Apple', 'Samsung', 'LG_1', 'Motorola']

Q16. Remove the LG_1 that is after Samsung and print the brand_list.

brand_list.pop(2)
'LG_1'
print(brand_list)
[out] ['Apple', 'Samsung', 'Motorola']

Q17. Store the storage specifications 32/64/128/256 as an immutable variable 'storage'

storage=(128,128,64,128)
storage
110
[out] (128, 128, 64, 128)

Q18. Display the type of variabel/object 'storage'

type(storage)
[out] tuple

Q19. Print the second item in the 'storage'

print('The second item in the storage tuple is',storage[1])


[out] The second item in the storage tuple is 128

Q20. Store the details of a phone Brand- Apple, RAM( in GB) - 4GB, Storage (in GB)- 128GB, Price(in
$) : 800 in a single variable. Name the variable as 'attributes' Print the variable.

attributes={'Brand':'Apple' ,
'RAM':'4 GB' ,
'Storage':'128 GB' ,
'Price':'$800'}
attributes
[out] {'Brand': 'Apple', 'RAM': '4 GB', 'Storage': '128 GB', 'Price': '$800'}

Q21. Display the type of variable 'attributes'.

type(attributes)
[out] dict

Q22. Extract the price from the dictionary attributes.

attributes['Price']
[out] '$800'

Q23. Change the price in attributes to 900

attributes['Price']='$900'
attributes
[out] {'Brand': 'Apple', 'RAM': '4 GB', 'Storage': '128 GB', 'Price': '$900'}

Q24. Create a dictionary 'products' for storing the attributes of 4 different mobile phones.

products={'brand':['Apple','Samsung','LG','Apple'] ,
'ram':[4,12,8,8] ,
'storage':[128,128,64,128] ,
111
'price':[900,899,600,1000]}
products
[out] {'brand': ['Apple', 'Samsung', 'LG', 'Apple'],
'ram': [4, 12, 8, 8],
'storage': [128, 128, 64, 128],
'price': [900, 899, 600, 1000]}

Q25. Extract the keys and values from the dictionary products. Print it with respective statements.

keys=products.keys()
print('The keys from the dictionary products: ',keys)
values=products.values()
print('The values from the dictionary products: ',values)
[out] The keys from the dictionary products: dict_keys(['brand', 'ram', 'storage', 'price'])
The values from the dictionary products: dict_values([['Apple', 'Samsung', 'LG', 'Apple'], [4, 12, 8, 8],
[128, 128, 64, 128], [900, 899, 600, 1000]])

Q26. Print the '2nd' value in 'RAM' key in 'products' dictionary.

print('The second value in RAM key in products dictionary is',products['ram'][1])


[out] The second value in RAM key in products dictionary is 12

112
LAB EXERCISE 3

Aim:
To exercise on the concepts of loops with python using the following practice questions.

Output:

Q1. Get input of a number, identity whether it is odd or even and present the result. Use an if else loop.

enter your number : 5


5 is odd number

Q2. Explore the 'elif' loop. Get input of two numbers and operation to be done. consider the four operations
addition,subtraction, multiplication and division only. Print the output. If other operations are given, display the
output as invalid opr.

Q3. Suppose a customer is planning to buy a mobile phone but has a limited budget. Thus, his decision to buy
is based on the condition that the price comes under his budget. Let's say his budget is $600.

Write a code in Python that prints whether the customer can buy the iPhone or not based on his budget.

Hint: Price is fixed Get budget as input use if else loop

113
:
Enter your budget(in dollars): 800
Congrats! You can buy the Iphone.

114
LAB EXERCISE 4

Aim:
To practice with numpy packages by working with dictionaries,arrays,matrices etc in python using the
given questions.

Output:

Q1. Create a dictionary of 10 random values and demonstrate its type with required function

Q2. Create an array of values 45.2,50.3,-2

Q3 Create an array by joining the string a,n,t

Q4. Create one dimiensional horizontal vector with 5 elements

115
Q5. Create one dimensional vertical vector with 5 dimiesion

Q6. Create a matrix of 5X3 Each row having multiples of 1 to 5 respectively

Q7.Print the transpose of above created matrix

116
Q8.find the type of matrix created in Q6

Q9. Create a array of first eight multiples of 5.

Q10. Create a matrix of 4X2 from the array created in Q9

Q11. Create a null matrix of 2X3

Q12. Create a unit matrix of 3X2

Q13. Create a identity matrix of 4X4

117
Q14. Find how many unique values are there in a vector created with a,e,i,o,u,a,b,e

118
LAB EXERCISE 5

Q1. Create a series 'sports1'using pandas with values as 1,2,3,4 and index as Cricket, football, basketball
and Golf.

Q2. Display the series 'sports1'

Q3. create another series 'sports2' with Cricket, Football, Baseball and Golf as index and 1,2,5,4 as values

Q4. Display the series 'sports2'

Q5. Find the value of the index Cricket in sports1

Q6. Find the value of the index Baseball in sports2

119
Q7. Find the sum of sports1 and sports2 and store it as 'sports' and display it

Q8. Find whether there are any null values in sports

Q9. Create a data frame of 10 rows and 5 columns with row index as alphabets from A to J and Column
index as Score1 to Score5. For values create random number between 0 and 1 and the output that you
produce must be fixed Hint: Explore method to fix the random no

np.random.seed(42)
df = pd.DataFrame(np.random.uniform(0,1,size=(10,5)),
columns=['Score1','Score2','Score3','Score4','Score5'],index=['A','B','C','D','E','F','G','H','I','J'])

df

120
Q10. Reset the row index from A to J to the default index that is 0,1…

Q11. Check whether the row index of 'dataframe' is 0,1,2…

121
Q12. If the answer is No for question 11 make the indexing change permenant in dataframe

122
Q13. Create a list as 'new' with cnt1, cnt2, cnt3 to cnt10 as elements.

Q14. Add the list 'new' as last column, index the column as 'Countries' in the dataframe.

123
LAB EXERCISE 6 & 7

Dataframe functions

Context: Starting in 2008, every year Forbes Magazine publishes a list of America's best colleges.
When it comes to the question everyone seems to be asking, “Is college worth it? this published list of
colleges comes handy to take a decision based on student's requirement or desire. The mission of the
college ranking by Forbes Magazine is to conduct an annual review of the undergraduate institutions
that deliver the top academics, best experiences, career success and lowest debt. Whether a school is
in the Top 10 or near the bottom of the list, the 650 colleges are the best in the country.

For most families, choosing a four-year college is one of the biggest and most expensive decisions
they can make. For students, this time of their life may layout their future plans. So choose carefully.
Data set 'ForbesAmericasTopColleges2019.csv'

About Data The data set contains the rankings of 650 Unites States colleges along with various other
statistics pertaining to each school.

* Load required packages


* Load the data into pandas dataframe and view the details
* Dimension of the data
* Datatype of the fields in the data
* Check the missing values in the data
* Data Cleanup
* Reset the index of the dataframe
* Check the first 5 rows of the dataset
* Display the last five rows of the dataset
* Recheck the dimension and info of the data
* Data Analysis Get Summary of Data

124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140

You might also like