0% found this document useful (0 votes)
41 views152 pages

Editted Report PDF

The document is a summer training report on AI/ML using Python, submitted by Riddhi Bansal for a Bachelor of Technology degree in IT. It includes chapters on Python basics, machine learning, deep learning, project work, and conclusions, along with various assignments and practice problems. The report covers key programming concepts, libraries like NumPy and Pandas, and practical coding exercises.

Uploaded by

riddhiybansal04
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
41 views152 pages

Editted Report PDF

The document is a summer training report on AI/ML using Python, submitted by Riddhi Bansal for a Bachelor of Technology degree in IT. It includes chapters on Python basics, machine learning, deep learning, project work, and conclusions, along with various assignments and practice problems. The report covers key programming concepts, libraries like NumPy and Pandas, and practical coding exercises.

Uploaded by

riddhiybansal04
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 152

SUMMER TRAINING REPORT

On
AI/ML using Python

Submitted in partial fulfilment of requirements for the award of the


Degree of
Bachelor of Technology
In
IT

Submitted By

RIDDHI BANSAL
20111503122

Under the guidance of

Mr. Achin Jain


Mr. A K Dubey

IT Department
Bharati Vidyapeeth’s College of Engineering, New Delhi – 110063, INDIA
TABLE OF CONTENTS

Chapter 1- Introduction to Python................................................................................................................................... 3


Python Basics ................................................................................................................................................................
3
Assignments and Practice problems .............................................................................................................................
8

Chapter 2- Machine Learning .........................................................................................................................................


80
Machine Learning Basics .............................................................................................................................................
80
Assignments and Practice problems ...........................................................................................................................
84

Chapter 3- Deep Learning .............................................................................................................................................


108
Deep Learning Basics.................................................................................................................................................
108
Assignments and Practice problems .........................................................................................................................
113

Chapter 4- Project Work ...............................................................................................................................................


134

Chapter 5- Conclusion ..................................................................................................................................................


150

Chapter 6- References ..................................................................................................................................................


151
Chapter 1- Introduction to Python

Python is a popular programming language. It was created by Guido van Rossum, and released in 1991.
It is used for:

• web development (server-side),


• software development,
• mathematics,
• system scripting.

Some key features of Python are:

• Python was designed for readability, and has some similarities to the English language with influence from
mathematics.
• Python uses new lines to complete a command, as opposed to other programming languages which often use
semicolons or parentheses.
• Python relies on indentation, using whitespace, to define scope; such as the scope of loops, functions and
classes. Other programming languages often use curly-brackets for this purpose.

Variables

Variables are containers for storing data values. In Python, variables are created when you assign a value to it. They
are case sensitive.

For example: x=5

y = "Hello, World!"

You can get the data type of a variable with the type() function.

For example: print(type(x))


A variable can have a short name (like x and y) or a more descriptive name (age, carName, total_volume). Rules for
Python variables:

• A variable name must start with a letter or the underscore character


• A variable name cannot start with a number
• A variable name can only contain alpha-numeric characters and underscores (A-z, 0-9, and _)
Variable names are case-sensitive (age, Age and AGE are three different variables) A variable
name cannot be any of the Python keywords.

Comments

Python has commenting capability for the purpose of in-code documentation.

Comments start with a #, and Python will render the rest of the line as a comment.

For example: # this is a comment!!

Built-in Data Types


In programming, data type is an important concept.

Variables can store data of different types, and different types can do different things.

Python has the following data types built-in by default, in these categories:
Text Type: str

Numeric Types int, float, complex

Sequence Types list, tuple, range

Mapping Type dict

Set Types set, frozenset

Boolean Type bool

Binary Types bytes, bytearray, memoryview

None Type NoneType

Python Numbers

There are three numeric types in Python:

Int: a whole number, positive or negative, without decimals, of unlimited length.


Float: a number, positive or negative, containing one or more decimals.

Complex: Complex numbers are written with a "j" as the imaginary part.

Strings

Strings in python are surrounded by either single quotation marks, or double quotation marks.

'hello' is the same as "hello".

You can display a string literal with the print() function. Strings in Python are arrays of bytes representing
Unicode characters. However, Python does not have a character data type, a single character is simply a
string with a length of 1. Square brackets can be used to access elements of the string.

Boolean Values

In programming you often need to know if an expression is True or False.


You can evaluate any expression in Python, and get one of two answers, True or False.

When you compare two values, the expression is evaluated and Python returns the Boolean answer.

Python Operators

Operators are used to perform operations on variables and values.

Python divides the operators in the following groups:

• Arithmetic operators
• Assignment operators
• Comparison operators
• Logical operators
• Identity operators
• Membership operators
• Binary operators

Python Collections (Arrays)

There are four collection data types in the Python programming language:

• List is a collection which is ordered and changeable. Allows duplicate members.


• Tuple is a collection which is ordered and unchangeable. Allows duplicate members.
• Set is a collection which is unordered, unchangeable*, and unindexed. No duplicate members.
• Dictionary is a collection which is ordered** and changeable. No duplicate members.
List

Lists are used to store multiple items in a single variable.

Lists are one of 4 built-in data types in Python used to store collections of data, the other 3 are Tuple, Set,
and Dictionary, all with different qualities and usage.

Lists are created using square brackets. List items can be of any data type.

Tuple

Tuples are used to store multiple items in a single variable.

Tuple is one of 4 built-in data types in Python used to store collections of data, the other 3 are List, Set, and
Dictionary, all with different qualities and usage.

A tuple is a collection which is ordered and unchangeable.

Tuples are written with round brackets.

Set
Sets are used to store multiple items in a single variable.

A set is a collection which is unordered, unchangeable*, and unindexed.

Dictionary

Dictionaries are used to store data values in key:value pairs.

A dictionary is a collection which is ordered*, changeable and do not allow duplicates. Dictionary items are
ordered, changeable, and does not allow duplicates.

Dictionary items are presented in key:value pairs, and can be referred to by using the key name.

NumPy

NumPy is a Python library for scientific computing. NumPy stands for Numerical Python. It provides a
highperformance multidimensional array object, along with a suite of functions for working with arrays.
NumPy is the foundation of many other Python libraries for scientific computing, including Pandas, SciPy,
and Matplotlib.
Here are some of the key features of NumPy:

• Multidimensional arrays: NumPy arrays are multidimensional, meaning that they can have more than
one dimension. This makes them ideal for storing and manipulating large amounts of data.
• Fast numerical operations: NumPy arrays are designed to be very efficient for numerical operations.
This makes them ideal for scientific computing applications.
• Powerful functions: NumPy provides a wide range of functions for working with arrays. These
functions can be used to perform a variety of operations, such as mathematical operations, statistical
operations, and linear algebra operations.

NumPy is a powerful library that can be used for a wide variety of scientific computing tasks. It is a musthave
library for any Python developer who does any kind of numerical computing.

Here are some of the advantages of using NumPy:

• Speed: NumPy arrays are much faster than Python lists for numerical operations.
• Power: NumPy provides a wide range of functions for working with arrays.
• Flexibility: NumPy arrays can be used for a variety of tasks, not just scientific computing.
• Portability: NumPy is a well-maintained library that is compatible with many different platforms.

If you are doing any kind of numerical computing in Python, then you should definitely use NumPy. It is a
powerful library that will make your code faster, more powerful, and more flexible.

Pandas

Pandas is a Python library used for data manipulation and analysis. It is one of the most popular Python
libraries for data science, and it is known for its powerful data structures and its easy-to-use data analysis
tools.
Pandas is built on top of the NumPy library, which provides a high-performance multidimensional array
object. Pandas extends NumPy by adding data structures and operations that make it more suitable for data
analysis.

The two main data structures in Pandas are the DataFrame and the Series. A DataFrame is a tabular data
structure that can store data of any type (integer, float, string, etc.). A Series is a one-dimensional data
structure that can store data of a single type.

Pandas also provides a variety of data analysis tools, such as:

• Data cleaning and wrangling


• Data aggregation and summarization
• Data visualization
• Statistical analysis
• Machine learning

Why use Pandas?

There are many reasons to use Pandas for data analysis. Here are a few of the most important reasons:
• Pandas is easy to use. The syntax is relatively simple, and there are many tutorials and documentation
available online.
• Pandas is powerful. It can handle large datasets with ease, and it provides a wide range of data
analysis tools.
• Pandas is flexible. It can be used for a variety of data analysis tasks, from simple data cleaning to
complex machine learning.
• Pandas is popular. There is a large community of Pandas users and developers, which means that
there are many resources available to help you learn and use Pandas.

Matplotlib

Matplotlib is a Python library for creating static, animated, and interactive visualizations. It is a popular
choice for data visualization in Python, and is used by a wide range of industries, including scientific
research, engineering, finance, and healthcare.

Matplotlib is a versatile library that can be used to create a variety of different plots, including line plots,
bar charts, histograms, scatter plots, and pie charts. It also supports a variety of customization options, so
you can tailor your plots to your specific needs.

Matplotlib is divided into two main parts: the matplotlib.pyplot module and the matplotlib.artist module.
The matplotlib.pyplot module is a collection of functions that make it easy to create plots. The
matplotlib.artist module provides the underlying objects that are used to create plots.

To use Matplotlib, you first need to import the matplotlib.pyplot module. You can then use the functions in
this module to create your plots
Day 1 :- Assignment On Python Basics, Types, Expression, Variables, String
Operations

Q1: Find average Marks Write a program to input marks of three tests of a student (all integers). Then
calculate and print the average of all test marks.
a=float(input("enter the marks of first subject:"))
b=float(input("enter the marks of second subject:"))
c=float(input("enter the marks of third subject:"))
avg=(a+b+c)/3
print("average of all test marks=",avg)

enter the marks of first subject:78


enter the marks of second subject:81
enter the marks of third subject:92
average of all test marks= 83.66666666666667

Q2: Find X raised to power N You are given two numbers ’x’(it’s a float), and ’n’(it’s a integer).
Your task is to calculate ‘x’ raised to power ‘n’, and return it.
i=1
x=float(input("enter number:"))
n=int(input("enter power:"))
a=x**n
print("power of number=",a)

enter number:8
enter power:2
power of number= 64.0
Q3: Check Palindrome Given a string, determine if it is a palindrome, considering only alphanumeric
characters.
def isPalindrome(str):
return str==str[::-1]

str=input("enter a string: ")


ans=isPalindrome(str)
if ans==str:
print("true")
else:
print("false")

enter a string: language


false

Q4: Consider the string str=”Global Warming” Write statements in Python to implement the following
(a) To display the last four characters.
(b) To display the substring starting from index 4 and ending at index 8.
(c) To check whether string has alphanu-meric characters or not (d) To trim the last four characters from the
string.
(e) To trim the first four characters from the string.
(f) To display the starting index for the substring „ WaD.
(g) To change the case of the given string.
(h) To check if the string is in title case.
(i) To replace all the occurrences of letter „aD in the string with „*?
str="Global Warming"
print(str[-1:-5:-1])
print("substring=",str[4:8])
print(str.isalnum())
print(str.rstrip('ming'))
print(str.lstrip('Glob'))
print(str.index("Wa"))
print(str.swapcase())
print(str.istitle())
print(str.replace("a","*"))

gnim
substring= al W
False
Global War
al Warming
7
gLOBAL wARMING
True
Glob*l W*rming
addCode
addText

Day 1 Practice Questions

Write a program which prompts the user for a Celsius tem- perature, convert the temperature to Fahrenheit,
and print out the converted temperature.
temp=float(input("Enter temperature in Celsius:"))
tempF=(temp*(9/5))+32
print("temperature in celsius=",temp)
print("temperature in fahrenheit=",tempF)

Enter temperature in Celsius:45


temperature in celsius= 45.0
temperature in fahrenheit= 113.0
Assume that we execute the following assignment statements: width 17 height 12.0 For each of the following
expressions, write the value of the expression and the type (of the value of the expression).
1.width//2 2.width/2.0 3.height/3 4.1+2*5 Use the Python interpreter to check your answers.
width=17
height=12.0
print("width//2=",width//2)
print("width/2.0=",width/2.0)
print("height/3=",height/3)
print("1+2*5=",1+2*5)

width//2= 8
width/2.0= 8.5
height/3= 4.0
1+2*5= 11

Write a program that uses input to prompt a user for their name and then welcomes them

a=input("eneter your name:")


print("Hello ",a)

eneter your name:Riddhi


Hello Riddhi

Write a program to prompt the user for hours and rate per hour to compute gross pay

hours=int(input("Enter Hours:"))
rate=float(input("Enter Rate:"))
pay=hours*rate
print("Pay=",pay)

Enter Hours:5
Enter Rate:2.3
Pay= 11.5

Day 2 :- Assignment On Functions, Conditions, Loops

Q1: Check number Given an integer n, find if n is positive, negative or 0. If n is positive, print "Positive" If
n is negative, print "Negative" And if n is equal to 0, print "Zero".
n=int(input("Enter a number:"))
if(n>0):
print("positive")
elif(n<0):
print("negative")
else:
print("zero")

Enter a number:34
Positive

Q2: Sum of n numbers Given an integer n, find and print the sum of numbers from 1 to n.
n=int(input("enter a number"))
sum=0
for i in range(1,n+1):
sum=sum+i
print(sum)

enter a number5
15

Q3: Sum of Even Numbers Given a number N, print sum of all even numbers from 1 to N.
n=int(input("enter a number:"))
sum=0
for i in range(2,n+1,2):
sum=sum+i
print(sum)

enter a number:10
30

Q4: Reverse of a number Write a program to generate the reverse of a given number N. Print the
corresponding reverse number. Note : If a number has trailing zeros, then its reverse will not include them.
For e.g., reverse of 10400 will be 401 instead of 00401.
n=int(input("enter a number:"))
rev=0
while(n>0):
last=n%10
rev=last+rev*10
n=n//10
print(rev)

enter a number:1000
1

Q5: Nth Fibonacci Number Provided 'n' you have to find out the n-th Fibonacci Number. Example: Input: 6
Output: 8
n=int(input("enter a number:"))
a=0
b=1
c=0
for i in range(2,n+1):
c=a+b
a=b
b=c
print("required term is",c)

enter a number:7
required term is 13

Q6: Fibonacci Member Given a number N, figure out if it is a member of fibonacci series or not. Return true
if the number is member of fibonacci series else false.
N = int(input("Enter the number you want to check: "))
f3 = 0
f1 = 1
f2 = 1
if (N == 0 or N == 1):
print("Given number is fibonacci number")

else:
while f3 < N:
f3 = f1 + f2
f2 = f1
f1 = f3
if f3 == N:
print("Given number is fibonacci number")
else:
print("No it’s not a fibonacci number")

Enter the number you want to check: 21


Given number is fibonacci number

Q7: Write a Python function to find the maximum of three numbers.


n1=int(input("enter the first number : "))
n2=int(input("enter the second number : "))
n3=int(input("enter the third number : "))
if(n1>n2 and n1>n3):
print(n1," is the largest number")
elif(n2>n1 and n2>n3):
print(n2," is the largest number")
else:
print(n3," is the largest number")

enter the first number : 56


enter the second number : 789
enter the third number : 23
789 is the largest number

Q8: Write a Program to calculate simple interest using function interest() that received principal amount,
time and rate and returns calculated simple interest.
def simple_interest(p,t,r):
si=(p*r*t)/100
print(si)
simple_interest(1000,2,5)

100.0

Q9: WAP to accept three integers and print the largest of the three
n1=int(input("enter the first number : "))
n2=int(input("enter the second number : "))
n3=int(input("enter the third number : "))
if(n1>n2 and n1>n3):
print(n1," is the largest number")
elif(n2>n1 and n2>n3):
print(n2," is the largest number")
else:
print(n3," is the largest number")

enter the first number : 12


enter the second number : 45
enter the third number : 32
45 is the largest number

Q10: WAP that inputs three numbers and print sum of non-duplicate numbers. Duplicate numbers are
ignored
def nonduplicate_sum(a, b, c):
if a != b and b != c and a != c:
return a + b + c

elif a == b == c:
return 0

elif a == b:
return c

elif b == c:
return a

elif a == c:
return b
nonduplicate_sum(1,4,4)

1
Q11: WAP to display a menu for calculating area of circle or perimeter of circle
radius=float(input("enter the radius of the circle:"))
print("1.Calculate area")
print("2.Calculate perimeter")
choice=int(input("enter choice:"))
if(choice==1):
area=3.14*radius*radius
print("Area of the circle is ",area)
elif(choice==2):
perimeter=2*3.14*radius
print("Perimeter of the circle is ",perimeter)
else:
print("invalid choice")

enter the radius of the circle:4


1.Calculate area
2.Calculate perimeter
enter choice:1
Area of the circle is 50.24

Q12: WAP that reads three numbers and prints them in ascending order
a = float(input("Enter a: "))
b = float(input("Enter b: "))
c = float(input("Enter c: "))
if a < b:
if b < c:
print (a, "<", b, "<", c)
else:
if a < c:
print (a, "<", c, "<", b)
else:
print (c, "<", a, "<", b)
else:
if c < b:
print (c, "<", b, "<", a)
else:
if c < a:
print (b, "<", c, "<", a)
else:
print (b, "<", a, "<", c)

Enter a: 12
Enter b: 9
Enter c: 34
9.0 < 12.0 < 34.0

Q13: WAP that prints sum of natural numbers between two numbers taken as input
n1 = int(input("Enter Lower limit:")) n2 =
int(input("Enter Upper limit:")) sum = 0 for i in
range(n1, n2+1):
sum = sum+i
print(f"Sum of natural numbers between {n1} and {n2} is: {sum}") Enter Lower limit:3
Enter Upper limit:6
Sum of natural numbers between 3 and 6 is: 18 Q14: WAP to

calculate factorial of a number


def fact(N):
f = 1 for i in range(1,N+1):
f = f*i return f

n = int(input("Enter number to calculate factorial: ")) print(f"Factorial of {n} is: {fact(n)}")

Enter number to calculate factorial: 5


Factorial of 5 is: 120

Q15: WAP to input a number and test if it is a prime number or not


n = int(input("Enter number for prime check: "))
r = n//2 flag = 0 for i in
range(2,r):
if (n%i == 0):
flag = 1
break else: i
+= 1
if (flag==1):
print(f"{n} is not a prime number.") else:
print(f"{n} is a prime number.")

Enter number for prime check: 37 37 is a


prime number.

Q16: WAP to take String line as input and display following stats: Number of uppercase letters, Number of
lowercase letters, Number of alphabets and Number of Digits
str1 = input("Enter string: ") upper = 0
lower = 0 alphabets = 0 digits = 0
l = len(str1) for i in range (l): if
(str1[i].isupper()):
upper += 1 elif
(str1[i].islower()):
lower += 1 elif (str1[i].isdigit()):
digits += 1 alphabets = lower +
upper
print(f"Number of Uppercase characters: {upper}") print(f"Number of Lowercase
characters: {lower}")
print(f"Number of characters which are alphabets: {alphabets}")

print(f"Number of characters which are digits: {digits}") print(f"\nLength of string: {l}")

Enter string: This is Earth-616


Number of Uppercase characters: 2
Number of Lowercase characters: 9
Number of characters which are alphabets: 11
Number of characters which are digits: 3

Length of string: 17

Q17: WAP to reads a line and a substring and display number of occurences of the given substring in the
line
str = input("Enter string: ")
substr = input("Enter substring for check: ")
print(f"Number of times '{substr}' occurs in string: {str.count(substr)}")
Enter string: hi hi hi hi hello hello
Enter substring for check: hi
Number of times 'hi' occurs in string: 4

Q18: WAP that takes a string with multiple words and then capitalize the first letter of each word and forms
a new string out of it
str = input("Enter string:") print(f"String in Titlecase is:
{str.title()}")

Enter string:hello world


String in Titlecase is: Hello World

Practice Problems

Program to implement shorthand if statement


a = 45 b = 123 print("A") if a>b else
print("B")

Program to show type of variable after multiple assignments


a = 45 a = 'Hi'
print(type(a))

<class 'str'>

Program to implement if keyword


x = 'akd' if x=='akd': print("Hello")
print("from the other side.....")

Hello from the other side.....

Program to show a simple if program


if 3: print("This will print.")

This will print.

Program to show that ids of 2 variables having same value are equal
a=10 b=10 print(f"Id of a:
{id(a)}") print(f"Id of b: {id(b)}")
if (id(a)==id(b)): print("Both variables point to the same value.")

Id of a: 133569002291728
Id of b: 133569002291728
Both variables point to the same value. Program to compare

2 numbers
a = int(input("Enter first number: ")) b =
int(input("Enter second number: ")) if a>b:
print(f"{a} is greater than {b}.") elif a<b:
print(f"{b} is greater than {a}.") else:
print(f"Both {a} and {b} are equal.")

Enter first number: 4


Enter second number: 7 7 is
greater than 4.

Program to implement add function for 3 variables having default values


def add(a=1,b=2,c=5):
return a+b+c
print(f"Sum when 4 and 5 are passed is: {add(4,5)}")

Sum when 4 and 5 are passed is: 14 Program to print

prime numbers from 1 to 20


pnum = [] for num in
range(1,21):
if num > 1: for i in range
(2,num): if (num%i == 0) :
break else: if (num not in pnum):
pnum.append(num) print(f"Prime numbers from 1 to 20 are:
{pnum}")

Prime numbers from 1 to 20 are: [3, 5, 7, 9, 11, 13, 15, 17, 19] Program to implement string

functions
a = 'Hello world' b =
'Hiii' print(f"Original
strings: {a} and {b}\n")
print(f"Strings in uppercase: '{a.upper()}' and '{b.upper()}'") print(f"Strings in lowercase:
'{a.lower()}' and '{b.lower()}'") print(f"Strings in titlecase: '{a.title()}' and '{b.title()}'")
print(f"Strings after replacing 'H' with '?': '{a.replace('H','?')}' and '{b.replace('H','?')}'")

Original strings: Hello world and Hiii


Strings in uppercase: 'HELLO WORLD' and 'HIII'
Strings in lowercase: 'hello world' and 'hiii'
Strings in titlecase: 'Hello World' and 'Hiii'
Strings after replacing 'H' with '?': '?ello world' and '?iii' Program to implement

Combination in python
def fact(N):
f = 1 for i in range(1,N+1):
f = f*i return f
n = int(input("Enter value for n: ")) r =
int(input("Enter value for r: ")) x = n-r
cnr = (fact(n)/(fact(r)*fact(x))) print(f"Combination: {cnr}")

Enter value for n: 6


Enter value for r: 5
Combination: 6.0

Program to implement factorial using recursion


def fact(N): if (N==1) or (N==0): return 1 else:
return N*fact(N-1) n = int(input("Enter number: "))
print(f"Factorial of {n} is: {fact(n)}")

Enter number: 6
Factorial of 6 is: 720

Program to implement Permutation in python


def fact(N):
f = 1 for i in range(1,N+1):
f = f*i return f
n = int(input("Enter value for n: ")) r =
int(input("Enter value for r: ")) x = n-r
pnr = (fact(n)/fact(x)) print(f"Permutation: {pnr}")

Enter value for n: 6


Enter value for r: 2 Permutation: 30.0
Day 3 :- Assignment On Lists, Tuples

q1: WAP to find minimum element from a list of elements along with its index in the list
l = [24, 23, 10, 2, 1, 56, 14, 34] min_ele = l[0]
index = 0
print(f"List is: {l}")
for i in range(len(l)):
if(min_ele > l[i]): min_ele =
l[i] index = i
print(f"Minimum element of list is: {min_ele} and its index is: {index}")

List is: [24, 23, 10, 2, 1, 56, 14, 34]


Minimum element of list is: 1 and its index is: 4

q2: WAP to calculate mean of a given list of numbers


l = [1, 2, 4, 6, 8, 4, 6, 9, 12, 42, 23] sum = 0 mn = 0 n
= len(l) for i in l: sum += i mn = sum/n print(f"Mean
of all elements is: {mn}")

Mean of all elements is: 10.636363636363637

q3: WAP to search for an element in a given list of numbers


l = [1, 2, 4, 6, 8, 4, 6, 9, 12, 42, 23] elem =
int(input("Enter element:")) flag = 0 for i in l: if
(i==elem): flag = 1 break if (flag==1):
print(f"{elem} is present in list.") else: print(f"{elem}
is not present in list.")

Enter element:2
2 is present in list.

q4: WAP to count frequency of given elements in a list of numbers


l = [1, 2, 4, 6, 8, 4, 6, 9, 12, 42, 23, 2, 4, 6, 4] c = 0
n = int(input("Enter element:")) for i in l:
if (i==n): c += 1
print(f"Frequency of {n} is: {c}")

Enter element:4
Frequency of 4 is: 4

q5: WAP to find frequencies of all elements of a list. Also print the list of unique elements in the list and
duplicate elements in the list
l = [] #list taken from user l1 = [] #list of unique
elements lb = [] #list of duplicate elements ln = [] n =
int(input("Enter length of list: ")) for i in range(n):
a = int(input("Enter element: "))
l.append(a)
l.sort() for i in range (n):
count = 0
for j in l: if
j==l[i]: count +=
1
if(l[i] not in ln): print("Frequency of", l[i], "is ", count)
ln.append(l[i]) else:
continue
if(count==1):
l1.append(l[i])
if(count>1):
lb.append(l[i]) i =
i+count

l2 = [] #list of distinct duplicate elements


[l2.append(x) for x in lb if x not in l2]

print(f"\nList of unique elements is: {l1}") print(f"List of distinct duplicate


elements is: {l2}")

Enter length of list: 4


Enter element: 1
Enter element: 2
Enter element: 1
Enter element: 3
Frequency of 1 is 2
Frequency of 2 is 1
Frequency of 3 is 1

List of unique elements is: [2, 3]


List of distinct duplicate elements is: [1]

q6: WAP to calculate and display sum of all odd numbers in the list
l = [1, 3, 4, 5, 2, 7, 3, 5, 9, 12, 54, 3] sum = 0
for i in l: if
(i%2!=0): sum
+= i
print(f"Sum of all odd elements in list is: {sum}") Sum of all odd
elements in list is: 36

q7: WAP to find second largest number of a tuples of numbers


tup1 = (12, 23, 45, 68, 234, 65) list2 = []
for i in tup1: if (i not in
list2): list2.append(i)
list2.sort() print(f"Second largest number is: {list2[-2]}")
Second largest number is: 68

q8: Given a list in Python and provided the positions of the elements, write a program to swap the two
elements in the list.
def swap(list, a,b): temp =
list[b] list[b] = list[a] list[a] =
temp

l1 = [12, 23, 45, 69, 234, 65] print(f"List is: {l1}")


el1 = int(input("Enter position of first element:")) el2 = int(input("Enter
position of second element:")) swap(l1,el1,el2) print(f"List after
swapping is: {l1}")

List is: [12, 23, 45, 69, 234, 65]


Enter position of first element:1
Enter position of second element:5
List after swapping is: [12, 65, 45, 69, 234, 23]

q9: WAP to reverse All Strings in String List


sl = ['Halo', 'God of War', 'The Last of Us', 'Horizon:Zero Dawn'] rev = [] for y in sl: r =
y[::-1] rev.append(r)
print(f"List after reversing all strings: {rev}")

List after reversing all strings: ['olaH', 'raW fo doG', 'sU fo tsaL ehT', 'nwaD oreZ:noziroH']

q10: WAP to find the maximum frequency element in the tuple


tup1 = (12, 23, 45, 68, 234, 65, 12, 23, 45, 23) ele = 0 max_i = 0
for i in tup1: count =
0 for j in tup1: if
(i==j): count += 1
if(count > max_i):
ele = i max_i =
count else:
continue

print(f"Tuple is: {tup1}") print(f"Maximum frequency element in tuple is:


{ele} ")

Tuple is: (12, 23, 45, 68, 234, 65, 12, 23, 45, 23) Maximum frequency element
in tuple is: 23

q11. WAP to illustrate Stack Operations using List


stack = []
l = int(input("Enter length of stack: "))
for i in range(l): a = input("Enter
element: ") stack.insert(0, a)

print(f"Stack is: {stack}")


while (len(stack)!=0):
print("\n\tMENU\n1.Add element to stack.\n2.Pop element from stack.\n3.Exit.\n") ch = int(input("Enter your
choice: ")) if ch==1:
ele = input("Enter element to push:") stack.insert(0, ele)
print(f"Stack after pushing element is: {stack}")
elif ch==2:
stack.pop(0)
print("Element from stack has been popped.") print(f"Stack after popping
element is: {stack}")
elif ch==3: print("Menu has been exited.\n")
break else: print("Please enter a valid choice.")

Enter length of stack: 4


Enter element: 1
Enter element: 2
Enter element: 3
Enter element: 4
Stack is: ['4', '3', '2', '1']

MENU
1.Add element to stack.
2.Pop element from stack.
3.Exit.

Enter your choice: 1


Enter element to push:5
Stack after pushing element is: ['5', '4', '3', '2', '1']

MENU
1.Add element to stack.
2.Pop element from stack.
3.Exit.

Enter your choice: 2


Element from stack has been popped.
Stack after popping element is: ['4', '3', '2', '1']

MENU
1.Add element to stack.
2.Pop element from stack.
3.Exit.

Enter your choice: 4


Please enter a valid choice.

MENU
1.Add element to stack.
2.Pop element from stack.
3.Exit.

Enter your choice: 3 Menu has


been exited.

q12. WAP to illustrate Queue Operations using List


queue = []
l = int(input("Enter length of queue: "))
for i in range(l): a = input("Enter
element: ") queue.insert(0, a)

print(f"Queue is: {queue}")


while (len(queue)!=0):
print("\n\tMENU\n1.Add element to queue.\n2.Pop element from queue.\n3.Exit.\n") ch = int(input("Enter
your choice: ")) if ch==1: ele = input("Enter element to push:") queue.insert(0, ele)
print(f"Queue after pushing element is: {queue}")
elif ch==2:
queue.pop()
print("Element from queue has been popped.") print(f"Queue after popping
element is: {queue}")
elif ch==3:
print("Menu has been exited.\n") break

else: print("Please enter a valid choice.")

Enter length of queue: 4


Enter element: 1
Enter element: 2
Enter element: 3
Enter element: 4
Queue is: ['4', '3', '2', '1']

MENU
1.Add element to queue.
2.Pop element from queue.
3.Exit.

Enter your choice: 1


Enter element to push:5
Queue after pushing element is: ['5', '4', '3', '2', '1']

MENU
1.Add element to queue.
2.Pop element from queue.
3.Exit.

Enter your choice: 2


Element from queue has been popped.
Queue after popping element is: ['5', '4', '3', '2']

MENU
1.Add element to queue.
2.Pop element from queue.
3.Exit.

Enter your choice: 3 Menu has


been exited.

q13. WAP that scans an email address and forms a tuple of user name and domain
email = [] lud =
[] def scan(x):
a = x.split("@") lud.append((a[0],a[1]))

n = int(input("Enter the number of emails to store in list: ")) for i in range(n): e =


input("Enter email in the form 'username@domain': ") email.append(e)
for y in email:
scan(y) print(f"\nList of usernames and domains stored in tuples:\n{lud}")

Enter the number of emails to store in list: 4


Enter email in the form 'username@domain': [email protected]
Enter email in the form 'username@domain': [email protected]
Enter email in the form 'username@domain': [email protected] Enter email in the form 'username@domain':
[email protected]

List of usernames and domains stored in tuples:


[('bhaskar', 'yahoo.in'), ('khushi', 'gmail.com'), ('vedansh', 'bvp.edu.in'),
('yashika', 'gmail.com')]

q14. WAP that accepts different number of arguments and return sum of only the positive values passed to
it.
num = [] sum =
0
n = int(input("Enter number of elements in list:"))
for i in range(n): e = int(input("Enter
element: ")) num.append(e) if (e > 0):
sum += e
print(f"Sum of all positive elements is: {sum}")

Enter number of elements in list:6


Enter element: 2
Enter element: -5
Enter element: 5
Enter element: 3
Enter element: -8
Enter element: -4
Sum of all positive elements is: 10 q15. WAP to swap

two values using tuple assignment


l = [12, 23, 12, 54, 67 ,23 ,89, 45] print(f"List is: {l}")

i1 = int(input("Enter index of first element to swap: ")) i2 = int(input("Enter


index of second element to swap: "))
def swap(a,ia,ib):
print(f"Values before swapping: {a} ") (a[ia],a[ib]) =
(a[ib],a[ia]) print(f"Values after swapping: {a} ")
swap(l,i1,i2)

List is: [12, 23, 12, 54, 67, 23, 89, 45]
Enter index of first element to swap: 2
Enter index of second element to swap: 3
Values before swapping: [12, 23, 12, 54, 67, 23, 89, 45]
Values after swapping: [12, 23, 54, 12, 67, 23, 89, 45] PRACTICE
PROBLEMS

Program to print simple list.


l1 = [1,2,3,4,5,6] print(f"List is: {l1}")

List is: [1, 2, 3, 4, 5, 6]


Program to create a list with different data types stored in it and print them along with their data types.
l = []
l.append(43)
l.append('Hello')
l.append(6.9)
l.append('a') for i in l: print(f"{i}
\t{type(i)}")

43 <class 'int'>
Hello <class 'str'> 6.9 <class
'float'> a <class 'str'>

Progarm to input list and calculate sum of the elements.


l = [] sum =
0
ll = int(input("Enter length of list: "))
for i in range(ll):
inp = int(input("Enter element: "))
l.append(inp) sum +=
inp
print(f"List is: {l}\nSum is: {sum}")

Enter length of list: 4


Enter element: 1
Enter element: 2
Enter element: 3
Enter element: 4
List is: [1, 2, 3, 4] Sum is: 10

Program to implement list within a list.


l = [[1,2,3],[4,5,6],[7,8,9]] row = 0 for
i in l:
print(f"Row {row} is :") for j in
i: print(j,end = " ") print("\n")

Row 0 is :
123

Row 0 is :
456

Row 0 is :
789
Program to find smallest and largest element in list.
l = []
ll = int(input("Enter length of list: "))
for i in range(ll):
inp = int(input("Enter element: "))
l.append(inp) for i in
range(ll):
for j in range(0,ll-i-1):
if (l[j]>l[j+1]): temp =
l[j] l[j] = l[j+1] l[j+1]
= temp

print(f"\nSorted list is: {l}\nSmallest element is: {l[0]}\nLargest element is: {l[ll1]}")

Enter length of list: 5


Enter element: 2
Enter element: 7
Enter element: 4
Enter element: 9
Enter element: 6

Sorted list is: [2, 4, 6, 7, 9]


Smallest element is: 2 Largest element
is: 9

Program to multiply all elements in list.


l = [] prod = 1
ll = int(input("Enter length of string: "))
for i in range(ll):
a = int(input("Enter element: "))
l.append(a) prod *= a
print(f"Product of all elements: {prod}")

Enter length of string: 4


Enter element: 2
Enter element: 3
Enter element: 4
Enter element: 5
Product of all elements: 120

Program to take input from user to create list that should have data type string, float, int, boolean.
l = []
ll = int(input("Enter length of list: "))
for i in range(ll):
inp = input("Enter element: ")
l.append(inp) print(f"List
is: {l}")

Enter length of list: 5


Enter element: shweta
Enter element: 12
Enter element: 7.9
Enter element: 600
Enter element: vedansh
List is: ['shweta', '12', '7.9', '600', 'vedansh'] Program to implement insert

method in list.
list1 = [2, 4, 6, 8, 10] n = int(input("Enter
element: "))
index = int(input("Enter index at which to insert: ")) list1.insert(index, n)
print(f"List after insertion is: {list1}")

Enter element: 5
Enter index at which to insert: 2 List after insertion is: [2, 4,
5, 6, 8, 10] Program to delete from list using remove
method.
list1 = ['A', 'B', 'C', 'D', 'E'] list1.remove('A') print(f"List
after removal is: {list1}") List after removal is: ['B', 'C', 'D',
'E'] Program to delete from list using pop method.
list1 = ['A', 'B', 'C', 'D', 'E'] list1.pop() print(f"List after
removal is: {list1}") List after removal is: ['A', 'B', 'C', 'D']
Program to delete from list using del method.
list1 = ['A', 'B', 'C', 'D', 'E'] del list1[2] print(f"List after
removal is: {list1}") List after removal is: ['A', 'B', 'D', 'E']
Program to find min and max elements in 2D list.
l = [ [1,2,3] , [4, 5, 4], [19, 16, 12] ]
min = l[0][0] max =
0 for i in l: for j in
i: if(min > j):
min = j if(max <
j): max = j
print(f"Minimum value element in 2D list is: {min}") print(f"Maximum value element in
2D list is: {max}")

Minimum value element in 2D list is: 1


Maximum value element in 2D list is: 19

Program to create list of strings and count number of palindromes.


l = [] count = 0
n = int(input("Enter length of list: "))
for x in range(n): a = input("Enter
string: ")
l.append(a) for y in l: flag = 0 i = 0 j = len(y)-1 while(i<=j):
if (y[i]!=y[j]): flag = 1 i += 1 j -= 1 if (flag == 0):
count += 1 print(f"\nNumber of palindromes in list: {count}")

Enter length of list: 5


Enter string: 12321
Enter string: Athena
Enter string: Zeus
Enter string: Poseidon
Enter string: racecar
Number of palindromes in list: 2

Program to copy one list to another using assignment.


list1 = ['Hello', 'Yes', 'Bruh','Jack Hoff', 'Shweta'] list2 = list1 print(list1)
print(list2)

['Hello', 'Yes', 'Bruh', 'Jack Hoff', 'Shweta']


['Hello', 'Yes', 'Bruh', 'Jack Hoff', 'Shweta'] Program to copy one list to

another using copy method.


list1 = ['Hello', 'Yes', 'Bruh', 'Jack Hoff', 'Shweta'] list2 = list1.copy()
print(list1) print(list2)

['Hello', 'Yes', 'Bruh', 'Jack Hoff', 'Shweta']


['Hello', 'Yes', 'Bruh', 'Jack Hoff', 'Shweta']

Program to copy list using 2 methods and showing the differences


list1 = ['Hello', 'Yes', 'Bruh', 'Jack Hoff', 'Shweta'] list2 = list1.copy() list3
= list1 list1[0] = 'Bonjour'
print(f"List 2 after modifying list 1: {list2}") print(f"List 3 after modifying list 1:
{list3}") List 2 after modifying list 1: ['Hello', 'Yes', 'Bruh', 'Jack Hoff', 'Shweta']
List 3 after modifying list 1: ['Bonjour', 'Yes', 'Bruh', 'Jack Hoff', 'Shweta']
Program for list comprehension.
b = ['muda' for i in range(6)] print(b)

['muda', 'muda', 'muda', 'muda', 'muda', 'muda'] Program to create new

list using comprehension.


list1 = ['Hello', 'Yes', 'Bruh', 'Jack Hoff', 'Shweta'] list2 = ['Bruh' for i in
list1] print(f"New list is: {list2}")

New list is: ['Bruh', 'Bruh', 'Bruh', 'Bruh', 'Bruh']

Program to find kth and kth largest element in the list.


list1 = [12, 23, 45, 69, 234, 65] list2 = []
list1.sort()
for i in list1:
if (i not in list2): list2.append(i)

n = int(input("Enter k-th number to check for kth smallest and largest number: ")) sno = list2[n-1] lno = list2[-
n]
print(f"{n}-th smallest number is: {sno}") print(f"{n}-th largest number is:
{lno}")

Enter k-th number to check for kth smallest and largest number: 2
2-th smallest number is: 23 2-th largest
number is: 69

Program to join 2 lists to create new list using append method.


greekgods = ['Kronos', 'Zeus', 'Poseidon', 'Hades', 'Athena', 'Ares'] norsegods = ['Odin',
'Thor', 'Loki', 'Freya', 'Sif'] gods = [] for x in norsegods: gods.append(x) for x in greekgods:
gods.append(x)
print(f"Gods list: {gods}")
Gods list: ['Odin', 'Thor', 'Loki', 'Freya', 'Sif', 'Kronos', 'Zeus', 'Poseidon', 'Hades', 'Athena', 'Ares']

Program to create tuple and print it.


t = ('bvcoe', 123, 'Bro', 34.69) print(f"Tuple is: {t}")
print(t[-4:-1])

Tuple is: ('bvcoe', 123, 'Bro', 34.69)


('bvcoe', 123, 'Bro')
Program to show that tuple is immutable.
t = ('bvp', 13, 'HI', 42.690) print("Tuple
is immutable.") t[2] = 24 print(t)

Tuple is immutable.

---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-48-c160cf47212b> in <cell line: 3>()
1 t = ('bvp', 13, 'HI', 42.690)
2 print("Tuple is immutable.")
----> 3 t[2] = 24
4 print(t)

TypeError: 'tuple' object does not support item assignment Program to print type of

tuple elements.
t = ('bvp', 13, 'HI', 42.690) for i in t: print(f"Element {i} is
of type {type(i)}")

Element bvp is of type <class 'str'>


Element 13 is of type <class 'int'>
Element HI is of type <class 'str'>
Element 42.69 is of type <class 'float'>

Program to make a list in tuple and verify it is still mutable.


t = ('bvp',[1,2,3], 13, 'HI', 42.690) t[1][2] = 71 print(f"Tuple after
changing value in list: {t} ")

Tuple after changing value in list: ('bvp', [1, 2, 71], 13, 'HI', 42.69) Program to count number of unique

elements in tuple.
t = (1, 3, 4, 5, 2, 7, 3, 5, 9, 12, 54, 3) l = []
print(f"Tuple is: {t}")
print(f"Number of elements in tuple is: {len(t)}") for i in t: if i not
in l:
l.append(i) print(f"Number of unique elements in tuple is: {len(l)}")

Tuple is: (1, 3, 4, 5, 2, 7, 3, 5, 9, 12, 54, 3)


Number of elements in tuple is: 12
Number of unique elements in tuple is: 9

Program to count and print duplicate elements in tuple.


t = (1, 3, 4, 5, 2, 7, 3, 5, 9, 12, 54, 3) l = [] de = []
print(f"Tuple is: {t}")
print(f"Number of elements in tuple is: {len(t)}") for i in t: if i not
in l:

l.append(i) else:
de.append(i)

print(f"\nNumber of duplicate elements in tuple is: {len(de)}") print(f"Duplicate elements are:


{de}")

Tuple is: (1, 3, 4, 5, 2, 7, 3, 5, 9, 12, 54, 3)


Number of elements in tuple is: 12

Number of duplicate elements in tuple is: 3 Duplicate elements are: [3,


5, 3]

Program to count number of palindromes in tuple.


t = ('racecar', 'kanak', '12321', '521', 'Kratos') count = 0 palindromes
= [] for y in t: flag = 0 i = 0 j = len(str(y))-1 while(i<=j): if
(y[i]!=y[j]):
flag = 1 i += 1 j -= 1 if
(flag == 0): count += 1
palindromes.append(y)
print(f"Number of palindromes in tuple: {count}") print(f"Palindromes are: {palindromes}")

Number of palindromes in tuple: 3


Palindromes are: ['racecar', 'kanak', '12321'] Program to concatenate 2

tuples.
t1 = ('racecar', 'kanak', '12321', '521', 'Kratos') t2 = ('bvp', 13, 'HI',
42.690) t3 = t1 + t2 print(f"Concatenation of tuples is: {t3}")

Concatenation of tuples is: ('racecar', 'kanak', '12321', '521', 'Kratos', 'bvp', 13, 'HI', 42.69)

Program to delete tuple using del keyword.


t = ('bvp', 13, 'HI', 42.690) del t
print("Tuple has been deleted.")

Tuple has been deleted.

Program to convert tuple into list and modify it.


t = ('bvp', 13, 'HI', 42.690) l = list(t)
l.append(21) print(f"Tuple after converting into list and modifying: {l}")

Tuple after converting into list and modifying: ['bvp', 13, 'HI', 42.69, 21]
Day 4 :- Assignment On Sets, Dictionaries

q1: Write a Python program to return a new set with unique items from both sets by removing duplicates.
s1 = {23, 43, 1, 2, 4, 5, 8, 4, 2} s2 = {23, 3, 4,
56, 7, 8, 4} s = set()

s.update(s1)
s.update(s2) print(f"New set created using 2 sets is: {s}")
New set created using 2 sets is: {1, 2, 3, 4, 5, 7, 8, 43, 23, 56}

q2: Given two Python sets, write a Python program to update the first set with items that exist only in the
first set and not in the second set.
s1 = {23, 43, 1, 2, 4, 5, 8, 4, 2} s2 = {23, 3, 4, 56,
7, 8, 4}

s1 = (s1 - s2) print(f"Set having only elements in s1 and not in s2: {s1}") Set
having only elements in s1 and not in s2: {1, 2, 43, 5}

q3: WAP to Check if two sets have any elements in common. If yes, display the common elements
s1 = {23, 43, 1, 2, 4, 5, 8} s2 = {23,
3, 4, 56, 7, 8, 4} s = {} for i in s1: if
i in s2: print(i, end=" ")

4 23 8

q4: WAP to Update set1 by adding items from set2, except common items
s1 = {23, 43, 1, 2, 4, 5, 8, 4, 2} s2 = {23, 3, 4, 56,
7, 8, 4}

s1 = (s1.union(s2)) - (s1.intersection(s2)) print(f"Set 1 updated except for


common items: {s1}")

Set 1 updated except for common items: {1, 2, 3, 5, 7, 43, 56} q5: WAP to get the maximum and minimum
element in a set in Python, using the built-in functions of Python
s1 = {23, 43, 1, 2, 4, 5, 8, 4, 2} maxe =
max(s1) mine = min(s1)

print(f"Maximum element in the set is: {maxe}") print(f"Minimum element in the


set is: {mine}")

Maximum element in the set is: 43


Minimum element in the set is: 1
q6: Below are the two lists. Write a Python program to convert them into a dictionary in a way that
item from list1 is the key and item from list2 is the value keys = ['Ten', 'Twenty', 'Thirty'] values = [10,
20, 30]
keys = ['Ten', 'Twenty', 'Thirty'] values = [10,
20, 30] dict = {} for i in range(len(keys)):
dict[keys[i]] = values[i]
print(f"Dictionary is: {dict}")

Dictionary is: {'Ten': 10, 'Twenty': 20, 'Thirty': 30}

q7: Write a Python program to create a dictionary from a string. Note: Track the count of the letters from the
string.
Sample string : 'AIMLTraining'
Expected output: {'A': 1, 'I': 1, 'M': 1, 'L': 1, 'T': 1, 'r': 1, 'a': 1, 'i': 2, 'n': 2, 'g': 1}
s = "AIMLTraining"
dict = {} for i in s: if i
in dict: dict[i] += 1
else:
dict[i] = 1 print(dict)

{'A': 1, 'I': 1, 'M': 1, 'L': 1, 'T': 1, 'r': 1, 'a': 1, 'i': 2, 'n': 2, 'g': 1}

q8: Write a Python program to get the top three items in a shop. Sample data: {'item1': 45.50, 'item2':35,
'item3': 41.30, 'item4':55, 'item5': 24} Expected Output:
item4 55 item1
45.5 item3
41.3
dict1 = {'item1': 45.50, 'item2':35, 'item3': 41.30, 'item4':55, 'item5': 24} dict2 = sorted(dict1.items(),
key=lambda x:x[1])

print("Top three items are: ") for i in range(3):


print(f"{dict2[-(i+1)]}")

Top three items are:


('item4', 55)
('item1', 45.5)
('item3', 41.3)

q9:Write a Python program to filter a dictionary based on values.


Original Dictionary:
{'Cierra Vega': 175, 'Alden Cantrell': 180, 'Kierra Gentry': 165, 'Pierre Cox': 190}
Marks greater than 170:
{'Cierra Vega': 175, 'Alden Cantrell': 180, 'Pierre Cox': 190}
dict1 = {'Cierra Vega': 175, 'Alden Cantrell': 180, 'Kierra Gentry': 165, 'Pierre Cox': 190}
dict2 = sorted(dict1.items(), key = lambda x:x[1])

n = int(input("Enter the least value to search: ")) print(f"\nValues


which are greater than {n} are: ") for i in dict2: if(i[1]>=n):
print(i)

Enter the least value to search: 170 Values which are


greater than 170 are:
('Cierra Vega', 175)
('Alden Cantrell', 180)
('Pierre Cox', 190)

PRACTICE PROBLEMS

Program to create set and print it.


s = {2, 1, 4, 12, 4, 5, 4, 'Hello'} print(f"Set is: {s}")
Set is: {1, 2, 4, 5, 'Hello', 12}

Program to add elements to set and print it.


s = {2, 1, 4, 12, 4, 5, 'Hello'}
print(f"Original set: {s}") s.update({6,3,4})
print(f"Updated set is: {s}")

Original set: {1, 2, 4, 5, 'Hello', 12} Updated set is: {1, 2, 3, 4,


5, 6, 12, 'Hello'}

Program to create a simple dictionary.


d = {"A":1000, "B":1002, "C":1004} print(d)

{'A': 1000, 'B': 1002, 'C': 1004}

Program to change value of item in dictionary using its key.


d = {"A":1000, "B":1002, "C":1004} k =
input("Enter key: ") value = input("Enter
value: ")
if k in d: d[k] =
value
print(d)
Enter key:
B
Enter value: 109
{'A': 1000, 'B': '109', 'C': 1004}

Program to check if an item is in dictionary.


#
d = {"A":1000, "B":1002, "C":1004} item =
input("Enter element: ") flag = 0 key = 0 l =
len(d) for i in d: if (d[i] == item):
flag = 1 key = i
#break
if (flag == 1):
print(f"Yes, {item} is present in dictionary having key {key}.") else: print(f"{item} is
not present in dictionary.")

Enter element: 1000


1000 is not present in dictionary.

Program to remove items from dictionary using different methods.


d = {"A":1000, "B":1002, "C":1004, "D":1006, "E":1008, "F":1010}
print(f"Original dictionary is: {d}") del d["A"]
print(f"\nDictionary after using del method on 'A': {d}")
d.pop("B")
print(f"\nDictionary after using pop method on 'B': {d}") d.clear()
print(f"\nDictionary after using clear method: {d}")

Original dictionary is: {'A': 1000, 'B': 1002, 'C': 1004, 'D': 1006, 'E': 1008, 'F':
1010}

Dictionary after using del method on 'A': {'B': 1002, 'C': 1004, 'D': 1006, 'E': 1008,
'F': 1010}
Dictionary after using pop method on 'B': {'C': 1004, 'D': 1006, 'E': 1008, 'F': 1010}
Dictionary after using clear method: {}

Day 5 Assignment - NumPy

q1: Create a 1D array of numbers from 0 to 9


import numpy as np arr1 =
np.arange(0,10,1) print(f"Array:
{arr1}")

Array: [0 1 2 3 4 5 6 7 8 9]

q2: Create a 3×3 numpy array of all True’s


a = np.ones((3,3), dtype = bool) print(f"3D array of
Trues:\n{a}")

3D array of Trues:
[[ True True True]
[ True True True]
[ True True True]]

q3: Extract all odd numbers from an array of n numbers


import numpy as np
arr1 = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]) o = [] for i in arr1: if
(i%2!=0):
o.append(i) print(f"Odd numbers in array:
{o}")

Odd numbers in array: [1, 3, 5, 7, 9, 11]

q4: Replace all odd numbers in array with -1


import numpy as np
arr1 = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12])
for i in range (len(arr1)):
if (arr1[i]%2!=0): arr1[i] = -1 print(f"Array after replacing all odd
numbers: {arr1}")

Array after replacing all odd numbers: [-1 2 -1 4 -1 6 -1 8 -1 10 -1 12]

q5: Convert a 1D array to a 2D array with 2 rows


import numpy as np
arr1 = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]) arr2 = arr1.reshape(2,6)
print(f"Array after reshaping is:\n{arr2}")

Array after reshaping is:


[[ 1 2 3 4 5 6]
[ 7 8 9 10 11 12]]

q6: Get the common items between two arrays


import numpy as np
ar1 = np.array([1, 2, 3, 4, 5, 6, 7])
ar2 = np.array([2, 4, 6]) c = [] for
i in ar1: if i in ar2:
c.append(i) print(f"Common items are:
{c}") Common items are: [2, 4, 6]

q7: From array - a remove all items present in array - b


a = np.array([1, 2, 3, 4, 5, 6, 7]) b = np.array([2, 4, 6]) c = np.setdiff1d(a,b)
print(f"Array 'a' after removing elements from 'b': {c}")

Array 'a' after removing elements from 'b': [1 3 5 7]

Download the Iris Data CSV File from link -


https://drive.google.com/file/d/1KzIR0hK9F17jhzOT3KH2mmWrCoDWpDpQ/view?usp=sharing And
Write Program for following Questions:
from google.colab import drive
drive.mount('/content/drive') Mounted at
/content/drive

q8: Find the mean, median, standard deviation of iris's sepallength (1st column)
import pandas as pd
df = pd.read_csv('/content/drive/MyDrive/Summer Training/csv files/Iris.csv') sl = df["SepalLengthCm"]

print(f"Mean of Sepal length: {sl.mean()}") print(f"Median of Sepal length:


{sl.median()}") print(f"Standard deviation of Sepal length: {sl.std()}")

Mean of Sepal length: 5.843333333333334


Median of Sepal length: 5.8
Standard deviation of Sepal length: 0.828066127977863

q9: Create a normalized form of iris's sepallength whose values range exactly between 0 and 1 so that the
minimum has value 0 and maximum has value 1. Use following Normalization Formula -> x normalized
= (x – x minimum) / (x maximum – x minimum)
import pandas as pd
df = pd.read_csv('/content/drive/MyDrive/Summer Training/csv files/Iris.csv') sl =
df["SepalLengthCm"] min = sl.min() max = sl.max() nsl = [] for i in sl: nsl.append((i-min)/(max-
min))

df.assign(NormalizedSepalLength = nsl)
Id SepalLengthCm SepalWidthCm PetalLengthCm PetalWidthCm \
0 1 5.1 3.5 1.4 0.2
1 2 4.9 3.0 1.4 0.2
2 3 4.7 3.2 1.3 0.2
3 4 4.6 3.1 1.5 0.2
4 5 5.0 3.6 1.4 0.2 .. ... ... ... ... ...
145 146 6.7 3.0 5.2 2.3
146 147 6.3 2.5 5.0 1.9
147 148 6.5 3.0 5.2 2.0
148 149 6.2 3.4 5.4 2.3 149 150 5.9 3.0 5.1 1.8
Species NormalizedSepalLength
0 Iris-setosa 0.222222
1 Iris-setosa 0.166667
2 Iris-setosa 0.111111
3 Iris-setosa 0.083333
4 Iris-setosa 0.194444 .. ... ...
145 Iris-virginica 0.666667
146 Iris-virginica 0.555556
147 Iris-virginica 0.611111
148 Iris-virginica 0.527778
149 Iris-virginica 0.444444

[150 rows x 7 columns]

q10: Filter the rows of iris data that has petallength (3rd column) > 1.5 and sepallength (1st column) < 5.0
df = pd.read_csv('/content/drive/MyDrive/Summer Training/csv files/Iris.csv') fi =
df[(df["SepalLengthCm"]<5) & (df["PetalLengthCm"]>1.5)]
fi

Id SepalLengthCm SepalWidthCm PetalLengthCm PetalWidthCm \


11 12 4.8 3.4 1.6 0.2
24 25 4.8 3.4 1.9 0.2
29 30 4.7 3.2 1.6 0.2
30 31 4.8 3.1 1.6 0.2
57 58 4.9 2.4 3.3 1.0 106 107 4.9 2.5 4.5
1.7
Species
11 Iris-setosa
24 Iris-setosa
29 Iris-setosa
30 Iris-setosa
57 Iris-versicolor
106 Iris-virginica

q11: Bin the petal length (3rd) column of iris data to form a text array, such that if petal length is: Less
than 3 --> 'small'
3-5 --> 'medium'
->5 --> 'large'
import numpy as np import
pandas as pd df =
pd.read_csv('/content/drive/MyD
rive/Summer Training/csv
files/Iris.csv') pl =
df["PetalLengthCm"] l = [] for i
in pl: if (i<3):
l.append("small") elif (i>5):
l.append("large") else:
l.append("medium")
df.assign(PetalLength = l)

Id SepalLengthCm SepalWidthCm PetalLengthCm PetalWidthCm \


0 1 5.1 3.5 1.4 0.2
1 2 4.9 3.0 1.4 0.2
2 3 4.7 3.2 1.3 0.2
3 4 4.6 3.1 1.5 0.2
4 5 5.0 3.6 1.4 0.2 .. ... ... ... ... ...
145 146 6.7 3.0 5.2 2.3
146 147 6.3 2.5 5.0 1.9
147 148 6.5 3.0 5.2 2.0
148 149 6.2 3.4 5.4 2.3 149 150 5.9 3.0 5.1 1.8

Species PetalLength
0 Iris-setosa small
1 Iris-setosa small
2 Iris-setosa small
3 Iris-setosa small
4 Iris-setosa small .. ... ...
145 Iris-virginica large
146 Iris-virginica medium
147 Iris-virginica large
148 Iris-virginica large
149 Iris-virginica large

[150 rows x 7 columns]

q12: Sort the iris dataset based on sepallength column.


import numpy as np import
pandas as pd

df = pd.read_csv('/content/drive/MyDrive/Summer Training/csv files/Iris.csv') sl =


df.sort_values("SepalLengthCm")
sl

Id SepalLengthCm SepalWidthCm PetalLengthCm PetalWidthCm \


13 14 4.3 3.0 1.1 0.1
42 43 4.4 3.2 1.3 0.2
38 39 4.4 3.0 1.3 0.2
8 9 4.4 2.9 1.4 0.2
41 42 4.5 2.3 1.3 0.3
.. ... ... ... ... ...
122 123 7.7 2.8 6.7 2.0
118 119 7.7 2.6 6.9 2.3
117 118 7.7 3.8 6.7 2.2
135 136 7.7 3.0 6.1 2.3 131 132 7.9 3.8 6.4
2.0

Species
13 Iris-setosa
42 Iris-setosa
38 Iris-setosa
8 Iris-setosa
41 Iris-setosa .. ...
122 Iris-virginica
118 Iris-virginica
117 Iris-virginica
135 Iris-virginica
131 Iris-virginica

[150 rows x 6 columns]

q13: Find the most frequent value of petal length (3rd column) in iris dataset.
import numpy as np import
pandas as pd

df = pd.read_csv('/content/drive/MyDrive/Summer Training/csv files/Iris.csv') pl = df["PetalLengthCm"]

mfv = pl.mode() print(f"Most frequent value of petal length is:\n{mfv}")

Most frequent value of petal length is: 0 1.5


Name: PetalLengthCm, dtype: float64

NumPy Exercise for Students 1

Import NumPy package as np import


numpy as np

01. Create an empty array of 20 0's and replace the 4th object with the number 5 arr1 =
np.zeros(20, dtype = int) arr1[3] = 5 arr1 array([0, 0, 0, 5, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0])

02. Create an array of 20 1's and store it as a variable named array_master. Copy the
same array into another variable named array_copy array_master = np.ones(20, dtype = int)
array_copy = array_master.copy() array_copy array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1])
03.Create an array containing 30 1's and broadcast all the one's to the value 100
arr2 = np.ones(30, dtype = int) arr2[:] =
100 arr2

array([100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100,
100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100])

04. Create an array of integers starting from 21 until 31 and name it as array1

--- Create an array of integers starting from 11 until 21 and name it array2

--- Calculate the difference between array1 and array2


array1 = np.arange(21,32, 1) array2 =
np.arange(11, 22, 1)

diff = array1-array2 diff array([10, 10, 10, 10, 10, 10, 10, 10, 10, 10,
10])
05. Create an array of all even integers from 2 to 10 and name it a1

-- Create an array of all even integers from 22 to 30 and name it a2

a) Use the 2 arrays as rows and create a matrix [ Hint - Use stack function from numpy ]
a1 = np.arange(2,12, 2) a2 =
np.arange(22, 32, 2)

acr = np.stack((a1,a2)) acr

array([[ 2, 4, 6, 8, 10],
[22, 24, 26, 28, 30]])

b) Use the 2 arrays as columns and create a matrix [ Hint - Use column_stack function from numpy ]
acc = np.column_stack((a1, a2)) acc

array([[ 2, 22], [ 4,
24],
[ 6, 26],
[ 8, 28],
[10, 30]])

06. Create a 5x6 matrix with values ranging from 0 to 29 and retrieve the value intersecting at 2nd row and
3rd column
arr = np.arange(0,30,1).reshape(5, 6) print(arr[2][3])

15

07. Create an identity matrix of shape 10x10 and replace the 0's with the value 21
arri = np.eye(10, dtype=int) for i in
range(10): for j in range(10): if
(arri[i][j]==0): arri[i][j] = 21 arri
array([[ 1, 21, 21, 21, 21, 21, 21, 21, 21, 21], [21, 1, 21, 21, 21, 21,
21, 21, 21, 21],
[21, 21, 1, 21, 21, 21, 21, 21, 21, 21],
[21, 21, 21, 1, 21, 21, 21, 21, 21, 21],
[21, 21, 21, 21, 1, 21, 21, 21, 21, 21],
[21, 21, 21, 21, 21, 1, 21, 21, 21, 21],
[21, 21, 21, 21, 21, 21, 1, 21, 21, 21],
[21, 21, 21, 21, 21, 21, 21, 1, 21, 21],
[21, 21, 21, 21, 21, 21, 21, 21, 1, 21],
[21, 21, 21, 21, 21, 21, 21, 21, 21, 1]])

Use NumPy to generate a random set of 10 numbers between 0 and 1

08. Display a boolean array output where all values > 0.2 are True, rest are marked as False
ar = np.random.rand(10) bar =
[] for i in ar: if (i>0.2):
bar.append("True") else:
bar.append("False") print(bar)

['True', 'True', 'True', 'True', 'True', 'True', 'True', 'True', 'False', 'True']

09. Use NumPy to generate an array matrix of 5x2 random numbers sampled from a standard normal
distribution
m1 = np.random.randn(5, 2) m1
array([[-0.24505186, 0.07510073], [-0.94109451,
1.29147136],
[-0.03320188, 1.18647596],
[ 0.12500751, -0.93431175],
[-1.15417168, 0.5521074 ]])

10. Create an array of 30 linearly spaced points between 0 and 100:


arrr1 = np.linspace(0,101, 30) arrr1

array([ 0. , 3.48275862, 6.96551724, 10.44827586, 13.93103448,


17.4137931 , 20.89655172, 24.37931034, 27.86206897, 31.34482759,
34.82758621, 38.31034483, 41.79310345, 45.27586207, 48.75862069,
52.24137931, 55.72413793, 59.20689655, 62.68965517, 66.17241379,
69.65517241, 73.13793103, 76.62068966, 80.10344828,
83.5862069 , 87.06896552, 90.55172414, 94.03448276,
97.51724138, 101. ])

Numpy Indexing and Selections

11. Using the below given Matrix, generate the output for the below questions.
[ [1 2 4 67] [34 55 65 7] [45 66 44 3] [33 79 23 9] ]

a) Retrieve the last 2 rows and first 3 column values of the above matrix using index & selection technique
m1 = ([[1, 2, 4, 67], [34, 55,
65, 7],
[45, 66, 44, 3],
[33, 79, 23, 9]])
m1[2:][:3]

[[45, 66, 44, 3], [33, 79, 23, 9]]

b) Retrieve the value 55 from the above matrix using index & selection technique
m1[1][1]

55

c) Retrieve the values from the 3rd column in the above matrix
m1[:][2]

[45, 66, 44, 3]

d) Retrieve the values from the 4th row in the above matrix
m1[3][:]

[33, 79, 23, 9]

e) Retrieve values from the 2nd & 4th rows in the above matrix m1[1:4:2][:]
[[34, 55, 65, 7], [33, 79, 23, 9]]

Calculate the following values for the given matrix

a) Calculate sum of all the values in the matrix


s = np.sum(m1) s

537
b) Calculate standard deviation of all the values in the matrix
sd = np.std(m1) sd

26.466886740793676

c) Calculate the variance of all values in the matrix


vr = np.var(m1) vr

700.49609375

d) Calculate the mean of all values in the matrix


mn = np.mean(m1)
mn 33.5625

e) Retrieve the largest number from the matrix


xmax = np.max(m1) xmax

79

f) Retrieve the smallest number from the matrix


xmin = np.min(m1)
xmin 1

Practice Problems

Program to show that numpy array takes less space and less time to form than list array.
import numpy as np import sys
li_arr = [i for i in range(12)] np_arr =
np.arange(12)

print(np_arr.itemsize*np_arr.size)
print(sys.getsizeof(1)*len(li_arr))

96
336

Program to create numpy array from list.


l = [1, 2, 3, 4] b =
np.array(l) print(b)
print(type(b))

[1 2 3 4]
<class 'numpy.ndarray'>

Program to create numpy array from list with it repeating.


l = [1, 2, '3', 4, 6.15] b =
np.array(l*4) print(b)

['1' '2' '3' '4' '6.15' '1' '2' '3' '4' '6.15' '1' '2' '3' '4' '6.15' '1' '2' '3' '4' '6.15']

Program to create numpy array using zeros.


b = np.zeros(3, dtype = float) b
array([0., 0., 0.])
Program to create numpy array using ones.
b = np.ones(3, dtype = str) b array(['1', '1', '1'],
dtype='<U1') Program to create 2d numpy array
using ones.
b = np.ones((2,3), dtype = int) b

array([[1, 1, 1],
[1, 1, 1]])

Program to create numpy array using full.


b = np.full((3,3), 5, dtype = float) b
array([[5., 5., 5.], [5., 5., 5.],
[5., 5., 5.]])

Program to print element at specified index in numpy array.


import numpy as np
l = np.array([1, 2, 3, 4, 5, 6, 7, 8]) l[3]

Program to print array using slicing.


import numpy as np
l = np.array([1, 2, 3, 4, 5, 6, 7, 8]) l[2:] array([3, 4,
5, 6, 7, 8])

Program to print array between specified indexes.


import numpy as np
l = np.array([1, 2, 3, 4, 5, 6, 7, 8]) l[2:5] array([3,
4, 5])

Program to print array having specified indexes.


import numpy as np
l = np.array([2, 3, 4, 5, 6, 7, 8, 9]) l[[1,4,5]]
array([3, 6, 7])

Program to show differences between np.arrange() and np.linspace().


a = np.arange(3, 9, 1, dtype = int) a array([3,
4, 5, 6, 7, 8])

b = np.linspace(1.1, 7.1, num = 7) b


array([1.1, 2.1, 3.1, 4.1, 5.1, 6.1, 7.1]) Program to
replace all even numbers by 2.
import numpy as np
arr = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9])
for i in range(len(arr)): if
((arr[i]%2)==0):
arr[i] = 2 arr array([1, 2, 3, 2, 5, 2, 7, 2, 9])
Program to find index of arrays where 2 elements match. Take 2 arrays as input and match their index values,
if same, print index.
import numpy as np arr1 = [] n1 = int(input("Enter length of
first array:")) for i in range(n1): e = int(input("Enter element:
")) arr1.append(e)
arr2 = [] n2 = int(input("Enter length of second array:")) for i in
range(n2): e = int(input("Enter element: ")) arr2.append(e)

a = np.array(arr1) b =
np.array(arr2)
for i in range(len(a)): for j in
range(len(b)):
if (a[i]==b[j]):
print(f"Element {a[i]} is present in both arrays at indexes {i} and {j} respectively.")

Enter length of first array:3


Enter element: 1
Enter element: 2
Enter element: 3
Enter length of second array:3
Enter element: 3
Enter element: 4
Enter element: 5
Element 3 is present in both arrays at indexes 2 and 0 respectively.

Program to create a numpy array age and normalize it.


import numpy as np
age = np.array([22, 23, 24, 35, 36, 40, 39, 52, 58, 65]) n_age = [] xmax =
np.max(age) xmin = np.min(age)
for i in age: n_age.append((i-xmin)/(xmax-
xmin))
n_age

[0.0,
0.023255813953488372,
0.046511627906976744,
0.3023255813953488,
0.32558139534883723,
0.4186046511627907, 0.3953488372093023,
0.6976744186046512,
0.8372093023255814,
1.0]

Program to create a numpy array income and normalize it.


import numpy as np
income = np.array([2000, 1500, 3000, 1000, 500, 300, 400, 800, 900, 4000]) n_inc = [] xmax = np.max(income)
xmin = np.min(income)
for i in income:
n_inc.append((i-xmin)/(xmax-xmin))
n_inc

[0.4594594594594595,
0.32432432432432434,
0.7297297297297297,
0.1891891891891892,
0.05405405405405406,
0.0,
0.02702702702702703, 0.13513513513513514,
0.16216216216216217,
1.0]

Program to find max, min elements and their indexes in a numpy array.
import numpy as np
l = np.array([1, 2, 4, 3, 6, 2, 8, 34, 76, 24, 87, 35])
max = np.max(l) min =
np.min(l) imax =
np.argmax(l) imin =
np.argmin(l)

print(f"Maximum element in array is: {max} and its index is: {imax}") print(f"Minimum element
in array is: {min} and its index is: {imin}")

Maximum element in array is: 87 and its index is: 10 Minimum element
in array is: 1 and its index is: 0 Program to implement Pandas
Series.
import pandas as pd import numpy
as np

labels = ['w','x', 'y', 'z'] list = [10, 20, 30, 40] list =
np.array([10, 20, 30, 40]) dict = {'w':10 , 'x':20, 'y':30,
'z':40} pd.Series(data = list)

0 10
1 20
2 30 3 40 dtype: int64 pd.Series(data = list , index = labels)

w 10 x 20 y
30 z 40 dtype:
int64
pd.Series(list,
labels)
w 10 x 20 y
30 z 40 dtype:
int64
pd.Series(dict)

w 10 x 20 y
30 z 40 dtype:
int64

Program to create Pandas Series using list and indexes.


sports1 = pd.Series([1, 2, 3, 4], index = ['Cricket', 'Football', 'Baseball', 'Golf']) sports1

Cricket 1
Football 2
Baseball 3 Golf 4
dtype: int64
sports1['Baseball']

Program to import csv file from drive and print it.


import pandas as pd
df = pd.read_csv("/content/drive/MyDrive/Summer Training/csv files/numbers.csv") print(df)

NUMBERS ALPHABETS
0 1 A
1 2 B
2 3 C
3 4 D
4 5 E
5 6 F
6 7 G
7 8 H
8 9 I
9 10 J
10 11 K
11 12 L
12 13 M
13 14 N
14 15 O
15 16 P
16 17 Q
17 18 R
18 19 S
19 20 T

Day 6 Assignment - Pandas

Download CSV File from link -


https://drive.google.com/file/d/1OmUctbaMnIxNhyyLYG2iJOtkclZvs0TB/view?usp=sharing
And Answer the following questions
from google.colab import drive
drive.mount('/content/drive') Mounted at
/content/drive

q1: From the given dataset print the first and last five rows
import pandas as pd
df2 = pd.read_csv("/content/drive/MyDrive/Summer Training/csv files/Automobile_data.csv")
print(f"First 5 rows are:\n{df2.head()}") print(f"\nLast 5 rows
are:\n{df2.tail()}")
First 5 rows are:
index company body-style wheel-base length engine-type \ 0 0 alfa-romero
convertible 88.6 168.8 dohc
1 1 alfa-romero convertible 88.6 168.8 dohc
2 2 alfa-romero hatchback 94.5 171.2 ohcv
3 3 audi sedan 99.8 176.6 ohc 4 4 audi sedan 99.4 176.6 ohc

num-of-cylinders horsepower average-mileage price 0 four


111 21 13495.0
1 four 111 21 16500.0
2 six 154 19 16500.0
3 four 102 24 13950.0 4 five 115 18 17450.0

Last 5 rows are:


index company body-style wheel-base length engine-type \ 56 81 volkswagen
sedan 97.3 171.7 ohc
57 82 volkswagen sedan 97.3 171.7 ohc
58 86 volkswagen sedan 97.3 171.7 ohc
59 87 volvo sedan 104.3 188.8 ohc 60 88 volvo wagon 104.3 188.8 ohc

num-of-cylinders horsepower average-mileage price 56 four 85


27 7975.0
57 four 52 37 7995.0
58 four 100 26 9995.0
59 four 114 23 12940.0
60 four 114 23 13415.0

q2: Clean the dataset and update the CSV file. Replace all column values which contain ?, n.a, or NaN with
?
df3 = df2.fillna(0) df3

index company body-style wheel-base length engine-type \ 0 0 alfa-romero


convertible 88.6 168.8 dohc
1 1 alfa-romero convertible 88.6 168.8 dohc
2 2 alfa-romero hatchback 94.5 171.2 ohcv
3 3 audi sedan 99.8 176.6 ohc
4 4 audi sedan 99.4 176.6 ohc .. ... ... ... ... ... ...
56 81 volkswagen sedan 97.3 171.7 ohc
57 82 volkswagen sedan 97.3 171.7 ohc
58 86 volkswagen sedan 97.3 171.7 ohc
59 87 volvo sedan 104.3 188.8 ohc 60 88 volvo wagon 104.3 188.8
ohc

num-of-cylinders horsepower average-mileage price 0 four 111


21 13495.0
1 four 111 21 16500.0
2 six 154 19 16500.0
3 four 102 24 13950.0
4 five 115 18 17450.0 .. ... ... ... ...
56 four 85 27 7975.0
57 four 52 37 7995.0
58 four 100 26 9995.0
59 four 114 23 12940.0
60 four 114 23 13415.0
[61 rows x 10 columns] q3: Print most expensive car’s

company name and price.


max = df2['price'].max() df4 =
df2[df2['price'] == max ] df4

index company body-style wheel-base length engine-type \ 35 47 mercedes-benz


hardtop 112.0 199.2 ohcv

num-of-cylinders horsepower average-mileage price 35 eight 184


14 45400.0 q4: Print All Toyota Cars details
df5 = df2[df2['company'] == 'toyota'] df5

index company body-style wheel-base length engine-type num-of-cylinders \ 48 66 toyota hatchback


95.7 158.7 ohc four
49 67 toyota hatchback 95.7 158.7 ohc four
50 68 toyota hatchback 95.7 158.7 ohc four
51 69 toyota wagon 95.7 169.7 ohc four
52 70 toyota wagon 95.7 169.7 ohc four
53 71 toyota wagon 95.7 169.7 ohc four 54 79 toyota wagon 104.5 187.8
dohc six

horsepower average-mileage price 48 62


35 5348.0
49 62 31 6338.0
50 62 31 6488.0
51 62 31 6918.0
52 62 27 7898.0
53 62 27 8778.0
54 156 19 15750.0

q5: Count total cars per company


df6 = df2["company"].value_counts() df6

toyota 7 bmw
6 mazda 5 nissan
5 audi 4 mercedes-
benz 4 mitsubishi 4
volkswagen 4 alfa-
romero 3 chevrolet 3
honda 3 isuzu 3
jaguar 3 porsche
3 dodge 2 volvo
2
Name: company, dtype: int64

q6: Find each company’s Highest price car


dfc1 = df2.groupby('company') df7 =
dfc1["company", 'price'].max() df7

<ipython-input-9-837f2684420b>:2: FutureWarning: Indexing with multiple keys (implicitly converted to a tuple


of keys) will be deprecated, use a list instead. df7 = dfc1["company", 'price'].max()
company price company
alfa-romero alfa-romero 16500.0 audi audi
18920.0 bmw bmw 41315.0 chevrolet
chevrolet 6575.0 dodge dodge 6377.0 honda
honda 12945.0 isuzu isuzu 6785.0 jaguar
jaguar 36000.0 mazda mazda 18344.0 mercedes-
benz mercedes-benz 45400.0 mitsubishi mitsubishi
8189.0 nissan nissan 13499.0 porsche
porsche 37028.0 toyota toyota 15750.0 volkswagen
volkswagen 9995.0 volvo volvo 13415.0 q7: Find
the average mileage of each car making company
dfc2 = df2.groupby('company')
df8 = dfc2['company', 'average-mileage'].mean() df8

<ipython-input-10-b999b8edd580>:2: FutureWarning: Indexing with multiple keys (implicitly converted to a tuple


of keys) will be deprecated, use a list instead.
df8 = dfc2['company', 'average-mileage'].mean()
<ipython-input-10-b999b8edd580>:2: FutureWarning: The default value of numeric_only in
DataFrameGroupBy.mean is deprecated. In a future version, numeric_only will default to False. Either specify
numeric_only or select only columns which should be valid for the function. df8 = dfc2['company', 'average-
mileage'].mean()

average-mileage company
alfa-romero 20.333333 audi
20.000000 bmw 19.000000
chevrolet 41.000000 dodge
31.000000 honda 26.333333 isuzu
33.333333 jaguar 14.333333 mazda
28.000000 mercedes-benz 18.000000
mitsubishi 29.500000 nissan
31.400000 porsche 17.000000 toyota
28.714286 volkswagen 31.750000
volvo 23.000000

q8: Sort all cars by Price column


df9 = df2.sort_values('price') df9

index company body-style wheel-base length engine-type \ 13 16 chevrolet


hatchback 88.4 141.1 l
27 36 mazda hatchback 93.1 159.1 ohc
48 66 toyota hatchback 95.7 158.7 ohc
36 49 mitsubishi hatchback 93.7 157.3 ohc
28 37 mazda hatchback 93.1 159.1 ohc .. ... ... ... ... ...
...
11 14 bmw sedan 103.5 193.8 ohc
35 47 mercedes-benz hardtop 112.0 199.2 ohcv
22 31 isuzu sedan 94.5 155.9 ohc
23 32 isuzu sedan 94.5 155.9 ohc 47 63 porsche hatchback 98.4
175.7 dohcv

num-of-cylinders horsepower average-mileage price 13 three 48


47 5151.0
27 four 68 30 5195.0
48 four 62 35 5348.0
36 four 68 37 5389.0
28 four 68 31 6095.0 .. ... ... ... ...
11 six 182 16 41315.0
35 eight 184 14 45400.0
22 four 70 38 NaN
23 four 70 38 NaN 47 eight 288 17 NaN

[61 rows x 10 columns] Practice

Problems

Program to implement Pandas Series.


import pandas as pd import numpy
as np

labels = ['w','x', 'y', 'z'] list = [10, 20, 30, 40] list =
np.array([10, 20, 30, 40]) dict = {'w':10 , 'x':20, 'y':30,
'z':40} pd.Series(data = list)
0 10
1 20
2 30 3 40 dtype: int64 pd.Series(data = list , index = labels)

w 10 x 20 y 30 z 40
dtype: int64 pd.Series(list,
labels)

w 10 x 20 y
30 z 40 dtype:
int64
pd.Series(dict)

w 10 x 20 y
30 z 40 dtype:
int64

Program to create Pandas Series uding list and indexes.


sports1 = pd.Series([1, 2, 3, 4], index = ['Cricket', 'Football', 'Baseball', 'Golf']) sports1

Cricket 1
Football 2
Baseball 3 Golf 4
dtype: int64
sports1['Baseball']

Program to import csv file from drive and print it.


import pandas as pd
df = pd.read_csv("/content/drive/MyDrive/Summer Training/csv files/numbers.csv") print(df)

NUMBERS ALPHABETS
0 1 A
1 2 B
2 3 C
3 4 D
4 5 E
5 6 F
6 7 G
7 8 H
8 9 I
9 10 J
10 11 K
11 12 L
12 13 M
13 14 N
14 15 O
15 16 P
16 17 Q
17 18 R
18 19 S
19 20 T

Program to create a DataFrame using Pandas and print Series of Day2.


import numpy as np import
pandas as pd
data = {
"calories": [420, 380, 390],
"duration": [50, 40, 45]
}

#load data into a DataFrame object


df = pd.DataFrame(data, index = ['Day1', 'Day2', 'Day3'])
print(df) print("\n",df.loc['Day2'])

calories duration Day1 420


50
Day2 380 40
Day3 390 45
calories 380
duration 40
Name: Day2, dtype: int64

Program to create DataFrame and create graphs.


data = {
"calories": [420, 380, 390],
"duration": [50, 40, 45]
}

#load data into a DataFrame object


df = pd.DataFrame(data, index = ['Day1', 'Day2', 'Day3']) df[['calories', 'duration']]

calories duration Day1 420


50
Day2 380 40
Day3 390 45
Program to load database in a database object.
import numpy as np import
pandas as pd

df = pd.read_csv('/content/drive/MyDrive/Summer Training/csv files/student.csv') print(df.to_string())

id name class mark gender 0 1 John Deo


Four 75 female
1 2 Max Ruin Three 85 male
2 3 Arnold Three 55 male
3 4 Krish Star Four 60 female
4 5 John Mike Four 60 female
5 6 Alex John Four 55 male
6 7 My John Rob Fifth 78 male
7 8 Asruid Five 85 male
8 9 Tes Qry Six 78 male
9 10 Big John Four 55 female
10 11 Ronald Six 89 female
11 12 Recky Six 94 female
12 13 Kty Seven 88 female
13 14 Bigy Seven 88 female
14 15 Tade Row Four 88 male
15 16 Gimmy Four 88 male
16 17 Tumyu Six 54 male
17 18 Honny Five 75 male
18 19 Tinny Nine 18 male
19 20 Jackly Nine 65 female
20 21 Babby John Four 69 female
21 22 Reggid Seven 55 female
22 23 Herod Eight 79 male
23 24 Tiddy Now Seven 78 male
24 25 Giff Tow Seven 88 male
25 26 Crelea Seven 79 male
26 27 Big Nose Three 81 female
27 28 Rojj Base Seven 86 female
28 29 Tess Played Seven 55 male
29 30 Reppy Red Six 79 female
30 31 Marry Toeey Four 88 male
31 32 Binn Rott Seven 90 female
32 33 Kenn Rein Six 96 female
33 34 Gain Toe Seven 69 male
34 35 Rows Noump Six 88 female Program to read type of DataFrame column.

type(df['mark']) pandas.core.series.Series

Program to display column where marks>70.


df[df['mark']>70]

id name class mark gender 0 1 John Deo


Four 75 female
1 2 Max Ruin Three 85 male
6 7 My John Rob Fifth 78 male
7 8 Asruid Five 85 male
8 9 Tes Qry Six 78 male
10 11 Ronald Six 89 female
11 12 Recky Six 94 female
12 13 Kty Seven 88 female
13 14 Bigy Seven 88 female
14 15 Tade Row Four 88 male
15 16 Gimmy Four 88 male
17 18 Honny Five 75 male
22 23 Herod Eight 79 male
23 24 Tiddy Now Seven 78 male
24 25 Giff Tow Seven 88 male
25 26 Crelea Seven 79 male
26 27 Big Nose Three 81 female
27 28 Rojj Base Seven 86 female
29 30 Reppy Red Six 79 female
30 31 Marry Toeey Four 88 male
31 32 Binn Rott Seven 90 female
32 33 Kenn Rein Six 96 female
34 35 Rows Noump Six 88 female

Program to understand data from a DataFrame using describe function.


df.describe()

id mark count 35.000000


35.000000 mean 18.000000 74.657143
std 10.246951 16.401117 min
1.000000 18.000000 25% 9.500000
62.500000
50% 18.000000 79.000000 75%
26.500000 88.000000 max 35.000000
96.000000 Program to drop data from
DataFrame.
df2 = df.drop(np.arange(20,35,1)) print(df2)

id name class mark gender 0 1 John Deo


Four 75 female
1 2 Max Ruin Three 85 male
2 3 Arnold Three 55 male
3 4 Krish Star Four 60 female
4 5 John Mike Four 60 female
5 6 Alex John Four 55 male
6 7 My John Rob Fifth 78 male
7 8 Asruid Five 85 male
8 9 Tes Qry Six 78 male
9 10 Big John Four 55 female
10 11 Ronald Six 89 female
11 12 Recky Six 94 female
12 13 Kty Seven 88 female
13 14 Bigy Seven 88 female
14 15 Tade Row Four 88 male
15 16 Gimmy Four 88 male
16 17 Tumyu Six 54 male
17 18 Honny Five 75 male
18 19 Tinny Nine 18 male
19 20 Jackly Nine 65 female
Program to create a DataFrame object using dictionary.
data = { 'id': 21,
'name': 'Jack Hoff',
'class': 'Eight', 'mark': 69,
'gender': 'male',
}
df1 = pd.DataFrame(data, index=[20]) print(df1)

id name class mark gender 20 21 Jack Hoff


Eight 69 male Program to add row to
DataFrame object.
df2 = pd.concat([df2, df1]) print(df2)

id name class mark gender 0 1 John Deo


Four 75 female
1 2 Max Ruin Three 85 male
2 3 Arnold Three 55 male
3 4 Krish Star Four 60 female
4 5 John Mike Four 60 female
5 6 Alex John Four 55 male
6 7 My John Rob Fifth 78 male
7 8 Asruid Five 85 male
8 9 Tes Qry Six 78 male
9 10 Big John Four 55 female
10 11 Ronald Six 89 female
11 12 Recky Six 94 female
12 13 Kty Seven 88 female
13 14 Bigy Seven 88 female
14 15 Tade Row Four 88 male
15 16 Gimmy Four 88 male
16 17 Tumyu Six 54 male
17 18 Honny Five 75 male
18 19 Tinny Nine 18 male
19 20 Jackly Nine 65 female
20 21 Jack Hoff Eight 69 male Program to print last 5 rows of DataFrame. df2.tail(5)

id name class mark gender 16 17 Tumyu


Six 54 male
17 18 Honny Five 75 male
18 19 Tinny Nine 18 male
19 20 Jackly Nine 65 female
20 21 Jack Hoff Eight 69 male

Program to print first 8 rows of DataFrame. df2.head(8)


id name class mark gender 0 1 John Deo
Four 75 female
1 2 Max Ruin Three 85 male
2 3 Arnold Three 55 male
3 4 Krish Star Four 60 female 4 5 John Mike Four 60 female
5 6 Alex John Four 55 male
6 7 My John Rob Fifth 78 male
7 8 Asruid Five 85 male
Program to check if any value is null in file.
df2.isnull()

index company body-style wheel-base length engine-type \ 0 False False False


False False False
1 False False False False False False
2 False False False False False False
3 False False False False False False
4 False False False False False False .. ... ... ... ... ... ...
56 False False False False False False
57 False False False False False False
58 False False False False False False
59 False False False False False False
60 False False False False False False

num-of-cylinders horsepower average-mileage price 0 False


False False False
1 False False False False
2 False False False False
3 False False False False
4 False False False False .. ... ... ... ...
56 False False False False
57 False False False False
58 False False False False
59 False False False False
60 False False False False

[61 rows x 10 columns]

Program to check how many nulls are present in a column.


df2.isnull().sum()

index 0 company 0
body-style 0 wheel-base
0 length 0 engine-type
0 num-of-cylinders 0 horsepower
0 average-mileage 0 price
3 dtype: int64

Program to drop mark column from dataframe. df.drop('mark',axis=1)


id name class gender 0 1 John Deo
Four female
1 2 Max Ruin Three male
2 3 Arnold Three male
3 4 Krish Star Four female
4 5 John Mike Four female
5 6 Alex John Four male
6 7 My John Rob Fifth male
7 8 Asruid Five male
8 9 Tes Qry Six male
9 10 Big John Four female
10 11 Ronald Six female
11 12 Recky Six female
12 13 Kty Seven female
13 14 Bigy Seven female
14 15 Tade Row Four male
15 16 Gimmy Four male
16 17 Tumyu Six male
17 18 Honny Five male
18 19 Tinny Nine male
19 20 Jackly Nine female
20 21 Babby John Four female
21 22 Reggid Seven female
22 23 Herod Eight male
23 24 Tiddy Now Seven male
24 25 Giff Tow Seven male
25 26 Crelea Seven male
26 27 Big Nose Three female
27 28 Rojj Base Seven female
28 29 Tess Played Seven male
29 30 Reppy Red Six female
30 31 Marry Toeey Four male
31 32 Binn Rott Seven female
32 33 Kenn Rein Six female
33 34 Gain Toe Seven male
34 35 Rows Noump Six female

Day 8 Assignment - On MatplotLib(Data Visualization)

Download CSV File from Link and answer the following questions -
https://drive.google.com/file/d/1-
PbK5h1Msmw2LRysPWNvJQkNBEZac6YK/view?usp=sharing
from google.colab import drive
drive.mount('/content/drive') Mounted at
/content/drive

import pandas as pd import numpy as np


import matplotlib.pyplot as plt
df = pd.read_csv("/content/drive/MyDrive/Summer Training/csv files/company_sales_data.csv")

q1: Read Total profit of all months and show it using a line plot
X label name = Month Number
Y label name = Total profit

x = df['month_number'] y =
df['total_profit'] plt.plot(x, y)

[<matplotlib.lines.Line2D at 0x7b0ecd676f20>]

q2: Get total profit of all months and show line plot with the following Style properties
Line Style dotted and Line-color should be red

Show legend at the lower right location.

X label name = Month Number

Y label name = Sold units number

Add a circle marker.


Line marker color as red

Line width should be 3


x = df['month_number'] y = df['total_profit']
plt.plot(x, y, 'ro:', linewidth=3) plt.xlabel('Month
Number') plt.ylabel('Sold units number')
plt.legend(['Units'], loc='lower right')

<matplotlib.legend.Legend at 0x7b0ecd599e70>

q3: Read all product sales data and show it using a multiline plot
Display the number of units sold per month for each product using multiline plots. (i.e., Separate Plotline
for each product ).
x = df['month_number'] y1 =
df['facecream'] y2 =
df['facewash'] y3 =
df['toothpaste'] y4 =
df['bathingsoap'] y5 =
df['shampoo'] y6 =
df['moisturizer']
plt.plot(x, y1, label = 'Facecream') plt.plot(x, y2, label =
'Facewash') plt.plot(x, y3, label = 'Toothpaste') plt.plot(x, y4,
label = 'Bathing Soap') plt.plot(x, y5, label = 'Shampoo')
plt.plot(x, y6, label = 'Moisturizer') plt.legend() plt.axis([1, 12,
0, 18000]) plt.xlabel('Month Number') plt.ylabel('Product units
sold')

Text(0, 0.5, 'Product units sold')


q4: Read toothpaste sales data of each month and show it using a scatter plot Also,
add a grid in the plot. gridline style should “–“.
plt.scatter(x, y3) plt.grid(True, linestyle='--')

q5: Read face cream and facewash product sales data and show it using the bar chart The bar chart should
display the number of units sold per month for each product. Add a separate bar for each product in the
same chart.
plt.bar(x+0.2, y1, label = 'Facecream', width = 0.4) plt.bar(x-0.2, y2,
label = 'Facewash', width = 0.4) plt.xlabel('Month number')
plt.ylabel('Units sold') plt.title('Product Sales') plt.legend()

<matplotlib.legend.Legend at 0x7b0ecde840d0>
q6: Read sales data of bathing soap of all months and show it using a bar chart. Save this plot to your hard
disk
plt.bar(x, y4) plt.xlabel('Month')
plt.ylabel('Units') plt.legend(['Units sold'])
plt.savefig('MyChart.png')

q7: Read the total profit of each month and show it using the histogram to see the most common profit ranges
y8 = df['total_profit'] plt.hist(y8, rwidth = 0.8,
bins = 10)

(array([2., 4., 1., 1., 1., 1., 0., 1., 0., 1.]),
array([183300., 206250., 229200., 252150., 275100., 298050., 321000.,
343950., 366900., 389850., 412800.]),
<BarContainer object of 10 artists>)
q8: Calculate total sale data for last year for each product and show it using a Pie chart
s1 = y1.sum() s2 =
y2.sum() s3 =
y3.sum() s4 =
y4.sum() s5 =
y5.sum() s6 =
y6.sum()

sales = [s1, s2, s3, s4, s5, s6]


prod = ['Facecream', 'Facewash', 'Toothpaste', 'Bathing soap', 'Shampoo',
'Moisturizer']

figureObject, axesObject = plt.subplots()


axesObject.pie(sales, labels = prod, startangle = 90, autopct = '%.2f%%', wedgeprops =
{'linewidth':7}) axesObject.axis('equal')
plt.title('Product Sales')

Text(0.5, 1.0, 'Product Sales')


q9: Read Bathing soap facewash of all months and display it using the Subplot
plt.subplot(2,2,1) plt.plot(x, y4, 'r')
plt.xlabel('Bathing soap units')
plt.ylabel('Months') plt.title('Bathing Soap
Sales')

plt.subplot(2,2,2) plt.plot(x, y2)


plt.xlabel('Facewash units')
plt.ylabel('Months') plt.title('Facewash
Sales')
plt.suptitle('Sales Numbers')

Text(0.5, 0.98, 'Sales Numbers')

q10: Read all product sales data and show it using the stack plot
y = np.vstack([y1, y2, y3, y4, y5, y6])
prod = ['Facecream', 'Facewash', 'Toothpaste', 'Bathing soap', 'Shampoo',
'Moisturizer']

figure, ax = plt.subplots() ax.stackplot(x, y,


labels = prod) plt.legend() plt.xlabel('Units')
plt.ylabel('Month number') plt.title('Product
Sales')

Text(0.5, 1.0, 'Product Sales')

#Practice Problems
Program to plot a simple line chart.
import matplotlib.pyplot as plt

x = [1, 2, 3, 4] y = [1, 4,
9, 16] plt.plot(x,y)

[<matplotlib.lines.Line2D at 0x7b0ecd630cd0>]
Program to plot a line chart of f(x) = x^3.
import numpy as np
import matplotlib.pyplot as plt

x = np.arange(-10, 11, 1) y =
x**3 plt.plot(x,y)

[<matplotlib.lines.Line2D at 0x7b0ecd9956c0>]

Program to plot distinct points using scatter function.


import numpy as np import
matplotlib.pyplot as plt x = np.arange(2,
13, 2) y = x**2 plt.scatter(x, y)

<matplotlib.collections.PathCollection at 0x7b0ecd716290>

Program to plot simple linear graph.


# y = mx+c
import matplotlib.pyplot as plt

x = np.arange(0, 13, 3) m = int(input("Enter slope of


graph: ")) c = int(input("Enter y-intercept: ")) y =
m*x + c plt.plot(x, y)

Enter slope of graph: 2


Enter y-intercept: 12

[<matplotlib.lines.Line2D at 0x7b0ecd32dcf0>]

Program to plot a simple graph for quadratic equation.


# y = ax^2 + bx + c import numpy as np
import matplotlib.pyplot as plt

x = np.arange(0,11, 1)
a = int(input("Enter coefficient of x^2: ")) b =
int(input("Enter coefficient of x: ")) c = int(input("Enter y-
intercept: ")) y = a*x**2 + b*x + c plt.plot(x,y)

Enter coefficient of x^2: 2


Enter coefficient of x: 3
Enter y-intercept: 7

[<matplotlib.lines.Line2D at 0x7b0ecd1ab5e0>]

Program to plot graph with different colors and shapes.


import numpy as np import
matplotlib.pyplot as plt

x = np.arange(0, 11, 2) y = x +
10 plt.plot(x, y, 'g^:')

[<matplotlib.lines.Line2D at 0x7b0ecd21b1f0>]
import numpy as np import
matplotlib.pyplot as plt

x = np.arange(-2, 7, 1) y =
x**4 plt.plot(x, y, 'b--d')

[<matplotlib.lines.Line2D at 0x7b0ecd0b46a0>]

import numpy as np import


matplotlib.pyplot as plt

x = np.arange(12, 23, 2) y = x*3


plt.plot(x, y, 'r-.')

[<matplotlib.lines.Line2D at 0x7b0ecd11ba30>]
Program to plot multiple graphs while differentiating them with colors.
import numpy as np
import matplotlib.pyplot as plt

x = np.arange(0, 11, 1) y1 = x
y2 = x**2 y3 = x**3
plt.plot(x, y1, 'r--') plt.plot(x, y2, 'bs:')
plt.plot(x, y3, 'g^-')
#plt.plot(x, y1, 'r--', x, y2, 'bs', x, y3, 'g^') also works instead.

[<matplotlib.lines.Line2D at 0x7b0ecceb8280>]

Program to use legend and title functions in graph.


import numpy as np
import matplotlib.pyplot as plt

x = np.arange(-5, 6, 1) y1 =
x**2 y2 = x**3
plt.plot(x, y1, 'g^--', label='Parabolic') plt.plot(x, y2, 'yo:',
label='Cubic') plt.legend() #to show legend plt.title('Parabolic v/s
Cubic graph')

Text(0.5, 1.0, 'Parabolic v/s Cubic graph')

Program to show grid in graph.


import numpy as np
import matplotlib.pyplot as plt

x = np.arange(12, 23, 2) y = x*3


plt.grid(True) plt.plot(x, y, 'r-.')

[<matplotlib.lines.Line2D at 0x7b0ecceebf70>]
Program to limit axes in graph.
import numpy as np import
matplotlib.pyplot as plt

x = np.arange(12, 23, 2) y = x*2


plt.plot(x, y, 'r-.') plt.axis([5, 38,
15, 60])

(5.0, 38.0, 15.0, 60.0)

Program to display labels and title for graph.


import numpy as np import
matplotlib.pyplot as plt x = np.arange(12,
23, 2) y = x**2 plt.plot(x, y, 'r-.')
plt.xlabel('X - axis') plt.ylabel('Y - axis')
plt.title('Stonks')

Text(0.5, 1.0, 'Stonks')

Program to plot random points using scatter function.


import numpy as np import
matplotlib.pyplot as plt

x = np.random.rand(25) y = np.random.rand(25)
plt.scatter(x, y) plt.plot(x, y, 'ro:', markerfacecolor = 'b')

[<matplotlib.lines.Line2D at 0x7b0ecce00100>]

Program to plot bar graph.


import numpy as np
import matplotlib.pyplot as plt

x = np.arange(1, 13, 1) y =
np.arange(500, 6500, 500)

plt.bar(x, y, width = 0.7) plt.xlabel('X values')


plt.ylabel('Y values') plt.title('Bar Graph')

Text(0.5, 1.0, 'Bar Graph')

Program to create a pie chart.


import numpy as np import
matplotlib.pyplot as plt
conts = ['Asia', 'Africa', 'Europe', 'North America', 'South America'] population = [59.69, 16,
9.94, 7.79, 5.68] explosion = (0.02, 0, 0, 0, 0)

figureObject, axesObject = plt.subplots()


axesObject.pie(population, explode = explosion, labels = conts, autopct = '%.2f%%', startangle = 90,
wedgeprops = {'linewidth': 5}) axesObject.axis('equal')

(-1.1199233681493854,
1.1009483364911072,
-1.1066336766829425,
1.1003158893657508)
Program to create a simple histogram.
import numpy as np
import matplotlib.pyplot as plt

val = np.arange(5, 20, 2) plt.hist(val, rwidth = 0.8,


bins = 18)

(array([1., 0., 1., 0., 0., 1., 0., 1., 0., 0., 1., 0., 1., 0., 0., 1., 0.,
1.]),
array([ 5. , 5.77777778, 6.55555556, 7.33333333, 8.11111111, 8.88888889, 9.66666667,
10.44444444, 11.22222222, 12. ,
12.77777778, 13.55555556, 14.33333333, 15.11111111, 15.88888889,
16.66666667, 17.44444444, 18.22222222, 19. ]),
<BarContainer object of 18 artists>)
Program to create Stack plot.
import numpy as np
import matplotlib.pyplot as plt

x = [1, 2, 7, 8, 9, 16] y1 = [3, 4, 5,


7, 2, 8] y2 = [4, 2, 4, 4, 5, 12] y3 =
[2, 1, 4, 1, 9, 7]

lb = ['Y1', 'Y2', 'Y3'] y =


np.vstack([y1, y2, y3])

plt.stackplot(x, y, labels = lb) plt.xlabel('X


values') plt.ylabel('Y values') plt.legend(loc =
'upper left') plt.title('Stack Plot')

Text(0.5, 1.0, 'Stack Plot')


Chapter 2- Machine Learning

Machine learning is a type of artificial intelligence (AI) that allows software applications to become more
accurate in predicting outcomes without being explicitly programmed to do so. Machine learning algorithms
use historical data as input to predict new output values.

Python is a general-purpose programming language that is often used for machine learning applications. It
is easy to learn and has a large library of machine learning modules and libraries.

There are two main types of machine learning in Python: supervised learning and unsupervised learning.

• Supervised learning is when the computer is given labelled data, meaning that the output values are
known. The computer then learns to predict the output values for new data based on the labelled data.
• Unsupervised learning is when the computer is given unlabelled data. The computer then learns to
find patterns in the data without any prior knowledge of the output values.

Some of the most common machine learning algorithms in Python include:

• Linear regression: This algorithm is used to predict continuous values, such as the price of a house
or the number of sales.
• Logistic regression: This algorithm is used to predict categorical values, such as whether a customer
will click on an ad or not.
• Decision trees: This algorithm is used to create a decision tree that can be used to classify or predict
data.
• Support vector machines (SVMs): This algorithm is used to find the best hyperplane that separates
two classes of data.
• Neural networks: This algorithm is a type of machine learning that is inspired by the human brain. It
is used to solve complex problems, such as image recognition and natural language processing.

Machine learning with Python is a powerful tool that can be used to solve a wide variety of problems. It is
a relatively easy language to learn, and there are many resources available to help you get started.

Here are some of the advantages of using Python for machine learning:

• Python is a general-purpose language, which means that it can be used for a variety of tasks, not just
machine learning.
• Python is easy to learn and use.
• There are many machine learning libraries and modules available for Python.
• Python is open source, which means that it is free to use and modify.

Linear Regression

Linear regression is one of the easiest and most popular Machine Learning algorithms. It is a statistical
method that is used for predictive analysis. Linear regression makes predictions for continuous/real or
numeric variables such as sales, salary, age, product price, etc.

Linear regression algorithm shows a linear relationship between a dependent (y) and one or more
independent (y) variables, hence called as linear regression. Since linear regression shows the linear
relationship, which means it finds how the value of the dependent variable is changing according to the
value of the independent variable.

Mathematically, we can represent a linear regression as: y=

a0+a1x+ ε

Here,

Y = Dependent Variable (Target Variable) X = Independent


Variable (predictor Variable) a0 = intercept of the line (Gives an
additional degree of freedom) a1 = Linear regression coefficient
(scale factor to each input value). ε = random error
The values for x and y variables are training datasets for Linear Regression model representation.

Types of Linear Regression

Linear regression can be further divided into two types of the algorithm:

• Simple Linear Regression:


If a single independent variable is used to predict the value of a numerical dependent variable, then such a
Linear Regression algorithm is called Simple Linear Regression.
• Multiple Linear regression:
If more than one independent variable is used to predict the value of a numerical dependent variable, then such
a Linear Regression algorithm is called Multiple Linear Regression.

Logistic Regression

• Logistic regression is another supervised learning algorithm which is used to solve the classification
problems. In classification problems, we have dependent variables in a binary or discrete format such as 0
or
1.
• Logistic regression algorithm works with the categorical variable such as 0 or 1, Yes or No, True or False,
Spam or not spam, etc.

• It is a predictive analysis algorithm which works on the concept of probability.

• Logistic regression is a type of regression, but it is different from the linear regression algorithm in the term
how they are used.

• Logistic regression uses sigmoid function or logistic function which is a complex cost function. This
sigmoid function is used to model the data in logistic regression.

KNN Algorithm

• K-Nearest Neighbour is one of the simplest Machine Learning algorithms based on Supervised Learning
technique.
• K-NN algorithm assumes the similarity between the new case/data and available cases and put the new case
into the category that is most similar to the available categories.
• K-NN algorithm stores all the available data and classifies a new data point based on the similarity. This means
when new data appears then it can be easily classified into a well suite category by using K- NN algorithm.
• K-NN algorithm can be used for Regression as well as for Classification but mostly it is used for the
Classification problems.
• K-NN is a non-parametric algorithm, which means it does not make any assumption on underlying data.

• It is also called a lazy learner algorithm because it does not learn from the training set immediately instead it
stores the dataset and at the time of classification, it performs an action on the dataset.
• KNN algorithm at the training phase just stores the dataset and when it gets new data, then it classifies that
data into a category that is much similar to the new data.

Support Vector Machine


Support Vector Machine or SVM is one of the most popular Supervised Learning algorithms, which is used for
Classification as well as Regression problems. However, primarily, it is used for Classification problems in Machine
Learning.

The goal of the SVM algorithm is to create the best line or decision boundary that can segregate n-dimensional space
into classes so that we can easily put the new data point in the correct category in the future. This best decision
boundary is called a hyperplane.

SVM chooses the extreme points/vectors that help in creating the hyperplane. These extreme cases are called as support
vectors, and hence algorithm is termed as Support Vector Machine.

Naïve Bayes Classifier

• Naïve Bayes algorithm is a supervised learning algorithm, which is based on Bayes theorem and used for
solving classification problems.
• It is mainly used in text classification that includes a high-dimensional training dataset.
• Naïve Bayes Classifier is one of the simple and most effective Classification algorithms which helps in
building the fast machine learning models that can make quick predictions.
• It is a probabilistic classifier, which means it predicts on the basis of the probability of an object.
• Some popular examples of Naïve Bayes Algorithm are spam filtration, Sentimental analysis, and classifying
articles.

Decision Tree Classification Algorithm

• Decision Tree is a Supervised learning technique that can be used for both classification and Regression
problems, but mostly it is preferred for solving Classification problems. It is a tree-structured classifier, where
internal nodes represent the features of a dataset, branches represent the decision rules and each leaf node
represents the outcome.
• In a Decision tree, there are two nodes, which are the Decision Node and Leaf Node. Decision nodes are used
to make any decision and have multiple branches, whereas Leaf nodes are the output of those decisions and
do not contain any further branches.
• The decisions or the test are performed on the basis of features of the given dataset.
• It is a graphical representation for getting all the possible solutions to a problem/decision based on given
conditions.
• It is called a decision tree because, similar to a tree, it starts with the root node, which expands on further
branches and constructs a tree-like structure.
• In order to build a tree, we use the CART algorithm, which stands for Classification and Regression Tree
algorithm.
• A decision tree simply asks a question, and based on the answer (Yes/No), it further split the tree into subtrees.

Random Forest Algorithm

Random Forest is a popular machine learning algorithm that belongs to the supervised learning technique. It can be
used for both Classification and Regression problems in ML. It is based on the concept of ensemble learning, which
is a process of combining multiple classifiers to solve a complex problem and to improve the performance of the
model.

As the name suggests, "Random Forest is a classifier that contains a number of decision trees on various subsets of
the given dataset and takes the average to improve the predictive accuracy of that dataset." Instead of relying on one
decision tree, the random forest takes the prediction from each tree and based on the majority votes of predictions,
and it predicts the final output.
The greater number of trees in the forest leads to higher accuracy and prevents the problem of overfitting.

K-Means Clustering

K-Means Clustering is an Unsupervised Learning algorithm, which groups the unlabelled dataset into different
clusters. Here K defines the number of pre-defined clusters that need to be created in the process, as if K=2, there will
be two clusters, and for K=3, there will be three clusters, and so on.

It is an iterative algorithm that divides the unlabelled dataset into k different clusters in such a way that each dataset
belongs only one group that has similar properties.

It allows us to cluster the data into different groups and a convenient way to discover the categories of groups in the
unlabelled dataset on its own without the need for any training.

It is a centroid-based algorithm, where each cluster is associated with a centroid. The main aim of this algorithm is to
minimize the sum of distances between the data point and their corresponding clusters.

The algorithm takes the unlabelled dataset as input, divides the dataset into k-number of clusters, and repeats the
process until it does not find the best clusters. The value of k should be predetermined in this algorithm.
The k-means clustering algorithm mainly performs two tasks:

• Determines the best value for K center points or centroids by an iterative process.
• Assigns each data point to its closest k-center. Those data points which are near to the particular k-center,
create a cluster.
• Hence each cluster has datapoints with some commonalities, and it is away from other clusters.

Apriori Algorithm in Machine Learning

The Apriori algorithm uses frequent itemsets to generate association rules, and it is designed to work on the databases
that contain transactions. With the help of these association rule, it determines how strongly or how weakly two objects
are connected. This algorithm uses a Breadth-First Search and Hash Tree to calculate the itemset associations
efficiently. It is the iterative process for finding the frequent itemsets from the large dataset.

Association Rule Learning

Association rule learning is a type of unsupervised learning technique that checks for the dependency of one data item
on another data item and maps accordingly so that it can be more profitable. It tries to find some interesting relations
or associations among the variables of dataset. It is based on different rules to discover the interesting relations between
variables in the database.

The association rule learning is one of the very important concepts of machine learning, and it is employed in Market
Basket analysis, Web usage mining, continuous production, etc. Here market basket analysis is a technique used by
the various big retailer to discover the associations between items. We can understand it by taking an example of a
supermarket, as in a supermarket, all products that are purchased together are put together.
For example, if a customer buys bread, he most likely can also buy butter, eggs, or milk, so these products are stored
within a shelf or mostly nearby.
DAY 1 Assignment (Zomato dataset)

1. Read csv
2. Display no. of columns
3. Describe the dataset
4. Check how many columns with null are there
5. Read excel file
6. Merge both files (country code common)
7. Display the final column list
8. Plot piechart (countryvalue vs label)
9. Plot piechart for highest 3 countries
10. Groupby aggregate rating, rating color, rating text
11. Plot bar aggregate rating vs rating count
12. Count plot rating color vs rating count
13. Which currency used by which country (groupby) ?
14. Which country has online delivery options ?
import numpy as np import pandas as pd
import matplotlib.pyplot as plt import
seaborn as sns

#1 Read csv
df = pd.read_csv('/content/drive/MyDrive/Summer Training/csv files/zomato.csv') df

Restaurant ID Restaurant Name Country Code City \


0 6317637 Le Petit Souffle 162 Makati City
1 6304287 Izakaya Kikufuji 162 Makati City
2 6300002 Heat - Edsa Shangri-La 162 Mandaluyong City
3 6318506 Ooma 162 Mandaluyong City
4 6314302 Sambo Kojin 162 Mandaluyong City ... ... ...
... ...
9546 5915730 NamlÛ± Gurme 208 ÛÁstanbul
9547 5908749 Ceviz AÛôacÛ± 208 ÛÁstanbul
9548 5915807 Huqqa 208 ÛÁstanbul 9549 5916112 A ô ôk Kahve
208 ÛÁstanbul 9550 5927402 Walter's Coffee Roastery 208 ÛÁstanbul
Address \ 0 Third Floor, Century City Mall,
Kalayaan Avenu...
1 Little Tokyo, 2277 Chino Roces Avenue, Legaspi...
2 Edsa Shangri-La, 1 Garden Way, Ortigas, Mandal...
3 Third Floor, Mega Fashion Hall, SM Megamall, O...
4 Third Floor, Mega Atrium, SM Megamall, Ortigas...
... ... 9546 Kemanke ô Karamustafa Pa ôa Mahallesi,
RÛ±htÛ±... 9547 Ko ôuyolu Mahallesi, Muhittin íìstí_ndaÛô Cadd... 9548
Kuruí_e ôme Mahallesi, Muallim Naci Caddesi, N...
9549 Kuruí_e ôme Mahallesi, Muallim Naci Caddesi, N... 9550 CafeaÛôa
Mahallesi, BademaltÛ± Sokak, No 21/B,...

Locality \
0 Century City Mall, Poblacion, Makati City
1 Little Tokyo, Legaspi Village, Makati City
2 Edsa Shangri-La, Ortigas, Mandaluyong City
3 SM Megamall, Ortigas, Mandaluyong City
4 SM Megamall, Ortigas, Mandaluyong City ... ...
9546 Karakí_y 9547 Ko
ôuyolu 9548 Kuruí_e ôme
9549 Kuruí_e ôme 9550
Moda

Locality Verbose Longitude \


0 Century City Mall, Poblacion, Makati City, Mak... 121.027535
1 Little Tokyo, Legaspi Village, Makati City, Ma... 121.014101
2 Edsa Shangri-La, Ortigas, Mandaluyong City, Ma... 121.056831
3 SM Megamall, Ortigas, Mandaluyong City, Mandal... 121.056475
4 SM Megamall, Ortigas, Mandaluyong City, Mandal... 121.057508 ... ...
...
9546 Karakí_y, ÛÁstanbul 28.977392 9547 Ko ôuyolu,
ÛÁstanbul 29.041297 9548 Kuruí_e ôme, ÛÁstanbul 29.034640
9549 Kuruí_e ôme, ÛÁstanbul 29.036019 9550
Moda, ÛÁstanbul 29.026016
Latitude Cuisines ... Currency \
0 14.565443 French, Japanese, Desserts ... Botswana Pula(P)
1 14.553708 Japanese ... Botswana Pula(P)
2 14.581404 Seafood, Asian, Filipino, Indian ... Botswana Pula(P)
3 14.585318 Japanese, Sushi ... Botswana Pula(P)
4 14.584450 Japanese, Korean ... Botswana Pula(P) ... ... ... ...
...
9546 41.022793 Turkish ... Turkish Lira(TL)
9547 41.009847 World Cuisine, Patisserie, Cafe ... Turkish Lira(TL)
9548 41.055817 Italian, World Cuisine ... Turkish Lira(TL)
9549 41.057979 Restaurant Cafe ... Turkish Lira(TL) 9550 40.984776 Cafe ...
Turkish Lira(TL)

Has Table booking Has Online delivery Is delivering now \


0 Yes No No
1 Yes No No
2 Yes No No
3 No No No
4 Yes No No ... ... ... ...
9546 No No No
9547 No No No
9548 No No No
9549 No No No
9550 No No No

Switch to order menu Price range Aggregate rating Rating color \


0 No 3 4.8 Dark Green
1 No 3 4.5 Dark Green
2 No 4 4.4 Green
3 No 4 4.9 Dark Green
4 No 4 4.8 Dark Green ... ... ... ... ...
9546 No 3 4.1 Green
9547 No 3 4.2 Green
9548 No 4 3.7 Yellow 9549 No 4 4.0
Green 9550 No 2 4.0 Green

Rating text Votes


0 Excellent 314
1 Excellent 591
2 Very Good 270
3 Excellent 365
4 Excellent 229 ... ... ...
9546 Very Good 788
9547 Very Good 1034
9548 Good 661
9549 Very Good 901
9550 Very Good 591
[9551 rows x 21 columns]

#2 Display no. of columns len(df.columns)

21

#3 Describe the dataset df.describe()

Restaurant ID Country Code Longitude Latitude \ count 9.551000e+03


9551.000000 9551.000000 9551.000000 mean 9.051128e+06 18.365616
64.126574 25.854381 std 8.791521e+06 56.750546 41.467058 11.007935
min 5.300000e+01 1.000000 -157.948486 -41.330428 25% 3.019625e+05
1.000000 77.081343 28.478713
50% 6.004089e+06 1.000000 77.191964 28.570469 75%
1.835229e+07 1.000000 77.282006 28.642758 max 1.850065e+07
216.000000 174.832089 55.976980

Average Cost for two Price range Aggregate rating Votes count 9551.000000
9551.000000 9551.000000 9551.000000 mean 1199.210763 1.804837 2.666370
156.909748 std 16121.183073 0.905609 1.516378 430.169145 min
0.000000 1.000000 0.000000 0.000000 25% 250.000000 1.000000
2.500000 5.000000
50% 400.000000 2.000000 3.200000 31.000000 75% 700.000000
2.000000 3.700000 131.000000 max 800000.000000 4.000000 4.900000
10934.000000

#4 Check how many columns with null are there df.isna().any()

Restaurant ID False
Restaurant Name False
Country Code False
City False
Address False
Locality False
Locality Verbose False
Longitude False
Latitude False
Cuisines True
Average Cost for two False
Currency False
Has Table booking False
Has Online delivery False
Is delivering now False
Switch to order menu False
Price range False
Aggregate rating False
Rating color False
Rating text False Votes
False dtype: bool

#5 Read excel file


df2 = pd.read_excel('/content/drive/MyDrive/Summer Training/EXCEL files/Country-
Code.xlsx') df2

Country Code Country


0 1 India
1 14 Australia
2 30 Brazil
3 37 Canada
4 94 Indonesia
5 148 New Zealand
6 162 Phillipines
7 166 Qatar
8 184 Singapore
9 189 South Africa
10 191 Sri Lanka
11 208 Turkey
12 214 UAE
13 215 United Kingdom
14 216 United States

#6 Merge both files (country code common) df3 =


pd.merge(df,df2) df3

Restaurant ID Restaurant Name Country Code City \


0 6317637 Le Petit Souffle 162 Makati City
1 6304287 Izakaya Kikufuji 162 Makati City
2 6300002 Heat - Edsa Shangri-La 162 Mandaluyong City
3 6318506 Ooma 162 Mandaluyong City
4 6314302 Sambo Kojin 162 Mandaluyong City ... ... ...
... ...
9546 5915730 NamlÛ± Gurme 208 ÛÁstanbul
9547 5908749 Ceviz AÛôacÛ± 208 ÛÁstanbul
9548 5915807 Huqqa 208 ÛÁstanbul 9549 5916112 A ô ôk Kahve
208 ÛÁstanbul 9550 5927402 Walter's Coffee Roastery 208 ÛÁstanbul
Address \ 0 Third Floor, Century City Mall,
Kalayaan Avenu...
1 Little Tokyo, 2277 Chino Roces Avenue, Legaspi...
2 Edsa Shangri-La, 1 Garden Way, Ortigas, Mandal...
3 Third Floor, Mega Fashion Hall, SM Megamall, O...
4 Third Floor, Mega Atrium, SM Megamall, Ortigas...
... ... 9546 Kemanke ô Karamustafa Pa ôa Mahallesi,
RÛ±htÛ±... 9547 Ko ôuyolu Mahallesi, Muhittin íìstí_ndaÛô Cadd... 9548
Kuruí_e ôme Mahallesi, Muallim Naci Caddesi, N... 9549 Kuruí_e ôme
Mahallesi, Muallim Naci Caddesi, N... 9550 CafeaÛôa Mahallesi, BademaltÛ±
Sokak, No 21/B,...

Locality \
0 Century City Mall, Poblacion, Makati City
1 Little Tokyo, Legaspi Village, Makati City
2 Edsa Shangri-La, Ortigas, Mandaluyong City
3 SM Megamall, Ortigas, Mandaluyong City
4 SM Megamall, Ortigas, Mandaluyong City ... ...
9546 Karakí_y 9547 Ko
ôuyolu 9548 Kuruí_e ôme
9549 Kuruí_e ôme 9550
Moda

Locality Verbose Longitude \


0 Century City Mall, Poblacion, Makati City, Mak... 121.027535
1 Little Tokyo, Legaspi Village, Makati City, Ma... 121.014101
2 Edsa Shangri-La, Ortigas, Mandaluyong City, Ma... 121.056831
3 SM Megamall, Ortigas, Mandaluyong City, Mandal... 121.056475
4 SM Megamall, Ortigas, Mandaluyong City, Mandal... 121.057508 ... ...
...
9546 Karakí_y, ÛÁstanbul 28.977392 9547 Ko ôuyolu,
ÛÁstanbul 29.041297 9548 Kuruí_e ôme, ÛÁstanbul 29.034640
9549 Kuruí_e ôme, ÛÁstanbul 29.036019 9550 Moda,
ÛÁstanbul 29.026016

Latitude Cuisines ... Has Table booking \


0 14.565443 French, Japanese, Desserts ... Yes
1 14.553708 Japanese ... Yes
2 14.581404 Seafood, Asian, Filipino, Indian ... Yes
3 14.585318 Japanese, Sushi ... No
4 14.584450 Japanese, Korean ... Yes ... ... ... ... ...
9546 41.022793 Turkish ... No
9547 41.009847 World Cuisine, Patisserie, Cafe ... No
9548 41.055817 Italian, World Cuisine ... No
9549 41.057979 Restaurant Cafe ... No 9550 40.984776 Cafe ...
No
Has Online delivery Is delivering now Switch to order menu Price range \
0 No No No 3
1 No No No 3
2 No No No 4
3 No No No 4
4 No No No 4 ... ... ... ...
...
9546 No No No 3
9547 No No No 3
9548 No No No 4
9549 No No No 4 9550 No No
No 2
Aggregate rating Rating color Rating text Votes Country
0 4.8 Dark Green Excellent 314 Phillipines
1 4.5 Dark Green Excellent 591 Phillipines
2 4.4 Green Very Good 270 Phillipines
3 4.9 Dark Green Excellent 365 Phillipines
4 4.8 Dark Green Excellent 229 Phillipines ... ... ... ... ... ...
9546 4.1 Green Very Good 788 Turkey
9547 4.2 Green Very Good 1034 Turkey
9548 3.7 Yellow Good 661 Turkey
9549 4.0 Green Very Good 901 Turkey
9550 4.0 Green Very Good 591 Turkey

[9551 rows x 22 columns]

#7 Display the final column list df3.info()

<class 'pandas.core.frame.DataFrame'> Int64Index: 9551 entries,


0 to 9550
Data columns (total 22 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Restaurant ID 9551 non-null int64
1 Restaurant Name 9551 non-null object
2 Country Code 9551 non-null int64
3 City 9551 non-null object
4 Address 9551 non-null object
5 Locality 9551 non-null object
6 Locality Verbose 9551 non-null object
7 Longitude 9551 non-null float64
8 Latitude 9551 non-null float64
9 Cuisines 9542 non-null object
10 Average Cost for two 9551 non-null int64
11 Currency 9551 non-null object
12 Has Table booking 9551 non-null object
13 Has Online delivery 9551 non-null object
14 Is delivering now 9551 non-null object
15 Switch to order menu 9551 non-null object
16 Price range 9551 non-null int64
17 Aggregate rating 9551 non-null float64
18 Rating color 9551 non-null object
19 Rating text 9551 non-null object
20 Votes 9551 non-null int64 21 Country 9551 non-null object dtypes: float64(3),
int64(5), object(14) memory usage: 1.7+ MB

#8 Plot piechart (countryvalue vs label) labels1 =


df3.Country.value_counts().index values =
df3.Country.value_counts().values
plt.pie(values , labels = labels1, autopct = "%1.2f%%")

([<matplotlib.patches.Wedge at 0x7ee2bfe36bf0>,
<matplotlib.patches.Wedge at 0x7ee2bfe36ad0>,
<matplotlib.patches.Wedge at 0x7ee2bfe378e0>,
<matplotlib.patches.Wedge at 0x7ee2bfe37f70>,
<matplotlib.patches.Wedge at 0x7ee2bfe74640>,
<matplotlib.patches.Wedge at 0x7ee2bfe74cd0>,
<matplotlib.patches.Wedge at 0x7ee2bfe75360>,
<matplotlib.patches.Wedge at 0x7ee2bfe759c0>,
<matplotlib.patches.Wedge at 0x7ee2bfe76050>,
<matplotlib.patches.Wedge at 0x7ee2bfe766e0>,
<matplotlib.patches.Wedge at 0x7ee2bfe36bc0>,
<matplotlib.patches.Wedge at 0x7ee2bfe773d0>,
<matplotlib.patches.Wedge at 0x7ee2bfe77a60>,
<matplotlib.patches.Wedge at 0x7ee2bfeac130>,
<matplotlib.patches.Wedge at 0x7ee2bfeac7c0>],
[Text(-1.052256163793291, 0.3205572737577906, 'India'),
Text(0.9911329812843455, -0.477132490415823, 'United States'),
Text(1.0572858296119743, -0.3035567072257165, 'United Kingdom'),
Text(1.070138816916019, -0.2545641619112621, 'Brazil'),
Text(1.0793506814479759, -0.21213699926648824, 'UAE'),
Text(1.086881147244973, -0.16937937230799818, 'South Africa'),
Text(1.0918635911832035, -0.1335436192729486, 'New Zealand'),
Text(1.0947903814016446, -0.10692998078388304, 'Turkey'),
Text(1.096631023945382, -0.08602556201794338, 'Australia'),
Text(1.0978070729776455, -0.06942355882735218, 'Phillipines'),
Text(1.0986791544015209, -0.05388984768543213, 'Indonesia'),
Text(1.0993059848742366, -0.039068550263413035, 'Singapore'),
Text(1.0997248508282123, -0.02460187941736628, 'Qatar'),
Text(1.0999533462179636, -0.010130949802716446, 'Sri Lanka'),
Text(1.0999990477553414, -0.0014473898376707638, 'Canada')],
[Text(-0.5739579075236132, 0.17484942204970394, '90.59%'),
Text(0.5406179897914611, -0.260254085681358, '4.54%'),
Text(0.5767013616065314, -0.16557638575948172, '0.84%'),
Text(0.5837120819541921, -0.13885317922432475, '0.63%'),
Text(0.5887367353352595, -0.11571109050899356, '0.63%'),
Text(0.5928442621336216, -0.09238874853163535, '0.63%'),
Text(0.5955619588272019, -0.07284197414888105, '0.42%'),
Text(0.5971583898554425, -0.058325444063936194, '0.36%'),
Text(0.5981623766974811, -0.04692303382796911, '0.25%'),
Text(0.5988038579878066, -0.037867395724010273, '0.23%'),
Text(0.5992795387644659, -0.02939446237387207, '0.22%'),
Text(0.5996214462950381, -0.021310118325498017, '0.21%'),
Text(0.5998499186335702, -0.013419206954927062, '0.21%'),
Text(0.5999745524825255, -0.005525972619663515, '0.21%'),
Text(0.5999994805938226, -0.0007894853660022347, '0.04%')])
#9 Plot piechart for highest 3 countries plt.pie(values[:3], labels = labels1[:3],
autopct = "%1.2f%%")

([<matplotlib.patches.Wedge at 0x7ee2844c87c0>,
<matplotlib.patches.Wedge at 0x7ee2bfee3790>,
<matplotlib.patches.Wedge at 0x7ee2844c9390>],
[Text(-1.0829742700952103, 0.19278674827836725, 'India'),
Text(1.077281715838356, -0.22240527134123297, 'United States'),
Text(1.0995865153823035, -0.03015783794312073, 'United Kingdom')],
[Text(-0.590713238233751, 0.10515640815183668, '94.39%'), Text(0.5876082086391032, -
0.12131196618612707, '4.73%'),
Text(0.5997744629358018, -0.01644972978715676, '0.87%')])
#10 Groupby aggregate rating, rating color, rating text df4 = df3.groupby(['Aggregate
rating', 'Rating color', 'Rating text']).size().reset_index().rename(columns={0: 'Rating
Count'}) df4

Aggregate rating Rating color Rating text Rating Count


0 0.0 White Not rated 2148
1 1.8 Red Poor 1
2 1.9 Red Poor 2
3 2.0 Red Poor 7
4 2.1 Red Poor 15
5 2.2 Red Poor 27
6 2.3 Red Poor 47
7 2.4 Red Poor 87
8 2.5 Orange Average 110
9 2.6 Orange Average 191
10 2.7 Orange Average 250
11 2.8 Orange Average 315
12 2.9 Orange Average 381
13 3.0 Orange Average 468
14 3.1 Orange Average 519
15 3.2 Orange Average 522
16 3.3 Orange Average 483
17 3.4 Orange Average 498
18 3.5 Yellow Good 480
19 3.6 Yellow Good 458
20 3.7 Yellow Good 427
21 3.8 Yellow Good 400
22 3.9 Yellow Good 335
23 4.0 Green Very Good 266
24 4.1 Green Very Good 274
25 4.2 Green Very Good 221
26 4.3 Green Very Good 174
27 4.4 Green Very Good 144
28 4.5 Dark Green Excellent 95
29 4.6 Dark Green Excellent 78
30 4.7 Dark Green Excellent 42
31 4.8 Dark Green Excellent 25
32 4.9 Dark Green Excellent 61

#11 Plot bar aggregate rating vs rating count sns.barplot(data = df4 , x = "Aggregate rating",
y = "Rating Count")

<Axes: xlabel='Aggregate rating', ylabel='Rating Count'>

#12 Count plot rating color vs rating count sns.countplot(data = df4, x = 'Rating color',
hue = 'Rating Count')

<Axes: xlabel='Rating color', ylabel='count'>


#13 Which currency used by which country (groupby) ?

df5 = df3.groupby(['Country', 'Currency']).size().reset_index() df5

Country Currency 0
0 Australia Dollar($) 24
1 Brazil Brazilian Real(R$) 60
2 Canada Dollar($) 4
3 India Indian Rupees(Rs.) 8652
4 Indonesia Indonesian Rupiah(IDR) 21
5 New Zealand NewZealand($) 40
6 Phillipines Botswana Pula(P) 22
7 Qatar Qatari Rial(QR) 20
8 Singapore Dollar($) 20
9 South Africa Rand(R) 60
10 Sri Lanka Sri Lankan Rupee(LKR) 20
11 Turkey Turkish Lira(TL) 34
12 UAE Emirati Diram(AED) 60
13 United Kingdom Pounds(Σ) 80
14 United States Dollar($) 434

#14 Which country has online delivery options ?

df6 = df3.groupby(['Country', 'Has Online delivery']).size().reset_index() df6 = df6[df6['Has Online


delivery'] == 'Yes'] df6

Country Has Online delivery 0


4 India Yes 2423
14 UAE Yes 28
DAY 2 Assignment

from google.colab import drive


drive.mount('/content/drive') Mounted at
/content/drive

import pandas as pd import


matplotlib.pyplot as plt import numpy as
np import seaborn as sns
df = pd.read_csv('/content/drive/MyDrive/Summer Training/csv files/mle_dataset.csv') df.head()

Email \
0 [email protected]
1 [email protected]
2 [email protected]
3 [email protected]
4 [email protected]

Address Avatar \
0 835 Frank Tunnel Wrightmouth, MI 82180-9605 Violet
1 4547 Archer Common Diazchester, CA 06566-8576 DarkGreen
2 24645 Valerie Unions Suite 582 Cobbborough Bisque
3 1414 David Throughway Port Jason, OH 22070-1220 SaddleBrown 4 14023 Rodriguez Passage
Port Jacobville, PR MediumAquaMarine
Avg. Session Length Time on App Time on Website Length of Membership \
0 34.49 12.65 39.57 4.08
1 31.92 11.10 37.26 2.66
2 33.00 11.33 37.11 4.10
3 34.30 13.71 36.72 3.12 4 33.33 12.79 37.53
4.44
Yearly Amount Spent
0 587.95
1 392.42
2 487.54
3 581.85 4 599.64 df.describe()

Avg. Session Length Time on App Time on Website \ count 22.000000


22.000000 22.000000 mean 33.747091 12.386818 36.794091
std 1.196860 1.199748 1.840602 min 31.920000
10.370000 33.250000 25% 32.767500 11.335000 35.392500
50% 33.555000 12.565000 37.135000 75% 34.689500
13.405000 37.837500 max 35.890000 14.280000 39.740000

Length of Membership Yearly Amount Spent count


22.000000 22.000000 mean 3.588636 519.849545
std 0.652426 87.161733 min 2.450000
329.930000 25% 3.127500 462.902500
50% 3.620000 550.755000 75% 4.155000
594.272500 max 4.580000 634.180000

grp = sns.JointGrid(data = df, x = 'Time on Website', y= 'Yearly Amount Spent') grp.plot(sns.scatterplot, sns.histplot)
<seaborn.axisgrid.JointGrid at 0x7a80a03a30d0>

** Do the same but with the Time on App column instead. **


grp2 = sns.JointGrid(data = df, x = 'Time on App', y = 'Yearly Amount Spent') grp2.plot(sns.scatterplot, sns.histplot)

<seaborn.axisgrid.JointGrid at 0x7a809e01fd60>
** Use jointplot to create a 2D hex bin plot comparing Time on App and Length of Membership.**
grp3 = sns.jointplot(data = df, x = 'Time on App', y = 'Length of Membership', kind = "hex")
grp4 = sns.PairGrid(data =df) grp4.map_diag(sns.histplot)
grp4.map_offdiag(sns.scatterplot)

<seaborn.axisgrid.PairGrid at 0x7a809de3b5b0>
Create a linear model plot (using seaborn's lmplot) of Yearly Amount Spent vs. Length of Membership.

sns.set_style("whitegrid") sns.lmplot(data = df, x = "Yearly Amount Spent", y = "Length of


Membership")

<seaborn.axisgrid.FacetGrid at 0x7a809c8dfac0>
Training and Testing Data

Now that we've explored the data a bit, let's go ahead and split the data into training and testing sets. ** Set
a variable X equal to the numerical features of the customers and a variable y equal to the "Yearly
Amount Spent" column. ** from sklearn.model_selection import
train_test_split

X = df.iloc[:,3:-1] y =
df.iloc[:,-1]

** Use model_selection.train_test_split from sklearn to split the data into training and testing sets. Set
test_size=0.3 and random_state=101**
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.3, random_state = 101)

X_train

Avg. Session Length Time on App Time on Website Length of Membership


1 31.92 11.10 37.26 2.66
12 32.23 13.84 36.62 4.27
18 33.57 13.57 34.95 4.16
0 34.49 12.65 39.57 4.08
5 32.25 14.28 35.46 4.58
14 35.89 10.37 33.58 4.26
4 33.33 12.79 37.53 4.44
8 33.54 13.45 35.12 3.42
13 33.11 11.74 37.94 3.25
9 35.45 10.45 33.25 2.96
15 34.45 13.27 35.92 2.45
21 32.69 13.27 35.37 3.18
6 35.48 13.54 37.26 3.15
17 32.24 11.35 38.82 2.68 11 33.73 11.26 39.74
4.18

Training the Model

Now its time to train our model on our training data!


** Import LinearRegression from sklearn.linear_model **
from sklearn.linear_model import LinearRegression

Create an instance of a LinearRegression() model named lm.


reg = LinearRegression()

** Train/fit lm on the training data.**


reg.fit(X_train, y_train)

LinearRegression()

Print out the coefficients of the model

print(reg.intercept_) print(reg.coef_)

-248.07210567565767
[-1.31257805 12.61911111 14.43841382 38.29792225]

Predicting Test Data

Now that we have fit our model, let's evaluate its performance by predicting off the test values!
** Use lm.predict() to predict off the X_test set of the data.**
y_pred = reg.predict(X_test)

** Create a scatterplot of the real test values versus the predicted values. **
plt.scatter(y_test, y_pred) plt.xlabel("Y Test")
plt.ylabel("Predicted Y") plt.show()
Evaluating the Model
from sklearn import metrics

print('MeanAbsoluteError:', metrics.mean_absolute_error(y_test, y_pred)) print('MeanSquareError:',


metrics.mean_squared_error(y_test, y_pred)) print('RootMeanSquareError:',
np.sqrt(metrics.mean_squared_error(y_test, y_pred)))

MeanAbsoluteError: 88.29997116869201
MeanSquareError: 11725.316780593896
RootMeanSquareError: 108.28350188553146
KNN Model Assignment

# Manually coding the kNN algorithm.

import numpy as np
from collections import Counter
def euclid_distance(x1, x2):

distance = np.sqrt(np.sum((x1-x2)**2)) return distance

class KNN:

def __init__(self, k=3):

self.k = k
def fit(self, X, y):

self.X_train = X self.y_train = y
def predict(self, X):

predictions = [self._predict(x) for x in X] return predictions


def _predict(self, x):

# compute the distance


distances = [euclid_distance(x, x_train) for x_train in self.X_train]
# get the closest k
k_indices = np.argsort(distances)[:self.k]
k_nearest_labels = [self.y_train[i] for i in k_indices]

# majority voye
most_common = Counter(k_nearest_labels).most_common() return
most_common[0][0]

# Importing necessary libraries and making a euclidean distance function for calculating distances.

from sklearn.metrics import *


from sklearn.neighbors import KNeighborsClassifier from numpy.linalg
import norm
def euclidean(a, b): return norm(a-b) # Computing the euclidean distance between a and b

# Testing the kNN model

raw_data = np.array([[1, 2, 1], [3, 2, 1], [2, 4, 1], [3, 3, 1], [2, 5,
1], [-1, -2, 0],
[-3, -2, 0], [-2, -4, 0], [-3, -3, 0],
[-2, -5, 0]], dtype = float)
X = raw_data[:, :2] y =
raw_data[:, -1]
model = KNeighborsClassifier(n_neighbors = 3, metric = euclidean)

model.fit(X, y)

print("Value: {},\tPrediction: {}".format([1, 0], model.predict(np.array([[1, 0]]))))

print("Value: {},\tPrediction: {}".format([0, 1], model.predict(np.array([[0, 1]]))))

print("Value: {},\tPrediction: {}".format([0, 0], model.predict(np.array([[0, 0]]))))

print("Value: {},\tPrediction: {}".format([-1, 0], model.predict(np.array([[-1,


0]]))))

print("Value: {},\tPrediction: {}".format([0, -1], model.predict(np.array([[0, -


1]]))))

Value: [1, 0], Prediction: [1.]


Value: [0, 1], Prediction: [1.]
Value: [0, 0], Prediction: [1.]
Value: [-1, 0], Prediction: [0.]
Value: [0, -1], Prediction: [0.]

SVM Model Assignment

# Importing necessary libraries.


import pandas as pd import numpy
as np
from sklearn.model_selection import train_test_split from
sklearn.metrics import classification_report from sklearn.svm import
SVC

# importing and creating dataframe for banana dataset.

df = pd.read_csv('/content/drive/MyDrive/Summer Training/csv files/banana_dataset.csv')

x = df[['At1','At2']].to_numpy() y =
df['Class'].to_numpy()

# Splitting the dataset for training and testing

x_train, x_test, y_train, y_test = train_test_split(x,y,test_size = 0.30, random_state


= 1)

# Creating Linear SVM kernel.


linear_svc = SVC(kernel = 'linear')
linear_svc = linear_svc.fit(x_train, y_train)

y_pred = linear_svc.predict(x_test)
print(classification_report(y_test, y_pred))

precision recall f1-score support

-1 0.55 1.00 0.71 869


1 0.00 0.00 0.00 721

accuracy 0.55 1590 macro avg 0.27 0.50


0.35 1590 weighted avg 0.30 0.55 0.39 1590

/usr/local/lib/python3.10/dist-packages/sklearn/metrics/_classification.py:1344: UndefinedMetricWarning: Precision


and F-score are ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to
control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
/usr/local/lib/python3.10/dist-packages/sklearn/metrics/_classification.py:1344: UndefinedMetricWarning: Precision
and F-score are ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to
control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
/usr/local/lib/python3.10/dist-packages/sklearn/metrics/_classification.py:1344: UndefinedMetricWarning: Precision
and F-score are ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to
control this behavior.
_warn_prf(average, modifier, msg_start, len(result))

# Creating Sigmoid svm model

sigmoid_svc = SVC(kernel ='sigmoid') sigmoid_svc =


sigmoid_svc.fit(x_train, y_train)

y_pred = sigmoid_svc.predict(x_test)
print(classification_report(y_test, y_pred))

precision recall f1-score support

-1 0.36 0.38 0.37 869


1 0.21 0.20 0.21 721

accuracy 0.30 1590 macro avg 0.29 0.29


0.29 1590 weighted avg 0.29 0.30 0.30 1590

# Creating a Polynomial svm model

poly_svc = SVC(kernel ='poly') poly_svc =


poly_svc.fit(x_train, y_train)

y_pred = poly_svc.predict(x_test)
print(classification_report(y_test, y_pred))

precision recall f1-score support


-1 0.61 0.87 0.72 869
1 0.69 0.34 0.45 721
accuracy 0.63 1590 macro avg 0.65 0.60
0.59 1590 weighted avg 0.65 0.63 0.60 1590

# Creating a RBF svm model

rbf_svc = SVC(kernel ='poly') rbf_svc =


rbf_svc.fit(x_train, y_train)

y_pred_rbf = rbf_svc.predict(x_test)
print(classification_report(y_test, y_pred_rbf))

precision recall f1-score support

-1 0.61 0.87 0.72 869


1 0.69 0.34 0.45 721

accuracy 0.63 1590 macro avg 0.65 0.60


0.59 1590 weighted avg 0.65 0.63 0.60 1590

Naïve Bayes Classifier Model Assignment

import pandas as pd import numpy as np


import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split from
sklearn.metrics import classification_report from sklearn.linear_model
import LogisticRegression from sklearn.naive_bayes import
GaussianNB

# Importing the iris dataset

df = pd.read_csv('/content/drive/MyDrive/Summer Training/csv files/Iris.csv') df

Id SepalLengthCm SepalWidthCm PetalLengthCm PetalWidthCm \


0 1 5.1 3.5 1.4 0.2
1 2 4.9 3.0 1.4 0.2
2 3 4.7 3.2 1.3 0.2
3 4 4.6 3.1 1.5 0.2
4 5 5.0 3.6 1.4 0.2 .. ... ... ... ... ...
145 146 6.7 3.0 5.2 2.3
146 147 6.3 2.5 5.0 1.9
147 148 6.5 3.0 5.2 2.0
148 149 6.2 3.4 5.4 2.3 149 150 5.9 3.0 5.1 1.8

Species
0 Iris-setosa
1 Iris-setosa
2 Iris-setosa
3 Iris-setosa
4 Iris-setosa .. ...
145 Iris-virginica
146 Iris-virginica
147 Iris-virginica
148 Iris-virginica
149 Iris-virginica

[150 rows x 6 columns]

# Preprocessing the dataset

X = df[['SepalLengthCm','PetalLengthCm','PetalWidthCm']].to_numpy()

# Import label encoder

from sklearn import preprocessing

label_encoder = preprocessing.LabelEncoder() df['Species']=


label_encoder.fit_transform(df['Species']) y = df.Species.to_numpy()

X_train, X_test, y_train, y_test = train_test_split(X,y, test_size = 0.3, random_state


= 1)

# Training the dataset on Naive Bayes model

nb_model = GaussianNB()
nb_model = nb_model.fit(X_train, y_train)

y_pred = nb_model.predict(X_test) print(classification_report(y_test, y_pred))

precision recall f1-score support

0 1.00 1.00 1.00 14


1 0.94 0.94 0.94 18
2 0.92 0.92 0.92 13

accuracy 0.96 45 macro avg 0.96 0.96 0.96


45 weighted avg 0.96 0.96 0.96 45

# Training the Dataset on Logistic Regression model

lg_model = LogisticRegression() lg_model =


lg_model.fit(X_train, y_train)

y_pred = lg_model.predict(X_test) print(classification_report(y_test, y_pred))


precision recall f1-score support
0 1.00 1.00 1.00 14
1 1.00 0.94 0.97 18 2 0.93 1.00 0.96 13

accuracy 0.98 45 macro avg 0.98 0.98


0.98 45 weighted avg 0.98 0.98 0.98 45

Chapter 3- Deep Learning

Deep learning is a subset of machine learning that uses artificial neural networks to learn from data. Neural
networks are inspired by the human brain, and they are made up of layers of interconnected nodes. Each
node performs a simple calculation, and the output of each node is passed to the nodes in the next layer.

Python is a popular programming language for many different tasks, including data science and machine
learning. It is easy to learn and use, and it has a large and active community of developers. Python also has
many powerful libraries for deep learning, such as TensorFlow, Keras, and PyTorch.

What are the steps involved in deep learning with Python?

The steps involved in deep learning with Python are:

1. Gather data. The first step is to gather the data that you want to train your model on. This data can
be images, text, audio, or any other type of data.
2. Prepare the data. The data needs to be prepared before it can be used to train the model. This involves
cleaning the data, removing any errors or outliers, and transforming the data into a format that the
model can understand.
3. Choose a model. There are many different types of neural networks that can be used for deep
learning. The choice of model depends on the task that you are trying to solve.
4. Train the model. The model is trained by feeding it the prepared data. The model will learn to extract
features from the data and to make predictions.
5. Evaluate the model. Once the model is trained, it needs to be evaluated to see how well it performs.
This is done by testing the model on a held-out set of data that it has not seen before.
6. Deploy the model. Once the model is evaluated and found to be satisfactory, it can be deployed to
production. This means that the model can be used to make predictions on new data.

Deep learning is used in a wide variety of applications, including:

• Computer vision:

Deep learning is used to recognize objects in images and videos. This is used in applications such as
self-driving cars, facial recognition, and image search.

• Natural language processing:


Deep learning is used to understand text and to generate text. This is used in applications such as
machine translation, chatbots, and text summarization.

• Speech recognition:

Deep learning is used to recognize speech. This is used in applications such as voice assistants,
dictation software, and call centers.

• Generative models:

Deep learning is used to create artificial data that is similar to real data. This is used in applications
such as image generation, text generation, and music generation.

Architectures

• Deep Neural Networks


It is a neural network that incorporates the complexity of a certain level, which means several numbers of
hidden layers are encompassed in between the input and output layers. They are highly proficient on model
and process non-linear associations.
• Deep Belief Networks
A deep belief network is a class of Deep Neural Network that comprises of multi-layer belief networks.
Steps to perform DBN:
o With the help of the Contrastive Divergence algorithm, a layer of features is learned from perceptible
units.
o Next, the formerly trained features are treated as visible units, which perform learning of features. o
Lastly, when the learning of the final hidden layer is accomplished, then the whole DBN is trained.
• Recurrent Neural Networks
It permits parallel as well as sequential computation, and it is exactly similar to that of the human brain
(large feedback network of connected neurons). Since they are capable enough to reminisce all of the
imperative things related to the input they have received, so they are more precise.

Types of Deep Learning Networks

1. Feed Forward Neural Network


A feed-forward neural network is none other than an Artificial Neural Network, which ensures that the nodes do not
form a cycle. In this kind of neural network, all the perceptrons are organized within layers, such that the input layer
takes the input, and the output layer generates the output. Since the hidden layers do not link with the outside world,
it is named as hidden layers. Each of the perceptrons contained in one single layer is associated with each node in the
subsequent layer. It can be concluded that all of the nodes are fully connected. It does not contain any visible or
invisible connection between the nodes in the same layer. There are no back-loops in the feed-forward network. To
minimize the prediction error, the backpropagation algorithm can be used to update the weight values.

Applications:

• Data Compression
• Pattern Recognition
• Computer Vision
• Sonar Target Recognition
• Speech Recognition
• Handwritten Characters Recognition

2. Recurrent Neural Network


Recurrent neural networks are yet another variation of feed-forward networks. Here each of the neurons present in the
hidden layers receives an input with a specific delay in time. The Recurrent neural network mainly accesses the
preceding info of existing iterations. For example, to guess the succeeding word in any sentence, one must have
knowledge about the words that were previously used. It not only processes the inputs but also shares the length as
well as weights crossways time. It does not let the size of the model to increase with the increase in the input size.
However, the only problem with this recurrent neural network is that it has slow computational speed as well as it
does not contemplate any future input for the current state. It has a problem with reminiscing prior information.

Applications:

• Machine Translation
• Robot Control
• Time Series Prediction

• Speech Recognition
• Speech Synthesis
• Time Series Anomaly Detection
• Rhythm Learning
• Music Composition

3. Convolutional Neural Network


Convolutional Neural Networks are a special kind of neural network mainly used for image classification, clustering
of images and object recognition. DNNs enable unsupervised construction of hierarchical image representations. To
achieve the best accuracy, deep convolutional neural networks are preferred more than any other neural network.

Applications:

• Identify Faces, Street Signs, Tumors.


• Image Recognition.
• Video Analysis.
• NLP.
• Anomaly Detection.
• Drug Discovery.
• Checkers Game.

4. Restricted Boltzmann Machine


RBMs are yet another variant of Boltzmann Machines. Here the neurons present in the input layer and the hidden layer
encompasses symmetric connections amid them. However, there is no internal association within the respective layer.
But in contrast to RBM, Boltzmann machines do encompass internal connections inside the hidden layer. These
restrictions in BMs helps the model to train efficiently.
Applications:

• Filtering.
• Feature Learning.
• Classification.
• Risk Detection.
• Business and Economic analysis.

5. Autoencoders
An autoencoder neural network is another kind of unsupervised machine learning algorithm. Here the number of
hidden cells is merely small than that of the input cells. But the number of input cells is equivalent to the number of
output cells. An autoencoder network is trained to display the output similar to the fed input to force AEs to find
common patterns and generalize the data. The autoencoders are mainly used for the smaller representation of the input.
It helps in the reconstruction of the original data from compressed data. This algorithm is comparatively simple as it
only necessitates the output identical to the input.

Encoder: Convert input data in lower dimensions.

Decoder: Reconstruct the compressed data.

Applications:

• Classification.
• Clustering.
• Feature Compression.

Deep learning applications

• Self-Driving Cars
In self-driven cars, it is able to capture the images around it by processing a huge amount of data, and then it
will decide which actions should be incorporated to take a left or right or should it stop. So, accordingly, it
will decide what actions it should take, which will further reduce the accidents that happen every year.
• Voice Controlled Assistance
When we talk about voice control assistance, then Siri is the one thing that comes into our mind. So, you can
tell Siri whatever you want it to do it for you, and it will search it for you and display it for you.
• Automatic Image Caption Generation
Whatever image that you upload, the algorithm will work in such a way that it will generate caption
accordingly. If you say blue coloured eye, it will display a blue-coloured eye with a caption at the bottom
of the image.
• Automatic Machine Translation
With the help of automatic machine translation, we are able to convert one language into another with the help
of deep learning.

Limitations

• It only learns through the observations.


• It comprises of biases issues.

Advantages

• It lessens the need for feature engineering.


• It eradicates all those costs that are needless.
• It easily identifies difficult defects.
• It results in the best-in-class performance on problems.

Disadvantages

• It requires an ample amount of data.


• It is quite expensive to train.
• It does not have strong theoretical groundwork.

Artificial Neural Networks

Artificial Neural Networks are the computing system that is designed to simulate the way the human brain
analyzes and processes the information. Artificial Neural Networks have self-learning capabilities that
enable it to produce a better result as more data become available. So, if the network is trained on more data,
it will be more accurate because these neural networks learn from the examples. The neural network can be
configured for specific applications like data classification, pattern recognition, etc.

With the help of the neural network, we can actually see that a lot of technology has been evolved from
translating webpages to other languages to having a virtual assistant to order groceries online. All of these
things are possible because of neural networks. So, an artificial neural network is nothing but a network of
various artificial neurons.

Convolutional Neural Network

Convolutional Neural Networks are a special type of feed-forward artificial neural network in which the
connectivity pattern between its neuron is inspired by the visual cortex.

The visual cortex encompasses a small region of cells that are region sensitive to visual fields. In case some
certain orientation edges are present then only some individual neuronal cells get fired inside the brain such
as some neurons responds as and when they get exposed to the vertical edges, however some responds when
they are shown to horizontal or diagonal edges, which is nothing but the motivation behind Convolutional
Neural Networks.

The Convolutional Neural Networks, which are also called as covnets, are nothing but neural networks,
sharing their parameters.

Recurrent Neural Networks

Recurrent Networks are one such kind of artificial neural network that are mainly intended to identify
patterns in data sequences, such as text, genomes, handwriting, the spoken word, numerical times series data
emanating from sensors, stock markets, and government agencies.
DL ASSIGNMENT 1 (XOR gate implementation using perceptron)

import tensorflow as tf import numpy


as np from tensorflow import keras

x_train = [[0,0], [0,1], [1,0], [1,1]] y_train = [0, 1, 1, 0]

from keras.api._v2.keras import activations xor_model =


keras.Sequential([
keras.layers.Dense(units = 2, input_dim = 2, activation = 'sigmoid'), keras.layers.Dense(units = 1, activation =
'sigmoid')
])

xor_model.compile(loss = 'binary_crossentropy', optimizer = 'adam', metrics =


['accuracy']) xor_model.fit(x_train, y_train, epochs = 3000, verbose = 0)

<keras.callbacks.History at 0x7e7c1866ead0>

loss, accuracy = xor_model.evaluate(x_train, y_train) print(f"Loss is: {loss}, Accuracy


is: {accuracy}\n")

1/1 [==============================] - 0s 286ms/step - loss: 0.3394 - accuracy: 1.0000


Loss is: 0.3393554091453552, Accuracy is: 1.0

inp = [1,0]
y_pred = xor_model.predict([(inp)]) print(f"Prediction of model when entering
{inp} is: {y_pred}")

1/1 [==============================] - 0s 110ms/step


Prediction of model when entering [1, 0] is: [[0.74665815]]

DL ASSIGNMENT 2 (Multi Layer Neural Network)

# Importing necessary libraries import pandas as pd import


numpy as np import matplotlib.pyplot as plt import tensorflow
as tf from tensorflow import keras from keras import Sequential
from keras.layers import Dense, Dropout, Flatten

# Loading the MNIST dataset for training and testing.


(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()
# Cast the records into float values x_train =
x_train.astype('float32') x_test =
x_test.astype('float32')

Downloading data from https://storage.googleapis.com/tensorflow/tf-kerasdatasets/mnist.npz


11490434/11490434 [==============================] - 0s 0us/step

# normalize image pixel values by dividing by 255 x_train /= 255


x_test /= 255

fig, ax = plt.subplots(10, 10) k = 0 for i


in range(10):
for j in range(10):
ax[i][j].imshow(x_train[k].reshape(28, 28),
aspect='auto')
k += 1 plt.show()
model = Sequential([
Flatten(input_shape=(28, 28)),
#layer 1
Dense(256, activation='sigmoid'),
#dropout layer Dropout(0.25),
#layer 2
Dense(128, activation='sigmoid'),
#output layer
Dense(10, activation='sigmoid'),
]) model.compile(
optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy']
)

history = model.fit(x_train, y_train, epochs=50,


batch_size=2000,
validation_split=0.2)
model.predict(x_test)

Epoch 1/50
24/24 [==============================] - 3s 66ms/step - loss: 2.0725 - accuracy:
0.3658 - val_loss: 1.7010 - val_accuracy: 0.7056
Epoch 2/50
24/24 [==============================] - 1s 55ms/step - loss: 1.3995 - accuracy:
0.6957 - val_loss: 1.0266 - val_accuracy: 0.8087
Epoch 3/50
24/24 [==============================] - 1s 56ms/step - loss: 0.8940 - accuracy:
0.7964 - val_loss: 0.6658 - val_accuracy: 0.8571
Epoch 4/50
24/24 [==============================] - 2s 101ms/step - loss: 0.6354 - accuracy:
0.8457 - val_loss: 0.4943 - val_accuracy: 0.8842
Epoch 5/50
24/24 [==============================] - 2s 63ms/step - loss: 0.5053 - accuracy:
0.8691 - val_loss: 0.4056 - val_accuracy: 0.8982
Epoch 6/50
24/24 [==============================] - 1s 55ms/step - loss: 0.4326 - accuracy:
0.8846 - val_loss: 0.3558 - val_accuracy: 0.9030
Epoch 7/50
24/24 [==============================] - 1s 56ms/step - loss: 0.3845 - accuracy:
0.8953 - val_loss: 0.3212 - val_accuracy: 0.9116
Epoch 8/50
24/24 [==============================] - 1s 55ms/step - loss: 0.3512 - accuracy:
0.9030 - val_loss: 0.2964 - val_accuracy: 0.9187
Epoch 9/50
24/24 [==============================] - 1s 56ms/step - loss: 0.3283 - accuracy:
0.9074 - val_loss: 0.2774 - val_accuracy: 0.9218
Epoch 10/50
24/24 [==============================] - 1s 56ms/step - loss: 0.3092 - accuracy:
0.9116 - val_loss: 0.2629 - val_accuracy: 0.9240
Epoch 11/50
24/24 [==============================] - 1s 55ms/step - loss: 0.2919 - accuracy:
0.9164 - val_loss: 0.2500 - val_accuracy: 0.9279
Epoch 12/50
24/24 [==============================] - 2s 69ms/step - loss: 0.2778 - accuracy:
0.9199 - val_loss: 0.2391 - val_accuracy: 0.9301
Epoch 13/50
24/24 [==============================] - 2s 95ms/step - loss: 0.2651 - accuracy:
0.9238 - val_loss: 0.2301 - val_accuracy: 0.9328
Epoch 14/50
24/24 [==============================] - 1s 55ms/step - loss: 0.2540 - accuracy:
0.9256 - val_loss: 0.2211 - val_accuracy: 0.9358
Epoch 15/50
24/24 [==============================] - 1s 56ms/step - loss: 0.2438 - accuracy:
0.9287 - val_loss: 0.2145 - val_accuracy: 0.9366
Epoch 16/50
24/24 [==============================] - 1s 55ms/step - loss: 0.2347 - accuracy:
0.9318 - val_loss: 0.2069 - val_accuracy: 0.9399
Epoch 17/50
24/24 [==============================] - 1s 55ms/step - loss: 0.2266 - accuracy:
0.9342 - val_loss: 0.1999 - val_accuracy: 0.9426
Epoch 18/50
24/24 [==============================] - 1s 54ms/step - loss: 0.2178 - accuracy:
0.9359 - val_loss: 0.1943 - val_accuracy: 0.9422
Epoch 19/50
24/24 [==============================] - 1s 55ms/step - loss: 0.2103 - accuracy:
0.9386 - val_loss: 0.1876 - val_accuracy: 0.9456
Epoch 20/50
24/24 [==============================] - 1s 54ms/step - loss: 0.2031 - accuracy:
0.9412 - val_loss: 0.1823 - val_accuracy: 0.9481
Epoch 21/50
24/24 [==============================] - 2s 77ms/step - loss: 0.1958 - accuracy:
0.9429 - val_loss: 0.1768 - val_accuracy: 0.9499
Epoch 22/50
24/24 [==============================] - 2s 83ms/step - loss: 0.1903 - accuracy:
0.9445 - val_loss: 0.1729 - val_accuracy: 0.9513
Epoch 23/50
24/24 [==============================] - 1s 54ms/step - loss: 0.1843 - accuracy:
0.9461 - val_loss: 0.1673 - val_accuracy: 0.9523
Epoch 24/50
24/24 [==============================] - 1s 56ms/step - loss: 0.1772 - accuracy:
0.9480 - val_loss: 0.1635 - val_accuracy: 0.9534
Epoch 25/50
24/24 [==============================] - 1s 56ms/step - loss: 0.1724 - accuracy:
0.9495 - val_loss: 0.1589 - val_accuracy: 0.9542
Epoch 26/50
24/24 [==============================] - 1s 55ms/step - loss: 0.1664 - accuracy:
0.9515 - val_loss: 0.1560 - val_accuracy: 0.9551
Epoch 27/50
24/24 [==============================] - 1s 54ms/step - loss: 0.1628 - accuracy:
0.9521 - val_loss: 0.1525 - val_accuracy: 0.9554
Epoch 28/50
24/24 [==============================] - 1s 54ms/step - loss: 0.1566 - accuracy:
0.9534 - val_loss: 0.1481 - val_accuracy: 0.9572
Epoch 29/50
24/24 [==============================] - 1s 53ms/step - loss: 0.1515 - accuracy:
0.9555 - val_loss: 0.1453 - val_accuracy: 0.9576
Epoch 30/50
24/24 [==============================] - 2s 80ms/step - loss: 0.1471 - accuracy:
0.9573 - val_loss: 0.1422 - val_accuracy: 0.9578
Epoch 31/50
24/24 [==============================] - 2s 81ms/step - loss: 0.1438 - accuracy:
0.9578 - val_loss: 0.1389 - val_accuracy: 0.9594
Epoch 32/50
24/24 [==============================] - 1s 54ms/step - loss: 0.1397 - accuracy:
0.9589 - val_loss: 0.1355 - val_accuracy: 0.9597
Epoch 33/50
24/24 [==============================] - 1s 55ms/step - loss: 0.1360 - accuracy:
0.9605 - val_loss: 0.1331 - val_accuracy: 0.9603
Epoch 34/50
24/24 [==============================] - 1s 56ms/step - loss: 0.1330 - accuracy:
0.9612 - val_loss: 0.1309 - val_accuracy: 0.9615
Epoch 35/50
24/24 [==============================] - 1s 55ms/step - loss: 0.1272 - accuracy:
0.9621 - val_loss: 0.1281 - val_accuracy: 0.9616
Epoch 36/50
24/24 [==============================] - 1s 55ms/step - loss: 0.1243 - accuracy:
0.9641 - val_loss: 0.1258 - val_accuracy: 0.9620
Epoch 37/50
24/24 [==============================] - 1s 55ms/step - loss: 0.1216 - accuracy:
0.9646 - val_loss: 0.1233 - val_accuracy: 0.9638
Epoch 38/50
24/24 [==============================] - 1s 56ms/step - loss: 0.1186 - accuracy:
0.9651 - val_loss: 0.1219 - val_accuracy: 0.9645
Epoch 39/50
24/24 [==============================] - 2s 94ms/step - loss: 0.1163 - accuracy:
0.9663 - val_loss: 0.1193 - val_accuracy: 0.9649
Epoch 40/50
24/24 [==============================] - 2s 73ms/step - loss: 0.1121 - accuracy:
0.9674 - val_loss: 0.1171 - val_accuracy: 0.9646 Epoch 41/50
24/24 [==============================] - 1s 55ms/step - loss: 0.1101 - accuracy:
0.9683 - val_loss: 0.1153 - val_accuracy: 0.9649
Epoch 42/50
24/24 [==============================] - 2s 72ms/step - loss: 0.1062 - accuracy:
0.9695 - val_loss: 0.1135 - val_accuracy: 0.9663
Epoch 43/50
24/24 [==============================] - 1s 55ms/step - loss: 0.1039 - accuracy:
0.9695 - val_loss: 0.1113 - val_accuracy: 0.9669
Epoch 44/50
24/24 [==============================] - 1s 56ms/step - loss: 0.1013 - accuracy:
0.9712 - val_loss: 0.1097 - val_accuracy: 0.9675
Epoch 45/50
24/24 [==============================] - 1s 56ms/step - loss: 0.0984 - accuracy:
0.9722 - val_loss: 0.1092 - val_accuracy: 0.9672
Epoch 46/50
24/24 [==============================] - 1s 54ms/step - loss: 0.0962 - accuracy:
0.9720 - val_loss: 0.1071 - val_accuracy: 0.9688
Epoch 47/50
24/24 [==============================] - 2s 65ms/step - loss: 0.0941 - accuracy:
0.9722 - val_loss: 0.1054 - val_accuracy: 0.9685
Epoch 48/50
24/24 [==============================] - 2s 96ms/step - loss: 0.0911 - accuracy:
0.9738 - val_loss: 0.1034 - val_accuracy: 0.9691
Epoch 49/50
24/24 [==============================] - 1s 57ms/step - loss: 0.0902 - accuracy:
0.9732 - val_loss: 0.1027 - val_accuracy: 0.9696
Epoch 50/50
24/24 [==============================] - 1s 56ms/step - loss: 0.0881 - accuracy:
0.9735 - val_loss: 0.1016 - val_accuracy: 0.9694 313/313
[==============================] - 1s 3ms/step

array([[2.8935158e-01, 3.2773522e-01, 8.3881783e-01, ..., 9.9995828e-01,


4.8661761e-02, 7.5276440e-01],
[5.7332432e-01, 9.6474677e-01, 9.9990517e-01, ..., 2.5647560e-02, 2.4458456e-01, 2.2719693e-04],
[5.1329401e-03, 9.9979448e-01, 5.6902397e-01, ..., 6.9995368e-01,
5.3165650e-01, 2.9427910e-02],
...,
[9.8780505e-03, 4.7454089e-02, 4.4409815e-02, ..., 6.8833405e-01,
6.4510244e-01, 9.7488630e-01],
[2.0966947e-01, 6.1044776e-01, 4.8755817e-03, ..., 1.5530500e-02,
9.5872200e-01, 6.2541336e-02],
[5.0834352e-01, 7.7532850e-02, 6.3098377e-01, ..., 1.2406199e-02,
1.5494260e-01, 4.1652426e-02]], dtype=float32) Analysis Report
model.summary()
print('the accuracy on 30th epoch is: ' ,history.history['accuracy'][29]) print('the accuracy on 50th epoch is: '
,history.history['accuracy'][49]) print('the accuracy on 30th epoch is: ' ,history.history['loss'][29]) print('the
accuracy on 50th epoch is: ' ,history.history['loss'][49])

Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
===============================================================
== flatten (Flatten) (None, 784) 0
dense (Dense) (None, 256) 200960
dropout (Dropout) (None, 256) 0
dense_1 (Dense) (None, 128) 32896
dense_2 (Dense) (None, 10) 1290
=================================================================
Total params: 235,146
Trainable params: 235,146
Non-trainable params: 0
_________________________________________________________________ the
accuracy on 30th epoch is: 0.9573125243186951 the accuracy on 50th epoch is:
0.973520815372467 the accuracy on 30th epoch is: 0.14712917804718018 the
accuracy on 50th epoch is: 0.08809798955917358

# summarize history for accuracy


plt.plot(history.history['accuracy'])
plt.plot(history.history['val_accuracy']) plt.title('model
accuracy') plt.ylabel('accuracy') plt.xlabel('epoch')
plt.legend(['train', 'test'], loc='upper left') plt.show()
# summarize history for loss
plt.plot(history.history['loss'])
plt.plot(history.history['val_loss']) plt.title('model
loss') plt.ylabel('loss') plt.xlabel('epoch')
plt.legend(['train', 'test'], loc='upper left') plt.show()
DL ASSIGNMENT 3 (CNN Model)

Importing necessary libraries


import pandas as pd import numpy as np
import matplotlib.pyplot as plt import random
import tensorflow as tf from tensorflow import
keras from keras.models import Sequential
from keras import layers, models from keras
import optimizers
from keras.preprocessing.image import ImageDataGenerator
import cv2 import os,
shutil
os.environ["CUDA_DEVICE_ORDER"]="PCI_BUS_ID" #
The GPU id to use, usually either "0" or "1"
os.environ["CUDA_VISIBLE_DEVICES"]="0"

Loading the dataset folder


dataset_path = '/content/drive/MyDrive/Summer Training/DATASETS/caltech-
256/256_ObjectCategories' folder_names = []
folder_names = [f for f in
sorted(os.listdir(dataset_path))]
print(len(folder_names)) # 257 = 256 categories + background

258

Loading some example images


image_path = '/content/drive/MyDrive/Summer Training/DATASETS/caltech-
256/256_ObjectCategories/069.fighter-jet/069_0014.jpg' image =
cv2.imread(image_path) plt.imshow(image) plt.show()

print(image.shape) (321,

432, 3) Creating
directories
category_dict = {} images_per_category_dict = {}
category_images_path_dict = {}

total_images = 0
for i, category in enumerate(folder_names):
category_dict[i] = category

folder_path = dataset_path + '/' + category


#image_names = [os.path.join(folder_path, img) for img in sorted(os.listdir(folder_path))]
image_names = [img for img in sorted(os.listdir(folder_path))]

images_per_category_dict[i] = len(image_names) category_images_path_dict[i] = image_names

print('%s: %d' %(category, images_per_category_dict[i])) total_images +=


images_per_category_dict[i]

print('Total images in dataset: %d' %(total_images))


001.ak47: 98
002.american-flag: 97
003.backpack: 151
004.baseball-bat: 127
005.baseball-glove: 148
006.basketball-hoop: 90
007.bat: 106
008.bathtub: 232
009.bear: 102
010.beer-mug: 94
011.billiards: 278
012.binoculars: 216
013.birdbath: 98
014.blimp: 86
015.bonsai-101: 122
016.boom-box: 91
017.bowling-ball: 104 018.bowling-pin: 101
019.boxing-glove: 124
020.brain-101: 83
021.breadmaker: 142
022.buddha-101: 97
023.bulldozer: 110
024.butterfly: 112
025.cactus: 114
026.cake: 106
027.calculator: 100
028.camel: 110
029.cannon: 103
030.canoe: 104
031.car-tire: 90
032.cartman: 101
033.cd: 102
034.centipede: 100
035.cereal-box: 87
036.chandelier-101: 106
037.chess-board: 120
038.chimp: 110
039.chopsticks: 85
040.cockroach: 124
041.coffee-mug: 87
042.coffin: 87 043.coin: 124
044.comet: 121
045.computer-keyboard: 85
046.computer-monitor: 133
047.computer-mouse: 94
048.conch: 103
049.cormorant: 116
050.covered-wagon: 97
051.cowboy-hat: 114
052.crab-101: 85
053.desk-globe: 82
054.diamond-ring: 118
055.dice: 98
056.dog: 103
057.dolphin-101: 106
058.doorknob: 93
059.drinking-straw: 83 060.duck: 87
061.dumb-bell: 102
062.eiffel-tower: 83
063.electric-guitar-101: 122
064.elephant-101: 131
065.elk: 101
066.ewer-101: 83
067.eyeglasses: 83
068.fern: 110
069.fighter-jet: 99
070.fire-extinguisher: 84
071.fire-hydrant: 99
072.fire-truck: 118 073.fireworks: 100
074.flashlight: 115
075.floppy-disk: 83
076.football-helmet: 84
077.french-horn: 92
078.fried-egg: 90
079.frisbee: 99
080.frog: 116
081.frying-pan: 95
082.galaxy: 81
083.gas-pump: 95
084.giraffe: 84
085.goat: 112
086.golden-gate-bridge: 80
087.goldfish: 93
088.golf-ball: 98
089.goose: 110
090.gorilla: 212
091.grand-piano-101: 95
092.grapes: 201
093.grasshopper: 112
094.guitar-pick: 104
095.hamburger: 86 096.hammock: 285
097.harmonica: 89
098.harp: 100
099.harpsichord: 80
100.hawksbill-101: 93
101.head-phones: 138
102.helicopter-101: 88
103.hibiscus: 111
104.homer-simpson: 97
105.horse: 270
106.horseshoe-crab: 87
107.hot-air-balloon: 89
108.hot-dog: 85
109.hot-tub: 156
110.hourglass: 85
111.house-fly: 84
112.human-skeleton: 84
113.hummingbird: 131
114.ibis-101: 120
115.ice-cream-cone: 88
116.iguana: 107
117.ipod: 121
118.iris: 108 119.jesus-christ: 87
120.joy-stick: 130
121.kangaroo-101: 82
122.kayak: 103
123.ketch-101: 111
124.killer-whale: 91
125.knife: 101
126.ladder: 242
127.laptop-101: 128
128.lathe: 105
129.leopards-101: 190
130.license-plate: 91
131.lightbulb: 112
132.light-house: 190
133.lightning: 136
134.llama-101: 119
135.mailbox: 93
136.mandolin: 93
137.mars: 156
138.mattress: 192
139.megaphone: 86
140.menorah-101: 89
141.microscope: 117
142.microwave: 107
143.minaret: 130
144.minotaur: 82
145.motorbikes-101: 798
146.mountain-bike: 82
147.mushroom: 202
148.mussels: 174
149.necktie: 103
150.octopus: 111
151.ostrich: 109
152.owl: 120
153.palm-pilot: 93
154.palm-tree: 103
155.paperclip: 92
156.paper-shredder: 96
157.pci-card: 105
158.penguin: 149
159.people: 209
160.pez-dispenser: 83
161.photocopier: 103
162.picnic-table: 91
163.playing-card: 90
164.porcupine: 101
165.pram: 88
166.praying-mantis: 92
167.pyramid: 86
168.raccoon: 140
169.radio-telescope: 92
170.rainbow: 102
171.refrigerator: 84
172.revolver-101: 99
173.rifle: 106
174.rotary-phone: 84
175.roulette-wheel: 83
176.saddle: 110
177.saturn: 96 178.school-bus: 98
179.scorpion-101: 80
180.screwdriver: 102
181.segway: 100
182.self-propelled-lawn-mower: 120
183.sextant: 100
184.sheet-music: 84
185.skateboard: 103
186.skunk: 81
187.skyscraper: 95
188.smokestack: 88
189.snail: 119
190.snake: 112
191.sneaker: 111
192.snowmobile: 112
193.soccer-ball: 174
194.socks: 112
195.soda-can: 87
196.spaghetti: 104
197.speed-boat: 100
198.spider: 109
199.spoon: 105
200.stained-glass: 100
201.starfish-101: 81
202.steering-wheel: 97
203.stirrups: 91
204.sunflower-101: 80
205.superman: 87
206.sushi: 98
207.swan: 115
208.swiss-army-knife: 109
209.sword: 102
210.syringe: 111
211.tambourine: 95
212.teapot: 136
213.teddy-bear: 101
214.teepee: 139
215.telephone-box: 84
216.tennis-ball: 98
217.tennis-court: 105
218.tennis-racket: 81
219.theodolite: 84
220.toaster: 94
221.tomato: 103
222.tombstone: 91
223.top-hat: 80
224.touring-bike: 110
225.tower-pisa: 90
226.traffic-light: 99
227.treadmill: 147
228.triceratops: 95
229.tricycle: 95
230.trilobite-101: 94
231.tripod: 112
232.t-shirt: 358
233.tuning-fork: 100
234.tweezer: 122
235.umbrella-101: 114
236.unicorn: 97
237.vcr: 90
238.video-projector: 97
239.washing-machine: 84
240.watch-101: 201 241.waterfall: 95
242.watermelon: 93
243.welding-mask: 90
244.wheelbarrow: 91
245.windmill: 91
246.wine-bottle: 101
247.xylophone: 92
248.yarmulke: 84
249.yo-yo: 100
250.zebra: 96
251.airplanes-101: 800
252.car-side-101: 116
253.faces-easy-101: 435
254.greyhound: 95
255.tennis-shoes: 103
256.toad: 108
256_ObjectCategories: 29
257.clutter: 827
Total images in dataset: 30683

# create the directories to use base_path =


'./split_dataset' os.mkdir(base_path)

train_dir = os.path.join(base_path, 'train')


os.mkdir(train_dir)

validation_dir = os.path.join(base_path, 'validation') os.mkdir(validation_dir)

test_dir = os.path.join(base_path, 'test') os.mkdir(test_dir)

# create the categories files in each


for directory in [train_dir, validation_dir, test_dir]: for category in
folder_names: os.mkdir(os.path.join(directory, category))

# calculate the number of images to place in each train/valid/test categories folder

total_train = 0 total_validation = 0
total_test = 0
total_train_2 = 0 total_validation_2 = 0
total_test_2 = 0
for i, category in enumerate(folder_names): train_number = int(0.7 *
images_per_category_dict[i]) validation_number = int(0.2 *
images_per_category_dict[i])
test_number = images_per_category_dict[i] - train_number - validation_number # for not exceeding maximum
number

# for statistics later


total_train += train_number total_validation +=
validation_number total_test += test_number

# now copy these images to respective folders # Copy first 1000 cat
images to train_cats_dir fnames =
category_images_path_dict[i][:train_number] for fname in fnames:
src = os.path.join(dataset_path, category, fname) dst =
os.path.join(train_dir, category, fname) shutil.copyfile(src, dst)

total_train_2 += len(fnames)

fnames = category_images_path_dict[i][train_number:train_number + validation_number]


for fname in fnames:
src = os.path.join(dataset_path, category, fname) dst =
os.path.join(validation_dir, category, fname) shutil.copyfile(src, dst)

total_validation_2 += len(fnames)

fnames = category_images_path_dict[i][train_number + validation_number:] for fname in fnames:


src = os.path.join(dataset_path, category, fname) dst =
os.path.join(test_dir, category, fname) shutil.copyfile(src, dst)
total_test_2 += len(fnames)

# print statistics

print('Correct train split: ', total_train == total_train_2)


print('Correct validation split: ', total_validation == total_validation_2) print('Correct test split: ',
total_test == total_test_2) print()
print('Number of training images: ', total_train) print('Number of validation
images: ', total_validation) print('Number of test images: ', total_test)
print()
print('Real percentage of training images: ', total_train / total_images) print('Real percentage of validation
images: ', total_validation / total_images) print('Real percentage of test images: ', total_test / total_images)

Correct train split: True


Correct validation split: True
Correct test split: True

Number of training images: 21308


Number of validation images: 6027
Number of test images: 3273

Real percentage of training images: 0.6961578672242551


Real percentage of validation images: 0.1969093047569263
Real percentage of test images: 0.10693282801881861
Data preprocesssing
# All images will be rescaled by 1./255 train_datagen =
ImageDataGenerator(rescale=1./255) test_datagen =
ImageDataGenerator(rescale=1./255)

train_generator = train_datagen.flow_from_directory(
# This is the target directory train_dir,
# All images will be resized to 150x150
target_size=(150, 150), batch_size=20,
# Since we use binary_crossentropy loss, we need binary labels class_mode='categorical')

validation_generator = test_datagen.flow_from_directory(
validation_dir, target_size=(150, 150), batch_size=20,
class_mode='categorical')

Found 21308 images belonging to 257 classes. Found 6027 images


belonging to 257 classes.

for data_batch, labels_batch in train_generator: print('data batch shape:',


data_batch.shape) print('labels batch shape:', labels_batch.shape) break

data batch shape: (20, 150, 150, 3) labels batch


shape: (20, 257) Creating a basic CNN

model.
#adding a data augumentation layer

data_augmentation = tf.keras.Sequential([
tf.keras.layers.RandomFlip('horizontal'),
tf.keras.layers.RandomRotation(0.2), ])

# to make the model reproducible random.seed(0)

#creating the model model =


Sequential([
layers.Conv2D(512, (3,3), activation = 'relu', input_shape = (150,150,3)), layers.MaxPooling2D((2,
2)), layers.Dropout(0.25),
layers.Conv2D(256, (3, 3), activation='relu'),
layers.MaxPooling2D((2, 2)), layers.Dropout(0.25),
layers.Conv2D(256, (3, 3), activation='relu'),
layers.MaxPooling2D((2, 2)), layers.Dropout(0.25),
layers.Conv2D(128, (3, 3), activation='relu'),
layers.MaxPooling2D((2, 2)), layers.Flatten(),
layers.Dense(512, activation='relu'), layers.Dropout(0.25), layers.Dense(257,
activation='softmax')
])

model.compile( optimizer = 'adam', loss =


'categorical_crossentropy', metrics = ['acc']
)

history = model.fit_generator(
train_generator, steps_per_epoch = 10,
epochs = 50,
validation_data = validation_generator, validation_steps = 80)
Analysis report for CNN model for caltech 256 dataset. model.summary()
Model: "sequential_1"
_________________________________________________________________
Layer (type) Output Shape Param #
===============================================================
== conv2d (Conv2D) (None, 148, 148, 512) 14336
max_pooling2d (MaxPooling2D (None, 74, 74, 512) 0 )
dropout (Dropout) (None, 74, 74, 512) 0
conv2d_1 (Conv2D) (None, 72, 72, 256) 1179904
max_pooling2d_1 (MaxPooling (None, 36, 36, 256) 0 2D)
dropout_1 (Dropout) (None, 36, 36, 256) 0
conv2d_2 (Conv2D) (None, 34, 34, 256) 590080
max_pooling2d_2 (MaxPooling (None, 17, 17, 256) 0 2D)
dropout_2 (Dropout) (None, 17, 17, 256) 0
conv2d_3 (Conv2D) (None, 15, 15, 128) 295040
max_pooling2d_3 (MaxPooling (None, 7, 7, 128) 0 2D)
flatten (Flatten) (None, 6272) 0
dense (Dense) (None, 512) 3211776
dropout_3 (Dropout) (None, 512) 0
dense_1 (Dense) (None, 257) 131841
=================================================================
Total params: 5,422,977
Trainable params: 5,422,977
Non-trainable params: 0
_________________________________________________________________

print('Accuracy on 30th epoch is: ' ,history.history['acc'][29]) print('Accuracy on 50th epoch is: '
,history.history['acc'][49])

print('Loss on 30th epoch is: ' ,history.history['loss'][29]) print('Loss on 50th epoch is: '
,history.history['loss'][49])

DL ASSIGNMENT 4 (ResNet CNN Model)

import os import numpy as np import tensorflow as tf from tensorflow.keras.preprocessing.image


import ImageDataGenerator from tensorflow.keras.applications import ResNet50 from
tensorflow.keras.layers import Flatten, Dense, BatchNormalization from tensorflow.keras.models
import Sequential

data_dir = '/content/drive/MyDrive/Summer Training/DATASETS/caltech-


101/101_ObjectCategories' train_datagen =
ImageDataGenerator( rescale=1.0/255.0,
rotation_range=20, width_shift_range=0.2,
height_shift_range=0.2, shear_range=0.2,
zoom_range=0.2, horizontal_flip=True,
validation_split=0.2
)
train_generator = train_datagen.flow_from_directory( data_dir, target_size=(224, 224), # Resize
images to match ResNet's input size batch_size=32, class_mode='categorical', subset='training'
)
validation_generator = train_datagen.flow_from_directory( data_dir,
target_size=(224, 224), batch_size=32, class_mode='categorical',
subset='validation'
)

Found 7378 images belonging to 102 classes.


Found 1793 images belonging to 102 classes.
import cv2 import numpy as np import os def
normalize_image(image_path, img_shape):
read_image = cv2.imread(image_path)

print(image_path)

image_resized = cv2.resize(read_image, img_shape, interpolation=cv2.INTER_CUBIC)


image = np.float32(image_resized)
image = cv2.normalize(image, image, alpha=0, beta=1, norm_type=cv2.NORM_MINMAX) #Change alpha,
beta according to the preprocessing you desire
return image
def getLabel(input_string):
try:
i=input_string.index('-101') return
input_string[4:i] except ValueError:
return input_string[4:]

from sklearn.model_selection import train_test_split


def get_images_and_classes(main_path, partition, img_shape=(128, 128)):
images_training = []
y_training = [] images_testing =
[] y_testing = []
list = [name for name in os.listdir(main_path) if
os.path.isdir(os.path.join(main_path,name))] for idx,folder in
enumerate(list):
label = getLabel(folder)
sub_list = sorted(os.listdir(os.path.join(main_path,folder)))

local_y= [] local_images= [] for i in range(0,


len(sub_list)):
image_path = os.path.join(main_path, folder, sub_list[i]) image= normalize_image(image_path,
img_shape)

local_images.append(image) local_y.append(label) X_train, X_test,


y_train, y_test = train_test_split(local_images, local_y, test_size = partition)

images_training = images_training + X_train y_training = y_training +


y_train images_testing = images_testing + X_test y_testing = y_testing
+ y_test
images_training = np.array(images_training) y_training =
np.array(y_training) images_testing = np.array(images_testing)
y_testing = np.array(y_testing)
return images_training, y_training, images_testing, y_testing
path='/content/drive/MyDrive/Summer Training/DATASETS/caltech-
101/101_ObjectCategories' partition= 0.2 images_training, y_training,
images_testing, y_testing = get_images_and_classes(path, partition) test_datagen
= ImageDataGenerator(rescale=1.0/255.0)
batch_size = 32 img_size = (224, 224) from sklearn.preprocessing import OneHotEncoder onehotencoder =
OneHotEncoder() y_training = onehotencoder.fit_transform(y_training.reshape(-1, 1)).toarray() y_testing =
onehotencoder.transform(y_testing.reshape(-1, 1)).toarray() base_model = ResNet50(weights='imagenet',
include_top=False, input_shape=(224,
224, 3))

Streaming output truncated to the last 5 lines.

/content/drive/MyDrive/Summer
Training/DATASETS/caltech101/101_ObjectCategories/wrench/image_0035.jpg
/content/drive/MyDrive/Summer
Training/DATASETS/caltech101/101_ObjectCategories/wrench/image_0036.jpg
/content/drive/MyDrive/Summer
Training/DATASETS/caltech101/101_ObjectCategories/wrench/image_0037.jpg
/content/drive/MyDrive/Summer
Training/DATASETS/caltech101/101_ObjectCategories/wrench/image_0038.jpg
/content/drive/MyDrive/Summer Training/DATASETS/caltech-
101/101_ObjectCategories/wrench/image_0039.jpg
Downloading data from
https://storage.googleapis.com/tensorflow/kerasapplications/resnet/resnet50_weights_tf_dim_ordering_tf
_kernels_notop.h5 94765736/94765736 [==============================] - 1s 0us/step
model = Sequential([ base_model,
Flatten(),
Dense(256, activation='relu'),
BatchNormalization(),
Dense(train_generator.num_classes, activation='softmax')
])

# Freeze the layers of the pre-trained ResNet model for layer in


base_model.layers: layer.trainable = False

model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])


epochs = 3

history = model.fit( train_generator, steps_per_epoch=train_generator.samples //


train_generator.batch_size, epochs=epochs, validation_data=validation_generator,
validation_steps=validation_generator.samples // validation_generator.batch_size
)
Epoch 1/3
230/230 [==============================] - 2020s 9s/step - loss: 3.5742 - accuracy: 0.2596 -
val_loss: 7.3548 - val_accuracy: 0.0273
Epoch 2/3
230/230 [==============================] - 2040s 9s/step - loss: 3.0537 - accuracy: 0.3253 -
val_loss: 6.4838 - val_accuracy: 0.0407
Epoch 3/3
230/230 [==============================] - 2020s 9s/step - loss: 2.8196 - accuracy: 0.3718 -
val_loss: 4.5918 - val_accuracy: 0.1635

from torch.utils.data import DataLoader, Dataset class ImageDataset(Dataset): def


__init__(self, images, labels=None, transforms=None):
self.X = images self.y = labels
self.transforms = transforms
def __len__(self):
return (len(self.X))
def __getitem__(self, i): data =
self.X[i][:]
if self.transforms:
data = self.transforms(data)

if self.y is not None:


return (data, self.y[i]) else:
return data

test_generator = test_datagen.flow_from_directory( data_dir, target_size=img_size,


batch_size=batch_size, class_mode='categorical', shuffle=False # Important to keep predictions matched
with true labels
)
val_loss, val_accuracy = model.evaluate(validation_generator)
print("Validation Loss:", val_loss) print("Validation
Accuracy:", val_accuracy)

Found 9171 images belonging to 102 classes.


57/57 [==============================] - 381s 7s/step - loss: 4.5949 - accuracy: 0.1567
Validation Loss: 4.594882488250732
Validation Accuracy: 0.1567205786705017

# Get the class names class_names = sorted(train_generator.class_indices, key=lambda k:


train_generator.class_indices[k])

# Predict using the model


predictions = model.predict(test_generator)

# Convert predictions to class names predicted_classes = np.argmax(predictions, axis=1)


predicted_class_names = [class_names[i] for i in predicted_classes]
287/287 [==============================] - 1807s 6s/step
# Print testing predictions for filename, predicted_class in zip(test_generator.filenames,
predicted_class_names):
print(f"File: {filename} - Predicted Class: {predicted_class}") test_loss, test_accuracy =
model.evaluate(test_generator) print("Test Loss:", test_loss) print("Test Accuracy:", test_accuracy)

Streaming output truncated to the last 5 lines.


File: yin_yang/image_0056.jpg - Predicted Class: yin_yang
File: yin_yang/image_0057.jpg - Predicted Class: yin_yang File: yin_yang/image_0058.jpg -
Predicted Class: Faces
File: yin_yang/image_0059.jpg - Predicted Class: yin_yang
File: yin_yang/image_0060.jpg - Predicted Class: yin_yang
287/287 [==============================] - 1787s 6s/step - loss: 3.6249 - accuracy:
0.2845
Test Loss: 3.62494158744812
Test Accuracy: 0.2844837009906769

Chapter 4- Project Work


JOB PLACEMENT PREDICTION MODEL

Introduction
Due to the growing need of educated and talented individuals, especially in developing
countries, recruiting fresh graduates is a routine practice for organizations. Conventional
recruiting methods and selection processes can be prone to errors and in order to optimize the
whole process, some innovative methods are needed.
Our data has 215 values and 13 columns, Here the "Job_Placement_Data.csv" database that has
been made available for use, we have analyzed and processed the data and used machine
learning classification models to achieve our goal.
This project aims to predict job placement for students based on their academic and personal
details using various machine learning algorithms. By analyzing historical data, we can build a
predictive model that helps students understand their likelihood of getting placed, thereby
enabling them to take proactive measures to improve their chances.

Methodology

Libraries used in Implementation:


• The project employs Python libraries including:
• Pandas for data manipulation.
• NumPy for numerical operations.
• Matplotlib and Seaborn for data visualization.
• Scikit-learn for feature scaling and evaluation metrics.
• TensorFlow and Keras

1.Data Collection and Preprocessing

A) Data Loading:

- The dataset is loaded into a DataFrame for analysis and processing.

B) Data Cleaning:

- The dataset is examined for missing values, which are then handled appropriately (e.g., filling missing values or
dropping incomplete rows).

- Duplicate records are identified and removed to ensure the dataset's integrity.

C) Data Transformation:
- Categorical variables are converted into numerical format using techniques such as one-hot encoding to facilitate
model training.
2. Exploratory Data Analysis (EDA)

A) Visualizations:
- Histograms are created for numerical features to understand their distributions and detect any anomalies.
- Scatter plots are used to visualize relationships between features.
B) Summary Statistics:
- Descriptive statistics are generated to summarize the central tendency.

3. Feature Engineering

A) Scaling:

- Numerical features are standardized to ensure they have a mean of 0 and a standard deviation of 1, which helps
improve model performance and convergence.

4.Model Building and Evaluation

A) Train-Test Split:

- The dataset is split into training and testing sets to evaluate the model's performance on unseen data.

B) Model Training and Evaluation: Various machine learning models are trained and evaluated, including:

- Logistic Regression: A statistical model that predicts the probability of a binary outcome.

- Support Vector Machine (SVM): Different kernels (RBF, linear, sigmoid, polynomial) are used to find the optimal
hyperplane that separates data into classes.

- Decision Tree: A model that splits the data into branches to make predictions.

- Random Forest: An ensemble of decision trees that improves prediction accuracy by averaging multiple trees.

- Naive Bayes: A probabilistic classifier based on Bayes' theorem.

- K-Nearest Neighbors (KNN): A model that classifies data points based on the labels of their nearest neighbors.

- Neural Network (CNN): A deep learning model with multiple layers to capture complex patterns in the data.

C) Model Comparison:

- Accuracy scores of different models are compared to select the best-performing model.
Dataset used

Job_Placement_Data – sourced from Kaggle

CODE

import numpy as np
import pandas as pd

df=pd.read_csv("Job_Placement_Data.csv")
df

st
ge ssc_p ssc hsc_p hsc hsc_ degree underg work_ emp_tes speci mba
at
nd ercen _bo ercen _bo subj _perce rad_de experi t_perce alisa _per
u
er tage ard tage ard ect ntage gree ence ntage tion cent
s

5
Mkt Pl
67. Other 91.0 Othe Comm Sci&T 8.
0 M 58.00 No 55.0 &H ac
00 s 0 rs erce ech 8
R ed
0

6
Pl
79. Centr 78.3 Othe Sci&T Mkt 6.
1 M Science 77.48 Yes 86.5 ac
33 al 3 rs ech &Fin 2
ed
8

5
Comm Pl
65. Centr 68.0 Cent Mkt 7.
2 M Arts 64.00 &Mg No 75.0 ac
00 al 0 ral &Fin 8
mt ed
0

N
5
Mkt ot
56. Centr 52.0 Cent Sci&T 9.
3 M Science 52.00 No 66.0 &H Pl
00 al 0 ral ech 4
R ac
3
ed

Comm Pl
85. Centr 73.6 Cent Comm Mkt 5
4 M 73.30 &Mg No 96.8 ac
80 al 0 ral erce &Fin 5.
mt ed
st
ge ssc_p ssc hsc_p hsc hsc_ degree underg work_ emp_tes speci mba
at
nd ercen _bo ercen _bo subj _perce rad_de experi t_perce alisa _per
u
er tage ard tage ard ect ntage gree ence ntage tion cent
s

5
0

... ... ... ... ... ... ... ... ... ... ... ... ... ...

7
Comm Pl
21 80. Other 82.0 Othe Comm Mkt 4.
M 77.60 &Mg No 91.0 ac
0 60 s 0 rs erce &Fin 4
mt ed
9

5
Pl
21 58. Other 60.0 Othe Sci&T Mkt 3.
M Science 72.00 No 74.0 ac
1 00 s 0 rs ech &Fin 6
ed
2

6
Comm Pl
21 67. Other 67.0 Othe Comm Mkt 9.
M 73.00 &Mg Yes 59.0 ac
2 00 s 0 rs erce &Fin 7
mt ed
2

6
Comm Mkt Pl
21 74. Other 66.0 Othe Comm 0.
F 58.00 &Mg No 70.0 &H ac
3 00 s 0 rs erce 2
mt R ed
3

N
6
Comm Mkt ot
21 62. Centr 58.0 Othe 0.
M Science 53.00 &Mg No 89.0 &H Pl
4 00 al 0 rs 2
mt R ac
2
ed
215 rows × 13 columns

df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 215 entries, 0 to 214
Data columns (total 13 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 gender 215 non-null object
1 ssc_percentage 215 non-null float64
2 ssc_board 215 non-null object
3 hsc_percentage 215 non-null float64
4 hsc_board 215 non-null object
5 hsc_subject 215 non-null object
6 degree_percentage 215 non-null float64
7 undergrad_degree 215 non-null object
8 work_experience 215 non-null object
9 emp_test_percentage 215 non-null float64
10 specialisation 215 non-null object
11 mba_percent 215 non-null float64
12 status 215 non-null object
dtypes: float64(5), object(8)
memory usage: 22.0+ KB

df.isnull().sum()

dtype:int64

df.shape
(215, 13)

df.duplicated().sum()
0

df.describe()
import matplotlib.pyplot as plt
import seaborn as sns
plt.hist(df["ssc_percentage"],bins=20)
plt.title("SSC Percentage Distribution")
plt.xlabel("SSC Percentage")
plt.ylabel("Frequency")
plt.show()

plt.hist(df["hsc_percentage"],bins=20)
plt.title("HSC Percentage Distribution")
plt.xlabel("HSC Percentage")
plt.ylabel("Frequency")
plt.show()

plt.hist(df["gender"],bins=20)
plt.title("Gender Distribution")
plt.xlabel("Gender")
plt.ylabel("Frequency")
plt.show()
print("totals numbers of Female:",df["gender"].value_counts()[1])
print("totals numbers of Male:",df["gender"].value_counts()[0],"\n\n")
totals numbers of Female: 76
totals numbers of Male: 139

df["status"]
status
0 Placed
1 Placed
2 Placed
3 Not Placed
4 Placed
5 Not Placed
6 Not Placed
7 Placed
8 Placed
9 Not Placed
10 Placed
11 Placed
12 Not Placed
13 Placed
14 Not Placed
15 Placed
16 Placed
17 Not Placed
18 Not Placed
19 Placed
20 Placed
21 Placed
22 Placed
23 Placed
24 Placed
25 Not Placed
26 Placed
27 Placed
28 Placed
29 Not Placed
30 Placed
31 Not Placed
32 Placed
33 Placed
34 Not Placed
35 Placed
36 Not Placed
37 Placed
38 Placed
39 Placed
40 Placed
41 Not Placed
42 Not Placed
43 Placed
44 Placed
45 Not Placed
46 Not Placed
47 Placed
48 Placed
49 Not Placed
50 Placed
51 Not Placed
52 Not Placed
53 Placed
54 Placed
55 Placed
56 Placed
57 Placed
58 Placed
59 Placed
60 Placed
61 Placed
62 Placed
63 Not Placed
64 Placed
65 Not Placed
66 Placed
67 Placed
68 Not Placed
69 Placed
70 Placed
71 Placed
72 Placed
73 Placed
74 Placed
75 Not Placed
76 Placed
77 Placed
78 Placed
79 Not Placed
80 Placed
81 Placed
82 Not Placed
83 Placed
84 Placed
85 Placed
86 Placed
87 Not Placed
88 Placed
89 Placed
90 Placed
91 Not Placed
92 Placed
93 Not Placed
94 Placed
95 Placed
96 Placed
97 Not Placed
98 Placed
99 Not Placed
100 Not Placed
101 Placed
102 Placed
103 Placed
104 Placed
105 Not Placed
106 Not Placed
107 Placed
108 Placed
109 Not Placed
110 Placed
111 Not Placed
112 Placed
113 Placed
114 Placed
115 Placed
116 Placed
117 Placed
118 Placed
119 Placed
120 Not Placed
121 Placed
122 Placed
123 Placed
124 Placed
125 Placed
126 Placed
127 Placed
128 Placed
129 Placed
130 Not Placed
131 Placed
132 Placed
133 Placed
134 Placed
135 Placed
136 Not Placed
137 Placed
138 Placed
139 Placed
140 Placed
141 Not Placed
142 Placed
143 Placed
144 Not Placed
145 Placed
146 Placed
147 Placed
148 Placed
149 Not Placed
150 Placed
151 Placed
152 Placed
153 Placed
154 Placed
155 Not Placed
156 Placed
157 Placed
158 Not Placed
159 Not Placed
160 Placed
161 Not Placed
162 Placed
163 Placed
164 Placed
165 Not Placed
166 Placed
167 Not Placed
168 Not Placed
169 Not Placed
170 Not Placed
171 Placed
172 Placed
173 Not Placed
174 Placed
175 Not Placed
176 Placed
177 Placed
178 Placed
179 Not Placed
180 Placed
181 Not Placed
182 Not Placed
183 Placed
184 Not Placed
185 Placed
186 Not Placed
187 Placed
188 Not Placed
189 Not Placed
190 Not Placed
191 Placed
192 Placed
193 Placed
194 Not Placed
195 Placed
196 Placed
197 Placed
198 Not Placed
199 Placed
200 Placed
201 Not Placed
202 Placed
203 Placed
204 Placed
205 Placed
206 Not Placed
207 Placed
208 Not Placed
209 Placed
210 Placed
211 Placed
212 Placed
213 Placed
214 Not Placed

sns.scatterplot(x="ssc_percentage",y="hsc_percentage",data=df,hue="status")
plt.title("Difference for SSC and HSC Person")
plt.show()
df["status"].value_counts()

# covert categorical data into numerical


df=pd.get_dummies(df,columns=["gender","ssc_board","hsc_subject","hsc_board","
undergrad_degree","work_experience","specialisation","status"],drop_first=True
)

df.info()

a=df.columns[5:15]
print(a)

Index(['gender_M', 'ssc_board_Others', 'hsc_subject_Commerce',


'hsc_subject_Science', 'hsc_board_Others', 'undergrad_degree_Others',
'undergrad_degree_Sci&Tech', 'work_experience_Yes',
'specialisation_Mkt&HR', 'status_Placed'],
dtype='object')

for i in a:
df[i]=df[i].astype(int)
df.info()

x=df.drop("status_Placed",axis=1)
y=df["status_Placed"]
print(x.shape,"\n\n",y.shape)
(215, 14)

(215,)

from sklearn.model_selection import train_test_split


x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=0.2,random_state=
2)
x_train.shape

(172, 14)

x_test.shape

(43,14)

y_train.shape

(172,)

y_test.shape

(43,)
LOGISTIC REGRESSION

from sklearn.linear_model import LogisticRegression


lg=LogisticRegression()
lg.fit(x_train,y_train)

y_pred=lg.predict(x_test)
print(y_pred)
[1 1 0 1 1 1 1 0 0 1 1 1 1 0 1 1 1 1 0 0 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 0 1 1 0 1 1 1]

from sklearn.metrics import


accuracy_score,confusion_matrix,classification_report
print("Accuracy Score:",accuracy_score(y_test,y_pred)*100)

Accuracy Score: 81.3953488372093

CNN

import os
import math
os.environ["CUDA_DEVICE_ORDER"]="PCI_BUS_ID"
# The GPU id to use, usually either "0" or "1"
os.environ["CUDA_VISIBLE_DEVICES"]="0"

import numpy as np
import cv2
from matplotlib import pyplot as plt

import keras
print("keras version: ", keras.__version__)

import tensorflow as tf
print("tensoflow version: ", tf.__version__)

keras version: 3.4.1


tensoflow version: 2.17.0

config = tf.compat.v1.ConfigProto()

# Don't pre-allocate memory; allocate as-needed


config.gpu_options.allow_growth = True

# Only allow a total of half the GPU memory to be allocated


config.gpu_options.per_process_gpu_memory_fraction = 0.9
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D, MaxPooling1D, Flatten,
Dense,Conv1D,Dropout
from tensorflow.keras.datasets import mnist
from tensorflow.keras.utils import to_categorical

model = Sequential()
model.add(Dense(64, activation='relu', input_shape=(x_train.shape[1],)))
model.add(Dropout(0.2))
model.add(Dense(128, activation='relu'))
model.add(Dropout(0.2))
num_classes = len(np.unique(y_train)) # Calculate the number of unique
classes in y_train
model.add(Dense(num_classes, activation='softmax')) # Set the number of
output neurons

model.compile(optimizer='adam', loss='categorical_crossentropy',
metrics=['accuracy'])

# Train the model


history = model.fit(x_train, to_categorical(y_train), validation_data=(x_test,
to_categorical(y_test)), epochs=10, batch_size=32) # Convert labels to one-hot
encoding

# Evaluate the model


loss, accuracy = model.evaluate(x_test, to_categorical(y_test)) # Convert
labels to one-hot encoding
print(f'Test Accuracy: {accuracy*100:.2f}%')

print(f'Test Accuracy: {accuracy*100:.2f}%')


Test Accuracy: 76.74%
SVM

from sklearn.svm import SVC # "Support vector classifier"


classifier_A = SVC(kernel='rbf', random_state=0)
classifier_A.fit(x_train, y_train)

y_pred= classifier_A.predict(x_test)
print(accuracy_score(y_test, y_pred)*100)
83.72093023255815
classifier_B = SVC(kernel='linear', random_state=0)
classifier_B.fit(x_train, y_train)

y_pred= classifier_B.predict(x_test)
print(accuracy_score(y_test, y_pred)*100)
81.3953488372093

classifier_C = SVC(kernel='sigmoid', random_state=0)


classifier_C.fit(x_train, y_train)

y_pred= classifier_C.predict(x_test)
print(accuracy_score(y_test, y_pred)*100)
67.44186046511628

classifier_D= SVC(kernel='poly', random_state=0)


classifier_D.fit(x_train, y_train)

y_pred= classifier_D.predict(x_test)
print(accuracy_score(y_test, y_pred)*100)
81.3953488372093

DECISION TREE

#feature Scaling
from sklearn.preprocessing import StandardScaler
st_x= StandardScaler()
x_train_np = x_train.to_numpy() # Convert DataFrame to NumPy array
x_test_np = x_test.to_numpy() # Convert DataFrame to NumPy array

# Reshape to 2D for scaling


x_train_reshaped = x_train_np.reshape(x_train_np.shape[0], -1)
x_test_reshaped = x_test_np.reshape(x_test_np.shape[0], -1)

x_train_scaled = st_x.fit_transform(x_train_reshaped)
x_test_scaled = st_x.transform(x_test_reshaped)

# Reshape back to original shape


x_train = x_train_scaled.reshape(x_train_np.shape) # Use the original NumPy
array's shape
x_test = x_test_scaled.reshape(x_test_np.shape) # Use the original NumPy
array's shape

from sklearn.tree import DecisionTreeClassifier


from sklearn.preprocessing import LabelEncoder

classifier= DecisionTreeClassifier(criterion='entropy', random_state=0)


classifier.fit(x_train_scaled, y_train)

y_pred= classifier.predict(x_test_reshaped)
from sklearn.metrics import accuracy_score
accuracy_score (y_test,y_pred)*100
69.76744186046511

RANDOM FOREST CLASSIFER

from sklearn.ensemble import RandomForestClassifier


classifier1= RandomForestClassifier(n_estimators= 3, criterion="entropy")
classifier1.fit(x_train_scaled, y_train)

y_pred= classifier1.predict(x_test_reshaped)
accuracy_score (y_test,y_pred)*100
69.76744186046511

GAUSSIAN NAÏVE BAYE’S ALGORITHM

from sklearn.naive_bayes import GaussianNB


gnb = GaussianNB()
if isinstance(y_train, pd.Series):
y_train = y_train.to_numpy()
# Reshape y_train to have a second dimension if it's a 1D array
if y_train.ndim == 1:
y_train = y_train.reshape(-1, 1)

y_train_1d = np.argmax(y_train, axis=1)


gnb.fit(x_train_scaled, y_train_1d)

# Reshape x_test to be 2D (samples x features)


x_test_reshaped = x_test.reshape(x_test.shape[0], -1)

# Now predict using the reshaped data


y_pred_probs = gnb.predict_proba(x_test_reshaped) # Get probabilities for each
class

# Convert predicted probabilities to multilabel-indicator format


y_pred = (y_pred_probs > 0.5).astype(int) # Assuming a threshold of 0.5 for
class assignment

# comparing actual response values (y_test) with predicted response values


(y_pred)
from sklearn import metrics
print("Gaussian Naive Bayes model accuracy(in %):",
metrics.accuracy_score(y_test, y_pred)*100)

Gaussian Naive Bayes model accuracy(in %): 69.76744186046511

KNN

from sklearn.preprocessing import StandardScaler


sc = StandardScaler()

# Reshape x_train to 2D if it's not already


x_train_reshaped = x_train.reshape(x_train.shape[0], -1)
X_train = sc.fit_transform(x_train_reshaped)

# Reshape x_test to match x_train if needed


x_test_reshaped = x_test.reshape(x_test.shape[0], -1)
X_test = sc.transform(x_test_reshaped)

from sklearn.neighbors import KNeighborsClassifier


knn = KNeighborsClassifier(n_neighbors=3)
knn.fit(X_train, y_train)
knn.fit(X_train, y_train)

# Make predictions on the test set


y_pred = knn.predict(X_test)

# Calculate the accuracy


accuracy = accuracy_score(y_test, y_pred)
print(f'Accuracy: {accuracy*100:.2f}')
a=accuracy*100
Accuracy: 72.09

data={"Methods":["KNN","CNN","SVM(rbf)","SVM(linear)","SVM(sigmoid)","SVM(poly
)","Logistic Regression","Gaussian Naive Bayes","Decision Tree","Random
Forest"],"Accuracy":["72.09%","81.401%","83.72%","81.39%","67.44%","81.39%","8
1.39%","69.76%","69.76%","69.76%"]}
x=pd.DataFrame(data)
x['Accuracy'] = x['Accuracy'].astype(str)

x['Accuracy'] = x['Accuracy'].str.rstrip('%').astype('float')

x_sorted = x.sort_values(by='Accuracy',
ascending=False).reset_index(drop=True)

x_sorted['Accuracy'] = x_sorted['Accuracy'].astype(str) + '%'

x_sorted
SVM(rbf) gives the best accuracy of 83.72%

df.drop("undergrad_degree_Others",axis=1)
while True:
input_data = eval(input("Enter the data: "))

input_data_as_array=np.asarray(input_data)
reshaped_array=input_data_as_array.reshape(1,-1)

prediction =classifier_A.predict(reshaped_array)
print(prediction)

if (prediction[0]==0):
print("Not Placed")
else:
print("Placed")

Enter the data: 74.00,66.00,58.00,70.0,60.23,0,1,1,0,1,0,0,1,1


Placed
Enter the data: 56.00,52.00,52.00,66.0,59.43,1,0,0,1,0,1,0,1,0
Not Placed

Results

The model developed can predict whether a student is placed or not by making the
use of various machine learning algorithms – KNN, CNN , Gaussian Naïve Baye’s
Algorithm, SVM , Decision Tree , Random Forest Classifier.

The model demonstrates that SVM with rbf kernel is the best technique of all as it gives the best accuracy possible.

Conclusion

The project effectively demonstrated the use of various machine learning models to predict job placement outcomes.
By leveraging data preprocessing, feature encoding, and multiple model evaluations, the SVM with the RBF kernel
emerged as the most accurate model. This model is robust for predicting job placements based on student data,
providing valuable insights for educational institutions and placement agencies.

The approach and findings from this project highlight the importance of choosing appropriate models and
hyperparameters, as well as the effectiveness of preprocessing steps in improving model performance.
Chapter 5- Conclusion

In conclusion, the needs and benefits of machine learning and deep learning are undeniable in today's rapidly evolving
technological landscape. These powerful fields of artificial intelligence have revolutionized various industries and
continue to shape our world in profound ways.

Embarking on the journey of the Python programming language, machine learning, and deep learning has been an
immensely beneficial and transformative experience. This training program has not only expanded my technical
skillset but has also opened up new horizons of possibilities in my career and personal growth.

Machine learning and deep learning meet the ever-increasing demand for intelligent solutions by automating tasks,
making predictions, and extracting insights from vast and complex datasets. They have improved efficiency, accuracy,
and decision-making across domains such as healthcare, finance, manufacturing, and transportation. These
technologies have enabled us to tackle previously insurmountable problems, from diagnosing diseases to optimizing
supply chains. The needs and benefits of machine learning and deep learning are clear: they empower us to solve
complex problems, drive innovation, and improve our quality of life. As we continue to advance in these fields, it is
imperative that we do so with a strong commitment to ethics and responsible AI practices, to ensure a bright and
inclusive future powered by intelligent machines.

Delving into machine learning has given me the ability to harness the predictive power of algorithms to extract
valuable insights from data. This has proven invaluable in making data-driven decisions, optimizing business
processes, and gaining a competitive edge in the rapidly evolving digital landscape. Furthermore, deep learning, with
its neural networks and complex architectures, has allowed me to delve into the cutting-edge realms of artificial
intelligence. It has enabled me to work on advanced projects such as job placement prediction, expanding my
capabilities to tackle complex real-world problems.

Beyond the technical skills acquired, this training program has fostered critical thinking, problem-solving, and
adaptability. It has taught me the importance of continuous learning in a rapidly evolving field, where staying up-
todate with the latest advancements is paramount. Additionally, the experience of collaborating with peers, engaging
in hands-on projects, and seeking guidance from mentors has enriched my learning journey. It has not only broadened
my knowledge but has also exposed me to diverse perspectives and approaches, which are invaluable in a field as
dynamic as technology.

In summary, this training program has been a transformative experience that has equipped me with the skills and
knowledge to navigate the ever-evolving landscape of technology. It has broadened my horizons, enhanced my
problem-solving abilities, and positioned me to make a meaningful impact in the world of Python, machine learning,
and deep learning.
Chapter 6- References

Online Documentation:
• Python Documentation: https://docs.python.org/3/
• TensorFlow Documentation: https://www.tensorflow.org/api_docs/python/tf/keras
• Scikit Documentation: https://scikit-learn.org/stable/user_guide.html
• Kaggle: https://www.kaggle.com/learn
• GeeksforGeeks: https://www.geeksforgeeks.org/machine-learning/
• Javatpoint: https://www.javatpoint.com/machine-learning

You might also like