Editted Report PDF
Editted Report PDF
On
AI/ML using Python
Submitted By
RIDDHI BANSAL
20111503122
IT Department
Bharati Vidyapeeth’s College of Engineering, New Delhi – 110063, INDIA
TABLE OF CONTENTS
Python is a popular programming language. It was created by Guido van Rossum, and released in 1991.
It is used for:
• Python was designed for readability, and has some similarities to the English language with influence from
mathematics.
• Python uses new lines to complete a command, as opposed to other programming languages which often use
semicolons or parentheses.
• Python relies on indentation, using whitespace, to define scope; such as the scope of loops, functions and
classes. Other programming languages often use curly-brackets for this purpose.
Variables
Variables are containers for storing data values. In Python, variables are created when you assign a value to it. They
are case sensitive.
y = "Hello, World!"
You can get the data type of a variable with the type() function.
Comments
Comments start with a #, and Python will render the rest of the line as a comment.
Variables can store data of different types, and different types can do different things.
Python has the following data types built-in by default, in these categories:
Text Type: str
Python Numbers
Complex: Complex numbers are written with a "j" as the imaginary part.
Strings
Strings in python are surrounded by either single quotation marks, or double quotation marks.
You can display a string literal with the print() function. Strings in Python are arrays of bytes representing
Unicode characters. However, Python does not have a character data type, a single character is simply a
string with a length of 1. Square brackets can be used to access elements of the string.
Boolean Values
When you compare two values, the expression is evaluated and Python returns the Boolean answer.
Python Operators
• Arithmetic operators
• Assignment operators
• Comparison operators
• Logical operators
• Identity operators
• Membership operators
• Binary operators
There are four collection data types in the Python programming language:
Lists are one of 4 built-in data types in Python used to store collections of data, the other 3 are Tuple, Set,
and Dictionary, all with different qualities and usage.
Lists are created using square brackets. List items can be of any data type.
Tuple
Tuple is one of 4 built-in data types in Python used to store collections of data, the other 3 are List, Set, and
Dictionary, all with different qualities and usage.
Set
Sets are used to store multiple items in a single variable.
Dictionary
A dictionary is a collection which is ordered*, changeable and do not allow duplicates. Dictionary items are
ordered, changeable, and does not allow duplicates.
Dictionary items are presented in key:value pairs, and can be referred to by using the key name.
NumPy
NumPy is a Python library for scientific computing. NumPy stands for Numerical Python. It provides a
highperformance multidimensional array object, along with a suite of functions for working with arrays.
NumPy is the foundation of many other Python libraries for scientific computing, including Pandas, SciPy,
and Matplotlib.
Here are some of the key features of NumPy:
• Multidimensional arrays: NumPy arrays are multidimensional, meaning that they can have more than
one dimension. This makes them ideal for storing and manipulating large amounts of data.
• Fast numerical operations: NumPy arrays are designed to be very efficient for numerical operations.
This makes them ideal for scientific computing applications.
• Powerful functions: NumPy provides a wide range of functions for working with arrays. These
functions can be used to perform a variety of operations, such as mathematical operations, statistical
operations, and linear algebra operations.
NumPy is a powerful library that can be used for a wide variety of scientific computing tasks. It is a musthave
library for any Python developer who does any kind of numerical computing.
• Speed: NumPy arrays are much faster than Python lists for numerical operations.
• Power: NumPy provides a wide range of functions for working with arrays.
• Flexibility: NumPy arrays can be used for a variety of tasks, not just scientific computing.
• Portability: NumPy is a well-maintained library that is compatible with many different platforms.
If you are doing any kind of numerical computing in Python, then you should definitely use NumPy. It is a
powerful library that will make your code faster, more powerful, and more flexible.
Pandas
Pandas is a Python library used for data manipulation and analysis. It is one of the most popular Python
libraries for data science, and it is known for its powerful data structures and its easy-to-use data analysis
tools.
Pandas is built on top of the NumPy library, which provides a high-performance multidimensional array
object. Pandas extends NumPy by adding data structures and operations that make it more suitable for data
analysis.
The two main data structures in Pandas are the DataFrame and the Series. A DataFrame is a tabular data
structure that can store data of any type (integer, float, string, etc.). A Series is a one-dimensional data
structure that can store data of a single type.
There are many reasons to use Pandas for data analysis. Here are a few of the most important reasons:
• Pandas is easy to use. The syntax is relatively simple, and there are many tutorials and documentation
available online.
• Pandas is powerful. It can handle large datasets with ease, and it provides a wide range of data
analysis tools.
• Pandas is flexible. It can be used for a variety of data analysis tasks, from simple data cleaning to
complex machine learning.
• Pandas is popular. There is a large community of Pandas users and developers, which means that
there are many resources available to help you learn and use Pandas.
Matplotlib
Matplotlib is a Python library for creating static, animated, and interactive visualizations. It is a popular
choice for data visualization in Python, and is used by a wide range of industries, including scientific
research, engineering, finance, and healthcare.
Matplotlib is a versatile library that can be used to create a variety of different plots, including line plots,
bar charts, histograms, scatter plots, and pie charts. It also supports a variety of customization options, so
you can tailor your plots to your specific needs.
Matplotlib is divided into two main parts: the matplotlib.pyplot module and the matplotlib.artist module.
The matplotlib.pyplot module is a collection of functions that make it easy to create plots. The
matplotlib.artist module provides the underlying objects that are used to create plots.
To use Matplotlib, you first need to import the matplotlib.pyplot module. You can then use the functions in
this module to create your plots
Day 1 :- Assignment On Python Basics, Types, Expression, Variables, String
Operations
Q1: Find average Marks Write a program to input marks of three tests of a student (all integers). Then
calculate and print the average of all test marks.
a=float(input("enter the marks of first subject:"))
b=float(input("enter the marks of second subject:"))
c=float(input("enter the marks of third subject:"))
avg=(a+b+c)/3
print("average of all test marks=",avg)
Q2: Find X raised to power N You are given two numbers ’x’(it’s a float), and ’n’(it’s a integer).
Your task is to calculate ‘x’ raised to power ‘n’, and return it.
i=1
x=float(input("enter number:"))
n=int(input("enter power:"))
a=x**n
print("power of number=",a)
enter number:8
enter power:2
power of number= 64.0
Q3: Check Palindrome Given a string, determine if it is a palindrome, considering only alphanumeric
characters.
def isPalindrome(str):
return str==str[::-1]
Q4: Consider the string str=”Global Warming” Write statements in Python to implement the following
(a) To display the last four characters.
(b) To display the substring starting from index 4 and ending at index 8.
(c) To check whether string has alphanu-meric characters or not (d) To trim the last four characters from the
string.
(e) To trim the first four characters from the string.
(f) To display the starting index for the substring „ WaD.
(g) To change the case of the given string.
(h) To check if the string is in title case.
(i) To replace all the occurrences of letter „aD in the string with „*?
str="Global Warming"
print(str[-1:-5:-1])
print("substring=",str[4:8])
print(str.isalnum())
print(str.rstrip('ming'))
print(str.lstrip('Glob'))
print(str.index("Wa"))
print(str.swapcase())
print(str.istitle())
print(str.replace("a","*"))
gnim
substring= al W
False
Global War
al Warming
7
gLOBAL wARMING
True
Glob*l W*rming
addCode
addText
Write a program which prompts the user for a Celsius tem- perature, convert the temperature to Fahrenheit,
and print out the converted temperature.
temp=float(input("Enter temperature in Celsius:"))
tempF=(temp*(9/5))+32
print("temperature in celsius=",temp)
print("temperature in fahrenheit=",tempF)
width//2= 8
width/2.0= 8.5
height/3= 4.0
1+2*5= 11
Write a program that uses input to prompt a user for their name and then welcomes them
Write a program to prompt the user for hours and rate per hour to compute gross pay
hours=int(input("Enter Hours:"))
rate=float(input("Enter Rate:"))
pay=hours*rate
print("Pay=",pay)
Enter Hours:5
Enter Rate:2.3
Pay= 11.5
Q1: Check number Given an integer n, find if n is positive, negative or 0. If n is positive, print "Positive" If
n is negative, print "Negative" And if n is equal to 0, print "Zero".
n=int(input("Enter a number:"))
if(n>0):
print("positive")
elif(n<0):
print("negative")
else:
print("zero")
Enter a number:34
Positive
Q2: Sum of n numbers Given an integer n, find and print the sum of numbers from 1 to n.
n=int(input("enter a number"))
sum=0
for i in range(1,n+1):
sum=sum+i
print(sum)
enter a number5
15
Q3: Sum of Even Numbers Given a number N, print sum of all even numbers from 1 to N.
n=int(input("enter a number:"))
sum=0
for i in range(2,n+1,2):
sum=sum+i
print(sum)
enter a number:10
30
Q4: Reverse of a number Write a program to generate the reverse of a given number N. Print the
corresponding reverse number. Note : If a number has trailing zeros, then its reverse will not include them.
For e.g., reverse of 10400 will be 401 instead of 00401.
n=int(input("enter a number:"))
rev=0
while(n>0):
last=n%10
rev=last+rev*10
n=n//10
print(rev)
enter a number:1000
1
Q5: Nth Fibonacci Number Provided 'n' you have to find out the n-th Fibonacci Number. Example: Input: 6
Output: 8
n=int(input("enter a number:"))
a=0
b=1
c=0
for i in range(2,n+1):
c=a+b
a=b
b=c
print("required term is",c)
enter a number:7
required term is 13
Q6: Fibonacci Member Given a number N, figure out if it is a member of fibonacci series or not. Return true
if the number is member of fibonacci series else false.
N = int(input("Enter the number you want to check: "))
f3 = 0
f1 = 1
f2 = 1
if (N == 0 or N == 1):
print("Given number is fibonacci number")
else:
while f3 < N:
f3 = f1 + f2
f2 = f1
f1 = f3
if f3 == N:
print("Given number is fibonacci number")
else:
print("No it’s not a fibonacci number")
Q8: Write a Program to calculate simple interest using function interest() that received principal amount,
time and rate and returns calculated simple interest.
def simple_interest(p,t,r):
si=(p*r*t)/100
print(si)
simple_interest(1000,2,5)
100.0
Q9: WAP to accept three integers and print the largest of the three
n1=int(input("enter the first number : "))
n2=int(input("enter the second number : "))
n3=int(input("enter the third number : "))
if(n1>n2 and n1>n3):
print(n1," is the largest number")
elif(n2>n1 and n2>n3):
print(n2," is the largest number")
else:
print(n3," is the largest number")
Q10: WAP that inputs three numbers and print sum of non-duplicate numbers. Duplicate numbers are
ignored
def nonduplicate_sum(a, b, c):
if a != b and b != c and a != c:
return a + b + c
elif a == b == c:
return 0
elif a == b:
return c
elif b == c:
return a
elif a == c:
return b
nonduplicate_sum(1,4,4)
1
Q11: WAP to display a menu for calculating area of circle or perimeter of circle
radius=float(input("enter the radius of the circle:"))
print("1.Calculate area")
print("2.Calculate perimeter")
choice=int(input("enter choice:"))
if(choice==1):
area=3.14*radius*radius
print("Area of the circle is ",area)
elif(choice==2):
perimeter=2*3.14*radius
print("Perimeter of the circle is ",perimeter)
else:
print("invalid choice")
Q12: WAP that reads three numbers and prints them in ascending order
a = float(input("Enter a: "))
b = float(input("Enter b: "))
c = float(input("Enter c: "))
if a < b:
if b < c:
print (a, "<", b, "<", c)
else:
if a < c:
print (a, "<", c, "<", b)
else:
print (c, "<", a, "<", b)
else:
if c < b:
print (c, "<", b, "<", a)
else:
if c < a:
print (b, "<", c, "<", a)
else:
print (b, "<", a, "<", c)
Enter a: 12
Enter b: 9
Enter c: 34
9.0 < 12.0 < 34.0
Q13: WAP that prints sum of natural numbers between two numbers taken as input
n1 = int(input("Enter Lower limit:")) n2 =
int(input("Enter Upper limit:")) sum = 0 for i in
range(n1, n2+1):
sum = sum+i
print(f"Sum of natural numbers between {n1} and {n2} is: {sum}") Enter Lower limit:3
Enter Upper limit:6
Sum of natural numbers between 3 and 6 is: 18 Q14: WAP to
Q16: WAP to take String line as input and display following stats: Number of uppercase letters, Number of
lowercase letters, Number of alphabets and Number of Digits
str1 = input("Enter string: ") upper = 0
lower = 0 alphabets = 0 digits = 0
l = len(str1) for i in range (l): if
(str1[i].isupper()):
upper += 1 elif
(str1[i].islower()):
lower += 1 elif (str1[i].isdigit()):
digits += 1 alphabets = lower +
upper
print(f"Number of Uppercase characters: {upper}") print(f"Number of Lowercase
characters: {lower}")
print(f"Number of characters which are alphabets: {alphabets}")
Length of string: 17
Q17: WAP to reads a line and a substring and display number of occurences of the given substring in the
line
str = input("Enter string: ")
substr = input("Enter substring for check: ")
print(f"Number of times '{substr}' occurs in string: {str.count(substr)}")
Enter string: hi hi hi hi hello hello
Enter substring for check: hi
Number of times 'hi' occurs in string: 4
Q18: WAP that takes a string with multiple words and then capitalize the first letter of each word and forms
a new string out of it
str = input("Enter string:") print(f"String in Titlecase is:
{str.title()}")
Practice Problems
<class 'str'>
Program to show that ids of 2 variables having same value are equal
a=10 b=10 print(f"Id of a:
{id(a)}") print(f"Id of b: {id(b)}")
if (id(a)==id(b)): print("Both variables point to the same value.")
Id of a: 133569002291728
Id of b: 133569002291728
Both variables point to the same value. Program to compare
2 numbers
a = int(input("Enter first number: ")) b =
int(input("Enter second number: ")) if a>b:
print(f"{a} is greater than {b}.") elif a<b:
print(f"{b} is greater than {a}.") else:
print(f"Both {a} and {b} are equal.")
Prime numbers from 1 to 20 are: [3, 5, 7, 9, 11, 13, 15, 17, 19] Program to implement string
functions
a = 'Hello world' b =
'Hiii' print(f"Original
strings: {a} and {b}\n")
print(f"Strings in uppercase: '{a.upper()}' and '{b.upper()}'") print(f"Strings in lowercase:
'{a.lower()}' and '{b.lower()}'") print(f"Strings in titlecase: '{a.title()}' and '{b.title()}'")
print(f"Strings after replacing 'H' with '?': '{a.replace('H','?')}' and '{b.replace('H','?')}'")
Combination in python
def fact(N):
f = 1 for i in range(1,N+1):
f = f*i return f
n = int(input("Enter value for n: ")) r =
int(input("Enter value for r: ")) x = n-r
cnr = (fact(n)/(fact(r)*fact(x))) print(f"Combination: {cnr}")
Enter number: 6
Factorial of 6 is: 720
q1: WAP to find minimum element from a list of elements along with its index in the list
l = [24, 23, 10, 2, 1, 56, 14, 34] min_ele = l[0]
index = 0
print(f"List is: {l}")
for i in range(len(l)):
if(min_ele > l[i]): min_ele =
l[i] index = i
print(f"Minimum element of list is: {min_ele} and its index is: {index}")
Enter element:2
2 is present in list.
Enter element:4
Frequency of 4 is: 4
q5: WAP to find frequencies of all elements of a list. Also print the list of unique elements in the list and
duplicate elements in the list
l = [] #list taken from user l1 = [] #list of unique
elements lb = [] #list of duplicate elements ln = [] n =
int(input("Enter length of list: ")) for i in range(n):
a = int(input("Enter element: "))
l.append(a)
l.sort() for i in range (n):
count = 0
for j in l: if
j==l[i]: count +=
1
if(l[i] not in ln): print("Frequency of", l[i], "is ", count)
ln.append(l[i]) else:
continue
if(count==1):
l1.append(l[i])
if(count>1):
lb.append(l[i]) i =
i+count
q6: WAP to calculate and display sum of all odd numbers in the list
l = [1, 3, 4, 5, 2, 7, 3, 5, 9, 12, 54, 3] sum = 0
for i in l: if
(i%2!=0): sum
+= i
print(f"Sum of all odd elements in list is: {sum}") Sum of all odd
elements in list is: 36
q8: Given a list in Python and provided the positions of the elements, write a program to swap the two
elements in the list.
def swap(list, a,b): temp =
list[b] list[b] = list[a] list[a] =
temp
List after reversing all strings: ['olaH', 'raW fo doG', 'sU fo tsaL ehT', 'nwaD oreZ:noziroH']
Tuple is: (12, 23, 45, 68, 234, 65, 12, 23, 45, 23) Maximum frequency element
in tuple is: 23
MENU
1.Add element to stack.
2.Pop element from stack.
3.Exit.
MENU
1.Add element to stack.
2.Pop element from stack.
3.Exit.
MENU
1.Add element to stack.
2.Pop element from stack.
3.Exit.
MENU
1.Add element to stack.
2.Pop element from stack.
3.Exit.
MENU
1.Add element to queue.
2.Pop element from queue.
3.Exit.
MENU
1.Add element to queue.
2.Pop element from queue.
3.Exit.
MENU
1.Add element to queue.
2.Pop element from queue.
3.Exit.
q13. WAP that scans an email address and forms a tuple of user name and domain
email = [] lud =
[] def scan(x):
a = x.split("@") lud.append((a[0],a[1]))
q14. WAP that accepts different number of arguments and return sum of only the positive values passed to
it.
num = [] sum =
0
n = int(input("Enter number of elements in list:"))
for i in range(n): e = int(input("Enter
element: ")) num.append(e) if (e > 0):
sum += e
print(f"Sum of all positive elements is: {sum}")
List is: [12, 23, 12, 54, 67, 23, 89, 45]
Enter index of first element to swap: 2
Enter index of second element to swap: 3
Values before swapping: [12, 23, 12, 54, 67, 23, 89, 45]
Values after swapping: [12, 23, 54, 12, 67, 23, 89, 45] PRACTICE
PROBLEMS
43 <class 'int'>
Hello <class 'str'> 6.9 <class
'float'> a <class 'str'>
Row 0 is :
123
Row 0 is :
456
Row 0 is :
789
Program to find smallest and largest element in list.
l = []
ll = int(input("Enter length of list: "))
for i in range(ll):
inp = int(input("Enter element: "))
l.append(inp) for i in
range(ll):
for j in range(0,ll-i-1):
if (l[j]>l[j+1]): temp =
l[j] l[j] = l[j+1] l[j+1]
= temp
print(f"\nSorted list is: {l}\nSmallest element is: {l[0]}\nLargest element is: {l[ll1]}")
Program to take input from user to create list that should have data type string, float, int, boolean.
l = []
ll = int(input("Enter length of list: "))
for i in range(ll):
inp = input("Enter element: ")
l.append(inp) print(f"List
is: {l}")
method in list.
list1 = [2, 4, 6, 8, 10] n = int(input("Enter
element: "))
index = int(input("Enter index at which to insert: ")) list1.insert(index, n)
print(f"List after insertion is: {list1}")
Enter element: 5
Enter index at which to insert: 2 List after insertion is: [2, 4,
5, 6, 8, 10] Program to delete from list using remove
method.
list1 = ['A', 'B', 'C', 'D', 'E'] list1.remove('A') print(f"List
after removal is: {list1}") List after removal is: ['B', 'C', 'D',
'E'] Program to delete from list using pop method.
list1 = ['A', 'B', 'C', 'D', 'E'] list1.pop() print(f"List after
removal is: {list1}") List after removal is: ['A', 'B', 'C', 'D']
Program to delete from list using del method.
list1 = ['A', 'B', 'C', 'D', 'E'] del list1[2] print(f"List after
removal is: {list1}") List after removal is: ['A', 'B', 'D', 'E']
Program to find min and max elements in 2D list.
l = [ [1,2,3] , [4, 5, 4], [19, 16, 12] ]
min = l[0][0] max =
0 for i in l: for j in
i: if(min > j):
min = j if(max <
j): max = j
print(f"Minimum value element in 2D list is: {min}") print(f"Maximum value element in
2D list is: {max}")
n = int(input("Enter k-th number to check for kth smallest and largest number: ")) sno = list2[n-1] lno = list2[-
n]
print(f"{n}-th smallest number is: {sno}") print(f"{n}-th largest number is:
{lno}")
Enter k-th number to check for kth smallest and largest number: 2
2-th smallest number is: 23 2-th largest
number is: 69
Tuple is immutable.
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-48-c160cf47212b> in <cell line: 3>()
1 t = ('bvp', 13, 'HI', 42.690)
2 print("Tuple is immutable.")
----> 3 t[2] = 24
4 print(t)
TypeError: 'tuple' object does not support item assignment Program to print type of
tuple elements.
t = ('bvp', 13, 'HI', 42.690) for i in t: print(f"Element {i} is
of type {type(i)}")
Tuple after changing value in list: ('bvp', [1, 2, 71], 13, 'HI', 42.69) Program to count number of unique
elements in tuple.
t = (1, 3, 4, 5, 2, 7, 3, 5, 9, 12, 54, 3) l = []
print(f"Tuple is: {t}")
print(f"Number of elements in tuple is: {len(t)}") for i in t: if i not
in l:
l.append(i) print(f"Number of unique elements in tuple is: {len(l)}")
l.append(i) else:
de.append(i)
tuples.
t1 = ('racecar', 'kanak', '12321', '521', 'Kratos') t2 = ('bvp', 13, 'HI',
42.690) t3 = t1 + t2 print(f"Concatenation of tuples is: {t3}")
Concatenation of tuples is: ('racecar', 'kanak', '12321', '521', 'Kratos', 'bvp', 13, 'HI', 42.69)
Tuple after converting into list and modifying: ['bvp', 13, 'HI', 42.69, 21]
Day 4 :- Assignment On Sets, Dictionaries
q1: Write a Python program to return a new set with unique items from both sets by removing duplicates.
s1 = {23, 43, 1, 2, 4, 5, 8, 4, 2} s2 = {23, 3, 4,
56, 7, 8, 4} s = set()
s.update(s1)
s.update(s2) print(f"New set created using 2 sets is: {s}")
New set created using 2 sets is: {1, 2, 3, 4, 5, 7, 8, 43, 23, 56}
q2: Given two Python sets, write a Python program to update the first set with items that exist only in the
first set and not in the second set.
s1 = {23, 43, 1, 2, 4, 5, 8, 4, 2} s2 = {23, 3, 4, 56,
7, 8, 4}
s1 = (s1 - s2) print(f"Set having only elements in s1 and not in s2: {s1}") Set
having only elements in s1 and not in s2: {1, 2, 43, 5}
q3: WAP to Check if two sets have any elements in common. If yes, display the common elements
s1 = {23, 43, 1, 2, 4, 5, 8} s2 = {23,
3, 4, 56, 7, 8, 4} s = {} for i in s1: if
i in s2: print(i, end=" ")
4 23 8
q4: WAP to Update set1 by adding items from set2, except common items
s1 = {23, 43, 1, 2, 4, 5, 8, 4, 2} s2 = {23, 3, 4, 56,
7, 8, 4}
Set 1 updated except for common items: {1, 2, 3, 5, 7, 43, 56} q5: WAP to get the maximum and minimum
element in a set in Python, using the built-in functions of Python
s1 = {23, 43, 1, 2, 4, 5, 8, 4, 2} maxe =
max(s1) mine = min(s1)
q7: Write a Python program to create a dictionary from a string. Note: Track the count of the letters from the
string.
Sample string : 'AIMLTraining'
Expected output: {'A': 1, 'I': 1, 'M': 1, 'L': 1, 'T': 1, 'r': 1, 'a': 1, 'i': 2, 'n': 2, 'g': 1}
s = "AIMLTraining"
dict = {} for i in s: if i
in dict: dict[i] += 1
else:
dict[i] = 1 print(dict)
{'A': 1, 'I': 1, 'M': 1, 'L': 1, 'T': 1, 'r': 1, 'a': 1, 'i': 2, 'n': 2, 'g': 1}
q8: Write a Python program to get the top three items in a shop. Sample data: {'item1': 45.50, 'item2':35,
'item3': 41.30, 'item4':55, 'item5': 24} Expected Output:
item4 55 item1
45.5 item3
41.3
dict1 = {'item1': 45.50, 'item2':35, 'item3': 41.30, 'item4':55, 'item5': 24} dict2 = sorted(dict1.items(),
key=lambda x:x[1])
PRACTICE PROBLEMS
Original dictionary is: {'A': 1000, 'B': 1002, 'C': 1004, 'D': 1006, 'E': 1008, 'F':
1010}
Dictionary after using del method on 'A': {'B': 1002, 'C': 1004, 'D': 1006, 'E': 1008,
'F': 1010}
Dictionary after using pop method on 'B': {'C': 1004, 'D': 1006, 'E': 1008, 'F': 1010}
Dictionary after using clear method: {}
Array: [0 1 2 3 4 5 6 7 8 9]
3D array of Trues:
[[ True True True]
[ True True True]
[ True True True]]
q8: Find the mean, median, standard deviation of iris's sepallength (1st column)
import pandas as pd
df = pd.read_csv('/content/drive/MyDrive/Summer Training/csv files/Iris.csv') sl = df["SepalLengthCm"]
q9: Create a normalized form of iris's sepallength whose values range exactly between 0 and 1 so that the
minimum has value 0 and maximum has value 1. Use following Normalization Formula -> x normalized
= (x – x minimum) / (x maximum – x minimum)
import pandas as pd
df = pd.read_csv('/content/drive/MyDrive/Summer Training/csv files/Iris.csv') sl =
df["SepalLengthCm"] min = sl.min() max = sl.max() nsl = [] for i in sl: nsl.append((i-min)/(max-
min))
df.assign(NormalizedSepalLength = nsl)
Id SepalLengthCm SepalWidthCm PetalLengthCm PetalWidthCm \
0 1 5.1 3.5 1.4 0.2
1 2 4.9 3.0 1.4 0.2
2 3 4.7 3.2 1.3 0.2
3 4 4.6 3.1 1.5 0.2
4 5 5.0 3.6 1.4 0.2 .. ... ... ... ... ...
145 146 6.7 3.0 5.2 2.3
146 147 6.3 2.5 5.0 1.9
147 148 6.5 3.0 5.2 2.0
148 149 6.2 3.4 5.4 2.3 149 150 5.9 3.0 5.1 1.8
Species NormalizedSepalLength
0 Iris-setosa 0.222222
1 Iris-setosa 0.166667
2 Iris-setosa 0.111111
3 Iris-setosa 0.083333
4 Iris-setosa 0.194444 .. ... ...
145 Iris-virginica 0.666667
146 Iris-virginica 0.555556
147 Iris-virginica 0.611111
148 Iris-virginica 0.527778
149 Iris-virginica 0.444444
q10: Filter the rows of iris data that has petallength (3rd column) > 1.5 and sepallength (1st column) < 5.0
df = pd.read_csv('/content/drive/MyDrive/Summer Training/csv files/Iris.csv') fi =
df[(df["SepalLengthCm"]<5) & (df["PetalLengthCm"]>1.5)]
fi
q11: Bin the petal length (3rd) column of iris data to form a text array, such that if petal length is: Less
than 3 --> 'small'
3-5 --> 'medium'
->5 --> 'large'
import numpy as np import
pandas as pd df =
pd.read_csv('/content/drive/MyD
rive/Summer Training/csv
files/Iris.csv') pl =
df["PetalLengthCm"] l = [] for i
in pl: if (i<3):
l.append("small") elif (i>5):
l.append("large") else:
l.append("medium")
df.assign(PetalLength = l)
Species PetalLength
0 Iris-setosa small
1 Iris-setosa small
2 Iris-setosa small
3 Iris-setosa small
4 Iris-setosa small .. ... ...
145 Iris-virginica large
146 Iris-virginica medium
147 Iris-virginica large
148 Iris-virginica large
149 Iris-virginica large
Species
13 Iris-setosa
42 Iris-setosa
38 Iris-setosa
8 Iris-setosa
41 Iris-setosa .. ...
122 Iris-virginica
118 Iris-virginica
117 Iris-virginica
135 Iris-virginica
131 Iris-virginica
q13: Find the most frequent value of petal length (3rd column) in iris dataset.
import numpy as np import
pandas as pd
01. Create an empty array of 20 0's and replace the 4th object with the number 5 arr1 =
np.zeros(20, dtype = int) arr1[3] = 5 arr1 array([0, 0, 0, 5, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0])
02. Create an array of 20 1's and store it as a variable named array_master. Copy the
same array into another variable named array_copy array_master = np.ones(20, dtype = int)
array_copy = array_master.copy() array_copy array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1])
03.Create an array containing 30 1's and broadcast all the one's to the value 100
arr2 = np.ones(30, dtype = int) arr2[:] =
100 arr2
array([100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100,
100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100])
04. Create an array of integers starting from 21 until 31 and name it as array1
--- Create an array of integers starting from 11 until 21 and name it array2
diff = array1-array2 diff array([10, 10, 10, 10, 10, 10, 10, 10, 10, 10,
10])
05. Create an array of all even integers from 2 to 10 and name it a1
a) Use the 2 arrays as rows and create a matrix [ Hint - Use stack function from numpy ]
a1 = np.arange(2,12, 2) a2 =
np.arange(22, 32, 2)
array([[ 2, 4, 6, 8, 10],
[22, 24, 26, 28, 30]])
b) Use the 2 arrays as columns and create a matrix [ Hint - Use column_stack function from numpy ]
acc = np.column_stack((a1, a2)) acc
array([[ 2, 22], [ 4,
24],
[ 6, 26],
[ 8, 28],
[10, 30]])
06. Create a 5x6 matrix with values ranging from 0 to 29 and retrieve the value intersecting at 2nd row and
3rd column
arr = np.arange(0,30,1).reshape(5, 6) print(arr[2][3])
15
07. Create an identity matrix of shape 10x10 and replace the 0's with the value 21
arri = np.eye(10, dtype=int) for i in
range(10): for j in range(10): if
(arri[i][j]==0): arri[i][j] = 21 arri
array([[ 1, 21, 21, 21, 21, 21, 21, 21, 21, 21], [21, 1, 21, 21, 21, 21,
21, 21, 21, 21],
[21, 21, 1, 21, 21, 21, 21, 21, 21, 21],
[21, 21, 21, 1, 21, 21, 21, 21, 21, 21],
[21, 21, 21, 21, 1, 21, 21, 21, 21, 21],
[21, 21, 21, 21, 21, 1, 21, 21, 21, 21],
[21, 21, 21, 21, 21, 21, 1, 21, 21, 21],
[21, 21, 21, 21, 21, 21, 21, 1, 21, 21],
[21, 21, 21, 21, 21, 21, 21, 21, 1, 21],
[21, 21, 21, 21, 21, 21, 21, 21, 21, 1]])
08. Display a boolean array output where all values > 0.2 are True, rest are marked as False
ar = np.random.rand(10) bar =
[] for i in ar: if (i>0.2):
bar.append("True") else:
bar.append("False") print(bar)
['True', 'True', 'True', 'True', 'True', 'True', 'True', 'True', 'False', 'True']
09. Use NumPy to generate an array matrix of 5x2 random numbers sampled from a standard normal
distribution
m1 = np.random.randn(5, 2) m1
array([[-0.24505186, 0.07510073], [-0.94109451,
1.29147136],
[-0.03320188, 1.18647596],
[ 0.12500751, -0.93431175],
[-1.15417168, 0.5521074 ]])
11. Using the below given Matrix, generate the output for the below questions.
[ [1 2 4 67] [34 55 65 7] [45 66 44 3] [33 79 23 9] ]
a) Retrieve the last 2 rows and first 3 column values of the above matrix using index & selection technique
m1 = ([[1, 2, 4, 67], [34, 55,
65, 7],
[45, 66, 44, 3],
[33, 79, 23, 9]])
m1[2:][:3]
b) Retrieve the value 55 from the above matrix using index & selection technique
m1[1][1]
55
c) Retrieve the values from the 3rd column in the above matrix
m1[:][2]
d) Retrieve the values from the 4th row in the above matrix
m1[3][:]
e) Retrieve values from the 2nd & 4th rows in the above matrix m1[1:4:2][:]
[[34, 55, 65, 7], [33, 79, 23, 9]]
537
b) Calculate standard deviation of all the values in the matrix
sd = np.std(m1) sd
26.466886740793676
700.49609375
79
Practice Problems
Program to show that numpy array takes less space and less time to form than list array.
import numpy as np import sys
li_arr = [i for i in range(12)] np_arr =
np.arange(12)
print(np_arr.itemsize*np_arr.size)
print(sys.getsizeof(1)*len(li_arr))
96
336
[1 2 3 4]
<class 'numpy.ndarray'>
['1' '2' '3' '4' '6.15' '1' '2' '3' '4' '6.15' '1' '2' '3' '4' '6.15' '1' '2' '3' '4' '6.15']
array([[1, 1, 1],
[1, 1, 1]])
a = np.array(arr1) b =
np.array(arr2)
for i in range(len(a)): for j in
range(len(b)):
if (a[i]==b[j]):
print(f"Element {a[i]} is present in both arrays at indexes {i} and {j} respectively.")
[0.0,
0.023255813953488372,
0.046511627906976744,
0.3023255813953488,
0.32558139534883723,
0.4186046511627907, 0.3953488372093023,
0.6976744186046512,
0.8372093023255814,
1.0]
[0.4594594594594595,
0.32432432432432434,
0.7297297297297297,
0.1891891891891892,
0.05405405405405406,
0.0,
0.02702702702702703, 0.13513513513513514,
0.16216216216216217,
1.0]
Program to find max, min elements and their indexes in a numpy array.
import numpy as np
l = np.array([1, 2, 4, 3, 6, 2, 8, 34, 76, 24, 87, 35])
max = np.max(l) min =
np.min(l) imax =
np.argmax(l) imin =
np.argmin(l)
print(f"Maximum element in array is: {max} and its index is: {imax}") print(f"Minimum element
in array is: {min} and its index is: {imin}")
Maximum element in array is: 87 and its index is: 10 Minimum element
in array is: 1 and its index is: 0 Program to implement Pandas
Series.
import pandas as pd import numpy
as np
labels = ['w','x', 'y', 'z'] list = [10, 20, 30, 40] list =
np.array([10, 20, 30, 40]) dict = {'w':10 , 'x':20, 'y':30,
'z':40} pd.Series(data = list)
0 10
1 20
2 30 3 40 dtype: int64 pd.Series(data = list , index = labels)
w 10 x 20 y
30 z 40 dtype:
int64
pd.Series(list,
labels)
w 10 x 20 y
30 z 40 dtype:
int64
pd.Series(dict)
w 10 x 20 y
30 z 40 dtype:
int64
Cricket 1
Football 2
Baseball 3 Golf 4
dtype: int64
sports1['Baseball']
NUMBERS ALPHABETS
0 1 A
1 2 B
2 3 C
3 4 D
4 5 E
5 6 F
6 7 G
7 8 H
8 9 I
9 10 J
10 11 K
11 12 L
12 13 M
13 14 N
14 15 O
15 16 P
16 17 Q
17 18 R
18 19 S
19 20 T
q1: From the given dataset print the first and last five rows
import pandas as pd
df2 = pd.read_csv("/content/drive/MyDrive/Summer Training/csv files/Automobile_data.csv")
print(f"First 5 rows are:\n{df2.head()}") print(f"\nLast 5 rows
are:\n{df2.tail()}")
First 5 rows are:
index company body-style wheel-base length engine-type \ 0 0 alfa-romero
convertible 88.6 168.8 dohc
1 1 alfa-romero convertible 88.6 168.8 dohc
2 2 alfa-romero hatchback 94.5 171.2 ohcv
3 3 audi sedan 99.8 176.6 ohc 4 4 audi sedan 99.4 176.6 ohc
q2: Clean the dataset and update the CSV file. Replace all column values which contain ?, n.a, or NaN with
?
df3 = df2.fillna(0) df3
toyota 7 bmw
6 mazda 5 nissan
5 audi 4 mercedes-
benz 4 mitsubishi 4
volkswagen 4 alfa-
romero 3 chevrolet 3
honda 3 isuzu 3
jaguar 3 porsche
3 dodge 2 volvo
2
Name: company, dtype: int64
average-mileage company
alfa-romero 20.333333 audi
20.000000 bmw 19.000000
chevrolet 41.000000 dodge
31.000000 honda 26.333333 isuzu
33.333333 jaguar 14.333333 mazda
28.000000 mercedes-benz 18.000000
mitsubishi 29.500000 nissan
31.400000 porsche 17.000000 toyota
28.714286 volkswagen 31.750000
volvo 23.000000
Problems
labels = ['w','x', 'y', 'z'] list = [10, 20, 30, 40] list =
np.array([10, 20, 30, 40]) dict = {'w':10 , 'x':20, 'y':30,
'z':40} pd.Series(data = list)
0 10
1 20
2 30 3 40 dtype: int64 pd.Series(data = list , index = labels)
w 10 x 20 y 30 z 40
dtype: int64 pd.Series(list,
labels)
w 10 x 20 y
30 z 40 dtype:
int64
pd.Series(dict)
w 10 x 20 y
30 z 40 dtype:
int64
Cricket 1
Football 2
Baseball 3 Golf 4
dtype: int64
sports1['Baseball']
NUMBERS ALPHABETS
0 1 A
1 2 B
2 3 C
3 4 D
4 5 E
5 6 F
6 7 G
7 8 H
8 9 I
9 10 J
10 11 K
11 12 L
12 13 M
13 14 N
14 15 O
15 16 P
16 17 Q
17 18 R
18 19 S
19 20 T
type(df['mark']) pandas.core.series.Series
index 0 company 0
body-style 0 wheel-base
0 length 0 engine-type
0 num-of-cylinders 0 horsepower
0 average-mileage 0 price
3 dtype: int64
Download CSV File from Link and answer the following questions -
https://drive.google.com/file/d/1-
PbK5h1Msmw2LRysPWNvJQkNBEZac6YK/view?usp=sharing
from google.colab import drive
drive.mount('/content/drive') Mounted at
/content/drive
q1: Read Total profit of all months and show it using a line plot
X label name = Month Number
Y label name = Total profit
x = df['month_number'] y =
df['total_profit'] plt.plot(x, y)
[<matplotlib.lines.Line2D at 0x7b0ecd676f20>]
q2: Get total profit of all months and show line plot with the following Style properties
Line Style dotted and Line-color should be red
<matplotlib.legend.Legend at 0x7b0ecd599e70>
q3: Read all product sales data and show it using a multiline plot
Display the number of units sold per month for each product using multiline plots. (i.e., Separate Plotline
for each product ).
x = df['month_number'] y1 =
df['facecream'] y2 =
df['facewash'] y3 =
df['toothpaste'] y4 =
df['bathingsoap'] y5 =
df['shampoo'] y6 =
df['moisturizer']
plt.plot(x, y1, label = 'Facecream') plt.plot(x, y2, label =
'Facewash') plt.plot(x, y3, label = 'Toothpaste') plt.plot(x, y4,
label = 'Bathing Soap') plt.plot(x, y5, label = 'Shampoo')
plt.plot(x, y6, label = 'Moisturizer') plt.legend() plt.axis([1, 12,
0, 18000]) plt.xlabel('Month Number') plt.ylabel('Product units
sold')
q5: Read face cream and facewash product sales data and show it using the bar chart The bar chart should
display the number of units sold per month for each product. Add a separate bar for each product in the
same chart.
plt.bar(x+0.2, y1, label = 'Facecream', width = 0.4) plt.bar(x-0.2, y2,
label = 'Facewash', width = 0.4) plt.xlabel('Month number')
plt.ylabel('Units sold') plt.title('Product Sales') plt.legend()
<matplotlib.legend.Legend at 0x7b0ecde840d0>
q6: Read sales data of bathing soap of all months and show it using a bar chart. Save this plot to your hard
disk
plt.bar(x, y4) plt.xlabel('Month')
plt.ylabel('Units') plt.legend(['Units sold'])
plt.savefig('MyChart.png')
q7: Read the total profit of each month and show it using the histogram to see the most common profit ranges
y8 = df['total_profit'] plt.hist(y8, rwidth = 0.8,
bins = 10)
(array([2., 4., 1., 1., 1., 1., 0., 1., 0., 1.]),
array([183300., 206250., 229200., 252150., 275100., 298050., 321000.,
343950., 366900., 389850., 412800.]),
<BarContainer object of 10 artists>)
q8: Calculate total sale data for last year for each product and show it using a Pie chart
s1 = y1.sum() s2 =
y2.sum() s3 =
y3.sum() s4 =
y4.sum() s5 =
y5.sum() s6 =
y6.sum()
q10: Read all product sales data and show it using the stack plot
y = np.vstack([y1, y2, y3, y4, y5, y6])
prod = ['Facecream', 'Facewash', 'Toothpaste', 'Bathing soap', 'Shampoo',
'Moisturizer']
#Practice Problems
Program to plot a simple line chart.
import matplotlib.pyplot as plt
x = [1, 2, 3, 4] y = [1, 4,
9, 16] plt.plot(x,y)
[<matplotlib.lines.Line2D at 0x7b0ecd630cd0>]
Program to plot a line chart of f(x) = x^3.
import numpy as np
import matplotlib.pyplot as plt
x = np.arange(-10, 11, 1) y =
x**3 plt.plot(x,y)
[<matplotlib.lines.Line2D at 0x7b0ecd9956c0>]
<matplotlib.collections.PathCollection at 0x7b0ecd716290>
[<matplotlib.lines.Line2D at 0x7b0ecd32dcf0>]
x = np.arange(0,11, 1)
a = int(input("Enter coefficient of x^2: ")) b =
int(input("Enter coefficient of x: ")) c = int(input("Enter y-
intercept: ")) y = a*x**2 + b*x + c plt.plot(x,y)
[<matplotlib.lines.Line2D at 0x7b0ecd1ab5e0>]
x = np.arange(0, 11, 2) y = x +
10 plt.plot(x, y, 'g^:')
[<matplotlib.lines.Line2D at 0x7b0ecd21b1f0>]
import numpy as np import
matplotlib.pyplot as plt
x = np.arange(-2, 7, 1) y =
x**4 plt.plot(x, y, 'b--d')
[<matplotlib.lines.Line2D at 0x7b0ecd0b46a0>]
[<matplotlib.lines.Line2D at 0x7b0ecd11ba30>]
Program to plot multiple graphs while differentiating them with colors.
import numpy as np
import matplotlib.pyplot as plt
x = np.arange(0, 11, 1) y1 = x
y2 = x**2 y3 = x**3
plt.plot(x, y1, 'r--') plt.plot(x, y2, 'bs:')
plt.plot(x, y3, 'g^-')
#plt.plot(x, y1, 'r--', x, y2, 'bs', x, y3, 'g^') also works instead.
[<matplotlib.lines.Line2D at 0x7b0ecceb8280>]
x = np.arange(-5, 6, 1) y1 =
x**2 y2 = x**3
plt.plot(x, y1, 'g^--', label='Parabolic') plt.plot(x, y2, 'yo:',
label='Cubic') plt.legend() #to show legend plt.title('Parabolic v/s
Cubic graph')
[<matplotlib.lines.Line2D at 0x7b0ecceebf70>]
Program to limit axes in graph.
import numpy as np import
matplotlib.pyplot as plt
x = np.random.rand(25) y = np.random.rand(25)
plt.scatter(x, y) plt.plot(x, y, 'ro:', markerfacecolor = 'b')
[<matplotlib.lines.Line2D at 0x7b0ecce00100>]
x = np.arange(1, 13, 1) y =
np.arange(500, 6500, 500)
(-1.1199233681493854,
1.1009483364911072,
-1.1066336766829425,
1.1003158893657508)
Program to create a simple histogram.
import numpy as np
import matplotlib.pyplot as plt
(array([1., 0., 1., 0., 0., 1., 0., 1., 0., 0., 1., 0., 1., 0., 0., 1., 0.,
1.]),
array([ 5. , 5.77777778, 6.55555556, 7.33333333, 8.11111111, 8.88888889, 9.66666667,
10.44444444, 11.22222222, 12. ,
12.77777778, 13.55555556, 14.33333333, 15.11111111, 15.88888889,
16.66666667, 17.44444444, 18.22222222, 19. ]),
<BarContainer object of 18 artists>)
Program to create Stack plot.
import numpy as np
import matplotlib.pyplot as plt
Machine learning is a type of artificial intelligence (AI) that allows software applications to become more
accurate in predicting outcomes without being explicitly programmed to do so. Machine learning algorithms
use historical data as input to predict new output values.
Python is a general-purpose programming language that is often used for machine learning applications. It
is easy to learn and has a large library of machine learning modules and libraries.
There are two main types of machine learning in Python: supervised learning and unsupervised learning.
• Supervised learning is when the computer is given labelled data, meaning that the output values are
known. The computer then learns to predict the output values for new data based on the labelled data.
• Unsupervised learning is when the computer is given unlabelled data. The computer then learns to
find patterns in the data without any prior knowledge of the output values.
• Linear regression: This algorithm is used to predict continuous values, such as the price of a house
or the number of sales.
• Logistic regression: This algorithm is used to predict categorical values, such as whether a customer
will click on an ad or not.
• Decision trees: This algorithm is used to create a decision tree that can be used to classify or predict
data.
• Support vector machines (SVMs): This algorithm is used to find the best hyperplane that separates
two classes of data.
• Neural networks: This algorithm is a type of machine learning that is inspired by the human brain. It
is used to solve complex problems, such as image recognition and natural language processing.
Machine learning with Python is a powerful tool that can be used to solve a wide variety of problems. It is
a relatively easy language to learn, and there are many resources available to help you get started.
Here are some of the advantages of using Python for machine learning:
• Python is a general-purpose language, which means that it can be used for a variety of tasks, not just
machine learning.
• Python is easy to learn and use.
• There are many machine learning libraries and modules available for Python.
• Python is open source, which means that it is free to use and modify.
Linear Regression
Linear regression is one of the easiest and most popular Machine Learning algorithms. It is a statistical
method that is used for predictive analysis. Linear regression makes predictions for continuous/real or
numeric variables such as sales, salary, age, product price, etc.
Linear regression algorithm shows a linear relationship between a dependent (y) and one or more
independent (y) variables, hence called as linear regression. Since linear regression shows the linear
relationship, which means it finds how the value of the dependent variable is changing according to the
value of the independent variable.
a0+a1x+ ε
Here,
Linear regression can be further divided into two types of the algorithm:
Logistic Regression
• Logistic regression is another supervised learning algorithm which is used to solve the classification
problems. In classification problems, we have dependent variables in a binary or discrete format such as 0
or
1.
• Logistic regression algorithm works with the categorical variable such as 0 or 1, Yes or No, True or False,
Spam or not spam, etc.
• Logistic regression is a type of regression, but it is different from the linear regression algorithm in the term
how they are used.
• Logistic regression uses sigmoid function or logistic function which is a complex cost function. This
sigmoid function is used to model the data in logistic regression.
KNN Algorithm
• K-Nearest Neighbour is one of the simplest Machine Learning algorithms based on Supervised Learning
technique.
• K-NN algorithm assumes the similarity between the new case/data and available cases and put the new case
into the category that is most similar to the available categories.
• K-NN algorithm stores all the available data and classifies a new data point based on the similarity. This means
when new data appears then it can be easily classified into a well suite category by using K- NN algorithm.
• K-NN algorithm can be used for Regression as well as for Classification but mostly it is used for the
Classification problems.
• K-NN is a non-parametric algorithm, which means it does not make any assumption on underlying data.
• It is also called a lazy learner algorithm because it does not learn from the training set immediately instead it
stores the dataset and at the time of classification, it performs an action on the dataset.
• KNN algorithm at the training phase just stores the dataset and when it gets new data, then it classifies that
data into a category that is much similar to the new data.
The goal of the SVM algorithm is to create the best line or decision boundary that can segregate n-dimensional space
into classes so that we can easily put the new data point in the correct category in the future. This best decision
boundary is called a hyperplane.
SVM chooses the extreme points/vectors that help in creating the hyperplane. These extreme cases are called as support
vectors, and hence algorithm is termed as Support Vector Machine.
• Naïve Bayes algorithm is a supervised learning algorithm, which is based on Bayes theorem and used for
solving classification problems.
• It is mainly used in text classification that includes a high-dimensional training dataset.
• Naïve Bayes Classifier is one of the simple and most effective Classification algorithms which helps in
building the fast machine learning models that can make quick predictions.
• It is a probabilistic classifier, which means it predicts on the basis of the probability of an object.
• Some popular examples of Naïve Bayes Algorithm are spam filtration, Sentimental analysis, and classifying
articles.
• Decision Tree is a Supervised learning technique that can be used for both classification and Regression
problems, but mostly it is preferred for solving Classification problems. It is a tree-structured classifier, where
internal nodes represent the features of a dataset, branches represent the decision rules and each leaf node
represents the outcome.
• In a Decision tree, there are two nodes, which are the Decision Node and Leaf Node. Decision nodes are used
to make any decision and have multiple branches, whereas Leaf nodes are the output of those decisions and
do not contain any further branches.
• The decisions or the test are performed on the basis of features of the given dataset.
• It is a graphical representation for getting all the possible solutions to a problem/decision based on given
conditions.
• It is called a decision tree because, similar to a tree, it starts with the root node, which expands on further
branches and constructs a tree-like structure.
• In order to build a tree, we use the CART algorithm, which stands for Classification and Regression Tree
algorithm.
• A decision tree simply asks a question, and based on the answer (Yes/No), it further split the tree into subtrees.
Random Forest is a popular machine learning algorithm that belongs to the supervised learning technique. It can be
used for both Classification and Regression problems in ML. It is based on the concept of ensemble learning, which
is a process of combining multiple classifiers to solve a complex problem and to improve the performance of the
model.
As the name suggests, "Random Forest is a classifier that contains a number of decision trees on various subsets of
the given dataset and takes the average to improve the predictive accuracy of that dataset." Instead of relying on one
decision tree, the random forest takes the prediction from each tree and based on the majority votes of predictions,
and it predicts the final output.
The greater number of trees in the forest leads to higher accuracy and prevents the problem of overfitting.
K-Means Clustering
K-Means Clustering is an Unsupervised Learning algorithm, which groups the unlabelled dataset into different
clusters. Here K defines the number of pre-defined clusters that need to be created in the process, as if K=2, there will
be two clusters, and for K=3, there will be three clusters, and so on.
It is an iterative algorithm that divides the unlabelled dataset into k different clusters in such a way that each dataset
belongs only one group that has similar properties.
It allows us to cluster the data into different groups and a convenient way to discover the categories of groups in the
unlabelled dataset on its own without the need for any training.
It is a centroid-based algorithm, where each cluster is associated with a centroid. The main aim of this algorithm is to
minimize the sum of distances between the data point and their corresponding clusters.
The algorithm takes the unlabelled dataset as input, divides the dataset into k-number of clusters, and repeats the
process until it does not find the best clusters. The value of k should be predetermined in this algorithm.
The k-means clustering algorithm mainly performs two tasks:
• Determines the best value for K center points or centroids by an iterative process.
• Assigns each data point to its closest k-center. Those data points which are near to the particular k-center,
create a cluster.
• Hence each cluster has datapoints with some commonalities, and it is away from other clusters.
The Apriori algorithm uses frequent itemsets to generate association rules, and it is designed to work on the databases
that contain transactions. With the help of these association rule, it determines how strongly or how weakly two objects
are connected. This algorithm uses a Breadth-First Search and Hash Tree to calculate the itemset associations
efficiently. It is the iterative process for finding the frequent itemsets from the large dataset.
Association rule learning is a type of unsupervised learning technique that checks for the dependency of one data item
on another data item and maps accordingly so that it can be more profitable. It tries to find some interesting relations
or associations among the variables of dataset. It is based on different rules to discover the interesting relations between
variables in the database.
The association rule learning is one of the very important concepts of machine learning, and it is employed in Market
Basket analysis, Web usage mining, continuous production, etc. Here market basket analysis is a technique used by
the various big retailer to discover the associations between items. We can understand it by taking an example of a
supermarket, as in a supermarket, all products that are purchased together are put together.
For example, if a customer buys bread, he most likely can also buy butter, eggs, or milk, so these products are stored
within a shelf or mostly nearby.
DAY 1 Assignment (Zomato dataset)
1. Read csv
2. Display no. of columns
3. Describe the dataset
4. Check how many columns with null are there
5. Read excel file
6. Merge both files (country code common)
7. Display the final column list
8. Plot piechart (countryvalue vs label)
9. Plot piechart for highest 3 countries
10. Groupby aggregate rating, rating color, rating text
11. Plot bar aggregate rating vs rating count
12. Count plot rating color vs rating count
13. Which currency used by which country (groupby) ?
14. Which country has online delivery options ?
import numpy as np import pandas as pd
import matplotlib.pyplot as plt import
seaborn as sns
#1 Read csv
df = pd.read_csv('/content/drive/MyDrive/Summer Training/csv files/zomato.csv') df
Locality \
0 Century City Mall, Poblacion, Makati City
1 Little Tokyo, Legaspi Village, Makati City
2 Edsa Shangri-La, Ortigas, Mandaluyong City
3 SM Megamall, Ortigas, Mandaluyong City
4 SM Megamall, Ortigas, Mandaluyong City ... ...
9546 Karakí_y 9547 Ko
ôuyolu 9548 Kuruí_e ôme
9549 Kuruí_e ôme 9550
Moda
21
Average Cost for two Price range Aggregate rating Votes count 9551.000000
9551.000000 9551.000000 9551.000000 mean 1199.210763 1.804837 2.666370
156.909748 std 16121.183073 0.905609 1.516378 430.169145 min
0.000000 1.000000 0.000000 0.000000 25% 250.000000 1.000000
2.500000 5.000000
50% 400.000000 2.000000 3.200000 31.000000 75% 700.000000
2.000000 3.700000 131.000000 max 800000.000000 4.000000 4.900000
10934.000000
Restaurant ID False
Restaurant Name False
Country Code False
City False
Address False
Locality False
Locality Verbose False
Longitude False
Latitude False
Cuisines True
Average Cost for two False
Currency False
Has Table booking False
Has Online delivery False
Is delivering now False
Switch to order menu False
Price range False
Aggregate rating False
Rating color False
Rating text False Votes
False dtype: bool
Locality \
0 Century City Mall, Poblacion, Makati City
1 Little Tokyo, Legaspi Village, Makati City
2 Edsa Shangri-La, Ortigas, Mandaluyong City
3 SM Megamall, Ortigas, Mandaluyong City
4 SM Megamall, Ortigas, Mandaluyong City ... ...
9546 Karakí_y 9547 Ko
ôuyolu 9548 Kuruí_e ôme
9549 Kuruí_e ôme 9550
Moda
([<matplotlib.patches.Wedge at 0x7ee2bfe36bf0>,
<matplotlib.patches.Wedge at 0x7ee2bfe36ad0>,
<matplotlib.patches.Wedge at 0x7ee2bfe378e0>,
<matplotlib.patches.Wedge at 0x7ee2bfe37f70>,
<matplotlib.patches.Wedge at 0x7ee2bfe74640>,
<matplotlib.patches.Wedge at 0x7ee2bfe74cd0>,
<matplotlib.patches.Wedge at 0x7ee2bfe75360>,
<matplotlib.patches.Wedge at 0x7ee2bfe759c0>,
<matplotlib.patches.Wedge at 0x7ee2bfe76050>,
<matplotlib.patches.Wedge at 0x7ee2bfe766e0>,
<matplotlib.patches.Wedge at 0x7ee2bfe36bc0>,
<matplotlib.patches.Wedge at 0x7ee2bfe773d0>,
<matplotlib.patches.Wedge at 0x7ee2bfe77a60>,
<matplotlib.patches.Wedge at 0x7ee2bfeac130>,
<matplotlib.patches.Wedge at 0x7ee2bfeac7c0>],
[Text(-1.052256163793291, 0.3205572737577906, 'India'),
Text(0.9911329812843455, -0.477132490415823, 'United States'),
Text(1.0572858296119743, -0.3035567072257165, 'United Kingdom'),
Text(1.070138816916019, -0.2545641619112621, 'Brazil'),
Text(1.0793506814479759, -0.21213699926648824, 'UAE'),
Text(1.086881147244973, -0.16937937230799818, 'South Africa'),
Text(1.0918635911832035, -0.1335436192729486, 'New Zealand'),
Text(1.0947903814016446, -0.10692998078388304, 'Turkey'),
Text(1.096631023945382, -0.08602556201794338, 'Australia'),
Text(1.0978070729776455, -0.06942355882735218, 'Phillipines'),
Text(1.0986791544015209, -0.05388984768543213, 'Indonesia'),
Text(1.0993059848742366, -0.039068550263413035, 'Singapore'),
Text(1.0997248508282123, -0.02460187941736628, 'Qatar'),
Text(1.0999533462179636, -0.010130949802716446, 'Sri Lanka'),
Text(1.0999990477553414, -0.0014473898376707638, 'Canada')],
[Text(-0.5739579075236132, 0.17484942204970394, '90.59%'),
Text(0.5406179897914611, -0.260254085681358, '4.54%'),
Text(0.5767013616065314, -0.16557638575948172, '0.84%'),
Text(0.5837120819541921, -0.13885317922432475, '0.63%'),
Text(0.5887367353352595, -0.11571109050899356, '0.63%'),
Text(0.5928442621336216, -0.09238874853163535, '0.63%'),
Text(0.5955619588272019, -0.07284197414888105, '0.42%'),
Text(0.5971583898554425, -0.058325444063936194, '0.36%'),
Text(0.5981623766974811, -0.04692303382796911, '0.25%'),
Text(0.5988038579878066, -0.037867395724010273, '0.23%'),
Text(0.5992795387644659, -0.02939446237387207, '0.22%'),
Text(0.5996214462950381, -0.021310118325498017, '0.21%'),
Text(0.5998499186335702, -0.013419206954927062, '0.21%'),
Text(0.5999745524825255, -0.005525972619663515, '0.21%'),
Text(0.5999994805938226, -0.0007894853660022347, '0.04%')])
#9 Plot piechart for highest 3 countries plt.pie(values[:3], labels = labels1[:3],
autopct = "%1.2f%%")
([<matplotlib.patches.Wedge at 0x7ee2844c87c0>,
<matplotlib.patches.Wedge at 0x7ee2bfee3790>,
<matplotlib.patches.Wedge at 0x7ee2844c9390>],
[Text(-1.0829742700952103, 0.19278674827836725, 'India'),
Text(1.077281715838356, -0.22240527134123297, 'United States'),
Text(1.0995865153823035, -0.03015783794312073, 'United Kingdom')],
[Text(-0.590713238233751, 0.10515640815183668, '94.39%'), Text(0.5876082086391032, -
0.12131196618612707, '4.73%'),
Text(0.5997744629358018, -0.01644972978715676, '0.87%')])
#10 Groupby aggregate rating, rating color, rating text df4 = df3.groupby(['Aggregate
rating', 'Rating color', 'Rating text']).size().reset_index().rename(columns={0: 'Rating
Count'}) df4
#11 Plot bar aggregate rating vs rating count sns.barplot(data = df4 , x = "Aggregate rating",
y = "Rating Count")
#12 Count plot rating color vs rating count sns.countplot(data = df4, x = 'Rating color',
hue = 'Rating Count')
Country Currency 0
0 Australia Dollar($) 24
1 Brazil Brazilian Real(R$) 60
2 Canada Dollar($) 4
3 India Indian Rupees(Rs.) 8652
4 Indonesia Indonesian Rupiah(IDR) 21
5 New Zealand NewZealand($) 40
6 Phillipines Botswana Pula(P) 22
7 Qatar Qatari Rial(QR) 20
8 Singapore Dollar($) 20
9 South Africa Rand(R) 60
10 Sri Lanka Sri Lankan Rupee(LKR) 20
11 Turkey Turkish Lira(TL) 34
12 UAE Emirati Diram(AED) 60
13 United Kingdom Pounds(Σ) 80
14 United States Dollar($) 434
Email \
0 [email protected]
1 [email protected]
2 [email protected]
3 [email protected]
4 [email protected]
Address Avatar \
0 835 Frank Tunnel Wrightmouth, MI 82180-9605 Violet
1 4547 Archer Common Diazchester, CA 06566-8576 DarkGreen
2 24645 Valerie Unions Suite 582 Cobbborough Bisque
3 1414 David Throughway Port Jason, OH 22070-1220 SaddleBrown 4 14023 Rodriguez Passage
Port Jacobville, PR MediumAquaMarine
Avg. Session Length Time on App Time on Website Length of Membership \
0 34.49 12.65 39.57 4.08
1 31.92 11.10 37.26 2.66
2 33.00 11.33 37.11 4.10
3 34.30 13.71 36.72 3.12 4 33.33 12.79 37.53
4.44
Yearly Amount Spent
0 587.95
1 392.42
2 487.54
3 581.85 4 599.64 df.describe()
grp = sns.JointGrid(data = df, x = 'Time on Website', y= 'Yearly Amount Spent') grp.plot(sns.scatterplot, sns.histplot)
<seaborn.axisgrid.JointGrid at 0x7a80a03a30d0>
<seaborn.axisgrid.JointGrid at 0x7a809e01fd60>
** Use jointplot to create a 2D hex bin plot comparing Time on App and Length of Membership.**
grp3 = sns.jointplot(data = df, x = 'Time on App', y = 'Length of Membership', kind = "hex")
grp4 = sns.PairGrid(data =df) grp4.map_diag(sns.histplot)
grp4.map_offdiag(sns.scatterplot)
<seaborn.axisgrid.PairGrid at 0x7a809de3b5b0>
Create a linear model plot (using seaborn's lmplot) of Yearly Amount Spent vs. Length of Membership.
<seaborn.axisgrid.FacetGrid at 0x7a809c8dfac0>
Training and Testing Data
Now that we've explored the data a bit, let's go ahead and split the data into training and testing sets. ** Set
a variable X equal to the numerical features of the customers and a variable y equal to the "Yearly
Amount Spent" column. ** from sklearn.model_selection import
train_test_split
X = df.iloc[:,3:-1] y =
df.iloc[:,-1]
** Use model_selection.train_test_split from sklearn to split the data into training and testing sets. Set
test_size=0.3 and random_state=101**
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.3, random_state = 101)
X_train
LinearRegression()
print(reg.intercept_) print(reg.coef_)
-248.07210567565767
[-1.31257805 12.61911111 14.43841382 38.29792225]
Now that we have fit our model, let's evaluate its performance by predicting off the test values!
** Use lm.predict() to predict off the X_test set of the data.**
y_pred = reg.predict(X_test)
** Create a scatterplot of the real test values versus the predicted values. **
plt.scatter(y_test, y_pred) plt.xlabel("Y Test")
plt.ylabel("Predicted Y") plt.show()
Evaluating the Model
from sklearn import metrics
MeanAbsoluteError: 88.29997116869201
MeanSquareError: 11725.316780593896
RootMeanSquareError: 108.28350188553146
KNN Model Assignment
import numpy as np
from collections import Counter
def euclid_distance(x1, x2):
class KNN:
self.k = k
def fit(self, X, y):
self.X_train = X self.y_train = y
def predict(self, X):
# majority voye
most_common = Counter(k_nearest_labels).most_common() return
most_common[0][0]
# Importing necessary libraries and making a euclidean distance function for calculating distances.
raw_data = np.array([[1, 2, 1], [3, 2, 1], [2, 4, 1], [3, 3, 1], [2, 5,
1], [-1, -2, 0],
[-3, -2, 0], [-2, -4, 0], [-3, -3, 0],
[-2, -5, 0]], dtype = float)
X = raw_data[:, :2] y =
raw_data[:, -1]
model = KNeighborsClassifier(n_neighbors = 3, metric = euclidean)
model.fit(X, y)
x = df[['At1','At2']].to_numpy() y =
df['Class'].to_numpy()
y_pred = linear_svc.predict(x_test)
print(classification_report(y_test, y_pred))
y_pred = sigmoid_svc.predict(x_test)
print(classification_report(y_test, y_pred))
y_pred = poly_svc.predict(x_test)
print(classification_report(y_test, y_pred))
y_pred_rbf = rbf_svc.predict(x_test)
print(classification_report(y_test, y_pred_rbf))
Species
0 Iris-setosa
1 Iris-setosa
2 Iris-setosa
3 Iris-setosa
4 Iris-setosa .. ...
145 Iris-virginica
146 Iris-virginica
147 Iris-virginica
148 Iris-virginica
149 Iris-virginica
X = df[['SepalLengthCm','PetalLengthCm','PetalWidthCm']].to_numpy()
nb_model = GaussianNB()
nb_model = nb_model.fit(X_train, y_train)
Deep learning is a subset of machine learning that uses artificial neural networks to learn from data. Neural
networks are inspired by the human brain, and they are made up of layers of interconnected nodes. Each
node performs a simple calculation, and the output of each node is passed to the nodes in the next layer.
Python is a popular programming language for many different tasks, including data science and machine
learning. It is easy to learn and use, and it has a large and active community of developers. Python also has
many powerful libraries for deep learning, such as TensorFlow, Keras, and PyTorch.
1. Gather data. The first step is to gather the data that you want to train your model on. This data can
be images, text, audio, or any other type of data.
2. Prepare the data. The data needs to be prepared before it can be used to train the model. This involves
cleaning the data, removing any errors or outliers, and transforming the data into a format that the
model can understand.
3. Choose a model. There are many different types of neural networks that can be used for deep
learning. The choice of model depends on the task that you are trying to solve.
4. Train the model. The model is trained by feeding it the prepared data. The model will learn to extract
features from the data and to make predictions.
5. Evaluate the model. Once the model is trained, it needs to be evaluated to see how well it performs.
This is done by testing the model on a held-out set of data that it has not seen before.
6. Deploy the model. Once the model is evaluated and found to be satisfactory, it can be deployed to
production. This means that the model can be used to make predictions on new data.
• Computer vision:
Deep learning is used to recognize objects in images and videos. This is used in applications such as
self-driving cars, facial recognition, and image search.
• Speech recognition:
Deep learning is used to recognize speech. This is used in applications such as voice assistants,
dictation software, and call centers.
• Generative models:
Deep learning is used to create artificial data that is similar to real data. This is used in applications
such as image generation, text generation, and music generation.
Architectures
Applications:
• Data Compression
• Pattern Recognition
• Computer Vision
• Sonar Target Recognition
• Speech Recognition
• Handwritten Characters Recognition
Applications:
• Machine Translation
• Robot Control
• Time Series Prediction
• Speech Recognition
• Speech Synthesis
• Time Series Anomaly Detection
• Rhythm Learning
• Music Composition
Applications:
• Filtering.
• Feature Learning.
• Classification.
• Risk Detection.
• Business and Economic analysis.
5. Autoencoders
An autoencoder neural network is another kind of unsupervised machine learning algorithm. Here the number of
hidden cells is merely small than that of the input cells. But the number of input cells is equivalent to the number of
output cells. An autoencoder network is trained to display the output similar to the fed input to force AEs to find
common patterns and generalize the data. The autoencoders are mainly used for the smaller representation of the input.
It helps in the reconstruction of the original data from compressed data. This algorithm is comparatively simple as it
only necessitates the output identical to the input.
Applications:
• Classification.
• Clustering.
• Feature Compression.
• Self-Driving Cars
In self-driven cars, it is able to capture the images around it by processing a huge amount of data, and then it
will decide which actions should be incorporated to take a left or right or should it stop. So, accordingly, it
will decide what actions it should take, which will further reduce the accidents that happen every year.
• Voice Controlled Assistance
When we talk about voice control assistance, then Siri is the one thing that comes into our mind. So, you can
tell Siri whatever you want it to do it for you, and it will search it for you and display it for you.
• Automatic Image Caption Generation
Whatever image that you upload, the algorithm will work in such a way that it will generate caption
accordingly. If you say blue coloured eye, it will display a blue-coloured eye with a caption at the bottom
of the image.
• Automatic Machine Translation
With the help of automatic machine translation, we are able to convert one language into another with the help
of deep learning.
Limitations
Advantages
Disadvantages
Artificial Neural Networks are the computing system that is designed to simulate the way the human brain
analyzes and processes the information. Artificial Neural Networks have self-learning capabilities that
enable it to produce a better result as more data become available. So, if the network is trained on more data,
it will be more accurate because these neural networks learn from the examples. The neural network can be
configured for specific applications like data classification, pattern recognition, etc.
With the help of the neural network, we can actually see that a lot of technology has been evolved from
translating webpages to other languages to having a virtual assistant to order groceries online. All of these
things are possible because of neural networks. So, an artificial neural network is nothing but a network of
various artificial neurons.
Convolutional Neural Networks are a special type of feed-forward artificial neural network in which the
connectivity pattern between its neuron is inspired by the visual cortex.
The visual cortex encompasses a small region of cells that are region sensitive to visual fields. In case some
certain orientation edges are present then only some individual neuronal cells get fired inside the brain such
as some neurons responds as and when they get exposed to the vertical edges, however some responds when
they are shown to horizontal or diagonal edges, which is nothing but the motivation behind Convolutional
Neural Networks.
The Convolutional Neural Networks, which are also called as covnets, are nothing but neural networks,
sharing their parameters.
Recurrent Networks are one such kind of artificial neural network that are mainly intended to identify
patterns in data sequences, such as text, genomes, handwriting, the spoken word, numerical times series data
emanating from sensors, stock markets, and government agencies.
DL ASSIGNMENT 1 (XOR gate implementation using perceptron)
<keras.callbacks.History at 0x7e7c1866ead0>
inp = [1,0]
y_pred = xor_model.predict([(inp)]) print(f"Prediction of model when entering
{inp} is: {y_pred}")
Epoch 1/50
24/24 [==============================] - 3s 66ms/step - loss: 2.0725 - accuracy:
0.3658 - val_loss: 1.7010 - val_accuracy: 0.7056
Epoch 2/50
24/24 [==============================] - 1s 55ms/step - loss: 1.3995 - accuracy:
0.6957 - val_loss: 1.0266 - val_accuracy: 0.8087
Epoch 3/50
24/24 [==============================] - 1s 56ms/step - loss: 0.8940 - accuracy:
0.7964 - val_loss: 0.6658 - val_accuracy: 0.8571
Epoch 4/50
24/24 [==============================] - 2s 101ms/step - loss: 0.6354 - accuracy:
0.8457 - val_loss: 0.4943 - val_accuracy: 0.8842
Epoch 5/50
24/24 [==============================] - 2s 63ms/step - loss: 0.5053 - accuracy:
0.8691 - val_loss: 0.4056 - val_accuracy: 0.8982
Epoch 6/50
24/24 [==============================] - 1s 55ms/step - loss: 0.4326 - accuracy:
0.8846 - val_loss: 0.3558 - val_accuracy: 0.9030
Epoch 7/50
24/24 [==============================] - 1s 56ms/step - loss: 0.3845 - accuracy:
0.8953 - val_loss: 0.3212 - val_accuracy: 0.9116
Epoch 8/50
24/24 [==============================] - 1s 55ms/step - loss: 0.3512 - accuracy:
0.9030 - val_loss: 0.2964 - val_accuracy: 0.9187
Epoch 9/50
24/24 [==============================] - 1s 56ms/step - loss: 0.3283 - accuracy:
0.9074 - val_loss: 0.2774 - val_accuracy: 0.9218
Epoch 10/50
24/24 [==============================] - 1s 56ms/step - loss: 0.3092 - accuracy:
0.9116 - val_loss: 0.2629 - val_accuracy: 0.9240
Epoch 11/50
24/24 [==============================] - 1s 55ms/step - loss: 0.2919 - accuracy:
0.9164 - val_loss: 0.2500 - val_accuracy: 0.9279
Epoch 12/50
24/24 [==============================] - 2s 69ms/step - loss: 0.2778 - accuracy:
0.9199 - val_loss: 0.2391 - val_accuracy: 0.9301
Epoch 13/50
24/24 [==============================] - 2s 95ms/step - loss: 0.2651 - accuracy:
0.9238 - val_loss: 0.2301 - val_accuracy: 0.9328
Epoch 14/50
24/24 [==============================] - 1s 55ms/step - loss: 0.2540 - accuracy:
0.9256 - val_loss: 0.2211 - val_accuracy: 0.9358
Epoch 15/50
24/24 [==============================] - 1s 56ms/step - loss: 0.2438 - accuracy:
0.9287 - val_loss: 0.2145 - val_accuracy: 0.9366
Epoch 16/50
24/24 [==============================] - 1s 55ms/step - loss: 0.2347 - accuracy:
0.9318 - val_loss: 0.2069 - val_accuracy: 0.9399
Epoch 17/50
24/24 [==============================] - 1s 55ms/step - loss: 0.2266 - accuracy:
0.9342 - val_loss: 0.1999 - val_accuracy: 0.9426
Epoch 18/50
24/24 [==============================] - 1s 54ms/step - loss: 0.2178 - accuracy:
0.9359 - val_loss: 0.1943 - val_accuracy: 0.9422
Epoch 19/50
24/24 [==============================] - 1s 55ms/step - loss: 0.2103 - accuracy:
0.9386 - val_loss: 0.1876 - val_accuracy: 0.9456
Epoch 20/50
24/24 [==============================] - 1s 54ms/step - loss: 0.2031 - accuracy:
0.9412 - val_loss: 0.1823 - val_accuracy: 0.9481
Epoch 21/50
24/24 [==============================] - 2s 77ms/step - loss: 0.1958 - accuracy:
0.9429 - val_loss: 0.1768 - val_accuracy: 0.9499
Epoch 22/50
24/24 [==============================] - 2s 83ms/step - loss: 0.1903 - accuracy:
0.9445 - val_loss: 0.1729 - val_accuracy: 0.9513
Epoch 23/50
24/24 [==============================] - 1s 54ms/step - loss: 0.1843 - accuracy:
0.9461 - val_loss: 0.1673 - val_accuracy: 0.9523
Epoch 24/50
24/24 [==============================] - 1s 56ms/step - loss: 0.1772 - accuracy:
0.9480 - val_loss: 0.1635 - val_accuracy: 0.9534
Epoch 25/50
24/24 [==============================] - 1s 56ms/step - loss: 0.1724 - accuracy:
0.9495 - val_loss: 0.1589 - val_accuracy: 0.9542
Epoch 26/50
24/24 [==============================] - 1s 55ms/step - loss: 0.1664 - accuracy:
0.9515 - val_loss: 0.1560 - val_accuracy: 0.9551
Epoch 27/50
24/24 [==============================] - 1s 54ms/step - loss: 0.1628 - accuracy:
0.9521 - val_loss: 0.1525 - val_accuracy: 0.9554
Epoch 28/50
24/24 [==============================] - 1s 54ms/step - loss: 0.1566 - accuracy:
0.9534 - val_loss: 0.1481 - val_accuracy: 0.9572
Epoch 29/50
24/24 [==============================] - 1s 53ms/step - loss: 0.1515 - accuracy:
0.9555 - val_loss: 0.1453 - val_accuracy: 0.9576
Epoch 30/50
24/24 [==============================] - 2s 80ms/step - loss: 0.1471 - accuracy:
0.9573 - val_loss: 0.1422 - val_accuracy: 0.9578
Epoch 31/50
24/24 [==============================] - 2s 81ms/step - loss: 0.1438 - accuracy:
0.9578 - val_loss: 0.1389 - val_accuracy: 0.9594
Epoch 32/50
24/24 [==============================] - 1s 54ms/step - loss: 0.1397 - accuracy:
0.9589 - val_loss: 0.1355 - val_accuracy: 0.9597
Epoch 33/50
24/24 [==============================] - 1s 55ms/step - loss: 0.1360 - accuracy:
0.9605 - val_loss: 0.1331 - val_accuracy: 0.9603
Epoch 34/50
24/24 [==============================] - 1s 56ms/step - loss: 0.1330 - accuracy:
0.9612 - val_loss: 0.1309 - val_accuracy: 0.9615
Epoch 35/50
24/24 [==============================] - 1s 55ms/step - loss: 0.1272 - accuracy:
0.9621 - val_loss: 0.1281 - val_accuracy: 0.9616
Epoch 36/50
24/24 [==============================] - 1s 55ms/step - loss: 0.1243 - accuracy:
0.9641 - val_loss: 0.1258 - val_accuracy: 0.9620
Epoch 37/50
24/24 [==============================] - 1s 55ms/step - loss: 0.1216 - accuracy:
0.9646 - val_loss: 0.1233 - val_accuracy: 0.9638
Epoch 38/50
24/24 [==============================] - 1s 56ms/step - loss: 0.1186 - accuracy:
0.9651 - val_loss: 0.1219 - val_accuracy: 0.9645
Epoch 39/50
24/24 [==============================] - 2s 94ms/step - loss: 0.1163 - accuracy:
0.9663 - val_loss: 0.1193 - val_accuracy: 0.9649
Epoch 40/50
24/24 [==============================] - 2s 73ms/step - loss: 0.1121 - accuracy:
0.9674 - val_loss: 0.1171 - val_accuracy: 0.9646 Epoch 41/50
24/24 [==============================] - 1s 55ms/step - loss: 0.1101 - accuracy:
0.9683 - val_loss: 0.1153 - val_accuracy: 0.9649
Epoch 42/50
24/24 [==============================] - 2s 72ms/step - loss: 0.1062 - accuracy:
0.9695 - val_loss: 0.1135 - val_accuracy: 0.9663
Epoch 43/50
24/24 [==============================] - 1s 55ms/step - loss: 0.1039 - accuracy:
0.9695 - val_loss: 0.1113 - val_accuracy: 0.9669
Epoch 44/50
24/24 [==============================] - 1s 56ms/step - loss: 0.1013 - accuracy:
0.9712 - val_loss: 0.1097 - val_accuracy: 0.9675
Epoch 45/50
24/24 [==============================] - 1s 56ms/step - loss: 0.0984 - accuracy:
0.9722 - val_loss: 0.1092 - val_accuracy: 0.9672
Epoch 46/50
24/24 [==============================] - 1s 54ms/step - loss: 0.0962 - accuracy:
0.9720 - val_loss: 0.1071 - val_accuracy: 0.9688
Epoch 47/50
24/24 [==============================] - 2s 65ms/step - loss: 0.0941 - accuracy:
0.9722 - val_loss: 0.1054 - val_accuracy: 0.9685
Epoch 48/50
24/24 [==============================] - 2s 96ms/step - loss: 0.0911 - accuracy:
0.9738 - val_loss: 0.1034 - val_accuracy: 0.9691
Epoch 49/50
24/24 [==============================] - 1s 57ms/step - loss: 0.0902 - accuracy:
0.9732 - val_loss: 0.1027 - val_accuracy: 0.9696
Epoch 50/50
24/24 [==============================] - 1s 56ms/step - loss: 0.0881 - accuracy:
0.9735 - val_loss: 0.1016 - val_accuracy: 0.9694 313/313
[==============================] - 1s 3ms/step
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
===============================================================
== flatten (Flatten) (None, 784) 0
dense (Dense) (None, 256) 200960
dropout (Dropout) (None, 256) 0
dense_1 (Dense) (None, 128) 32896
dense_2 (Dense) (None, 10) 1290
=================================================================
Total params: 235,146
Trainable params: 235,146
Non-trainable params: 0
_________________________________________________________________ the
accuracy on 30th epoch is: 0.9573125243186951 the accuracy on 50th epoch is:
0.973520815372467 the accuracy on 30th epoch is: 0.14712917804718018 the
accuracy on 50th epoch is: 0.08809798955917358
258
print(image.shape) (321,
432, 3) Creating
directories
category_dict = {} images_per_category_dict = {}
category_images_path_dict = {}
total_images = 0
for i, category in enumerate(folder_names):
category_dict[i] = category
total_train = 0 total_validation = 0
total_test = 0
total_train_2 = 0 total_validation_2 = 0
total_test_2 = 0
for i, category in enumerate(folder_names): train_number = int(0.7 *
images_per_category_dict[i]) validation_number = int(0.2 *
images_per_category_dict[i])
test_number = images_per_category_dict[i] - train_number - validation_number # for not exceeding maximum
number
# now copy these images to respective folders # Copy first 1000 cat
images to train_cats_dir fnames =
category_images_path_dict[i][:train_number] for fname in fnames:
src = os.path.join(dataset_path, category, fname) dst =
os.path.join(train_dir, category, fname) shutil.copyfile(src, dst)
total_train_2 += len(fnames)
total_validation_2 += len(fnames)
# print statistics
train_generator = train_datagen.flow_from_directory(
# This is the target directory train_dir,
# All images will be resized to 150x150
target_size=(150, 150), batch_size=20,
# Since we use binary_crossentropy loss, we need binary labels class_mode='categorical')
validation_generator = test_datagen.flow_from_directory(
validation_dir, target_size=(150, 150), batch_size=20,
class_mode='categorical')
model.
#adding a data augumentation layer
data_augmentation = tf.keras.Sequential([
tf.keras.layers.RandomFlip('horizontal'),
tf.keras.layers.RandomRotation(0.2), ])
history = model.fit_generator(
train_generator, steps_per_epoch = 10,
epochs = 50,
validation_data = validation_generator, validation_steps = 80)
Analysis report for CNN model for caltech 256 dataset. model.summary()
Model: "sequential_1"
_________________________________________________________________
Layer (type) Output Shape Param #
===============================================================
== conv2d (Conv2D) (None, 148, 148, 512) 14336
max_pooling2d (MaxPooling2D (None, 74, 74, 512) 0 )
dropout (Dropout) (None, 74, 74, 512) 0
conv2d_1 (Conv2D) (None, 72, 72, 256) 1179904
max_pooling2d_1 (MaxPooling (None, 36, 36, 256) 0 2D)
dropout_1 (Dropout) (None, 36, 36, 256) 0
conv2d_2 (Conv2D) (None, 34, 34, 256) 590080
max_pooling2d_2 (MaxPooling (None, 17, 17, 256) 0 2D)
dropout_2 (Dropout) (None, 17, 17, 256) 0
conv2d_3 (Conv2D) (None, 15, 15, 128) 295040
max_pooling2d_3 (MaxPooling (None, 7, 7, 128) 0 2D)
flatten (Flatten) (None, 6272) 0
dense (Dense) (None, 512) 3211776
dropout_3 (Dropout) (None, 512) 0
dense_1 (Dense) (None, 257) 131841
=================================================================
Total params: 5,422,977
Trainable params: 5,422,977
Non-trainable params: 0
_________________________________________________________________
print('Accuracy on 30th epoch is: ' ,history.history['acc'][29]) print('Accuracy on 50th epoch is: '
,history.history['acc'][49])
print('Loss on 30th epoch is: ' ,history.history['loss'][29]) print('Loss on 50th epoch is: '
,history.history['loss'][49])
print(image_path)
/content/drive/MyDrive/Summer
Training/DATASETS/caltech101/101_ObjectCategories/wrench/image_0035.jpg
/content/drive/MyDrive/Summer
Training/DATASETS/caltech101/101_ObjectCategories/wrench/image_0036.jpg
/content/drive/MyDrive/Summer
Training/DATASETS/caltech101/101_ObjectCategories/wrench/image_0037.jpg
/content/drive/MyDrive/Summer
Training/DATASETS/caltech101/101_ObjectCategories/wrench/image_0038.jpg
/content/drive/MyDrive/Summer Training/DATASETS/caltech-
101/101_ObjectCategories/wrench/image_0039.jpg
Downloading data from
https://storage.googleapis.com/tensorflow/kerasapplications/resnet/resnet50_weights_tf_dim_ordering_tf
_kernels_notop.h5 94765736/94765736 [==============================] - 1s 0us/step
model = Sequential([ base_model,
Flatten(),
Dense(256, activation='relu'),
BatchNormalization(),
Dense(train_generator.num_classes, activation='softmax')
])
Introduction
Due to the growing need of educated and talented individuals, especially in developing
countries, recruiting fresh graduates is a routine practice for organizations. Conventional
recruiting methods and selection processes can be prone to errors and in order to optimize the
whole process, some innovative methods are needed.
Our data has 215 values and 13 columns, Here the "Job_Placement_Data.csv" database that has
been made available for use, we have analyzed and processed the data and used machine
learning classification models to achieve our goal.
This project aims to predict job placement for students based on their academic and personal
details using various machine learning algorithms. By analyzing historical data, we can build a
predictive model that helps students understand their likelihood of getting placed, thereby
enabling them to take proactive measures to improve their chances.
Methodology
A) Data Loading:
B) Data Cleaning:
- The dataset is examined for missing values, which are then handled appropriately (e.g., filling missing values or
dropping incomplete rows).
- Duplicate records are identified and removed to ensure the dataset's integrity.
C) Data Transformation:
- Categorical variables are converted into numerical format using techniques such as one-hot encoding to facilitate
model training.
2. Exploratory Data Analysis (EDA)
A) Visualizations:
- Histograms are created for numerical features to understand their distributions and detect any anomalies.
- Scatter plots are used to visualize relationships between features.
B) Summary Statistics:
- Descriptive statistics are generated to summarize the central tendency.
3. Feature Engineering
A) Scaling:
- Numerical features are standardized to ensure they have a mean of 0 and a standard deviation of 1, which helps
improve model performance and convergence.
A) Train-Test Split:
- The dataset is split into training and testing sets to evaluate the model's performance on unseen data.
B) Model Training and Evaluation: Various machine learning models are trained and evaluated, including:
- Logistic Regression: A statistical model that predicts the probability of a binary outcome.
- Support Vector Machine (SVM): Different kernels (RBF, linear, sigmoid, polynomial) are used to find the optimal
hyperplane that separates data into classes.
- Decision Tree: A model that splits the data into branches to make predictions.
- Random Forest: An ensemble of decision trees that improves prediction accuracy by averaging multiple trees.
- K-Nearest Neighbors (KNN): A model that classifies data points based on the labels of their nearest neighbors.
- Neural Network (CNN): A deep learning model with multiple layers to capture complex patterns in the data.
C) Model Comparison:
- Accuracy scores of different models are compared to select the best-performing model.
Dataset used
CODE
import numpy as np
import pandas as pd
df=pd.read_csv("Job_Placement_Data.csv")
df
st
ge ssc_p ssc hsc_p hsc hsc_ degree underg work_ emp_tes speci mba
at
nd ercen _bo ercen _bo subj _perce rad_de experi t_perce alisa _per
u
er tage ard tage ard ect ntage gree ence ntage tion cent
s
5
Mkt Pl
67. Other 91.0 Othe Comm Sci&T 8.
0 M 58.00 No 55.0 &H ac
00 s 0 rs erce ech 8
R ed
0
6
Pl
79. Centr 78.3 Othe Sci&T Mkt 6.
1 M Science 77.48 Yes 86.5 ac
33 al 3 rs ech &Fin 2
ed
8
5
Comm Pl
65. Centr 68.0 Cent Mkt 7.
2 M Arts 64.00 &Mg No 75.0 ac
00 al 0 ral &Fin 8
mt ed
0
N
5
Mkt ot
56. Centr 52.0 Cent Sci&T 9.
3 M Science 52.00 No 66.0 &H Pl
00 al 0 ral ech 4
R ac
3
ed
Comm Pl
85. Centr 73.6 Cent Comm Mkt 5
4 M 73.30 &Mg No 96.8 ac
80 al 0 ral erce &Fin 5.
mt ed
st
ge ssc_p ssc hsc_p hsc hsc_ degree underg work_ emp_tes speci mba
at
nd ercen _bo ercen _bo subj _perce rad_de experi t_perce alisa _per
u
er tage ard tage ard ect ntage gree ence ntage tion cent
s
5
0
... ... ... ... ... ... ... ... ... ... ... ... ... ...
7
Comm Pl
21 80. Other 82.0 Othe Comm Mkt 4.
M 77.60 &Mg No 91.0 ac
0 60 s 0 rs erce &Fin 4
mt ed
9
5
Pl
21 58. Other 60.0 Othe Sci&T Mkt 3.
M Science 72.00 No 74.0 ac
1 00 s 0 rs ech &Fin 6
ed
2
6
Comm Pl
21 67. Other 67.0 Othe Comm Mkt 9.
M 73.00 &Mg Yes 59.0 ac
2 00 s 0 rs erce &Fin 7
mt ed
2
6
Comm Mkt Pl
21 74. Other 66.0 Othe Comm 0.
F 58.00 &Mg No 70.0 &H ac
3 00 s 0 rs erce 2
mt R ed
3
N
6
Comm Mkt ot
21 62. Centr 58.0 Othe 0.
M Science 53.00 &Mg No 89.0 &H Pl
4 00 al 0 rs 2
mt R ac
2
ed
215 rows × 13 columns
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 215 entries, 0 to 214
Data columns (total 13 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 gender 215 non-null object
1 ssc_percentage 215 non-null float64
2 ssc_board 215 non-null object
3 hsc_percentage 215 non-null float64
4 hsc_board 215 non-null object
5 hsc_subject 215 non-null object
6 degree_percentage 215 non-null float64
7 undergrad_degree 215 non-null object
8 work_experience 215 non-null object
9 emp_test_percentage 215 non-null float64
10 specialisation 215 non-null object
11 mba_percent 215 non-null float64
12 status 215 non-null object
dtypes: float64(5), object(8)
memory usage: 22.0+ KB
df.isnull().sum()
dtype:int64
df.shape
(215, 13)
df.duplicated().sum()
0
df.describe()
import matplotlib.pyplot as plt
import seaborn as sns
plt.hist(df["ssc_percentage"],bins=20)
plt.title("SSC Percentage Distribution")
plt.xlabel("SSC Percentage")
plt.ylabel("Frequency")
plt.show()
plt.hist(df["hsc_percentage"],bins=20)
plt.title("HSC Percentage Distribution")
plt.xlabel("HSC Percentage")
plt.ylabel("Frequency")
plt.show()
plt.hist(df["gender"],bins=20)
plt.title("Gender Distribution")
plt.xlabel("Gender")
plt.ylabel("Frequency")
plt.show()
print("totals numbers of Female:",df["gender"].value_counts()[1])
print("totals numbers of Male:",df["gender"].value_counts()[0],"\n\n")
totals numbers of Female: 76
totals numbers of Male: 139
df["status"]
status
0 Placed
1 Placed
2 Placed
3 Not Placed
4 Placed
5 Not Placed
6 Not Placed
7 Placed
8 Placed
9 Not Placed
10 Placed
11 Placed
12 Not Placed
13 Placed
14 Not Placed
15 Placed
16 Placed
17 Not Placed
18 Not Placed
19 Placed
20 Placed
21 Placed
22 Placed
23 Placed
24 Placed
25 Not Placed
26 Placed
27 Placed
28 Placed
29 Not Placed
30 Placed
31 Not Placed
32 Placed
33 Placed
34 Not Placed
35 Placed
36 Not Placed
37 Placed
38 Placed
39 Placed
40 Placed
41 Not Placed
42 Not Placed
43 Placed
44 Placed
45 Not Placed
46 Not Placed
47 Placed
48 Placed
49 Not Placed
50 Placed
51 Not Placed
52 Not Placed
53 Placed
54 Placed
55 Placed
56 Placed
57 Placed
58 Placed
59 Placed
60 Placed
61 Placed
62 Placed
63 Not Placed
64 Placed
65 Not Placed
66 Placed
67 Placed
68 Not Placed
69 Placed
70 Placed
71 Placed
72 Placed
73 Placed
74 Placed
75 Not Placed
76 Placed
77 Placed
78 Placed
79 Not Placed
80 Placed
81 Placed
82 Not Placed
83 Placed
84 Placed
85 Placed
86 Placed
87 Not Placed
88 Placed
89 Placed
90 Placed
91 Not Placed
92 Placed
93 Not Placed
94 Placed
95 Placed
96 Placed
97 Not Placed
98 Placed
99 Not Placed
100 Not Placed
101 Placed
102 Placed
103 Placed
104 Placed
105 Not Placed
106 Not Placed
107 Placed
108 Placed
109 Not Placed
110 Placed
111 Not Placed
112 Placed
113 Placed
114 Placed
115 Placed
116 Placed
117 Placed
118 Placed
119 Placed
120 Not Placed
121 Placed
122 Placed
123 Placed
124 Placed
125 Placed
126 Placed
127 Placed
128 Placed
129 Placed
130 Not Placed
131 Placed
132 Placed
133 Placed
134 Placed
135 Placed
136 Not Placed
137 Placed
138 Placed
139 Placed
140 Placed
141 Not Placed
142 Placed
143 Placed
144 Not Placed
145 Placed
146 Placed
147 Placed
148 Placed
149 Not Placed
150 Placed
151 Placed
152 Placed
153 Placed
154 Placed
155 Not Placed
156 Placed
157 Placed
158 Not Placed
159 Not Placed
160 Placed
161 Not Placed
162 Placed
163 Placed
164 Placed
165 Not Placed
166 Placed
167 Not Placed
168 Not Placed
169 Not Placed
170 Not Placed
171 Placed
172 Placed
173 Not Placed
174 Placed
175 Not Placed
176 Placed
177 Placed
178 Placed
179 Not Placed
180 Placed
181 Not Placed
182 Not Placed
183 Placed
184 Not Placed
185 Placed
186 Not Placed
187 Placed
188 Not Placed
189 Not Placed
190 Not Placed
191 Placed
192 Placed
193 Placed
194 Not Placed
195 Placed
196 Placed
197 Placed
198 Not Placed
199 Placed
200 Placed
201 Not Placed
202 Placed
203 Placed
204 Placed
205 Placed
206 Not Placed
207 Placed
208 Not Placed
209 Placed
210 Placed
211 Placed
212 Placed
213 Placed
214 Not Placed
sns.scatterplot(x="ssc_percentage",y="hsc_percentage",data=df,hue="status")
plt.title("Difference for SSC and HSC Person")
plt.show()
df["status"].value_counts()
df.info()
a=df.columns[5:15]
print(a)
for i in a:
df[i]=df[i].astype(int)
df.info()
x=df.drop("status_Placed",axis=1)
y=df["status_Placed"]
print(x.shape,"\n\n",y.shape)
(215, 14)
(215,)
(172, 14)
x_test.shape
(43,14)
y_train.shape
(172,)
y_test.shape
(43,)
LOGISTIC REGRESSION
y_pred=lg.predict(x_test)
print(y_pred)
[1 1 0 1 1 1 1 0 0 1 1 1 1 0 1 1 1 1 0 0 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 0 1 1 0 1 1 1]
CNN
import os
import math
os.environ["CUDA_DEVICE_ORDER"]="PCI_BUS_ID"
# The GPU id to use, usually either "0" or "1"
os.environ["CUDA_VISIBLE_DEVICES"]="0"
import numpy as np
import cv2
from matplotlib import pyplot as plt
import keras
print("keras version: ", keras.__version__)
import tensorflow as tf
print("tensoflow version: ", tf.__version__)
config = tf.compat.v1.ConfigProto()
model = Sequential()
model.add(Dense(64, activation='relu', input_shape=(x_train.shape[1],)))
model.add(Dropout(0.2))
model.add(Dense(128, activation='relu'))
model.add(Dropout(0.2))
num_classes = len(np.unique(y_train)) # Calculate the number of unique
classes in y_train
model.add(Dense(num_classes, activation='softmax')) # Set the number of
output neurons
model.compile(optimizer='adam', loss='categorical_crossentropy',
metrics=['accuracy'])
y_pred= classifier_A.predict(x_test)
print(accuracy_score(y_test, y_pred)*100)
83.72093023255815
classifier_B = SVC(kernel='linear', random_state=0)
classifier_B.fit(x_train, y_train)
y_pred= classifier_B.predict(x_test)
print(accuracy_score(y_test, y_pred)*100)
81.3953488372093
y_pred= classifier_C.predict(x_test)
print(accuracy_score(y_test, y_pred)*100)
67.44186046511628
y_pred= classifier_D.predict(x_test)
print(accuracy_score(y_test, y_pred)*100)
81.3953488372093
DECISION TREE
#feature Scaling
from sklearn.preprocessing import StandardScaler
st_x= StandardScaler()
x_train_np = x_train.to_numpy() # Convert DataFrame to NumPy array
x_test_np = x_test.to_numpy() # Convert DataFrame to NumPy array
x_train_scaled = st_x.fit_transform(x_train_reshaped)
x_test_scaled = st_x.transform(x_test_reshaped)
y_pred= classifier.predict(x_test_reshaped)
from sklearn.metrics import accuracy_score
accuracy_score (y_test,y_pred)*100
69.76744186046511
y_pred= classifier1.predict(x_test_reshaped)
accuracy_score (y_test,y_pred)*100
69.76744186046511
KNN
data={"Methods":["KNN","CNN","SVM(rbf)","SVM(linear)","SVM(sigmoid)","SVM(poly
)","Logistic Regression","Gaussian Naive Bayes","Decision Tree","Random
Forest"],"Accuracy":["72.09%","81.401%","83.72%","81.39%","67.44%","81.39%","8
1.39%","69.76%","69.76%","69.76%"]}
x=pd.DataFrame(data)
x['Accuracy'] = x['Accuracy'].astype(str)
x['Accuracy'] = x['Accuracy'].str.rstrip('%').astype('float')
x_sorted = x.sort_values(by='Accuracy',
ascending=False).reset_index(drop=True)
x_sorted
SVM(rbf) gives the best accuracy of 83.72%
df.drop("undergrad_degree_Others",axis=1)
while True:
input_data = eval(input("Enter the data: "))
input_data_as_array=np.asarray(input_data)
reshaped_array=input_data_as_array.reshape(1,-1)
prediction =classifier_A.predict(reshaped_array)
print(prediction)
if (prediction[0]==0):
print("Not Placed")
else:
print("Placed")
Results
The model developed can predict whether a student is placed or not by making the
use of various machine learning algorithms – KNN, CNN , Gaussian Naïve Baye’s
Algorithm, SVM , Decision Tree , Random Forest Classifier.
The model demonstrates that SVM with rbf kernel is the best technique of all as it gives the best accuracy possible.
Conclusion
The project effectively demonstrated the use of various machine learning models to predict job placement outcomes.
By leveraging data preprocessing, feature encoding, and multiple model evaluations, the SVM with the RBF kernel
emerged as the most accurate model. This model is robust for predicting job placements based on student data,
providing valuable insights for educational institutions and placement agencies.
The approach and findings from this project highlight the importance of choosing appropriate models and
hyperparameters, as well as the effectiveness of preprocessing steps in improving model performance.
Chapter 5- Conclusion
In conclusion, the needs and benefits of machine learning and deep learning are undeniable in today's rapidly evolving
technological landscape. These powerful fields of artificial intelligence have revolutionized various industries and
continue to shape our world in profound ways.
Embarking on the journey of the Python programming language, machine learning, and deep learning has been an
immensely beneficial and transformative experience. This training program has not only expanded my technical
skillset but has also opened up new horizons of possibilities in my career and personal growth.
Machine learning and deep learning meet the ever-increasing demand for intelligent solutions by automating tasks,
making predictions, and extracting insights from vast and complex datasets. They have improved efficiency, accuracy,
and decision-making across domains such as healthcare, finance, manufacturing, and transportation. These
technologies have enabled us to tackle previously insurmountable problems, from diagnosing diseases to optimizing
supply chains. The needs and benefits of machine learning and deep learning are clear: they empower us to solve
complex problems, drive innovation, and improve our quality of life. As we continue to advance in these fields, it is
imperative that we do so with a strong commitment to ethics and responsible AI practices, to ensure a bright and
inclusive future powered by intelligent machines.
Delving into machine learning has given me the ability to harness the predictive power of algorithms to extract
valuable insights from data. This has proven invaluable in making data-driven decisions, optimizing business
processes, and gaining a competitive edge in the rapidly evolving digital landscape. Furthermore, deep learning, with
its neural networks and complex architectures, has allowed me to delve into the cutting-edge realms of artificial
intelligence. It has enabled me to work on advanced projects such as job placement prediction, expanding my
capabilities to tackle complex real-world problems.
Beyond the technical skills acquired, this training program has fostered critical thinking, problem-solving, and
adaptability. It has taught me the importance of continuous learning in a rapidly evolving field, where staying up-
todate with the latest advancements is paramount. Additionally, the experience of collaborating with peers, engaging
in hands-on projects, and seeking guidance from mentors has enriched my learning journey. It has not only broadened
my knowledge but has also exposed me to diverse perspectives and approaches, which are invaluable in a field as
dynamic as technology.
In summary, this training program has been a transformative experience that has equipped me with the skills and
knowledge to navigate the ever-evolving landscape of technology. It has broadened my horizons, enhanced my
problem-solving abilities, and positioned me to make a meaningful impact in the world of Python, machine learning,
and deep learning.
Chapter 6- References
Online Documentation:
• Python Documentation: https://docs.python.org/3/
• TensorFlow Documentation: https://www.tensorflow.org/api_docs/python/tf/keras
• Scikit Documentation: https://scikit-learn.org/stable/user_guide.html
• Kaggle: https://www.kaggle.com/learn
• GeeksforGeeks: https://www.geeksforgeeks.org/machine-learning/
• Javatpoint: https://www.javatpoint.com/machine-learning