100% found this document useful (1 vote)
1K views

Internship Report On Machine Learning With Python

This document is an internship report on Python for a B.Tech course. It includes a cover page, certificate from the internship coordinator, acknowledgements, declaration, abstract, and table of contents. The report will cover artificial intelligence, machine learning algorithms including supervised learning, unsupervised learning, reinforcement learning and deep learning. It will also include system design with UML diagrams, sample code implementation in Python, testing methodologies, and screenshots. The aim is to provide knowledge on applying machine learning in various artificial intelligence fields using Python.

Uploaded by

Kola Anjali
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
1K views

Internship Report On Machine Learning With Python

This document is an internship report on Python for a B.Tech course. It includes a cover page, certificate from the internship coordinator, acknowledgements, declaration, abstract, and table of contents. The report will cover artificial intelligence, machine learning algorithms including supervised learning, unsupervised learning, reinforcement learning and deep learning. It will also include system design with UML diagrams, sample code implementation in Python, testing methodologies, and screenshots. The aim is to provide knowledge on applying machine learning in various artificial intelligence fields using Python.

Uploaded by

Kola Anjali
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 50

An Internship Report On Python

A report submitted in part fulfillment of the Mandatory Course of B.Tech

(IV Year –I Semester)

In

Computer Science and Engineering


By

KOLA.ANJALI - 20NF1A0523

UNIVERSAL COLLEGE OF ENGINEERING AND TECHNOLOGY


Approved by AICTE, New Delhi & Affiliated to JNTUK, Kakinada
(Accredited by NAAC, B++), Dokiparru(v), Medikonduru(M), Guntur – 522438
PhoneNo:0863-2291231,2290232
E-Mail:ucetguntur@gmail.com
Website:www.ucet.edu.in

2022-2023

i
UNIVERSAL COLLEGE OF ENGINEERING AND TECHONOLOGY
Approved by AICTE, New Delhi & Affiliated to JNTUK, Kakinada
(Accredited by NAAC, B++), Dokiparru(v), Medikonduru(M), Guntur – 522438
PhoneNo:0863-2291231,2290232
E-Mail:ucetguntur@gmail.com
Website:www.ucet.edu.in

BONAFIDE CERTIFICATE
This is to certify that an Internship Report on entitled“ A MACHINE
LEARNING WITH PYTHON, submitted by KOLA.ANJALI- 20NF1A0523
of B.Tech(Computer Science and Engineering) in the Department of Computer
Science of Universal College of Engineering and Technology as a partial fulfillment of the
requirements for the Coursework of III Year-I Semester of B.Tech in Computer Science and
Engineering is a record of Internship Report carried out under my guidance and
supervision in the Academic year 2022-2023.

HEAD OF THE DEPARTMENT

Submitted for Viva voce Examination held on

EXTERNAL EXAMINER

ii
ACKNOWLEDGEMENT

We take this opportunity to express our deepest gratitude and appreciation to all
those people who made this Internship work easier with words of encouragement, motivation,
discipline, and faith by offering different places to look to expand my ideas and help me towards the
successful completion of this Internship work.

First and foremost, we express our deep gratitude to Dr. Bhagyaiah Chinnabathini,
Chairman, Universal College of Engineering and Technology for providing necessary
facilities throughout the Computer Science & Engineering program.

We express our sincere thanks toDr.Ch.Kesava Reddy,Principal, for Universal College of


Engineering and Technology his constant support and cooperation throughout the Computer
Science & Engineering program.

We express our sincere gratitude to Dr.G.GURU KESAVA DAS


B.E(CSE),M.E(CSE)Ph.D(CSE), Professor & HOD, Computer Science & Engineering,
Universal College of Engineering and Technology for his constant encouragement, motivation,
and faith in offering different places to look to expand my ideas.

We would like to express our sincere gratitude to our UCET-INTERNSHIP and our
Internship Coordinators Mrs.B.Sravanthi for their insightful advice, motivating suggestions,
invaluable guidance, help and support in the successful completion of this Internship.

We would like to take this opportunity to express our thanks to the teaching and
non- teaching staff in the Department of Computer Science & Engineering, UCET for their
invaluable help and support.

KOLA ANJALI
20NF1A0523

iii
DECLARATION

We hereby declare that Internship report entitled “AMACHINE LEARNING AND


PYTHON’S FRAMEWORK” which is being submitted to the Jawaharlal Nehru
Technological University Kakinada (JNTUK) for the fulfillment of the Mandatory Course of a
B.Tech (III year – I Semster) in Department of COMPUTER SCIENCE AND ENGINEERING is a
bonafide report of the work carried out by us. The material contained in this internship report has
not been submitted to any University or Institution for the award of any degree.

KOLA ANJALI

20NF1A0523

iv
ABSTRACT

The present paper is based on Machine Learning Activities using Python programming
language. There are various types of Machine Learning Algorithms such as Supervised
Learning, Unsupervised Learning and Reinforcement Learning. These already exist in the field
of computer programming. Besides these algorithms there is another Deep Learning algorithm
which plays a significant role in machine learning devices and is part of Machine Learning
methods. The Deep Learning can be used to intelligently analyze the data on a large scale. The
paper explores that how Python can be applied in the ML methods? A comprehensive overview on
the concerned issues has been illustrated in the study. The present research paper explores the history
of machine learning, the methods used in machine learning, its application in different fields of AI.
The aim of this study is to transmit the knowledge of machine learning in various fields of AI. In
Machine Learning (ML) the
knowledge of Artificial Intelligence (AI) is very much essential.

v
vi
INDEX

 Title page
 Certificate
 Acknowledgement
 Declaration
 Abstract

CHAPTER NO CHAPTER NAME PAGE NO


CHAPTER 1 INTRODUCTION 1
CHAPTER 2 LITERATURE SURVEY 2-5
CHAPTER 3 SYSTEM ANALYSIS AND REQUIREMENTS
3.1 Artificial Intelligence 6-8
3.2 Machine Learning
• Supervised Learning
• Unsupervised Learning
• Reinforcement Learning
• Deep Learning
CHAPTER 4 SYSTEM DESIGN
4.1 System Specifications
4.2 UML Diagrams 9-14
• Use Case Diagrams
• Class Diagrams
• Sequence Diagram
• Activity Diagram
• Communication Diagram
CHAPTER 5 SYSTEM IMPLEMENTATION
5.1 Sample Code 15-29
5.2 Testing Methodologies
CHAPTER 6 SCREEN SHOTS 30-35
CHAPTER 7 CONCLUSION 36
APPENDIX– A BIBLIOGRAPHY 37-39
B TECHONOLOGY USED

vii
LIST OF FIGURES

FIGURE NO FIGURE NAME


Fig 4.2.1 Use Case Diagram
Fig 4.2.2 Class Diagram
Fig 4.2.3 Sequence Diagram
Fig 4.2.4 Activity Diagram
Fig 4.2.5 Communication Diagram

viii
CHAPTER-1
1. INTRODUCTION
Artificial Intelligence (AI) is a broad term which is used very frequently in social
media, medical fields, agricultural fields, programming languages and other fields of automation
devices. Machine learning is a science which was found and developed as a subfield of artificial
intelligence. The machine learning was first introduced in the 1950s(Çelik, 2018). The first
steps of machine learning were carried out in the 1950s but there were no significant researches
were made on ML. The developments on ML science were slow down. But in the 1990s, the
researchers restarted the researches on this field and developed significant contribution on the ML.
Now it is a science that will improve more in the coming years. Machine learning is a branch of
artificial intelligence (AI) and computer science which focuses on the use of data and
algorithms to imitate the way that humans learn, gradually improving its accuracy. It is an
important component of the growing field of
data science.

1
CHAPTER-2

LITERATURE SURVEY

A Literature Review is a systematic and comprehensive analysis of books, scholarly articles,


and other sources relevant to a particular topic providing a base of knowledge on a topic. A literature
review is an overview of the previously published works on a particular topic. Literature reviews are
designed to identify and critique the existing literature on a topic to justify your research by exposing
gaps in current research.

The concept of Machine Learning is not new for us. There are several studies has been made
so far. The process of Machine Learning is a multidimensional problem so there are several facets
available for designing and analyzing the web based applications in machine learning using Python.
Some of the selected studies are explained hereunder:

1.Iqbal H. Sarker (2021) :

They made a study on machine learning with special reference to algorithms, real
world applications and research directions. A comprehensive overview of machine learning
algorithms for intelligent data analysis and applications is given in the study. How various
types of machine learning methods can be used for making solutions to various real-world issues
briefly discussed. A successful machine learning model depends on both the data and the
performance of the learning algorithms. This study is a part of the topical collection “Advances in
Computational Approaches for Artificial Intelligence, Image Processing, IoT and Cloud
Applications” guest edited by Bhanu Prakash K. N. and M. Shivakumar.

2.Sebastian Raschka, Joshua Patterson and Corey Nolet (2020):

I have reviewed Machine Learning in Python. The developments and technology trends
in data science, machine learning and artificial intelligence explained in the study. The study
also reveals some important insight into the field of machine learning with Python, taking a tour
through important topics to identify some of the core hardware and software paradigms that have
enabled it. Widely-used libraries and concepts, collected together for holistic comparison, with
the goal of
educating the reader and driving the field of Python machine learning forward covered in the study.

2
3.Jan Kossmann and Rainer Schlosser (2019) :

It have made a study on “A Framework for Self-Managing Database Systems” and explored
that database systems that autonomously manage their configuration and physical database
design face numerous challenges: They need to anticipate future workloads, find satisfactory
and robust configurations efficiently, and learn from recent actions. We describe a component-based
framework for selfmanaged database systems to facilitate development and database integration
with low overhead by relying on a clear separation of concerns. Our framework results in
exchangeable and reusable components, which simplify experiments and promote further
research. Furthermore, we propose an LP-based algorithm to find an efficient order to tune
multiple dependent features in a recursive way.(Kossmann & Schlosser, 2019)

4.Shweta J. Patil (2019):

Written a research paper on Python Using Database and SQL. She mentioned that Python is a
general-purpose, high-level programming language whose design philosophy emphasizes code
readability. Python claims to combine "remarkable power with very clear syntax", and its standard
library is large and comprehensive. Python is a programming language that lets you work
more quickly and integrate your systems more effectively. In this paper we reviews available
resources and basic information about database modules that are known to be used with Python and
also how to make the connection between python and database. This paper features about different
database systems with their standard commands implemented with python also result best
suitable to implement database engine using python. She concluded that during the work on
project, tried to analyze all the database servers in order to find the most suitable one. After a careful
consideration MySQL Server is chosen since it has many 14 appropriate characteristics to be
implemented in Python. Python is one of the most known advanced programming languages, which
owns mainly to its own natural expressiveness as well as to the bunch of support modules
that helps extend its advantages, that’s why Python fits perfectly well when it comes to developing
a stable connection between the program and the database.(Patil, 2019)

5.Özer Çelik and Serthan Salih Altunaydin (2018):

have made a study on a research on machine learning methods and its applications.
The conceptual and historical background of the machine learning illustrated in their study.
They described the machine learning algorithms, artificial neural networks, decision trees, single
layer and
multilayer artificial neural networks, some decision making algorithms and machine learning

3
application areas like education, health, finance, energy, meteorology, cyber security in their study.
They have made a suggestion that the power of information technology and machines must
be strictly taken into consideration in such an environment.

6.Amir Mosavi, Pinar Ozturk and Kwok-wing Chau (2018):

have made a study on flood prediction using machine learning models. They have presented
an overview of machine learning models used in flood prediction, and develops a
classification scheme to analyze the existing literature. The survey represents the performance
analysis and investigation of more than 6000 articles. Among them, they identified 180 original
and influential articles where the performance and accuracy of at least two machine learning
models were compared. To do so, the prediction models were classified into two categories
according to lead time, and further divided into categories of hybrid and single methods. The state of
the art of these classes was discussed and analyzed in detail, considering the performance
comparison of the methods available in the literature. The performance of the methods was
evaluated in terms of R2 and RMSE, in addition to the generalization ability, robustness,
computation cost, and speed. Despite the promising results already reported in implementing the
most popular machine learning methods, e.g., ANNs, SVM, SVR, ANFIS, WNN, and DTs, there was
important research and experimentation for further improvement and advancement. In this context,
there were four major trends reported in the literature for improving the quality of prediction.

7.Ahmed Othman Eltahawey (2016):

made a tutorial on Database Using Python. In python file, you have to first establish
a connection between your file and the database. After that, you can add, search, delete or update
your database. Moreover, you can retrieve the data from the database, make any operation on it then
re- add it to the database. The database operations are performed using SQL statements. In
the first section of this chapter, a set of useful links is provided that could help you in downloading
necessary database program and python connector. Moreover, a link to a small video describing how
to create database using mysql. In the second section, a description of how to make the connection
between python and database is provided. In the third section, a quick review of the basic SQL
statements is presented. In the forth section, the main database operations are performed using
python.(Eltahawey,
2017)

4
8.Bhojaraju, G. and Koganurmath, M.M. (2014):

described Database System: Concepts and Design. They expressed their views that an
organization must have accurate and reliable data for effective decision making. To this end,
the organization maintains records on the various facets maintaining relationships among them.
Such related data are called a database. A database system is an integrated collection of related files,
along with details of the interpretation of the data contained therein. Basically, database system is
nothing more than a computer-based record keeping system i.e. a system whose overall purpose is to
record and maintain information/data. A database management system (DBMS) is a software
system that allows access to data contained in a database. The objective of the DBMS is to provide a
convenient and effective method of defining, storing and retrieving the information contained in the
database. The DBMS interfaces with the application programs, so that the data contained in the
database can
be used by multiple applications and users.(Gunjal & Koganurmath, 2014)

5
CHAPTER-3

SYSTEM ANALYSIS AND REQUIREMENTS

3.1 Artificial Intelligence

Artificial Intelligence refers to machines mostly computers working like humans. In


AI, machines perform tasks like face recognition, learning and, problems-solving etc. Machines
can work and act like a human if they have enough knowledge about the task. So in artificial
intelligence, knowledge engineering plays a important role. The relation between objects and
properties are accepted to implement knowledge engineering. One of the familiar
techniques of Artificial Intelligence is explained below.

3.2 Machine Learning

Machine learning is a branch of artificial intelligence (AI) and computer science


which focuses on the use of data and algorithms to imitate the way that humans learn, gradually
improving its accuracy. As explained, machine learning algorithms have the ability to improve
themselves through training. There is no error margin in the operations carried out by
computers based an algorithm and the operation follows certain steps. Different from the commands
which are written to have an output based on an input, there are some situations when the
computers make decisions based upon the present sample data. In those situations, computers
may make mistakes just like people in the decision-making process. That is, machine learning
is the process of equipping the computers with the ability to learn by using the data and experience
like a human brain

1.Supervised Learning:

Supervised Learning is a process of machine learning. The Supervised learning belongs to a


relatively basic learning method. This learning method refers to the establishment of corresponding
learning goals by people before learning. During the initial training of the machine, the
machine relies on information technology to learn the needs of learning. In order to collect
basic data information, we are supposed to gradually complete the required learning content
in a supervised
environment.
6
2.Unsupervised Learning:

To supervised learning is called unsupervised learning. The so-called unsupervised learning


means that the machine does not mark the content in a certain direction during the entire learning
process, but rely on the machine itself to complete the analysis of data information. In practice, the
operation method is to let the machine learn the basic concepts and content, and then give
the machine enough freedom to complete a series of content learning, including concepts and
content similar to the basic principles, such as tree roots. In general, the continuous improvement of
learning in stages has increased the breadth of machine learning content. At present, unsupervised
learning includes algorithms such as deep belief networks and auto-encoders. Such situations are
conducive to the solution of clustering problems and have good applications in the
development of many industries

3.Reinforcement Learning:

In addition to supervised learning and unsupervised learning, there are also application methods of
reinforcement learning in machine learning. The so-called reinforcement learning is the systematic

7
learning of certain content. In the specific application process, the data collected in the
previous period will be used. It organizes and processes the feedback information of a certain part to
form a closed loop of data processing. On the whole, reinforcement learning is a type of learning
method that expands data collection based on statistics and dynamic learning. Such methods are
mainly used to solve the control problem of robots. Its representative learning methods include
Q-learning algorithm and Temporal difference learning algorithm(Jin, 2020). Reinforcement machine
learning is a behavioral machine learning model that is similar to supervised learning, but the
algorithm isn’t trained using sample data. This model learns as it goes by using trial and
error. A sequence of successful outcomes will be reinforced to develop the best
recommendation or policy for a given problem.

4.Deep Learning:

The subset of machine learning composed of algorithms that permit software to train itself to
perform tasks, like speech and image recognition, by exposing multilayered neural network to vast
amount of data. Deep learning uses Neural networks that pass data through many processing layers to
interpret data features and deep learning managed by algorithms are largely self-directed on data
analysis once they are put into production.

8
CHAPTER-4

SYSTEM DESIGN

4.1 SYSTEM SPECIFICATIONS:

Hardware Requirements:

• System : Pentium i3
• Hard Disk : 500 GB.
• Monitor : 14’ Colour Monitor.
• Mouse : Optical Mouse.
• Ram : 4 GB.

Software Requirements:
 Operating system : Windows 8/10.
 Coding Language : PYTHON

4.2 UML DIAGRAMS


UML stands for Unified Modeling Language. UML is a standardized general-
purpose modeling language in the field of object-oriented software engineering. The standard is
managed, and was created by, the Object Management Group.

The goal is for UML to become a common language for creating models of object oriented
computer software. In its current form UML is comprised of two major components: a
Meta- model and a notation. In the future, some form of method or process may also be added
to; or associated with, UML.

The Unified Modeling Language is a standard language for specifying,


Visualization, Constructing and documenting the artifacts of software system, as well as for
business modeling and other non-software systems.

The UML represents a collection of best engineering practices that have proven successful
in the modeling of large and complex systems

9
4.2.1USE CASE DIAGRAM
A use case diagram in the Unified Modelling Language (UML) is a type of
behavioral diagram defined by and created from a Use-case analysis. Its purpose is to present
a graphical overview of the functionality provided by a system in terms of actors, their goals
(represented as use cases), and any dependencies between those use cases. The main purpose of a use
case diagram is to show what system functions are performed for which actor. Roles of the actors in
the system can be depicted.

Fig 4.2.1 Use Case


Diagram

10
4.2.2 CLASS DIAGRAM:
In software engineering, a class diagram in the Unified Modelling Language (UML) is a type
of static structure diagram that describes the structure of a system by showing the system's classes,
their attributes, operations (or methods), and the relationships among the classes. It explains which
class contains information.

Fig 4.2.2 Class Diagram

11
4.2.3 SEQUENCE DIAGRAM:
A sequence diagram in Unified Modelling Language (UML) is a kind of interaction diagram
that shows how processes operate with one another and in what order. It is a construct of a Message
Sequence Chart. Sequence diagrams are sometimes called event diagrams, event scenarios,
and
timing diagrams.

12
4.2.4 ACTIVITY DIAGRAM
Activity diagrams are graphical representations of workflows of stepwise activities
and actions with support for choice, iteration and concurrency. In the Unified Modeling
Language, activity diagrams can be used to describe the business and operational step-by-step
workflows of components in a system. An activity diagram shows the overall flow of control.

Fig:5.2.4 Activity Diagram

13
4.2.5 COLLABORATION DIAGRAM
The collaboration diagram is used to show the relationship between the objects in a system.
Both the sequence and the collaboration diagrams represent the same information but
differently. Instead of showing the flow of messages, it depicts the architecture of the object
residing in the system as it is based on object-oriented programming. An object consists of
several features. Multiple objects present in the system are connected to each other. The
collaboration diagram, which is also known as a communication diagram, is used to portray the
object's architecture in the system.

Fig 5.2.5 Collaboration Diagram

14
CHAPTER-5

SYSTEM IMPLEMENTATION

5.1 CODE

MINOR PROJECT: PREDICTING THE SURVIVAL OF A PASSENGER


ABOARD TITANIC SHIP

DESCRIPTION OF PROJECT:
Build a classifier to predict the survival of passenger aboard titanic ship.The target is a
binary class.from two files –train.csv and test.csv. use the train file for training the data and the test
file for only reporting the performance metrics.

DATASET DESCRIPTION:

Survival-Target-0=No,1=Yes
PClass-Ticket class-1=1st class,2=2nd class,3=3rd class
Sex-Female or Male
Age-Age in years
Sibsp-number of siblings/spouses aboard the titanic
Parch-Number of parents/children aboard the titanic
Ticket-Ticket number
Fare-passenger fare
Cabin-cabin number
Embarked-port of embarkation-C=Chergbourg,Q=Queensport,S=Southampton

Clearly, it’s a classification problem because we need to tell whether the person will survive or
not. We will use 5 different classifiers and compare their accuracy. The 5 different classifiers are as
follows:
 Random Forest Classifier
 Logistic Regression
 K-Neighbor
 Decision Tree Classifier
 Support Vector Machine

15
IMPORTING LIBRARIES & LOADING DATASET

Now we’re going to need the following libraries:


import matplotlib.pyplot as plt

import seaborn as sns

from sklearn.model_selection import train_test_split

from sklearn.ensemble import RandomForestClassifier

from sklearn.linear_model import LogisticRegression

from sklearn.neighbors import KNeighborsClassifier

from sklearn import tree,svm

from sklearn.metrics import accuracy_score

After this, we need to load our dataset to start working. To load our dataset we’ll use the
read_csv method of pandas library. Observe the first 10 rows of the dataset. (Press Shift +
enter to run the cell)
train_data = pd.read_csv('/kaggle/input/titanic/train.csv')

# Printing first 10 rows of the dataset


train_data.head(10)

Titanic Dataset 10 rows


print('The shape of our training set: %s passengers and %s
features'%(train_data.shape[0],train_data.shape[1]))

train_data.info()<class 'pandas.core.frame.DataFrame'>
RangeIndex: 891 entries, 0 to 890

16
Data columns (total 12 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 PassengerId 891 non-null int64
1 Survived 891 non-null int64
2 Pclass 891 non-null int64
3 Name 891 non-null object
4 Sex 891 non-null object
5 Age 714 non-null float64
6 SibSp 891 non-null int64
7 Parch 891 non-null int64
8 Ticket 891 non-null object
9 Fare 891 non-null float64
10 Cabin 204 non-null object
11 Embarked 889 non-null object
dtypes: float64(2), int64(5), object(5)
memory usage: 83.7+ KB

As you can see we have 891 entries in total but some of the columns have less than 891
entries so that means we have missing values in these columns namely Age, Cabin &
Embarked. So we have to preprocess our data first before training our ml model.
# Checking Null Values
train_data.isnull().sum()
PassengerId 0
Survived 0
Pclass 0
Name 0
Sex 0
Age 177
SibSp 0
Parch 0
Ticket 0
Fare 0
Cabin 687
Embarked 2
dtype: int64

There are 177 missing entries in Age column. 687 missing entries are in Cabin column and 2
missing are in Embarked.

17
Exploratory Data Analysis

Now we will analyze our data to see which variables are actually important to predict the
value of the target variable. Hence, we are going to plot a heat map to see the correlation
between the parameters and the target variable (Survived).
heatmap = sns.heatmap(train_data[["Survived", "SibSp", "Parch", "Age",
"Fare"]].corr(), annot = True)

sns.set(rc={'figure.figsize':(12,10)})

Now see the decimal values in the above color 2D matrix. These values are the correlation
values. Just compare the survived column with the rest of the columns. The lighter the color
is the more correlated the value is. Let’s compare the Survived with Sibsp you’re getting the
value -0.035. It means that SibSp is not correlated to Survived. Then Parch has a 0.082 value
which shows very little correlation. Then Age, again no correlation. In the end, we have Fare
whose value of correlation with the Survived variable is 0.26 which shows that the more the
fare is, the more are the chances of survival.
Conclusion: But it does not mean that the other features are useless. We’ll explore more
about them later.

Moving on, now we will understand all the features one by one. We’ll visualize the impact of
each feature on the target variable. Let us start with SibSp that is the no. of siblings or
spouses a passenger has.

18
SibSp – Number of Siblings / Spouses aboard the Titanic

To visualize surviving probability with respect to SibSp we will plot a bar graph.
# Finding unique values
train_data['SibSp'].unique()
bargraph_sibsp = sns.catplot(x = "SibSp", y = "Survived", data = train_data,
kind="bar", height = 8)

Conclusion:
 Passengers having 1 or 2 siblings have good chances of survival
 More no. of siblings -> Fewer chances of survival
Age Column

We’ll plot a graph so as to see the distribution of age with respect to target variable.
ageplot = sns.FacetGrid(train_data, col="Survived", height = 7)
ageplot = ageplot.map(sns.distplot, "Age")
ageplot = ageplot.set_ylabels("Survival Probability")

19
Conclusion More age -> less chances of survival!
Gender Column

For gender we are simply going to use seaborn and will plot a bar graph.
sexplot = sns.barplot(x="Sex", y="Survived", data=train_data)

You can see from the above graph it’s quite obvious to say that man has less chances of survival over
females. (Remember the Titanic scene when everyone was saying “Women and children first!” I
want you to comment down “YES!” if you remember this scene.)

Pclass Column

Let us now see whether the class plays any role in survival probability or not.
pclassplot = sns.catplot(x = "Pclass", y="Survived", data = train_data, kind="bar", height = 6)

20
MAJOR PROJECT: PROVIDING CUSTOMERS WITH ONE YEAR
SUBSCRIPTION PLAN FOR THEIR PRODUCT
DESCRIPTION OF PROJECT:
XYZ is a service providing company that provides customers with a one year subscription
plan for their product. The company wants to know whether the customers will renew the
subscription for the coming year or not.
The csv consists of around 2000 rows and 16 columns features
1. Year
2. Customer_id-unique id
3. Phone_no-Customer phone number
4. Gender-male or female
5. Age-Age of the customer
6. No of days subscribed-the no of days since the subscription
7. Multi-Screen—does the customer have a single or multiple screen subscription
8. Mail subscription-customer receive E-mails or not
9. Weekly mins watched-no of minutes watched weekly
10. Minimum daily mins-minimum minutes watched
11. Maximum daily mins-maximum minutes watched
12. Weekly nights max mins-no of minutes watched at night time
13. Videos watched-total no of videos watched
14. Maximum_days_inactive-days since inactive
15. Customer support calls-no of customers support calls
16. Churn
1=Yes
2=No
Churn is the target variable
IMPORTING REQUIRED LIBRARIES:
1 #importing required Libraries and reading the dataset
2 import pandas as pd
3 from sklearn.model_selection import train_test_split
4 from sklearn.linear_model import LinearRegression
5 from sklearn.metrics import accuracy_score
6 from sklearn.metrics import confusion matrix
7 from sklearn.metrics import recall score
8 from sklearn.metrics import precision_score
9

21
10
11
12 #dropping of unwanted columns
13 df = pd.read_csv('data.csv')
14 dfl = pd.get_dummies(df['mail_subscribed'])
15 df = pd.concat([df, dfl], axis=l).reindex(df.index)
16 df.drop('mail_subscribed', axis=l, inplace=True)
17 df2 = pd.get_dummies(df['multi_screen'])
18 df = pd.concat([df, df2], axis=l).reindex(df.index)
19 df.drop('multi_screen', axis=l, inplace=True)
20 df3 = pd.get_dummies(df['gender'])
21 df = pd.concat([df, df3], axis=l).reindex(df.index)
22 df.drop('gender', axis=l, inplace=True)
23 Df

Out[1]:
year customer_id phone_no age no_of_days_subscribed weekly_mins_watched minim

0 2015 100198 409-8743 36 62 148.35

1 2015 100643 340-5930 39 149 294.45

2 2015 100756 372-3750 65 126 87.30

3 2015 101595 331-4902 24 131 321.30

4 2015 101653 351-8398 40 191 243.00

1995 2015 997132 385-7387 54 75 182.25

1996 2015 998086 383-9255 45 127 273.45

1997 2015 998474 353-2080 53 94 128.85

1998 2015 998934 359-7788 40 94 178.05

1999 2015 999961 414-1496 37 73 326.70

2000 rows x 19 columns

◄ ►
1 #filling null values in the data --optional but required
2
3 df.fillna(0,inplace = True)
4 y_df = df['churn']
5 x_df = df.drop(['year','customer_id','phone_no','churn'],axis=l)
6

In[ ]:
1

22
In [3]:

1 df.info()

<class 'pandas.core.frame.DataFrame'>
Rangeindex: 2000 entries, 0 to 1999
Data columns (total 19 columns):
# Column Non-Null Count Dtype
0 year 2000 non-null int64
1 customer_id 2000 non-null int64
2 phone_no 2000 non-null object
3 age 2000 non-null int64
4 no_of_days_subscribed 2000 non-null int64
5 weekly_mins_watched 2000 non-null float64
6 minimum_daily_mins 2000 non-null float64
7 maximum_daily_mins 2000 non-null float64
8 weekly_max_night_mins 2000 non-null int64
9 videos watched 2000 non-null int64
10 maximum_days_inactive 2000 non-null float64
11 customer_support_calls 2000 non-null int64
12 churn 2000 non-null float64
13 no 2000 non-null uint8
14 yes 2000 non-null uint8
15 no 2000 non-null uint8
16 yes 2000 non-null uint8
17 Female 2000 non-null uint8
18 Male 2000 non-null uint8
dtypes: float64(5), int64(7), object(l), uint8(6)

memory usage: 215.0+ KB

In[ ]:
1

# performing train and test split


x_train, x_test, y_train, y_test = train_test_split(x_df, y_df, test size=0.005, ranc
LR= LinearRegression()

# model building
LR.fit(x_train,y_train)

#model validation(predictions)
y_prediction = LR.predict(x_test)
print("The prediction is ",y_prediction)
y_prediction = y_prediction.round (0).astype(int)

23
# Accuracy Score
y_original = df['churn'].iloc[1990:2000]
ac = accuracy_score(y_original,y_prediction)
print("The accuracy score is",ac)
print("\n\n")

#confusion matrix
cf= confusion_matrix(y_original,y_prediction)
print("The confusion matrix is ",cf)
print("\n\n")

#recall score
re= recall_score(y_original,y_prediction,average = None)
print("The recall score is ",re)
print("\n\n")

#precision score
pc= precision_score(y_original,y_prediction,average = None)
print("The precision score is ",pc)
print("\n\n")

The prediction is [ 0.08898349 0.06527157 0.1068744 0.21992295 0.07693


165 0.06224768
0.19548169 -0.04951808 0.15539289 0.06575483]
The accuracy score is 0.9

The confusion matrix is [[9 0]


[1 0]]

The recall score is [1. 0.]

The precision score is [0.9 0. ]

24
5.2 TESTING METHODOLOGIES
The following are the Testing Methodologies:

o Unit Testing.
o Integration Testing.
o User Acceptance Testing.
o Output Testing.
o Validation Testing.

Unit Testing

Unit testing focuses verification effort on the smallest unit of Software design that is
the module. Unit testing exercises specific paths in a module’s control structure to ensure
complete coverage and maximum error detection. This test focuses on each module individually,
ensuring that it functions properly as a unit. Hence, the naming is Unit Testing.

During this testing, each module is tested individually and the module interfaces are verified
for the consistency with design specification. All important processing path are tested for the
expected results. All error handling paths are also tested.

Integration Testing

Integration testing addresses the issues associated with the dual problems of verification and
program construction. After the software has been integrated a set of high order tests are conducted.
The main objective in this testing process is to take unit tested modules and builds a
program structure that has been dictated by design.

The following are the types of Integration Testing:

1)Top Down Integration

This method is an incremental approach to the construction of program structure. Modules


are integrated by moving downward through the control hierarchy, beginning with the main program
module. The module subordinates to the main program module are incorporated into the structure in
either a depth first or breadth first manner.
In this method, the software is tested from main module and individual stubs are replaced
when the test proceeds downwards.

25
2. Bottom-up Integration

This method begins the construction and testing with the modules at the lowest level in the
program structure. Since the modules are integrated from the bottom up, processing required
for modules subordinate to a given level is always available and the need for stubs is eliminated. The
bottom up integration strategy may be implemented with the following steps:

 The low-level modules are combined into clusters into clusters that perform a specific
Software sub-function.
 A driver (i.e.) the control program for testing is written to coordinate test case input
and output.
 The cluster is tested.
 Drivers are removed and clusters are combined moving upward in the program
Structure
The bottom up approaches tests each module individually and then each module is module is
integrated with a main module and tested for functionality.

User Acceptance Testing

User Acceptance of a system is the key factor for the success of any system. The
system under consideration is tested for user acceptance by constantly keeping in touch with the
prospective system users at the time of developing and making changes wherever required.
The system developed provides a friendly user interface that can easily be understood even by a
person who is new to the system.

Output Testing

After performing the validation testing, the next step is output testing of the proposed system,
since no system could be useful if it does not produce the required output in the specified format.
Asking the users about the format required by them tests the outputs generated or displayed by the
system under consideration. Hence the output format is considered in 2 ways – one is on screen and
another in printed format.

Validation Checking

Validation checks are performed on the following fields.

26
Text Field:

The text field can contain only the number of characters lesser than or equal to its size. The
text fields are alphanumeric in some tables and alphabetic in other tables. Incorrect entry
always flashes and error message.
Numeric Field:

The numeric field can contain only numbers from 0 to 9. An entry of any character flashes an
error messages. The individual modules are checked for accuracy and what it has to perform.
Each module is subjected to test run along with sample data. The individually tested modules
are integrated into a single system. Testing involves executing the real data information is used in
the program the existence of any program defect is inferred from the output. The testing should be
planned so that all the requirements are individually tested.
A successful test is one that gives out the defects for the inappropriate data and produces
and output revealing the errors in the system.

Preparation of Test Data

Taking various kinds of test data does the above testing. Preparation of test data plays a vital
role in the system testing. After preparing the test data the system under study is tested using that test
data. While testing the system by using test data errors are again uncovered and corrected by using
above testing steps and corrections are also noted for future use.

Using Live Test Data:

Live test data are those that are actually extracted from organization files. After a system is
partially constructed, programmers or analysts often ask users to key in a set of data from
their normal activities. Then, the systems person uses this data as a way to partially test the
system. In other instances, programmers or analysts extract a set of live data from the files
and have them entered themselves.

It is difficult to obtain live data in sufficient amounts to conduct extensive testing.


And, although it is realistic data that will show how the system will perform for the typical
processing requirement, assuming that the live data entered are in fact typical, such data generally
will not test all combinations or formats that can enter the system. This bias toward typical values
then does not
provide a true systems test and in fact ignores the cases most likely to cause system failure.

27
Using Artificial Test Data:

Artificial test data are created solely for test purposes, since they can be generated to test all
combinations of formats and values. In other words, the artificial data, which can quickly be
prepared by a data generating utility program in the information systems department, make possible
the testing of all login and control paths through the program.

The most effective test programs use artificial test data generated by persons other than those
who wrote the programs. Often, an independent team of testers formulates a testing plan, using the
systems specifications.

The package “Virtual Private Network” has satisfied all the requirements specified as
per
software requirement specification and was accepted.

USER TRAINING

Whenever a new system is developed, user training is required to educate them about
the working of the system so that it can be put to efficient use by those for whom the system has been
primarily designed. For this purpose the normal working of the project was demonstrated to
the prospective users. Its working is easily understandable and since the expected users are people
who have good knowledge of computers, the use of this system is very easy.

MAINTAINENCE

This covers a wide range of activities including correcting code and design errors. To reduce
the need for maintenance in the long run, we have more accurately defined the user’s requirements
during the process of system development. Depending on the requirements, this system has
been developed to satisfy the needs to the largest possible extent. With development in technology, it
may be possible to add many more features based on the requirements in future. The coding
and designing is simple and easy to understand which will make maintenance easier.

TESTING STRATEGY :

A strategy for system testing integrates system test cases and design techniques into a well
planned series of steps that results in the successful construction of software. The testing strategy
must co-operate test planning, test case design, test execution, and the resultant data collection and
evaluation .A strategy for software testing must accommodate low-level tests that are necessary

28
to verify that a small source code segment has been correctly implemented as well as high level
tests that validate major system functions against user requirements.

Software testing is a critical element of software quality assurance and represents the ultimate
review of specification design and coding. Testing represents an interesting anomaly for the
software. Thus, a series of testing are performed for the proposed system before the system is ready
for user acceptance testing.

SYSTEM TESTING:

Software once validated must be combined with other system elements (e.g. Hardware,
people, database). System testing verifies that all the elements are proper and that overall
system function performance is achieved. It also tests to find discrepancies between the
system and its original objective, current specifications and system documentation.

UNIT TESTING:

In unit testing different are modules are tested against the specifications produced during the
design for the modules. Unit testing is essential for verification of the code produced during
the coding phase, and hence the goals to test the internal logic of the modules. Using the detailed
design description as a guide, important Conrail paths are tested to uncover errors within the
boundary of the modules. This testing is carried out during the programming stage itself. In this type
of testing step, each module was found to be working satisfactorily as regards to the expected output
from the module.

In Due Course, latest technology advancements will be taken into consideration. As part
of technical build-up many components of the networking system will be generic in nature so
that future projects can either use or interact with this.The future holds a lot to offer to the
development
and refinement of this project.

29
CHAPTER-6

SCREEN SHOTS

30
31
32
33
34
35
CHAPTER-7

CONCLUSION

Python is playing a significant role in our day to day life. Therefore it is a need to do more
and more work on its use and development. The reason behind this development is the difficulty of
analyzing and processing the rapidly increasing data. Machine learning is based on the principle of
finding the best model for the new data among the previous data thanks to this increasing
data. Therefore, machine learning researches will go on in parallel with the increasing data. We work
with several parts of Django Framework to get this working: View, Models, Forms, and Templates.

It would be remiss of us not to try to answer this question. This has increasingly become a
mantra of certain segments of the community. Arguing the web platform has advanced to the point
where you don’t need additional APIs to make creating web applications easy. Like everything in
this series, our response would be more it depends.

While going framework-less can work, and does, it also comes at a cost. Those who advocate
the benefits of framework-less JS, those used to the, we would argue, Stockholm Syndrome of web
technologies forget that there are multiple sets of rapidly evolving APIs with at least three different
technologies with three radically different syntaxes. The specifications for the web platform identify
over 12,000 APIs and the Venn diagram of what is actually in browsers shows there are significant
gaps:

36
APPENDIX-A

BIBLIOGRAPHY

[1]. Anand, P., Tri, K., Prof, A., & Pg, A. K. International Journal of Advanced Research in
Computer Science and Software Engineering A Framework of Distributed Database
Management Systems in the Modern Enterprise and the Uncertainties removal. 2(4), 550–555, 2012.
[2]. Çelik, Ö. A Research on Machine Learning Methods and Its Applications. Journal
of
Educational Technology and Online Learning. 2018. https://doi.org/10.31681/jetol.457046
[3]. Chandiramani, A. Management of Django Web Development in Python. Journal of Management
and Service Science (JMSS), 1(2), 1–17, 2021. https://doi.org/10.54060/jmss/001.02.005
[4]. Eltahawey, A. O. (2017). Database Using Python : A Tutorial. December 2016.
[5]. Gunjal, B., & Koganurmath, M. M. (2014). Database System: Concepts and Design. E-Journals
by Research Scholars in National Institute of Technology (NIT) Rourkela, December 1– 19. 2003.
[6]. Gupta, S. B. B. E. P. A., & Madnick, S. E., A FRAMEWORK AND COMPARATIVE STUDY
OF DISTRIBUTED HETEROGENEOUS DATABASE MANAGEMENT SYSTEMS. Working
paper, 1988.
[7]. Jin, W. Research on Machine Learning and Its Algorithms and Development. Journal of
Physics: Conference Series, 1544(1), 2020. https://doi.org/10.1088/1742-6596/1544/1/012003
[8]. Kossmann, J., & Schlosser, R. (2019). A framework for selfmanaging database systems.
Proceedings - 2019 IEEE 35th International Conference on Data Engineering Workshops,
ICDE,
100–106, 2019. https://doi.org/10.1109/ICDEW.2019.00-27
[9]. Patil, S. J. Python - Using Database and SQL. International Journal of Science and
Research
(IJSR), 8(2), 83–85, 2019. https://www.ijsr.net/archive/v8i2/ART20194929.pdf
[10]. Raschka, S., Patterson, J., & Nolet, C. Machine learning in python: Main developments
and technology trends in data science, machine learning, and artificial intelligence.
Information (Switzerland), 11(4), 2020. https://doi.org/10.3390/info11040193
[11]. Ren, Y. Python Machine Learning : Machine Learning and Deep Learning With Python
,.
International Journal of Knowledge-Based Organizations, 11(1), 67–70. 2021.
[12]. Sedhain, S. Web framework for Python Django Book: pdf version. English, 1–190,
2007 http://www.gti.bh/Library/assets/djangobookwzy482.pdf
[13]. Shyam, A., & Mukesh, N. A Django Based Educational Resource Sharing Website:
Shreic.
Journal of Scientific Research, 64(01), 138–152, 2020. https://doi.org/10.37398/jsr.2020.640134

37
[14]. Suraya, S., & Sholeh, M. Designing and Implementing a Database for Thesis
Data Management by Using the Python Flask Framework. International Journal of Engineering,
Science and Information Technology, 2(1), 9–14, 2021 https://doi.org/10.52088/ijesty.v2i1.197
[15]. Taneja, S., & Gupta, P. R. Python as a Tool for Web Server Application Development.
International Journal of Information, Communication and Computing Technology, 2(1), 77–83,
2014. https://www.jimsindia.org/8i_Journal/VolumeII/Python-as-atool-for-web-server-application-
development.pd

TECHNOLOGY USED

Python Web framework is a collection of packages or modules that allow developers to write web
applications or services. With it, developers don't need to handle low-level details like
protocols, sockets or process/thread management. There are several framework can be used in
Python Programming language, like AIOHTTP, Dash, Falcon, Flask, Giotto, Django etc. one
of the most popular framework in Python for Web Development is Django. Django is a high-level
Python web framework application that encourages rapid development and clean, pragmatic design.
It is built by experienced developers; it takes care of much of the hassle of web development, so one
can focus on writing his/her app without needing to reinvent the wheel. It’s free and open source so
any one can make use of it. Django can work with any client-side framework, and can deliver content
in almost any format (including HTML, RSS feeds, JSON, XML, etc). Django is written in Python,
which runs on many platforms. That means that you are not tied to any particular server platform, and
can run your applications on many flavors of Linux, Windows, and Mac OS X. Furthermore, Django
is well- supported by many web hosting providers, who often provide specific
infrastructure and documentation for hosting Django sites. Django Being a full stack
framework, Django supports quick web application development requiring lesser core to be done.
It is popularly known as “the web framework for perfectionists with deadlines” . It provides ease in
creating web applications with fewer lines of code and is scalable . It includes a built-in server that
can be used for developing and testing the applications. The framework comes with
comprehensive, well-written documentation. Features of this framework include templates,
support for relational database model (Databases include MySQL, SQLite, Oracle, and
PostgreSQL), comprehensive security, etc. It is well- suited for database driven applications. The
International Journal of Computer Sciences and Engineering Vol.10(5), May 2022, E-ISSN:
2347-2693 © 2022, IJCSE All Rights Reserved 64 framework is based on the principle of
reusability of code and non- redundancy of information(Taneja & Gupta,
2014). The basics of web development using Django to build blog applications that have the (CRUD)
Create, Read, Update, Delete functionality. Django is a widely used free, opensource, and high-level
web development framework. It provides a lot of features to the developers "out of the box,"
so development can be rapid. However, websites built from it are secured, scalable, and maintainable
at the same time(Chandiramani, 2021). A high-level Web framework is software that eases the pain
of building dynamic Web sites. It abstracts common problems of Web development and
provides
shortcuts for frequent programming tasks. For clarity, a dynamic Web site is one in which
pages
38
aren’t simply HTML documents sitting on a server’s filesystem somewhere. In a dynamic Web site,
rather, each page is generated by a computer program — a so-called “Web application” — that you,
the Web developer, create. A Web application may, for instance, retrieve records from a database or
take some action based on user input(Sedhain, 2007).

Python is one of the most popular programming languages for data science and thanks to its
developer and open source community, a large number of useful libraries for scientific
computing and machine learning have been developed(Ren, 2021). Python offers concise and
readable code. While complex algorithms and versatile workflows stand behind machine
learning i.e. Python’s simply allows developers to write reliable systems. Developers get to put all
their effort to solve ML problems instead of focusing on the technical nuances of the language.
Additionally, Python is appealing to many developers as it’s easy to learn. Python code is
understandable by its users, which makes it easier to build models for machine learning.
Python's extensive selection of machine learning-specific libraries and frameworks simplify the
development process and cut development time. Python's simple syntax and readability promote
rapid testing of complex algorithms, and make
the language accessible to nonprogrammers.

39

You might also like