50% found this document useful (2 votes)
703 views

Intern Report

The document summarizes an internship report submitted by M Siva Badarinath for their internship at PEBIANS Pvt. Ltd., Hyderabad. The 4-week internship included training in Python fundamentals and machine learning. In the final week, the intern worked on a machine learning project to identify fraud in credit card transactions using Python libraries and algorithms. Overall, the internship provided hands-on experience in applying machine learning concepts and helped enhance the intern's technical skills.

Uploaded by

Siva Badrinath
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
50% found this document useful (2 votes)
703 views

Intern Report

The document summarizes an internship report submitted by M Siva Badarinath for their internship at PEBIANS Pvt. Ltd., Hyderabad. The 4-week internship included training in Python fundamentals and machine learning. In the final week, the intern worked on a machine learning project to identify fraud in credit card transactions using Python libraries and algorithms. Overall, the internship provided hands-on experience in applying machine learning concepts and helped enhance the intern's technical skills.

Uploaded by

Siva Badrinath
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 29

 

Machine Learning with Python


INTERNSHIP PROGRAM

An Internship Report submitted in partial fulfilment of the requirements for the


award of the degree of
BACHELOR OF TECHNOLOGY
IN
COMPUTER SCIENCE AND ENGINEERING
Submitted by
M Siva Badarinath, 1215316025
Under the esteemed guidance of
Ramji Bora
Mentor, PEBIANS Pvt. Ltd., Hyderabad

DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING


GITAM
(Deemed to be University)
VISAKHAPATNAM
MAY 2019

 
 
 

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING


GITAM INSTITUTE OF TECHNOLOGY
GITAM
(Deemed to be University)

DECLARATION

I, hereby declare that the internship review entitled “​Machine Learning with Python​”. It is an
original work done in the Department of Computer Science and Engineering, GITAM Institute of
Technology, GITAM (Deemed to be University) submitted in partial fulfilment of the
requirements for the award of the degree of B.Tech. In Computer Science and Engineering.
The work has not been submitted to any other college or University for the award of any degree
or diploma.

Date: 06 JUNE 2019 Signature

Registration No: 1215316025 M Siva Badarinath

 

 
 

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING


GITAM INSTITUTE OF TECHNOLOGY
GITAM
(Deemed to be University)

CERTIFICATE

This is to certify that the internship report entitled “​Machine Learning with Python” ​is a
bonafide record of work carried out by M Siva Badarinath(1215316025) submitted partial
fulfilment of the requirements for the award of the degree of Bachelor of Technology in
Computer Science and Engineering.

SUPERVISOR INTERNSHIP REVIEWER


RAMJI BORA ​ .N.S.V. Jitendra,(Asst. Professor)
M

MODULE LEAD FACULTY


PEBIANS Pvt. Ltd., Hyderabad PVT. LTD GITAM, VISAKHAPATNAM

 

 
 

ACKNOWLEDGEMENT

The internship opportunity I had with PEBIANS Pvt. Ltd., Hyderabad was a great chance for
learning and professional development. Therefore, I consider myself as a very lucky individual
as I was provided with an opportunity to be a part of it. I am also grateful for having a chance to
meet so many wonderful people and professionals who led me through this internship period.
I express gratitude to Mr. Jageeshwar Reddy and Mr. Ramji Bora, for providing their valuable
guidance and sharp vision to undertake these six weeks internship at PEBIANS Pvt. Ltd.,
Hyderabad. I am grateful to Mr. Ramji Bora and Mr. Jageeshwar Reddy, my supervisors at
PEBIANS Pvt. Ltd., Hyderabad who have guided me throughout the internship. I am also
thankful to Mr. Ramji Bora for helping me overcome the difficulties faced during the internship.
PEBIANS Pvt. Ltd., Hyderabad is a privately held limited liability company, which implements
real collaborative internal communication in a real modern digital workplace.
I would also like to thank Dr. K Thammi Reddy, HOD, Department of Computer Science and
Engineering, GIT and Sri. M.N.S.V. Jitendra, A.M.C, who helped me a lot in successful
completion of our internship and internship report.
I am thankful and fortunate enough to get constant encouragement, support and guidance from
all my colleagues and staff at PEBIANS Pvt. Ltd., Hyderabad, which helped me in successfully
completing the project work. Also, I would like to extend our sincere esteems to all staff in
laboratory for their timely support.
I perceive as this opportunity as a big milestone in my career development. I will strive to use
gained skills and knowledge in the best possible way, and I will continue to work on their
improvement, in order to attain desired career objectives. Hope to continue cooperation with all
of you in the future.

M Siva Badarinath

 

 
 

TABLE OF CONTENTS:

1 Abstract 6

2 About the Organization 7

3 Schedule of Internship 7

4 Training 8

5 Project 23

6 Outcomes 28

7 Conclusion 28

8 References 29

 

 
 

ABSTRACT:

I have done my internship in ​PEBIANS Pvt. Ltd., ​which is a start up company in Hyderabad.
Internship cum training is an opportunity where we will be learning the new things application of
the knowledge gained. In this program we have undergone for 4 weeks of training cum
internship. The purpose of this program is to enhance our knowledge and use them to build the
application.

The first week was spent, getting to know Python . We were introduced to python's philosophy
which emphasizes code readability, allowing programmers to express concepts in fewer lines of
code, making Python a more simplistic language than others. In the next few weeks, we were
taught the fundamental concepts of python programming. After getting well versed with the
concepts we were given practice questions to apply our knowledge and learn more.

The final week was spent in understanding Machine Learning Project, The project is about
finding frauds in credit card transactions. We get the free available data from kaggle and Python
libraries, mine the data that is required. After Pre- Processing apply ML algorithms to get the
fraud data

 

 
 

About Company:

Pebians Private Limited is a Tech Organization engaged in the Business with a qualitative range
of industrial products. The Pebians Private Limited is listed in the class of company and
classified as Non Govt Company. This company is registered at the Registrar of
Companies(ROC), Hyderabad .During the Internship, the company's mission is to encourage
students to find themselves and learn through their experience. The Faculty have been supporting
throughout the internship

​SCHEDULE OF INTERNSHIP:

​WEEK 1: Python introduction and data structures

WEEK 2: Python Functions, Control Flow, OOPs.

WEEK 3 : Intro to Machine Learning and its Libraries

WEEK 4 : Project Building

 

 
 

PYTHON:

Python is an easy to learn, powerful programming language. It has efficient high-level data
structures and a simple but effective approach to object-oriented programming. Python’s elegant
syntax and dynamic typing, together with its interpreted nature, make it an ideal language for
scripting and rapid application development in many areas on most platforms.

Python is a cross-platform programming language, meaning, it runs on multiple platforms like


Windows, MacOS, Linux and has even been ported to the Java and .NET virtual machines. It is
free and open source, created in 1991 by Guido van Rossum.

Even though most of today’s Linux and Mac have Python pre installed in it, the version might be
out-of-date. So, it is always a good idea to install the most current version.

WHY PYTHON:

Python is widely considered as the preferred language for teaching and learning Ml (Machine
Learning). Few simple reasons are:

1. It’s simple to learn. As compared to c, c++ and Java, the syntax is simpler and Python also
consists of a lot of code libraries for ease of use.

2. Though it is slower than some of the other languages, the data handling capacity is great.

3. Open Source! – Python along with R is gaining momentum and popularity in the Analytics
domain since both of these languages are open source.

4. Capability of interacting with almost all the third party languages and platforms

 

 
 
CHARACTERISTICS OF PYTHON:

1) Easy to Learn and Use

2) Expressive Language

3) Interpreted Language

4) Cross-platform Language

Python Run-Modes:
1. Script run-Mode: ​Popularly known as development mode. In this mode we can store
collections of statements with ​.py ​extension. And run using ​“f5”.
2. Interactive mode: ​In this mode we are able to run every command independently.
Ex: >>print(“hello”)
hello

PYTHON DATA TYPES:

Variables can hold values of different data types. Python is a dynamically typed language hence
we need not define the type of the variable while declaring it. The interpreter implicitly binds the
value with its type.

Standard data types​: ​A variable can hold different types of values. For example, a
person's name must be stored as a string whereas its id must be stored as an
integer.Python provides various standard data types that define the storage method on each
of them. The data types defined in Python are given below.

1. Numbers

2. String

3. List

 

 
 
4. Tuple

5. Dictionary

1.NUMBERS:

Number stores numeric values. Python creates Number objects when a number is assigned to a
variable. For example;

>> a = 3 , b = 5 #a and b are number objects.

Python supports 4 types of numeric data:

1.int (signed integers like 10, 2, 29, etc.)

2.long (long integers used for a higher range of values like 908090800L, -0x1929292L, etc.)

3.float (float is used to store floating point numbers like 1.9, 9.902, 15.2, etc.)

4.complex (complex numbers like 2.14j, 2.0 + 2.3j, etc.)

2. STRINGS:

​String is a sequence of Unicode characters. We can use single quotes or double quotes to
represent strings. Multi-line strings can be denoted using triple quotes, ''' or """.

EX: >>> s = "This is a string"

str1 = 'hello javatpoint' #string str1

str2 = ' how are you' #string str2

 
10 
 
 

print (str1[0:2]) #printing first two character using slice operator

print (str1[4]) #printing 4th character of the string

print (str1*2) #printing the string twice

print (str1 + str2) #printing the concatenation of str1 and str2

Output​:

He

hello javatpointhello javatpoint

hello javatpoint how are you

String Indexing:

We can access individual characters using indexing and a range of characters using slicing. Index
starts from 0. Python allows negative indexing for its sequences.The index of -1 refers to the last
item, -2 to the second last item and so on. We can access a range of items in a string by using the
slicing operator(:) .

Ex: str= “Internship”

print('str[0] = ', str[0]) #first character #I

print('str[-1] = ', str[-1]) #last character #p

 
11 
 
 

print('str[1:5] = ', str[1:5]) #slicing 2nd to 5th character #nter

Strings are i​ mmutable​. This means that elements of a string cannot be changed 
once it has been assigned. We can simply reassign different strings to the same 
name. 

s[1]=’x’ 

>>Raises an error 

3. Lists:

​A list can be defined as a collection of values or items of different types. The items in the list are
separated with a comma (,) and enclosed with the square brackets [].It can have any number of
items and they may be of different types (integer, float, string etc.).

my_list = [] #empty list

my_list = [1, 2, 3] # list of integers

L1 = ["John", 102, "USA"]

1.Accessing:​ We can use the index operator [] to access an item in a list. Index starts from 0.
2.​Negative indexing​: Python allows negative indexing for its sequences. The index of -1 refers
to the last item.

3.Sclicing:​We can access a range of items in a list by using the slicing operator (colon).

Ex: print(my_list[2:5]) #ter

 
12 
 
 

4.​Mutable: ​List are mutable, meaning, their elements can be changed unlike string or tuple.

Ex: odd = [2, 4, 6, 8]

odd[0] = 1

4.Tuple:

Tuple is similar to lists since the value of the items stored in the list can be changed whereas the
tuple is immutable and the value of the items stored in the tuple can not be changed.

A tuple is created by placing all the items (elements) inside parentheses (), separated by commas.
A tuple can have any number of items and they may be of different types. (integer, float, list,
string, etc.).

Ex: t= = (5,'program', 1+3j)

Indexing and Slicing of Tuples ares same as Lists.

len(tuple) - It calculates the length of the tuple.

max(tuple) - It returns the maximum element of the tuple.

min(tuple) - It returns the minimum element of the tuple.

tuple(seq) - It converts the specified sequence to the tuple.

4. ​Sets:

 
13 
 
 

A set is an unordered collection of items. Every element is unique and must be immutable .Sets
can be used to perform mathematical set operations like union, intersection, symmetric
difference etc.

A set is created by placing all the items (elements) inside curly braces {}, separated by a comma
or by using the built-in function set().

Ex: s={1,2,3}

s2=set()

5.DICTIONARY​:

​Python dictionary is an unordered collection of items. While other compound data types have
only value as an element, a dictionary has a key: value pair.

Ex:>> d = {"apple": "green","banana": "yellow","cherry": "red"}

Creating a dictionary is as simple as placing items inside curly braces {} separated by comma.

>>d={}

Accessing: ​‘key is used to access the elements in dict.

>>print(my_dict['apple'])

MUTABLE: ​If the key is already present, value gets updated, else a new key: value pair is
added to the dictionary.

>>d[‘apple’]=”red”

 
14 
 
 

DELETE: ​We can remove a particular item in a dictionary by using the method pop(). This
method removes as item with the provided key and returns the value.

>>d.pop(“apple”)

OOPS:

Python is a multi-paradigm programming language. Meaning, it supports different programming


approach. One of the popular approaches to solve a programming problem is by creating objects.
This is known as ​Object-Oriented Programming ​(OOP).​Major principles of object-oriented
programming system are given below.

● Object

● Class

● Method

● Inheritance

● Polymorphism

● Data Abstraction

● Encapsulation

MACHINE LEARNING:

Machine learning is a type of artificial intelligence (AI) that provides computers with the ability
to learn without being explicitly programmed. Machine learning focuses on the development of
Computer Programs that can change when exposed to new data. In this article, we’ll see basics of
Machine Learning, and implementation of a simple machine learning algorithm using python.

Machine learning involves computer to get trained using a given data set, and use this training to
predict the properties of a given new data. For example, we can train computer by feeding it
1000 images of cats and 1000 more images which are not of a cat, and tell each time to computer

 
15 
 
 
whether a picture is cat or not. Then if we show the computer a new image, then from the above
training, computer should be able to tell whether this new image is cat or not.

PYTHON LIBRARIES:

1.NUMPY: ​NumPy is a general-purpose array-processing package. It provides a


high-performance multidimensional array object, and tools for working with these arrays.

It is the fundamental package for scientific computing with Python. It contains various features
including these important ones:Useful linear algebra, Fourier transform, and random number
capabilities

Array Creations:

import numpy as np

a=np.array([1,2,3,4])

b=np.array([1,2,3,4]) $Output: array([1, 2, 3, 4])

c = np.zeros((3, 3))

3d = np.linspace(0, 5, 10) # Creating a 3X3 array with all zeros

4e = np.arange(5) # Create a sequence of 10 values in the range 0 to 5

a.ndim : gives array dimensions

a.shape : gives the shape of array

a.size : gives Size of array

 
16 
 
 

Operations on Arrays: On 1D arrays:

a.sum a.mean a.max a.margin

a.prod a.std a.min a.ardmax

a.sort a.var

On 2D arrays:

Add : a+b Dot product: a@b

Sub : a-b Transpose : a.T Mul : a*b Div : a/b

Matplotlib​: ​Matplotlib is an amazing visualization library in Python for 2D plots of arrays.


Matplotlib is a multi-platform data visualization library built on NumPy arrays

One of the greatest benefits of visualization is that it allows us visual access to huge amounts of
data in an easily digestible visuals. Matplotlib consists of several plots like line, bar, scatter,
histogram etc.

Importing matplotlib:

from matplotlib import pyplot as plt

Line Plots:

X=[1,2,3]

 
17 
 
 

Y=[4,5,6]

Plt.plot(x,y)

plt.title('examp')

plt.xlabel('x axis')

plt.ylabel('y axis')

plt.grid(True,color='k') #grids

plt.plot(x,y,'g',label='first',linewidth=3)

plt.plot(p,q,'b',label='second',linewidth=3)

plt.legend()

 
18 
 
 

Bar Plots:

x=[1,2,3,4]

y=[6,3,6,7]

plt.xlabel('x...')

plt.ylabel('y...')

plt.title('jgkgb')

plt.bar(x,y,color='g', label='1st')

plt.bar([1,4,5],[5,4,1],color='r',label='2nd')

plt.legend()

 
19 
 
 

Scatter plot:

Plt.scatter(x,y)

 
20 
 
 

Pie Charts:

slices=[10,20,30,40]

activities=['eat','sleep','read','play']

col=['r','b','k','m']

plt.pie(slices, labels=activities, colors=col, startangle=90, shadow=True, explode=(0,0.1,0,0),


autopct='%1.1f%%')

plt.title('Pie Plot')

plt.show()

 
21 
 
 

PANDAS: ​Pandas is an open source, BSD-licensed library providing high-performance,


easy-to-use data structures and data analysis tools for the Python programming language.

Importing: ​import pandas as pd

Usage:

xyz={'Day':[ 1,2,3,4,5], 'visitors': [20,30,40,30,20],'B_rate':[20,20,15,10,30]}

a=pd.DataFrame(xyz) #a=a.set_index('Day',inplace=True) a

 
22 
 
 

Project: Credit Card Fraud Detection

Using a dataset of nearly 28,500 credit card transactions and multiple unsupervised anomaly
detection algorithms, we are going to identify transactions with a high probability of being credit
card fraud. In this project, we will build and deploy the following two machine learning
algorithms:

 
23 
 
 

1.Local Outlier Factor (LOF)

2.Isolation Forest Algorithm

Data visualization techniques common in data science, such as parameter histograms and
correlation matrices, to gain a better understanding of the underlying distribution of data in our
data set.

1. Importing Necessary Libraries: ​To start, let's print out the version numbers of all the
libraries we will be using in this project. This serves two purposes - it ensures we have installed
the libraries correctly and ensures that this tutorial will be reproducible.

import numpy as np

import pandas as pd

import matplotlib.pyplot as plt

import seaborn as sns

2. Import Data :​ we will import our dataset from a .csv file as a Pandas DataFrame.
Furthermore, we will begin exploring the dataset to gain an understanding of the type, quantity,
and distribution of data in our dataset. For this purpose, we will use Pandas' built-in describe
feature, as well as parameter histograms and a correlation matrix.

 
24 
 
 

3.​Plot Histograms:

 
25 
 
 

4.​Determine Fraud cases:

# Get all the columns from the dataFrame

 
26 
 
 

columns = data.columns.tolist()

# Filter the columns to remove data we do not want

columns = [c for c in columns if c not in ["Class"]]

# Store the variable we'll be predicting on

target = "Class"

X = data[columns]

Y = data[target]

5.​ Unsupervised Outlier Detection: ​Now that we have processed our data, we can begin
deploying our machine learning algorithms. We will use the following techniques:

Local Outlier Factor (LOF):​The anomaly score of each sample is called Local Outlier Factor.
It measures the local deviation of density of a given sample with respect to its neighbors. It is
local in that the anomaly score depends on how isolated the object is with respect to the
surrounding neighborhood.

Isolation Forest Algorithm:​The IsolationForest ‘isolates’ observations by randomly selecting a


feature and then randomly selecting a split value between the maximum and minimum values of
the selected feature.Since recursive partitioning can be represented by a tree structure, the
number of splittings required to isolate a sample is equivalent to the path length from the root
node to the terminating node.This path length, averaged over a forest of such random trees, is a
measure of normality and our decision function.Random partitioning produces noticeably shorter
paths for anomalies. Hence, when a forest of random trees collectively produce shorter path
lengths for particular samples, they are highly likely to be anomalies.

 
27 
 
 

CONCLUSION :

From the metrics, it can be determined that the Support Isolation Forest Algorithm with the
predictors Time,Class,Amount was the most accurate one and had precision changed with great
difference compared to the Local Outlier Factor.

OUTCOMES

The outcome of the Project is a model which gives a prediction number of fraud detection in
the transactions .The below table for the model is as follows:

 
28 
 
 

References:

1.Introduction to ML with Python(book)

2.Edureka Python and ML course

3.Geeks for Geeks, Python Programming language, https://www.geeksforgeeks.org/python-


programming-language/

4.Programiz site https://www.programiz.com/python-programming/

5. Medium articles.

 
29 

You might also like