100% found this document useful (1 vote)
188 views46 pages

Python Implant Training

This document provides an overview of a training program on data analysis using Python. It includes chapters that introduce Python basics like data types and statements, describe key Python libraries for data analysis like NumPy, Pandas and Matplotlib, and present a project analyzing SF salaries data to demonstrate skills learned. The document contains both theoretical concepts and practical examples to provide a comprehensive overview of the Python and data analysis skills covered in the training.

Uploaded by

Damuram Damuram
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
188 views46 pages

Python Implant Training

This document provides an overview of a training program on data analysis using Python. It includes chapters that introduce Python basics like data types and statements, describe key Python libraries for data analysis like NumPy, Pandas and Matplotlib, and present a project analyzing SF salaries data to demonstrate skills learned. The document contains both theoretical concepts and practical examples to provide a comprehensive overview of the Python and data analysis skills covered in the training.

Uploaded by

Damuram Damuram
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 46

ABSTRACT

This report is about Data analysis using python. We have learned the basics of python
and few data science libraries. This report is consist of theory and practical content both with
detail explanation. In short this is a report describing what we have learned and practice in
this implant training.

At last, we have made a project to demonstrate and test our learning & knowledge in this
Training. This project helped us to explore how the real project works.
TABLE OF CONTENTS
CHAPTER 1: INTRODUCTION TO PYTHON...............................................1
1.1 INTRODUCTION..................................................................................................................................1
1.2 LANGUAGE FEATURES....................................................................................................................1
1.3 INDUSTRIAL IMPORTANCE...............................................................................................................2
1.4 APPLICATION......................................................................................................................................3
1.5 PROS & CONS.......................................................................................................................................3
1.6 DIFFERENT PYTHON IDE..................................................................................................................3
1.7 ADVANTAGES.....................................................................................................................................3
CHAPTER 2: OBJECT AND DATA STRUCTURES BASICS......................4
2.1 NUMBERS.............................................................................................................................................4
2.1.1 TYPES OF NUMBERS...................................................................................................................4
2.1.2 BASIC ARITHMETIC OPERATION............................................................................................5
2.1.3 DIVISION VS FLOOR DIVISION................................................................................................5
2.1.4 SQUARE & SQUARE ROOT........................................................................................................5
2.2 STRING..................................................................................................................................................6
2.2.1 STRING INDEXING.......................................................................................................................6
2.2.2 STRING SLICING..........................................................................................................................7
2.2.3 STRING STEP SIZE.......................................................................................................................7
2.2.4 STRING CONCATENATE.............................................................................................................8
2.2.5 STRING METHODS.......................................................................................................................8
2.2.6 STRING FORMATTING................................................................................................................9
2.3 LIST........................................................................................................................................................9
2.3.1 LIST PROPERTIES.......................................................................................................................10
2.3.2 NESTED LISTS.............................................................................................................................10
2.3.3 LIST METHODS...........................................................................................................................11
2.4 TUPLES................................................................................................................................................11
2.4.1 BASIC TUPLE METHODS..........................................................................................................12
2.4.2 IMMUTABILITY..........................................................................................................................12
2.4.3 WHEN TO USE TUPLE...............................................................................................................12
2.5 DICTIONARIES..................................................................................................................................13
2.5.1 NESTING WITH DICTIONARY.................................................................................................14
2.5.2 A FEW DICTIONARIES METHODS..........................................................................................14
2.6 SETS.....................................................................................................................................................15
2.7 BOOLEANS.........................................................................................................................................15
CHAPTER 3: PYTHON STATEMENT..........................................................16
3.1 INDENTATION...................................................................................................................................16
3.2 COMPARISON OPERATORS............................................................................................................16
PYTHON PROGRAMMING 18CS66P

3.3 IF, ELIF, ELSE STATEMENTS..........................................................................................................17


3.4 FOR LOOPS.........................................................................................................................................18
3.5 WHILE LOOPS....................................................................................................................................19
3.6 USEFUL OPERATORS.......................................................................................................................21
3.6.1 RANGE..........................................................................................................................................22
3.6.2 ENUMERATE...............................................................................................................................22
3.6.3 ZIP..................................................................................................................................................22
3.6.4 MIN & MAX.................................................................................................................................23
CHAPTER 4: METHOD & FUNCTION.........................................................24
4.1 METHODS...........................................................................................................................................24
4.2 FUNCTION..........................................................................................................................................25
4.2.1 RETURN VS PRINT.....................................................................................................................26
4.2.2 EXAMPLES OF FUNCTIONS.....................................................................................................27
CHAPTER 5: LIBRARIES FOR DATA ANALYSIS....................................28
5.1 NumPy..................................................................................................................................................28
Creating NumPy Arrays...............................................................................................................................28
From a Python List....................................................................................................................................28
5.2 PANDAS..............................................................................................................................................29
5.2.1 DATA INPUT AND OUTPUT.....................................................................................................29
5.2.2 DATAFRAMES.............................................................................................................................30
5.2.3 SERIES..........................................................................................................................................31
5.2.4 GROUP BY....................................................................................................................................32
5.3 MATPLOTLIB.....................................................................................................................................34
CHAPTER 6: SF Salaries Project.....................................................................36
CHAPTER 7: CONCLUSION..........................................................................42
CHAPTER 8: REFERENCE.............................................................................43

COMPUTER SCIENCE & ENGG VITH SEM 1


PYTHON PROGRAMMING 18CS66P

CHAPTER 1: INTRODUCTION TO PYTHON

1.1 INTRODUCTION

Python is an interpreted, object-oriented, high-level programming language with


dynamic semantics. It was created by Guido van Rossum in 1991 and further developed
by the Python Software Foundation. Its high-level built-in data structures, combined
with dynamic typing and dynamic binding, make it very attractive for Rapid
Application Development, as well as for use as a scripting or glue language to connect
existing components together. Python's simple, easy to learn syntax emphasizes
readability and therefore reduces the cost of program maintenance. Python supports
modules and packages, which encourages program modularity and code reuse. The
Python interpreter and the extensive standard library are available in source or binary
form without charge for all major platforms, and can be freely distributed.

Often, programmers fall in love with Python because of the increased productivity it
provides. Since there is no compilation step, the edit-test-debug cycle is incredibly fast.
Debugging Python programs is easy: a bug or bad input will never cause a segmentation
fault. Instead, when the interpreter discovers an error, it raises an exception.

Prerequisites:

Knowledge of any programming language can be a plus.

Reason for increasing popularity:

1. Emphasis on code readability, shorter codes, ease of writing


2. Programmers can express logical concepts in fewer lines of code in comparison to
languages such as C++ or Java.
3. Python supports multiple programming paradigms, like object-oriented, imperative
and functional programming or procedural.
4. There exist inbuilt functions for almost all of the frequently used concepts.
5. Philosophy is “Simplicity is the best”.

1.2 LANGUAGE FEATURES

 Interpreted
o There are no separate compilation and execution steps like C and C++.
o Directly run the program from the source code.
o Internally, Python converts the source code into an intermediate form called
bytecodes which is then translated into native language of specific computer to
run it.
o No need to worry about linking and loading with libraries, etc.

COMPUTER SCIENCE & ENGG VITH SEM 2


PYTHON PROGRAMMING 18CS66P

 Platform Independent
o Python programs can be developed and executed on multiple operating system
platforms.
o Python can be used on Linux, Windows, Macintosh, Solaris and many more.
 Free and Open Source; Redistributable
 High-level Language
o In Python, no need to take care about low-level details such as managing the
memory used by the program.
 Simple
o Closer to English language; Easy to Learn
o More emphasis on the solution to the problem rather than the syntax
 Embeddable
o Python can be used within C/C++ program to give scripting capabilities for the
program’s users.
 Robust:
o Exceptional handling features
o Memory management techniques in built
 Rich Library Support
o The Python Standard Library is varied vast.
o Known as the “batteries included” philosophy of Python; It can help do
various things involving regular expressions, documentation generation, unit
testing, threading, databases, web browsers, CGI, email, XML, HTML, WAV
files, cryptography, GUI and many more.
o Besides the standard library, there are various other high-quality libraries such
as the Python Imaging Library which is an amazingly simple image
manipulation library.

1.3 INDUSTRIAL IMPORTANCE

Most of the companies are now looking for candidates who know about Python
Programming. Those having the knowledge of python may have more chances of impressing
the interviewing panel. So, I would suggest that beginners should start learning python and
excel in it.

Python is a high-level, interpreted, and general-purpose dynamic programming language that


focuses on code readability. It has fewer steps when compared to Java and C. It was founded
in 1991 by developer Guido Van Rossum. Python ranks among the most popular and fastest-
growing languages in the world. Python is a powerful, flexible, and easy-to-use language. In
addition, the community is very active there. It is used in many organizations as it supports
multiple programming paradigms. It also performs automatic memory management.

COMPUTER SCIENCE & ENGG VITH SEM 3


PYTHON PROGRAMMING 18CS66P

1.4 APPLICATION

1. Machine learning
2. Data science & Analysis
3. IOT
4. Web development
5. Game development
6. Business application
7. Software Development
8. Web scraping Application

1.5 PROS & CONS

Pros: -

1. Ease of use
2. Multi-paradigm Approach

Cons: -

1. Slow speed of execution compared to C, C++


2. Not good for reading Data structures and algorithm (DSA)

1.6 DIFFERENT PYTHON IDE

1. PyCharm
2. Visual Studio Code
3. Atom
4. Spyder
5. Vim
6. Thonny
7. Jupyter Notebook

1.7 ADVANTAGES

1. Presence of third-party modules.


2. Extensive support libraries (NumPy for numerical calculations, Pandas for data
analytics etc.)
3. Open source and community development.
4. Versatile, Easy to read, learn and write.
5. User-friendly data structures.
6. Object-oriented language.
7. Portable and Interactive.
8. Interpreted Language.

COMPUTER SCIENCE & ENGG VITH SEM 4


PYTHON PROGRAMMING 18CS66P

CHAPTER 2: OBJECT AND DATA STRUCTURES BASICS

2.1 NUMBERS

Number data types store numeric values. They are immutable data types, means that
changing the value of a number data type results in a newly allocated object.

2.1.1 TYPES OF NUMBERS

 int (signed integers) − They are often called just integers or int, are positive or
negative whole numbers with no decimal point.
 long (long integers) − Also called longs, they are integers of unlimited size, written
like integers and followed by an uppercase or lowercase L.
 float (floating point real values) − Also called floats, they represent real numbers and
are written with a decimal point dividing the integer and fractional parts. Floats may
also be in scientific notation, with E or e indicating the power of 10 (2.5e2 = 2.5 x 10 2
= 250).
 complex (complex numbers) − are of the form a + bJ, where a and b are floats and J
(or j) represents the square root of -1 (which is an imaginary number). The real part of
the number is a, and the imaginary part is b. Complex numbers are not used much in
Python programming.

COMPUTER SCIENCE & ENGG VITH SEM 5


PYTHON PROGRAMMING 18CS66P

2.1.2 BASIC ARITHMETIC OPERATION

2.1.3 DIVISION VS FLOOR DIVISION

 Division: - It divides the two number and gives the accurate result with decimal
points. We use (/) to perform division
 Floor Division: - It divides the two number and provide the round of result (int)
without any decimal. We use (//) to

2.1.4 SQUARE & SQUARE ROOT

COMPUTER SCIENCE & ENGG VITH SEM 6


PYTHON PROGRAMMING 18CS66P

2.2 STRING

Strings are used in Python to record text information, such as names. Strings in Python are
actually a sequence, which basically means Python keeps track of every element in the string
as a sequence. For example, Python understands the string “hello” to be a sequence of letters
in a specific order. This means we will be able to use indexing to grab particular letters (like
the first letter, or the last letter).

EX: - “hello world!”, ‘computer’, ‘I love python’, “465CS1900”

2.2.1 STRING INDEXING

We know strings are a sequence, which means Python can use indexes to call parts of the
sequence. In Python, we use brackets [] after an object to call its index. We should also note
that indexing starts at 0 for Python.

COMPUTER SCIENCE & ENGG VITH SEM 7


PYTHON PROGRAMMING 18CS66P

2.2.2 STRING SLICING

We can use a: to perform slicing which grabs everything up to a designated point. For
example:

Note the above slicing. Here we're telling Python to grab everything from 0 up to 3. It doesn't
include the 3rd index. You'll notice this a lot in Python, where statements and are usually in
the context of "up to, but not including".

2.2.3 STRING STEP SIZE

We can also use index and slice notation to grab elements of a sequence by a specified step
size (the default is 1). For instance, we can use two colons in a row and then a number
specifying the frequency to grab elements. For example:

COMPUTER SCIENCE & ENGG VITH SEM 8


PYTHON PROGRAMMING 18CS66P

2.2.4 STRING CONCATENATE

This is one of the string properties that we can add two string using (+) symbol.

2.2.5 STRING METHODS

Here are some of the popular string methods: -

COMPUTER SCIENCE & ENGG VITH SEM 9


PYTHON PROGRAMMING 18CS66P

2.2.6 STRING FORMATTING

In string formatting, we can get variable inside the print function by using f.

2.3 LIST

Earlier when discussing strings, we introduced the concept of a sequence in Python. Lists can
be thought of the most general version of a sequence in Python. Unlike strings, they are
mutable, meaning the elements inside a list can be changed!

Accessing the list is similar to accessing the string.

COMPUTER SCIENCE & ENGG VITH SEM 10


PYTHON PROGRAMMING 18CS66P

2.3.1 LIST PROPERTIES

2.3.2 NESTED LISTS

A great feature of Python data structures is that they support nesting. This means we can
have data structures within data structures. For example: A list inside a list.

Let's see how this works!

We can again use indexing to grab elements, but now there are two levels for the index. The
items in the matrix object, and then the items inside that list!

COMPUTER SCIENCE & ENGG VITH SEM 11


PYTHON PROGRAMMING 18CS66P

2.3.3 LIST METHODS

clear() Removes all the elements from the list

copy() Returns a copy of the list

count() Returns the number of elements with the specified value

extend() Add the elements of a list (or any iterable), to the end of the current list

index() Returns the index of the first element with the specified value

insert() Adds an element at the specified position

pop() Removes the element at the specified position

remove() Removes the first item with the specified value

reverse() Reverses the order of the list

sort() Sorts the list

2.4 TUPLES

In Python tuples are very similar to lists, however, unlike lists they are immutable
meaning they cannot be changed. You would use tuples to present things that shouldn't
be changed, such as days of the week, or dates on a calendar.

COMPUTER SCIENCE & ENGG VITH SEM 12


PYTHON PROGRAMMING 18CS66P

2.4.1 BASIC TUPLE METHODS

2.4.2 IMMUTABILITY

2.4.3 WHEN TO USE TUPLE

You may be wondering, "Why bother using tuples when they have fewer available
methods?" To be honest, tuples are not used as often as lists in programming, but are
used when immutability is necessary. If in your program you are passing around an
object and need to make sure it does not get changed, then a tuple becomes your
solution. It provides a convenient source of data integrity.

COMPUTER SCIENCE & ENGG VITH SEM 13


PYTHON PROGRAMMING 18CS66P

2.5 DICTIONARIES

In dictionary data structure we use mapping instead of sequence (indexing). If you're


familiar with other languages you can think of these Dictionaries as hash tables. In
dictionary for each element there will be key.
It is an unordered collection of data values, used to store data values like a map,
which unlike other Data Types that hold only single value as an element, Dictionary
holds key: value pair. Key value is provided in the dictionary to make it more
optimized.
So, what are mappings? Mappings are a collection of objects that are stored by a key,
unlike a sequence that stored objects by their relative position. This is an important
distinction, since mappings won't retain order since they have objects defined by a
key.

It’s important to note that dictionaries are very flexible in the data types they can hold.
For example:

It is also mutable in nature so you can access the element and change their content or
value.

COMPUTER SCIENCE & ENGG VITH SEM 14


PYTHON PROGRAMMING 18CS66P

2.5.1 NESTING WITH DICTIONARY

Dictionary is quite a flexible Data structure; we can have a nested dictionary. We can
access the nested dictionary using multiple value.

2.5.2 A FEW DICTIONARIES METHODS

COMPUTER SCIENCE & ENGG VITH SEM 15


PYTHON PROGRAMMING 18CS66P

2.6 SETS

Sets are an unordered collection of unique elements. We can construct them by using
the set() function.

Note the curly brackets. This does not indicate a dictionary! Although you can draw
analogies as a set being a dictionary with only keys. A set has only unique entries.

2.7 BOOLEANS

Python comes with Booleans (with predefined True and False displays that are
basically just the integers 1 and 0). It also has a placeholder object called None.

We can also use comparison operators to create Booleans.

COMPUTER SCIENCE & ENGG VITH SEM 16


PYTHON PROGRAMMING 18CS66P

CHAPTER 3: PYTHON STATEMENT

3.1 INDENTATION

Here is some pseudo-code to indicate the use of whitespace and indentation in Python:

Other Languages

if (x) {
if(y) {
code-statement;
}
}
Else {
another-code-statement;
}

Python

if x:
if y:
code-statement
else:
another-code-statement

Python is so heavily driven by code indentation and whitespace. This means that code
readability is a core part of the design of the Python language.

3.2 COMPARISON OPERATORS

These operators will allow us to compare variables and output a Boolean value (True
or False). If you have any sort of background in Math, these operators should be very
straight forward.

TABLE OF COMPARISION OPERATORS

Operator Description Example


If the values of two operands are
== equal, then the condition becomes (a == b) is not true.
true.
If values of two operands are not
!= equal, then condition becomes (a != b) is true
true.
If the value of left operand is
greater than the value of right
> (a > b) is not true.
operand, then condition becomes
true.
< If the value of left operand is less (a < b) is true.

COMPUTER SCIENCE & ENGG VITH SEM 17


PYTHON PROGRAMMING 18CS66P

than the value of right operand,


then condition becomes true.
If the value of left operand is
greater than or equal to the value
>= (a >= b) is not true.
of right operand, then condition
becomes true.
If the value of left operand is less
than or equal to the value of right
<= (a <= b) is true.
operand, then condition becomes
true.

3.3 IF, ELIF, ELSE STATEMENTS

If Statements in Python allows us to tell the computer to perform alternative actions based on
a certain set of results.

Verbally, we can imagine we are telling the computer:

"Hey if this case happens, perform some action"

We can then expand the idea further with elif and else statements, which allow us to tell the
computer:

"Hey if this case happens, perform some action. Else, if another case happens, perform some
other action. Else, if none of the above cases happened, perform this action."

Let's look at the syntax format for if statements to get a better idea of this:

if case1:
perform action1
elif case2:
perform action2
else:
perform action3

For example,

COMPUTER SCIENCE & ENGG VITH SEM 18


PYTHON PROGRAMMING 18CS66P

Multiple Branches: -

We write this out in a nested structure. Take note of how the if, elif, and else line up in the
code. This can help you see what if is related to what elif or else statements.

Note how the nested if statements are each checked until a True Boolean causes the nested
code below it to run. You should also note that you can put in as many elif statements as you
want before you close off with an else.

3.4 FOR LOOPS

A for loop acts as an iterator in Python; it goes through items that are in a sequence or any
other iterable item. Objects that we've learned about that we can iterate over include strings,
lists, tuples, and even built-in iterables for dictionaries, such as keys or values.
Here's the general format for a for loop in Python:
for item in object:
statements to do stuff
The variable name used for the item is completely up to the coder, so use your best judgment
for choosing a name that makes sense and you will be able to understand when revisiting
your code. This item name can then be referenced inside your loop, for example if you
wanted to use if statements to perform checks.
For Example,

COMPUTER SCIENCE & ENGG VITH SEM 19


PYTHON PROGRAMMING 18CS66P

For loops with if-else statements,

Tuples have a special quality when it comes to for loops. If you are iterating through a
sequence that contains tuples, the item can actually be the tuple itself, this is an example
of tuple unpacking. During the for loop, we will be unpacking the tuple inside of a sequence
and we can access the individual items inside that tuple!

3.5 WHILE LOOPS

The while statement in Python is one of most general ways to perform iteration.


A while statement will repeatedly execute a single statement or group of statements as long

COMPUTER SCIENCE & ENGG VITH SEM 20


PYTHON PROGRAMMING 18CS66P

as the condition is true. The reason it is called a 'loop' is because the code statements are
looped through over and over again until the condition is no longer met.
The general format of a while loop is:
while test:
code statements
else:
final code statements

Notice how many times the print statements occurred and how the while loop kept going
until the True condition was met, which occurred once x==10. It's important to note that
once this occurred the code stopped. Let's see how we could add an else statement:

BREAK, CONTINUE, PASS


We can use break, continue, and pass statements in our loops to add additional
functionality for various cases. The three statements are defined by:

break: Breaks out of the current closest enclosing loop.


continue: Goes to the top of the closest enclosing loop.

COMPUTER SCIENCE & ENGG VITH SEM 21


PYTHON PROGRAMMING 18CS66P

pass: Does nothing at all.


Thinking about break and continue statements, the general format of the while loop
looks like this:
while test:
code statement
if test:
break
if test:
continue
else:
break and continue statements can appear anywhere inside the loop’s body, but we will
usually put them further nested in conjunction with an if statement to perform an action
based on some condition.

Let’s put break in while loop,

Note how the other else statement wasn't reached and continuing was never printed! After
these brief but simple examples, you should feel comfortable using while statements in
your code.

3.6 USEFUL OPERATORS

There are a few built-in functions and "operators" in Python that don't fit well into any
category, which are quite useful in loops and if-else statement. So let’s see few of the
popular useful operators.

COMPUTER SCIENCE & ENGG VITH SEM 22


PYTHON PROGRAMMING 18CS66P

3.6.1 RANGE
The range function allows you to quickly generate a list of integers, this comes in handy
a lot, so take note of how to use it! There are 3 parameters you can pass, a start, a stop,
and a step size. Let's see some examples:

Note that this is a generator function, so to actually get a list out of it, we need to cast it
to a list with list (). What is a generator? It’s a special type of function that will generate
information and not need to save it to memory. We haven't talked about functions or
generators yet, so just keep this in your notes for now, we will discuss this in much more
detail in later on in your training!

3.6.2 ENUMERATE
enumerate is a very useful function to use with for loops. Let's imagine the following
situation:

Keeping track of how many loops you've gone through is so common, that enumerate was
created so you don't need to worry about creating and updating this index_count or
loop_count variable
3.6.3 ZIP
This data structure is actually very common in Python, especially when working with outside
libraries. You can use the zip () function to quickly create a list of tuples by "zipping" up
together two lists.

COMPUTER SCIENCE & ENGG VITH SEM 23


PYTHON PROGRAMMING 18CS66P

3.6.4 MIN & MAX

Min & max function is used to find the min and max value in sequence data type. And it
mostly used in sequence data structure. It is python in-built function.

COMPUTER SCIENCE & ENGG VITH SEM 24


PYTHON PROGRAMMING 18CS66P

CHAPTER 4: METHOD & FUNCTION

4.1 METHODS

We've already seen a few examples of methods when learning about Object and Data
Structure Types in Python. Methods are essentially functions built into objects. Later on in
the course we will learn about how to create our own objects and methods using Object
Oriented Programming (OOP) and classes.

Methods perform specific actions on an object and can also take arguments, just like a
function. This lecture will serve as just a brief introduction to methods and get you thinking
about overall design methods that we will touch back upon when we reach OOP in the
course.

Methods are in the form:

object. Method (arg1, arg2, etc...)

Let’s see few of methods,

You can always use Shift+Tab in the Jupyter Notebook to get more help about the
method. In general Python you can use the help () function:

COMPUTER SCIENCE & ENGG VITH SEM 25


PYTHON PROGRAMMING 18CS66P

4.2 FUNCTION

Formally, a function is a useful device that groups together a set of statements so they can be
run more than once. They can also let us specify parameters that can serve as inputs to the
functions.

On a more fundamental level, functions allow us to not have to repeatedly write the same
code again and again. If you remember back to the lessons on strings and lists, remember
that we used a function Len () to get the length of a string. Since checking the length of a
sequence is a common task you would want to write a function that can do this repeatedly at
command.

Functions will be one of most basic levels of reusing code in Python, and it will also allow us
to start thinking of program design (we will dive much deeper into the ideas of design when
we learn about Object Oriented Programming).
Why even use functions?
Put simply, you should use functions when you plan on using a block of code multiple times.
The function will allow you to call the same block of code without having to write it multiple
times. This in turn will allow you to create more complex Python scripts. To really
understand this though, we should actually write our own functions!

We begin with def then a space followed by the name of the function. Try to keep names
relevant, for example Len () is a good name for a length () function. We can use (‘‘‘doc
string’’’) to create a Doc String or comment for a function.

COMPUTER SCIENCE & ENGG VITH SEM 26


PYTHON PROGRAMMING 18CS66P

4.2.1 RETURN VS PRINT


The return keyword allows you to actually save the result of the output of a
function as a variable. The print () function simply displays the output to you, but
doesn't save it for future use. 

COMPUTER SCIENCE & ENGG VITH SEM 27


PYTHON PROGRAMMING 18CS66P

Be careful! Notice how print_result () doesn't let you actually save the result to a
variable! It only prints it out, with print () returning None for the assignment!

4.2.2 EXAMPLES OF FUNCTIONS

COMPUTER SCIENCE & ENGG VITH SEM 28


PYTHON PROGRAMMING 18CS66P

CHAPTER 5: LIBRARIES FOR DATA ANALYSIS

5.1 NumPy

NumPy (or Numpy) is a Linear Algebra Library for Python, the reason it is so important for
Data Science with Python is that almost all of the libraries in the PyData Ecosystem rely on
NumPy as one of their main building blocks. Numpy is also incredibly fast, as it has bindings
to C libraries.

In order to use NumPy we need to import NumPy package:


import numpy as np

NumPy arrays are the main way we will use Numpy throughout the course. Numpy arrays
essentially come in two flavors: vectors and matrices. Vectors are strictly 1-d arrays and
matrices are 2-d (but you should note a matrix can still have only one row or one column).

Creating NumPy Arrays

From a Python List

We can create an array by directly converting a list or list of lists:

COMPUTER SCIENCE & ENGG VITH SEM 29


PYTHON PROGRAMMING 18CS66P

5.2 PANDAS

Pandas is an open-source library that is made mainly for working with relational or labeled
data both easily and intuitively. This library is built on top of the NumPy library. Pandas is
fast and it has high performance & productivity for users. It is mostly used to work with Data
Analysis and visualization task.

In order to use Pandas, we need to import Pandas package:


import pandas as pd

Pandas generally provide two data structures for manipulating data, they are: 

 Series
 DataFrame

5.2.1 DATA INPUT AND OUTPUT

This notebook is the reference code for getting input and output, pandas can read a variety of
file types using its pd.read_ methods.

COMPUTER SCIENCE & ENGG VITH SEM 30


PYTHON PROGRAMMING 18CS66P

5.2.2 DATAFRAMES

DataFrames are the workhorse of pandas and are directly inspired by the R programming
language. We can think of a DataFrame as a bunch of Series objects put together to share the
same index.

COMPUTER SCIENCE & ENGG VITH SEM 31


PYTHON PROGRAMMING 18CS66P

5.2.3 SERIES

A Series is very similar to a NumPy array (in fact it is built on top of the NumPy array
object). What differentiates the NumPy array from a Series, is that a Series can have axis
labels, meaning it can be indexed by a label, instead of just a number location. It also doesn't
need to hold numeric data; it can hold any arbitrary Python Object.

5.2.4 GROUP BY

The groupby method allows you to group rows of data together and call aggregate functions

COMPUTER SCIENCE & ENGG VITH SEM 32


PYTHON PROGRAMMING 18CS66P

Now you can use the .groupby() method to group rows together based off of a column name.
For instance let's group based off of Company. This will create a DataFrameGroupBy object:

COMPUTER SCIENCE & ENGG VITH SEM 33


PYTHON PROGRAMMING 18CS66P

5.3 MATPLOTLIB

Matplotlib is the "grandfather" library of data visualization with Python. It was created by
John Hunter. He created it to try to replicate MatLab's (another programming language)
plotting capabilities in Python. It allows you to create reproducible figures programmatically.

It is an excellent 2D and 3D graphics library for generating scientific figures.

Some of the major Pros of Matplotlib are:

 Generally easy to get started for simple plots


 Support for custom labels and texts
 Great control of every element in a figure
 High-quality output in many formats
 Very customizable in general

That line is only for jupyter notebooks, if you are using another editor, you'll use: plt.show()
at the end of all your plotting commands to have the figure pop up in another window.

Example

COMPUTER SCIENCE & ENGG VITH SEM 34


PYTHON PROGRAMMING 18CS66P

\Basic Matplotlib Commands


We can create a very simple line plot using the following:

COMPUTER SCIENCE & ENGG VITH SEM 35


PYTHON PROGRAMMING 18CS66P

CHAPTER 6: SF Salaries Project

In this project we are going to implement all the concept that we have learned during
this implant training (python). Our final project will be the Data Analysis & visualization on
San Francisco Employee salaries.
PROCEDURE: -
1. We need to download the salaries.csv dataset file from Kaggle.com
2. Then Download the starter file given by our instructor.
3. First import the required library and .csv file in Jupyter Notebook.
4. After setting up everything, we can start writing our command for Analysis on the
data.
TOOLS USED FOR THIS PROJECT: -
 EDITOR: JUPYTER NOTEBOOK
 OTHER: EXCEL, WEB BROWSER

PYTHON LIBRARY: -
 PANDAS
 MATPLOTLIB

PURPOSE OF THIS PROJECT: -


This project is done to get insights about the San Francisco Employee
salaries. and to get useful information which can be used by the organization to summaries
their Employee salaries Details.

COMPUTER SCIENCE & ENGG VITH SEM 36


PYTHON PROGRAMMING 18CS66P

PROGRAM: -

** Import pandas as pd.**

In [2]: import pandas as pd


import matplotlib.pyplot as plt
import matplotlib.ticker
%matplotlib inline

** Read Salaries.csv as a dataframe called sal.**

In [3]: sal=pd.read_csv('Salaries.csv')

** Check the head of the DataFrame. **

In [4]: sal.head()

Out[4]:

** Use the .info() method to find out how many entries there are.**

In [5]: sal.info()

COMPUTER SCIENCE & ENGG VITH SEM 37


PYTHON PROGRAMMING 18CS66P

What is the average BasePay ?

In [6]: sal['BasePay'].mean()

Out[6]: 66325.44884050643

** What is the highest amount of OvertimePay in the dataset ? **

In [7]: sal['OvertimePay'].max()

Out[7]: 245131.88

** What is the job title of JOSEPH DRISCOLL ? Note: Use all caps, otherwise you may get an
answer that doesn't match up (there is also a lowercase Joseph Driscoll). **

In [8]: sal[sal['EmployeeName']=='JOSEPH DRISCOLL']


#sal['EmployeeName']=='JOSEPH DRISCOLL'

Out[8]:
Id EmployeeName JobTitle BasePay OvertimePay OtherPay Benefits TotalP

JOSEPH CAPTAIN, FIRE


24 25 DRISCOLL SUPPRESSION 140546.86 97868.77 31909.28 NaN 270324.

** How much does JOSEPH DRISCOLL make (including benefits)? **

In [9]: sal[sal['EmployeeName']=='JOSEPH DRISCOLL']['TotalPayBenefits']

Out[9]: 24 270324.91
Name: TotalPayBenefits, dtype: float64

** What is the name of highest paid person (including benefits)?**

#sal[sal['TotalPayBenefits']==sal['TotalPayBenefits'].max()]
In [10]:
sal.loc[sal['TotalPayBenefits'].idxmax()]

Out[10]:

COMPUTER SCIENCE & ENGG VITH SEM 38


PYTHON PROGRAMMING 18CS66P

** What is the name of lowest paid person (including benefits)? Do you notice something
strange about how much he or she is paid?**

In [11]: #sal[sal['TotalPayBenefits']==sal['TotalPayBenefits'].min()]
sal.loc[sal['TotalPayBenefits'].idxmin()]

Out[11]: Id 148654
EmployeeName Joe Lopez
JobTitle Counselor, Log Cabin Ranch
BasePay 0.0
OvertimePay 0.0
OtherPay -618.13
Benefits 0.0
TotalPay -618.13
TotalPayBenefits -618.13
Year 2014
Notes NaN
Agency San Francisco
Status NaN
Name: 148653, dtype: object

** What was the average (mean) BasePay of all employees per year? (2011-2014) ? **

In [12]: sal.groupby('Year').mean()['BasePay']

Out[12]: Year
2011 63595.956517
2012 65436.406857
2013 69630.030216
2014 66564.421924
Name: BasePay, dtype: float64

** How many unique job titles are there? **

In [13]: sal['JobTitle'].nunique()

Out[13]: 2159

** What are the top 5 most common jobs? **

In [14]: sal['JobTitle'].value_counts().head()

Out[14]: Transit Operator 7036


Special Nurse 4389
Registered Nurse 3736
Public Svc Aide-Public Works 2518
Police Officer 3 2421
Name: JobTitle, dtype: int64

COMPUTER SCIENCE & ENGG VITH SEM 39


PYTHON PROGRAMMING 18CS66P

** How many Job Titles were represented by only one person in 2013? (e.g. Job Titles with only one
occurence in 2013?) **

In [15]: #sum(sal[sal['Year']==2013]['JobTitle'].value_counts() == 1)
sum(sal[sal['Year']==2013]['JobTitle'].value_counts() == 1)

Out[15]: 202

**How have salaries changed over year (BasePay)?

In [16]: salYear=sal.groupby('Year').mean()['BasePay']

In [18]: #to plot the graph plt.plot(salYear.index,list(salYear))


plt.xlabel('years') plt.ylabel('Salaries (BasePay)')
plt.title(' salaries changed over year ')

#to remove the decimal point form x-axis locator =


matplotlib.ticker.MultipleLocator()
plt.gca().xaxis.set_major_locator(locator)
formatter = matplotlib.ticker.StrMethodFormatter("{x:.0f}")
plt.gca().xaxis.set_major_formatter(formatter)

#to display the graph


plt.show()

COMPUTER SCIENCE & ENGG VITH SEM 40


PYTHON PROGRAMMING 18CS66P

OUTPUT SCREENSHOTS

Screenshot 1: First five row of the data file (CSV)

Screenshot 2: Info the data in CSV file

COMPUTER SCIENCE & ENGG VITH SEM 41


PYTHON PROGRAMMING 18CS66P

Screenshot 3: The name of highest paid person (including benefits)

Screenshot 4: “Line graph” on the salaries changed over year

COMPUTER SCIENCE & ENGG VITH SEM 42


PYTHON PROGRAMMING 18CS66P

CHAPTER 7: CONCLUSION

In this implant training we learnt the basic syntax of python, some data structure & few
Python library for Data Analysis. We did several exercises to test our learning and enhance
our understanding about Python. At last, we made a project about Data Analysis which
helped us to implement our knowledge and learning in this final project.

COMPUTER SCIENCE & ENGG VITH SEM 43


PYTHON PROGRAMMING 18CS66P

CHAPTER 8: REFERENCE

1. https://www.anaconda.com/
2. https://www.geeksforgeeks.org/
3. https://www.w3schools.com/
4. https://www.kaggle.com/datasets/kaggle/sf-salaries
5. https://www.tutorialspoint.com/

COMPUTER SCIENCE & ENGG VITH SEM 44

You might also like