MC4103 Python Programming - Unit-Ii
MC4103 Python Programming - Unit-Ii
Lists, Tuples, Sets, Strings, Dictionary, Modules: Module Loading and Execution – Packages –
Making Your Own Module – The Python Standard Libraries.
Python - Lists
The most basic data structure in Python is the sequence. Each element of a sequence is assigned
a number - its position or index. The first index is zero, the second index is one, and so forth.
Python has six built-in types of sequences, but the most common ones are lists and tuples, which
we would see in this tutorial.
There are certain things you can do with all sequence types. These operations include indexing,
slicing, adding, multiplying, and checking for membership. In addition, Python has built-in
functions for finding the length of a sequence and for finding its largest and smallest elements.
Python Lists
The list is a most versatile datatype available in Python which can be written as a list of comma-
separated values (items) between square brackets. Important thing about a list is that items in a
list need not be of the same type.
Creating a list is as simple as putting different comma-separated values between square brackets.
For example −
Similar to string indices, list indices start at 0, and lists can be sliced, concatenated and so on.
To access values in lists, use the square brackets for slicing along with the index or indices to
obtain value available at that index. For example −
#!/usr/bin/python
list1[0]: physics
list2[1:5]: [2, 3, 4, 5]
Updating Lists
You can update single or multiple elements of lists by giving the slice on the left-hand side of the
assignment operator, and you can add to elements in a list with the append() method. For
example −
#!/usr/bin/python
To remove a list element, you can use either the del statement if you know exactly which
element(s) you are deleting or the remove() method if you do not know. For example −
#!/usr/bin/python
In fact, lists respond to all of the general sequence operations we used on strings in the prior
chapter.
Because lists are sequences, indexing and slicing work the same way for lists as they do for
strings.
A tuple is a collection of objects which ordered and immutable. Tuples are sequences, just like
lists. The differences between tuples and lists are, the tuples cannot be changed unlike lists and
tuples use parentheses, whereas lists use square brackets.
Creating a tuple is as simple as putting different comma-separated values. Optionally you can put
these comma-separated values between parentheses also. For example −
tup1 = ();
To write a tuple containing a single value you have to include a comma, even though there is
only one value −
tup1 = (50,);
Like string indices, tuple indices start at 0, and they can be sliced, concatenated, and so on.
To access values in tuple, use the square brackets for slicing along with the index or indices to
obtain value available at that index. For example −
Live Demo
#!/usr/bin/python
tup1[0]: physics
tup2[1:5]: [2, 3, 4, 5]
Updating Tuples
Tuples are immutable which means you cannot update or change the values of tuple elements.
You are able to take portions of existing tuples to create new tuples as the following example
demonstrates −
Live Demo
#!/usr/bin/python
Removing individual tuple elements is not possible. There is, of course, nothing wrong with
putting together another tuple with the undesired elements discarded.
To explicitly remove an entire tuple, just use the del statement. For example −
Live Demo
#!/usr/bin/python
This produces the following result. Note an exception raised, this is because after del tup tuple
does not exist any more −
Tuples respond to the + and * operators much like strings; they mean concatenation and
repetition here too, except that the result is a new tuple, not a string.
In fact, tuples respond to all of the general sequence operations we used on strings in the prior
chapter −
Because tuples are sequences, indexing and slicing work the same way for tuples as they do for
strings. Assuming following input −
No Enclosing Delimiters
Any set of multiple objects, comma-separated, written without identifying symbols, i.e., brackets
for lists, parentheses for tuples, etc., default to tuples, as indicated in these short examples −
Live Demo
#!/usr/bin/python
Mathematically a set is a collection of items not in any particular order. A Python set is similar to
this mathematical definition with below additional conditions.
Set Operations
The sets in python are typically used for mathematical operations like union, intersection,
difference and complement etc. We can create a set, access it’s elements and carry out these
mathematical operations as shown below.
Creating a set
A set is created by using the set() function or placing all the elements within a pair of curly
braces.
Example
Days=set(["Mon","Tue","Wed","Thu","Fri","Sat","Sun"])
Months={"Jan","Feb","Mar"}
Dates={21,22,17}
print(Days)
print(Months)
print(Dates)
Output
When the above code is executed, it produces the following result. Please note how the order of
the elements has changed in the result.
Example
Days=set(["Mon","Tue","Wed","Thu","Fri","Sat","Sun"])
for d in Days:
print(d)
Output
Wed
Sun
Fri
Tue
Mon
Thu
Sat
We can add elements to a set by using add() method. Again as discussed there is no specific
index attached to the newly added element.
Example
Days=set(["Mon","Tue","Wed","Thu","Fri","Sat"])
Days.add("Sun")
print(Days)
Output
We can remove elements from a set by using discard() method. Again as discussed there is no
specific index attached to the newly added element.
Example
Days=set(["Mon","Tue","Wed","Thu","Fri","Sat"])
Days.discard("Sun")
print(Days)
Output
Union of Sets
The union operation on two sets produces a new set containing all the distinct elements from
both the sets. In the below example the element “Wed” is present in both the sets.
Example
DaysA = set(["Mon","Tue","Wed"])
DaysB = set(["Wed","Thu","Fri","Sat","Sun"])
AllDays = DaysA|DaysB
print(AllDays)
Output
When the above code is executed, it produces the following result. Please note the result has only
one “wed”.
Intersection of Sets
The intersection operation on two sets produces a new set containing only the common elements
from both the sets. In the below example the element “Wed” is present in both the sets.
Example
DaysA = set(["Mon","Tue","Wed"])
DaysB = set(["Wed","Thu","Fri","Sat","Sun"])
AllDays = DaysA & DaysB
print(AllDays)
Output
When the above code is executed, it produces the following result. Please note the result has only
one “wed”.
set(['Wed'])
Difference of Sets
The difference operation on two sets produces a new set containing only the elements from the
first set and none from the second set. In the below example the element “Wed” is present in
both the sets so it will not be found in the result set.
Example
DaysA = set(["Mon","Tue","Wed"])
DaysB = set(["Wed","Thu","Fri","Sat","Sun"])
AllDays = DaysA - DaysB
print(AllDays)
Output
When the above code is executed, it produces the following result. Please note the result has only
one “wed”.
set(['Mon', 'Tue'])
Compare Sets
We can check if a given set is a subset or superset of another set. The result is True or False
depending on the elements present in the sets.
Example
DaysA = set(["Mon","Tue","Wed"])
DaysB = set(["Mon","Tue","Wed","Thu","Fri","Sat","Sun"])
SubsetRes = DaysA <= DaysB
SupersetRes = DaysB >= DaysA
print(SubsetRes)
print(SupersetRes)
Output
True
True
Python - Strings
Strings are amongst the most popular types in Python. We can create them simply by enclosing
characters in quotes. Python treats single quotes the same as double quotes. Creating strings is as
simple as assigning a value to a variable. For example −
Python does not support a character type; these are treated as strings of length one, thus also
considered a substring.
To access substrings, use the square brackets for slicing along with the index or indices to obtain
your substring. For example −
#!/usr/bin/python
var1[0]: H
var2[1:5]: ytho
Updating Strings
You can "update" an existing string by (re)assigning a variable to another string. The new value
can be related to its previous value or to a completely different string altogether. For example −
#!/usr/bin/python
Escape Characters
Following table is a list of escape or non-printable characters that can be represented with
backslash notation.
An escape character gets interpreted; in a single quoted as well as double quoted strings.
Backslash Hexadecimal
Description
notation character
\a 0x07 Bell or alert
\b 0x08 Backspace
\cx Control-x
\C-x Control-x
\e 0x1b Escape
\f 0x0c Formfeed
\M-\C-x Meta-Control-x
\n 0x0a Newline
\nnn Octal notation, where n is in the range 0.7
\r 0x0d Carriage return
\s 0x20 Space
\t 0x09 Tab
\v 0x0b Vertical tab
\x Character x
Hexadecimal notation, where n is in the range 0.9,
\xnn
a.f, or A.F
Assume string variable a holds 'Hello' and variable b holds 'Python', then −
One of Python's coolest features is the string format operator %. This operator is unique to
strings and makes up for the pack of having functions from C's printf() family. Following is a
simple example −
Live Demo
#!/usr/bin/python
Here is the list of complete set of symbols which can be used along with % −
Other supported symbols and functionality are listed in the following table −
Symbol Functionality
* argument specifies width or precision
- left justification
+ display the sign
<sp> leave a blank space before a positive number
add the octal leading zero ( '0' ) or hexadecimal leading '0x' or '0X',
#
depending on whether 'x' or 'X' were used.
0 pad from left with zeros (instead of spaces)
% '%%' leaves you with a single literal '%'
(var) mapping variable (dictionary arguments)
m is the minimum total width and n is the number of digits to
m.n.
display after the decimal point (if appl.)
Triple Quotes
Python's triple quotes comes to the rescue by allowing strings to span multiple lines, including
verbatim NEWLINEs, TABs, and any other special characters.
The syntax for triple quotes consists of three consecutive single or double quotes.
#!/usr/bin/python
When the above code is executed, it produces the following result. Note how every single special
character has been converted to its printed form, right down to the last NEWLINE at the end of
the string between the "up." and closing triple quotes. Also note that NEWLINEs occur either
with an explicit carriage return at the end of a line or its escape code (\n) −
Raw strings do not treat the backslash as a special character at all. Every character you put into a
raw string stays the way you wrote it −
#!/usr/bin/python
print 'C:\\nowhere'
C:\nowhere
Now let's make use of raw string. We would put expression in r'expression' as follows −
#!/usr/bin/python
print r'C:\\nowhere'
C:\\nowhere
Unicode String
Normal strings in Python are stored internally as 8-bit ASCII, while Unicode strings are stored as
16-bit Unicode. This allows for a more varied set of characters, including special characters from
most languages in the world. I'll restrict my treatment of Unicode strings to the following −
#!/usr/bin/python
Hello, world!
As you can see, Unicode strings use the prefix u, just as raw strings use the prefix r.
2
Returns a space-padded string with the original string centered to a total of width
columns.
3 count(str, beg= 0,end=len(string))
Counts how many times str occurs in string or in a substring of string if starting index
beg and ending index end are given.
decode(encoding='UTF-8',errors='strict')
4
Decodes the string using the codec registered for encoding. encoding defaults to the
default string encoding.
encode(encoding='UTF-8',errors='strict')
5
Returns encoded string version of string; on error, default is to raise a ValueError
unless errors is given with 'ignore' or 'replace'.
endswith(suffix, beg=0, end=len(string))
6
Determines if string or a substring of string (if starting index beg and ending index end
are given) ends with suffix; returns true if so and false otherwise.
expandtabs(tabsize=8)
7
Expands tabs in string to multiple spaces; defaults to 8 spaces per tab if tabsize not
provided.
find(str, beg=0 end=len(string))
8
Determine if str occurs in string or in a substring of string if starting index beg and
ending index end are given returns index if found and -1 otherwise.
index(str, beg=0, end=len(string))
9
Same as find(), but raises an exception if str not found.
isalnum()
10
Returns true if string has at least 1 character and all characters are alphanumeric and
false otherwise.
11 isalpha()
Returns true if string has at least 1 character and all characters are alphabetic and false
otherwise.
isdigit()
12
Returns true if string contains only digits and false otherwise.
islower()
13
Returns true if string has at least 1 cased character and all cased characters are in
lowercase and false otherwise.
isnumeric()
14
Returns true if a unicode string contains only numeric characters and false otherwise.
isspace()
15
Returns true if string contains only whitespace characters and false otherwise.
istitle()
16
Returns true if string is properly "titlecased" and false otherwise.
isupper()
17
Returns true if string has at least one cased character and all cased characters are in
uppercase and false otherwise.
join(seq)
18
Merges (concatenates) the string representations of elements in sequence seq into a
string, with separator string.
len(string)
19
Returns the length of the string
ljust(width[, fillchar])
20
Returns a space-padded string with the original string left-justified to a total of width
columns.
21 lower()
Converts all uppercase letters in string to lowercase.
lstrip()
22
Removes all leading whitespace in string.
maketrans()
23
Returns a translation table to be used in translate function.
max(str)
24
Returns the max alphabetical character from the string str.
min(str)
25
Returns the min alphabetical character from the string str.
replace(old, new [, max])
26
Replaces all occurrences of old in string with new or at most max occurrences if max
given.
rfind(str, beg=0,end=len(string))
27
Same as find(), but search backwards in string.
rindex( str, beg=0, end=len(string))
28
Same as index(), but search backwards in string.
rjust(width,[, fillchar])
29
Returns a space-padded string with the original string right-justified to a total of width
columns.
rstrip()
30
Removes all trailing whitespace of string.
31 split(str="", num=string.count(str))
Splits string according to delimiter str (space if not provided) and returns list of
substrings; split into at most num substrings if given.
splitlines( num=string.count('\n'))
32
Splits string at all (or num) NEWLINEs and returns a list of each line with
NEWLINEs removed.
startswith(str, beg=0,end=len(string))
33
Determines if string or a substring of string (if starting index beg and ending index end
are given) starts with substring str; returns true if so and false otherwise.
strip([chars])
34
Performs both lstrip() and rstrip() on string.
swapcase()
35
Inverts case for all letters in string.
title()
36
Returns "titlecased" version of string, that is, all words begin with uppercase and the
rest are lowercase.
translate(table, deletechars="")
37
Translates string according to translation table str(256 chars), removing those in the
del string.
upper()
38
Converts lowercase letters in string to uppercase.
zfill (width)
39
Returns original string leftpadded with zeros to a total of width characters; intended
for numbers, zfill() retains any sign given (less one zero).
40 isdecimal()
Returns true if a unicode string contains only decimal characters and false otherwise.
Python - Dictionary
Each key is separated from its value by a colon (:), the items are separated by commas, and the
whole thing is enclosed in curly braces. An empty dictionary without any items is written with
just two curly braces, like this: {}.
Keys are unique within a dictionary while values may not be. The values of a dictionary can be
of any type, but the keys must be of an immutable data type such as strings, numbers, or tuples.
To access dictionary elements, you can use the familiar square brackets along with the key to
obtain its value. Following is a simple example −
Live Demo
#!/usr/bin/python
dict['Name']: Zara
dict['Age']: 7
If we attempt to access a data item with a key, which is not part of the dictionary, we get an error
as follows −
Live Demo
#!/usr/bin/python
dict['Alice']:
Traceback (most recent call last):
File "test.py", line 4, in <module>
print "dict['Alice']: ", dict['Alice'];
KeyError: 'Alice'
Updating Dictionary
You can update a dictionary by adding a new entry or a key-value pair, modifying an existing
entry, or deleting an existing entry as shown below in the simple example −
Live Demo
#!/usr/bin/python
dict['Age']: 8
dict['School']: DPS School
You can either remove individual dictionary elements or clear the entire contents of a dictionary.
You can also delete entire dictionary in a single operation.
To explicitly remove an entire dictionary, just use the del statement. Following is a simple
example −
Live Demo
#!/usr/bin/python
This produces the following result. Note that an exception is raised because after del dict
dictionary does not exist any more −
dict['Age']:
Traceback (most recent call last):
File "test.py", line 8, in <module>
print "dict['Age']: ", dict['Age'];
TypeError: 'type' object is unsubscriptable
Dictionary values have no restrictions. They can be any arbitrary Python object, either standard
objects or user-defined objects. However, same is not true for the keys.
(a) More than one entry per key not allowed. Which means no duplicate key is allowed. When
duplicate keys encountered during assignment, the last assignment wins. For example −
Live Demo
#!/usr/bin/python
dict['Name']: Manni
(b) Keys must be immutable. Which means you can use strings, numbers or tuples as dictionary
keys but something like ['key'] is not allowed. Following is a simple example −
Live Demo
#!/usr/bin/python
Description
Python dictionary method cmp() compares two dictionaries based on key and values.
Syntax
cmp(dict1, dict2)
Parameters
Return Value
This method returns 0 if both dictionaries are equal, -1 if dict1 < dict2 and 1 if dict1 > dic2.
Example
Live Demo
#!/usr/bin/python
Return Value : -1
Return Value : 1
Return Value : 0
Python dictionary copy() Method
Description
Syntax
dict.copy()
Parameters
NA
Return Value
Example
Live Demo
#!/usr/bin/python
Description
Python dictionary method items() returns a list of dict's (key, value) tuple pairs
Syntax
dict.items()
Parameters
NA
Return Value
Example
Live Demo
#!/usr/bin/python
Description
Python dictionary method keys() returns a list of all the available keys in the dictionary.
Syntax
dict.keys()
Parameters
NA
Return Value
This method returns a list of all the available keys in the dictionary.
Example
Live Demo
#!/usr/bin/python
Advertisements
Previous Page
Next Page
Description
Python dictionary method values() returns a list of all the values available in a given dictionary.
Syntax
dict.values()
Parameters
NA
Return Value
This method returns a list of all the values available in a given dictionary.
Example
Live Demo
#!/usr/bin/python
Advertisements
Previous Page
Next Page
Description
Python dictionary method update() adds dictionary dict2's key-values pairs in to dict. This
function does not return anything.
Syntax
dict.update(dict2)
Parameters
Return Value
Example
Live Demo
#!/usr/bin/python
dict.update(dict2)
print "Value : %s" % dict
Advertisements
Previous Page
Next Page
Description
Python dictionary method fromkeys() creates a new dictionary with keys from seq and values set
to value.
Syntax
dict.fromkeys(seq[, value])
Parameters
seq − This is the list of values which would be used for dictionary keys preparation.
value − This is optional, if provided then value would be set to this value
Return Value
Example
Live Demo
#!/usr/bin/python
A module allows you to logically organize your Python code. Grouping related code into a
module makes the code easier to understand and use. A module is a Python object with
arbitrarily named attributes that you can bind and reference.
Simply, a module is a file consisting of Python code. A module can define functions, classes and
variables. A module can also include runnable code.
Example
The Python code for a module named aname normally resides in a file named aname.py. Here's
an example of a simple module, support.py
You can use any Python source file as a module by executing an import statement in some other
Python source file. The import has the following syntax −
When the interpreter encounters an import statement, it imports the module if the module is
present in the search path. A search path is a list of directories that the interpreter searches before
importing a module. For example, to import the module support.py, you need to put the
following command at the top of the script −
#!/usr/bin/python
Hello : Zara
A module is loaded only once, regardless of the number of times it is imported. This prevents the
module execution from happening over and over again if multiple imports occur.
Python's from statement lets you import specific attributes from a module into the current
namespace. The from...import has the following syntax −
For example, to import the function fibonacci from the module fib, use the following statement −
This statement does not import the entire module fib into the current namespace; it just
introduces the item fibonacci from the module fib into the global symbol table of the importing
module.
It is also possible to import all names from a module into the current namespace by using the
following import statement −
This provides an easy way to import all the items from a module into the current namespace;
however, this statement should be used sparingly.
Locating Modules
When you import a module, the Python interpreter searches for the module in the following
sequences −
The module search path is stored in the system module sys as the sys.path variable. The sys.path
variable contains the current directory, PYTHONPATH, and the installation-dependent default.
Variables are names (identifiers) that map to objects. A namespace is a dictionary of variable
names (keys) and their corresponding objects (values).
A Python statement can access variables in a local namespace and in the global namespace. If a
local and a global variable have the same name, the local variable shadows the global variable.
Each function has its own local namespace. Class methods follow the same scoping rule as
ordinary functions.
Python makes educated guesses on whether variables are local or global. It assumes that any
variable assigned a value in a function is local.
Therefore, in order to assign a value to a global variable within a function, you must first use the
global statement.
The statement global VarName tells Python that VarName is a global variable. Python stops
searching the local namespace for the variable.
For example, we define a variable Money in the global namespace. Within the function Money,
we assign Money a value, therefore Python assumes Money as a local variable. However, we
accessed the value of the local variable Money before setting it, so an UnboundLocalError is the
result. Uncommenting the global statement fixes the problem.
#!/usr/bin/python
Money = 2000
def AddMoney():
# Uncomment the following line to fix the code:
# global Money
Money = Money + 1
print Money
AddMoney()
print Money
The list contains the names of all the modules, variables and functions that are defined in a
module. Following is a simple example −
#!/usr/bin/python
content = dir(math)
print content
Here, the special string variable __name__ is the module's name, and __file__ is the filename
from which the module was loaded.
The globals() and locals() functions can be used to return the names in the global and local
namespaces depending on the location from where they are called.
If locals() is called from within a function, it will return all the names that can be accessed
locally from that function.
If globals() is called from within a function, it will return all the names that can be accessed
globally from that function.
The return type of both these functions is dictionary. Therefore, names can be extracted using the
keys() function.
When the module is imported into a script, the code in the top-level portion of a module is
executed only once.
Therefore, if you want to reexecute the top-level code in a module, you can use the reload()
function. The reload() function imports a previously imported module again. The syntax of the
reload() function is this −
reload(module_name)
Here, module_name is the name of the module you want to reload and not the string containing
the module name. For example, to reload hello module, do the following −
reload(hello)
Packages in Python
A package is a hierarchical file directory structure that defines a single Python application
environment that consists of modules and subpackages and sub-subpackages, and so on.
Consider a file Pots.py available in Phone directory. This file has following line of source code −
#!/usr/bin/python
def Pots():
print "I'm Pots Phone"
Similar way, we have another two files having different functions with the same name as above
−
Phone/__init__.py
To make all of your functions available when you've imported Phone, you need to put explicit
import statements in __init__.py as follows −
After you add these lines to __init__.py, you have all of these classes available when you import
the Phone package.
#!/usr/bin/python
Phone.Pots()
Phone.Isdn()
Phone.G3()
In the above example, we have taken example of a single functions in each file, but you can keep
multiple functions in your files. You can also define different Python classes in those files and
then you can create your packages out of those classes.
Packages in Python
A package is a hierarchical file directory structure that defines a single Python application
environment that consists of modules and subpackages and sub-subpackages, and so on.
Consider a file Pots.py available in Phone directory. This file has following line of source code −
#!/usr/bin/python
def Pots():
print "I'm Pots Phone"
Similar way, we have another two files having different functions with the same name as above
−
Phone/__init__.py
To make all of your functions available when you've imported Phone, you need to put explicit
import statements in __init__.py as follows −
After you add these lines to __init__.py, you have all of these classes available when you import
the Phone package.
#!/usr/bin/python
# Now import your Phone Package.
import Phone
Phone.Pots()
Phone.Isdn()
Phone.G3()
Packages help us to structure packages and modules in an organized hierarchy. Let's see how to
create packages in Python.
Creating Packages
We have included a __init__.py, file inside a directory to tell Python that the current directory is
a package. Whenever you want to create a package, then you have to include __init__.py file in
the directory. You can write code inside or leave it as blank as your wish. It doesn't bothers
Python.
Create a directory and include a __init__.py file in it to tell Python that the current
directory is a package.
Include other sub-packages or files you want.
Next, access them with the valid import statements.
Package (university)
__init__.py
student.py
faculty.py
Go to any directory in your laptop or desktop and create the above folder structure. After
creating the above folder structure include the following code in respective files.
Example
# student.py
class Student:
def get_student_details(self):
return f"Name: {self.name}\nGender: {self.gender}\nYear: {self.year}"
# faculty.py
class Faculty:
def __init__(self, faculty):
self.name = faculty['name']
self.subject = faculty['subject']
def get_faculty_details(self):
return f"Name: {self.name}\nSubject: {self.subject}"
We have the above in the student.py and faculty.py files. Let's create another file to access
those classed inside it. Now, inside the package directory create a file named testing.py and
include the following code.
Example
# testing.py
# importing the Student and Faculty classes from respective files
from student import Student
from faculty import Faculty
If you run the testing.py file, then you will get the following result.
Output
Name: John
Gender: Male
Year: 3
Name: Emma
Subject: Programming
We have seen how to create and to access a package in Python. And this is a simple package.
There might be plenty of sub-packages and files inside a package. Let's see how to access
subpackage modules.
Create a directory with the following structure
Package (university)
o __init__.py
o Subpackage (student)
__init__.py
main.py
...
o testing.py
Copy the above student code and place it here. Now, let's see how to access it in the testing.py
file. Add the following in the testing.py file.
Example
# testing.py
from student.main import Student
If you run the testing.py file, then you will get the following result.
Output
Name: John
Gender: Male
Year: 3
We have accessed the Student class from the main.py file inside the subpackage student using
a dot (.). You can go to as much deeper as you want based on the package structure.
Python Pandas - Introduction
In 2008, developer Wes McKinney started developing pandas when in need of high performance,
flexible tool for analysis of data.
Prior to Pandas, Python was majorly used for data munging and preparation. It had very little
contribution towards data analysis. Pandas solved this problem. Using Pandas, we can
accomplish five typical steps in the processing and analysis of data, regardless of the origin of
data — load, prepare, manipulate, model, and analyze.
Python with Pandas is used in a wide range of fields including academic and commercial
domains including finance, economics, Statistics, analytics, etc.
Fast and efficient DataFrame object with default and customized indexing.
Tools for loading data into in-memory data objects from different file formats.
Data alignment and integrated handling of missing data.
Reshaping and pivoting of date sets.
Label-based slicing, indexing and subsetting of large data sets.
Columns from a data structure can be deleted or inserted.
Group by data for aggregation and transformations.
High performance merging and joining of data.
Time Series functionality.
Standard Python distribution doesn't come bundled with Pandas module. A lightweight
alternative is to install NumPy using popular Python package installer, pip.
If you install Anaconda Python package, Pandas will be installed by default with the following −
Windows
Linux
Package managers of respective Linux distributions are used to install one or more packages in
SciPy stack.
Numeric, the ancestor of NumPy, was developed by Jim Hugunin. Another package Numarray
was also developed, having some additional functionalities. In 2005, Travis Oliphant created
NumPy package by incorporating the features of Numarray into Numeric package. There are
many contributors to this open source project.
NumPy is often used along with packages like SciPy (Scientific Python) and Mat−plotlib
(plotting library). This combination is widely used as a replacement for MatLab, a popular
platform for technical computing. However, Python alternative to MatLab is now seen as a more
modern and complete programming language.
In this chapter, we will understand what is Scikit-Learn or Sklearn, origin of Scikit-Learn and
some other related topics such as communities and contributors responsible for development and
maintenance of Scikit-Learn, its prerequisites, installation and its features.
Scikit-learn (Sklearn) is the most useful and robust library for machine learning in Python. It
provides a selection of efficient tools for machine learning and statistical modeling including
classification, regression, clustering and dimensionality reduction via a consistence interface in
Python. This library, which is largely written in Python, is built upon NumPy, SciPy and
Matplotlib.
Origin of Scikit-Learn
It was originally called scikits.learn and was initially developed by David Cournapeau as a
Google summer of code project in 2007. Later, in 2010, Fabian Pedregosa, Gael Varoquaux,
Alexandre Gramfort, and Vincent Michel, from FIRCA (French Institute for Research in
Computer Science and Automation), took this project at another level and made the first public
release (v0.1 beta) on 1st Feb. 2010.
Scikit-learn is a community effort and anyone can contribute to it. This project is hosted on
https://github.com/scikit-learn/scikit-learn. Following people are currently the core contributors
to Sklearn’s development and maintenance −
Joris Van den Bossche (Data Scientist)
Thomas J Fan (Software Developer)
Alexandre Gramfort (Machine Learning Researcher)
Olivier Grisel (Machine Learning Expert)
Nicolas Hug (Associate Research Scientist)
Andreas Mueller (Machine Learning Scientist)
Hanmin Qin (Software Engineer)
Adrin Jalali (Open Source Developer)
Nelle Varoquaux (Data Science Researcher)
Roman Yurchak (Data Scientist)
Various organisations like Booking.com, JP Morgan, Evernote, Inria, AWeber, Spotify and
many more are using Sklearn.
Prerequisites
Python (>=3.5)
NumPy (>= 1.11.0)
Scipy (>= 0.17.0)li
Joblib (>= 0.11)
Matplotlib (>= 1.5.1) is required for Sklearn plotting capabilities.
Pandas (>= 0.18.0) is required for some of the scikit-learn examples using data structure
and analysis.
Installation
If you already installed NumPy and Scipy, following are the two easiest ways to install scikit-
learn −
Using pip
On the other hand, if NumPy and Scipy is not yet installed on your Python workstation then, you
can install them by using either pip or conda.
Another option to use scikit-learn is to use Python distributions like Canopy and Anaconda
because they both ship the latest version of scikit-learn.
Features
Rather than focusing on loading, manipulating and summarising data, Scikit-learn library is
focused on modeling the data. Some of the most popular groups of models provided by Sklearn
are as follows −
Supervised Learning algorithms − Almost all the popular supervised learning algorithms, like
Linear Regression, Support Vector Machine (SVM), Decision Tree etc., are the part of scikit-
learn.
Unsupervised Learning algorithms − On the other hand, it also has all the popular
unsupervised learning algorithms from clustering, factor analysis, PCA (Principal Component
Analysis) to unsupervised neural networks.
Cross Validation − It is used to check the accuracy of supervised models on unseen data.
Dimensionality Reduction − It is used for reducing the number of attributes in data which can
be further used for summarisation, visualisation and feature selection.
Ensemble methods − As name suggest, it is used for combining the predictions of multiple
supervised models.
Feature extraction − It is used to extract the features from data to define the attributes in image
and text data.
Open Source − It is open source library and also commercially usable under BSD license
Python Libraries – Python Standard Library & List of Important Libraries
After Modules and Python Packages, we shift our discussion to Python Libraries.
This Python Library Tutorial, we will discuss Python Standard library and different libraries
offered by Python Programming Language: Matplotlib, scipy, numpy, etc.
We know that a module is a file with some Python code, and a package is a directory for sub
packages and modules. But the line between a package and a Python library is quite blurred.
A Python library is a reusable chunk of code that you may want to include in your programs/
projects.
Compared to languages like C++ or C, a Python libraries do not pertain to any specific context in
Python. Here, a ‘library’ loosely describes a collection of core modules.
Essentially, then, a library is a collection of modules. A package is a library that can be installed
using a package manager like rubygems or npm.
The Python Standard Library is a collection of exact syntax, token, and semantics of Python. It
comes bundled with core Python distribution. We mentioned this when we began with an
introduction.
It is written in C, and handles functionality like I/O and other core modules. All this functionality
together makes Python the language it is.
More than 200 core modules sit at the heart of the standard library. This library ships with
Python.
But in addition to this library, you can also access a growing collection of several thousand
components from the Python Package Index (PyPI). We mentioned it in the previous blog.
Next, we will see twenty Python libraries list that will take you places in your journey with
Python.
1. Matplotlib
Matplotlib helps with data analyzing, and is a numerical plotting library. We talked about it in
Python for Data Science.
2. Pandas
3. Requests
Requests is a Python Library that lets you send HTTP/1.1 requests, add headers, form data,
multipart files, and parameters with simple Python dictionaries.
It also lets you access the response data in the same way.
Python Libraries Tutorial- Requests
4. NumPy
5. SQLAlchemy
Python Libraries Tutorial – SQLAIchemy Overview
6. BeautifulSoup
It may be a bit slow, BeautifulSoup has an excellent XML- and HTML- parsing library for
beginners.
7. Pyglet
In fact, it also finds use in developing other visually-rich applications for Mac OS X, Windows,
and Linux.
In the 90s, when people were bored, they resorted to playing Minecraft on their computers.
Pyglet is the engine behind Minecraft.
Python Libraries Tutorial – Pyglet
8. SciPy
Next up is SciPy, one of the libraries we have been talking so much about. It has a number of
user-friendly and efficient numerical routines.
9. Scrapy
If your motive is fast, high-level screen scraping and web crawling, go for Scrapy.
You can use it for purposes from data mining to monitoring and automated testing.
Python Libraries Tutorial- Scrapy
10. PyGame
PyGame provides an extremely easy interface to the Simple Directmedia Library (SDL)
platform-independent graphic, audio, and input libraries.
Python Libraries Tutorial – PyGame
An event-driven networking engine, Twisted is written in Python, and licensed under the open-
source MIT license.
12. Pillow
Pillow is a friendly fork of PIL (Python Imaging Library), but is more user-friendly.
13. pywin32
This provides useful methods and class for interaction with Windows, as the name suggests.
Python pywin32 Library
14. wxPython
15. iPython
iPython Python Library has an architecture that facilitates parallel and distributed computing.
With it, you can develop, execute, debug, and monitor parallel applications.
Python Library – iPython
16. Nose
Nose delivers an alternate test discovery and running process for unittest. This intends to mimic
py.test’s behavior as much as it can.
17. Flask
A web framework, Flask is built with a small core and many extensions.
18. SymPy
With very simple and comprehensible code that is easily extensible, SymPy is a full-fledged
Computer Algebra System (CAS).
19. Fabric
Along with being a library, Fabric is a command-line tool for streamlining the use of SSH for
application deployment or systems administration tasks.
With it, you can execute local or remote shell commands, upload/download files, and even
prompt running user for input, or abort execution.
20. PyGTK
PyGTK lets you easily create programs with a GUI (Graphical User Interface) with Python.