0% found this document useful (0 votes)

41 views12 pages

Python

Uploaded by

Rakshit

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

41 views12 pages

Python

Uploaded by

Rakshit

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 12

Python

Introduction:

 Python was originally a general purpose language.

 Strong Community support
 Dedicated libraries for data analysis and predictive modeling.

Table of Contents

1. Basics of Python for Data Analysis

2. Python libraries and data structures

3. Exploratory analysis in Python using Pandas

4. Data Munging in Python using Pandas

5. Building a Predictive Model in Python

1. Basics of Python for Data Analysis
 Open Source – free to install
 Awesome online community
 Very easy to learn
 Can become a common language for data science and production of web based analytics
products.

Drawback: It is an interpreted language rather than compiled language – hence might take up
more CPU time.

Python 2.7 v/s 3.6-7

 This is one of the most debated topics in Python.
 You will invariably cross paths with it, especially if you are a beginner. There is no
right/wrong choice here. It totally depends on the situation and your need to use.

Why Python 2.7?

 Community support
 Python 2 was released in late 2000 and has been in use for more than 17 years
 Plethora of third-party libraries! Though many libraries have provided 3.x support but still a
large number of modules work only on 2.x versions. If you plan to use Python for specific
applications like web-development with high reliance on external modules, you might be
better off with 2.7
 Some of the features of 3.x versions have backward compatibility and can work with 2.7
version.

Why Python 3.6-7?

 Cleaner and faster! Python developers have fixed some inherent glitches and minor
drawbacks in order to set a stronger foundation for the future. These might not be very
relevant initially, but will matter eventually.
 It is the future! 2.7 is the last release for the 2.x family and eventually everyone has to shift
to 3.x versions. Python 3 has released stable versions for past 5 years and will continue the
same.

How to install Python?

There are 2 approaches to install Python:

You can download Python directly from its project site
(https://www.python.org/download/releases/2.7/) and install individual components and libraries
you want. Alternately, you can download and install a package, which comes with pre-installed
libraries.
Recommended Anaconda:- (https://www.continuum.io/downloads).

Another option could be Enthought Canopy Express (https://www.enthought.com/downloads/).

Second method provides a hassle free installation. The limitation of this approach is you have to
wait for the entire package to be upgraded, even if you are interested in the latest version of a
single library. It should not matter until and unless, you are doing cutting edge statistical research.

Choosing a development environment

3 most common options:

Terminal / Shell based
IDLE (default environment)
iPython notebook

IDE:

iPython:
Few things to note
 You can start iPython notebook by writing “ipython notebook” on your terminal / cmd,
depending on the OS you are working on
 You can name a iPython notebook by simply clicking on the name – UntitledO in the above
screenshot .
 The interface shows In [*] for inputs and Out[*] for output.
 You can execute a code by pressing “Shift + Enter” or “ALT + Enter”, if you want to insert an
additional row after.
2. Python libraries and Data Structures

Python Data Structures

Following are some data structures, which are used in Python. You should be familiar with them
in order to use them as appropriate.
Lists – Lists are one of the most versatile data structure in Python. A list can simply be defined
by writing a list of comma separated values in square brackets. Lists might contain items of
different types, but usually the items all have the same type. Python lists are mutable and
individual elements of a list can be changed.

1. Lists – Lists are one of the most versatile data structure in Python. A list can simply be
defined by writing a list of comma separated values in square brackets. Lists might
contain items of different types, but usually the items all have the same type. Python
lists are mutable and individual elements of a list can be changed.

A quick example:
2. Strings – Strings can simply be defined by use of single (‘ ), double ( ” ) or triple ( ”’ )
inverted commas. Strings enclosed in tripe quotes (“’) can span over multiple lines and
are used frequently in docstrings (Python’s way of documenting functions).
\ is used as an escape character. Please note that Python strings are immutable, so you
cannot change part of strings.

3. Tuples – A tuple is represented by a number of values separated by commas. Tuples are

immutable and the output is surrounded by parentheses so that nested tuples are
processed correctly. Additionally, even though tuples are immutable, they can hold
mutable data if needed. Since Tuples are immutable and cannot change, they are faster
in processing as compared to lists. Hence, if your list is unlikely to change, you should
use tuples, instead of lists.
4. Dictionary – Dictionary is an unordered set of key: value pairs, with the requirement
that the keys are unique (within one dictionary). A pair of braces creates an empty
dictionary: {}.

Python Iteration and Conditional Constructs

Like most languages, Python also has a FOR-loop which is the most widely used method for
iteration. It has a simple syntax:

for i in [Python Iterable]:

expression(i)
Here “Python Iterable” can be a list, tuple or other advanced data structures which we will
explore in later sections. Let’s take a look at a simple example, determining the factorial of a
number.

fact=1
for i in range(1,N+1):
fact *= i

Coming to conditional statements, these are used to execute code fragments based on a
condition. The most commonly used construct is if-else, with following syntax:

if [condition]:
__execution if true__
else:
__execution if false__

For instance, if we want to print whether the number N is even or odd: if N%

if N%2 == 0:
print 'Even'
else:
print 'Odd'

Let’s take a step further. What if you have to perform the following tasks?

1. Multiply 2 matrices
2. Find the root of a quadratic equation
3. Plot bar charts and histograms
4. Make statistical models
5. Access web-pages

There are many libraries for the purpose.

For example, consider the factorial example we just saw. We can do that in a single step as:

math.factorial(N)
Python Libraries

Lets take one step ahead in our journey to learn Python by getting acquainted with some useful
libraries. The first step is obviously to learn to import them into our environment. There are
several ways of doing so in Python:

import math as m

from math import *

In the first manner, we have defined an alias m to library math. We can now use various
functions from math library (e.g. factorial) by referencing it using the alias m.factorial().
In the second manner, you have imported the entire name space in math i.e. you can directly
use factorial() without referring to math.

Tip: Google recommends that you use first style of importing libraries, as you will know
where the functions have come from.

Following are a list of libraries, you will need for any scientific computations and
data analysis:

 NumPy stands for Numerical Python. The most powerful feature of NumPy is n-
dimensional array. This library also contains basic linear algebra functions, Fourier
transforms, advanced random number capabilities and tools for integration with other
low level languages like Fortran, C and C++.

 SciPy stands for Scientific Python. SciPy is built on NumPy. It is one of the most useful
library for variety of high level science and engineering modules like discrete Fourier
transform, Linear Algebra, Optimization and Sparse matrices.

 Matplotlib for plotting vast variety of graphs, starting from histograms to line plots to
heat plots.. You can use Pylab feature in ipython notebook (ipython notebook –pylab =
inline) to use these plotting features inline. If you ignore the inline option, then pylab
converts ipython environment to an environment, very similar to Matlab. You can also
use Latex commands to add math to your plot.

 Pandas for structured data operations and manipulations. It is extensively used for data
munging and preparation. Pandas were added relatively recently to Python and have
been instrumental in boosting Python’s usage in data scientist community.
 Scikit Learn for machine learning. Built on NumPy, SciPy and matplotlib, this library
contains a lot of effiecient tools for machine learning and statistical modeling including
classification, regression, clustering and dimensionality reduction.

 Statsmodels for statistical modeling. Statsmodels is a Python module that allows users
to explore data, estimate statistical models, and perform statistical tests. An extensive
list of descriptive statistics, statistical tests, plotting functions, and result statistics are
available for different types of data and each estimator.

 Seaborn for statistical data visualization. Seaborn is a library for making attractive and
informative statistical graphics in Python. It is based on matplotlib. Seaborn aims to
make visualization a central part of exploring and understanding data.

 Bokeh for creating interactive plots, dashboards and data applications on modern web-
browsers. It empowers the user to generate elegant and concise graphics in the style of
D3.js. Moreover, it has the capability of high-performance interactivity over very large
or streaming datasets.

 Blaze for extending the capability of Numpy and Pandas to distributed and streaming
datasets. It can be used to access data from a multitude of sources including Bcolz,
MongoDB, SQLAlchemy, Apache Spark, PyTables, etc. Together with Bokeh, Blaze can
act as a very powerful tool for creating effective visualizations and dashboards on huge
chunks of data.

 Scrapy for web crawling. It is a very useful framework for getting specific patterns of
data. It has the capability to start at a website home url and then dig through web-pages
within the website to gather information.

 SymPy for symbolic computation. It has wide-ranging capabilities from basic symbolic
arithmetic to calculus, algebra, discrete mathematics and quantum physics. Another
useful feature is the capability of formatting the result of the computations as LaTeX
code.

 Requests for accessing the web. It works similar to the the standard python library
urllib2 but is much easier to code. You will find subtle differences with urllib2 but for
beginners, Requests might be more convenient.

Additional libraries, you might need:

 os for Operating system and file operations
 networkx and igraph for graph based data manipulations
 regular expressions for finding patterns in text data
 BeautifulSoup for scrapping web. It is inferior to Scrapy as it will extract information
from just a single webpage in a run.

Now that we are familiar with Python fundamentals and additional libraries, let’s take a deep
dive into problem solving through Python. Yes making a predictive model! In the process, we
use some powerful libraries and also come across the next level of data structures. We will take
you through the 3 key phases:
1. Data Exploration – finding out more about the data we have.
2. Data Munging – cleaning the data and playing with it to make it better suit statistical
modeling
3. Predictive Modeling – running the actual algorithms and having fun.
3. Exploratory analysis in Python using Pandas

Python for Mechanical and Aerospace Engineering
From Everand
Python for Mechanical and Aerospace Engineering
Alexander Kenan
No ratings yet
Ultimate Step by Step Guide To Machine Learning Using Python Predictive
100% (3)
Ultimate Step by Step Guide To Machine Learning Using Python Predictive
56 pages
Pytthon For Data Analysis From Scratch
100% (5)
Pytthon For Data Analysis From Scratch
37 pages
PYTHON DATA SCIENCE FOR BEGINNERS: Unlock the Power of Data Science with Python and Start Your Journey as a Beginner (2023 Crash Course)
From Everand
PYTHON DATA SCIENCE FOR BEGINNERS: Unlock the Power of Data Science with Python and Start Your Journey as a Beginner (2023 Crash Course)
Rufus Johnston
No ratings yet
Bcse206l Fds Module-5 Smsatapathy
No ratings yet
Bcse206l Fds Module-5 Smsatapathy
74 pages
Python For Artificial Intelligence
No ratings yet
Python For Artificial Intelligence
43 pages
01 Complete-Tutorial-Learn-Data-Science-Python-Scratch-2
No ratings yet
01 Complete-Tutorial-Learn-Data-Science-Python-Scratch-2
28 pages
4 Weeks Session 2 DA Fundamentals
No ratings yet
4 Weeks Session 2 DA Fundamentals
36 pages
A Summer Training Report On Python and It's Libraries Under The Guidance of
No ratings yet
A Summer Training Report On Python and It's Libraries Under The Guidance of
20 pages
UNIT 1 PDS Notes
No ratings yet
UNIT 1 PDS Notes
83 pages
Python Programming
No ratings yet
Python Programming
27 pages
Program Name: B.Tech Semester:6TH Course Name: Course Code: Facilitator Name:ANTIM PANGHAL
No ratings yet
Program Name: B.Tech Semester:6TH Course Name: Course Code: Facilitator Name:ANTIM PANGHAL
13 pages
Python For Data Science
From Everand
Python For Data Science
Kevin Clark
No ratings yet
Python
No ratings yet
Python
144 pages
Python ?
No ratings yet
Python ?
69 pages
Data Analysis Bootcamp Week 1 Study Note
No ratings yet
Data Analysis Bootcamp Week 1 Study Note
43 pages
Mastering Python Programming: A Comprehensive Guide: The IT Collection
From Everand
Mastering Python Programming: A Comprehensive Guide: The IT Collection
Christopher Ford
5/5 (1)
Lab Manual
No ratings yet
Lab Manual
134 pages
Analyticsvidhya Com
No ratings yet
Analyticsvidhya Com
38 pages
Python Intern Notes-1
No ratings yet
Python Intern Notes-1
22 pages
Imp Question - Python Questions by Hari For Placement
No ratings yet
Imp Question - Python Questions by Hari For Placement
12 pages
Python Basic
No ratings yet
Python Basic
6 pages
Unit-4 Python Topic in IoT
No ratings yet
Unit-4 Python Topic in IoT
79 pages
Mastering Python: A Comprehensive Guide for Beginners and Experts
From Everand
Mastering Python: A Comprehensive Guide for Beginners and Experts
Rick Spair
No ratings yet
Iot (Python)
No ratings yet
Iot (Python)
32 pages
Industrial Visit Report
No ratings yet
Industrial Visit Report
23 pages
Report of Python (1.)
No ratings yet
Report of Python (1.)
52 pages
Python Final Exam
No ratings yet
Python Final Exam
39 pages
Your First Python Program
From Everand
Your First Python Program
Alexander Paz
No ratings yet
Python Notes Second Semester BVOC Computer Application
No ratings yet
Python Notes Second Semester BVOC Computer Application
27 pages
Comprehensive Guide to PYTHON
No ratings yet
Comprehensive Guide to PYTHON
17 pages
Data Analysis of Visualization: CHAPTER - 1 Preliminaries
No ratings yet
Data Analysis of Visualization: CHAPTER - 1 Preliminaries
93 pages
Python Mastery: From Absolute Beginner to Pro
From Everand
Python Mastery: From Absolute Beginner to Pro
NIBEDITA Sahu
No ratings yet
Unit III - 1
No ratings yet
Unit III - 1
57 pages
Python Atb
No ratings yet
Python Atb
17 pages
What Is Python
No ratings yet
What Is Python
10 pages
UNIT III - PDF (FIOT)
No ratings yet
UNIT III - PDF (FIOT)
24 pages
Python Datatypes
No ratings yet
Python Datatypes
22 pages
PYTHON Notes by Devaraj
100% (1)
PYTHON Notes by Devaraj
40 pages
Unit 1 Python
No ratings yet
Unit 1 Python
54 pages
Pds Imp
No ratings yet
Pds Imp
60 pages
ML Lab File
No ratings yet
ML Lab File
53 pages
FIoT Unit-3
No ratings yet
FIoT Unit-3
22 pages
PPUnit 1&2
No ratings yet
PPUnit 1&2
16 pages
C Language
No ratings yet
C Language
42 pages
CH - Intro of Pyhthon
No ratings yet
CH - Intro of Pyhthon
3 pages
Notes
No ratings yet
Notes
10 pages
Python Unit-1
No ratings yet
Python Unit-1
22 pages
Intership Body
No ratings yet
Intership Body
31 pages
My Assignment - 081404
No ratings yet
My Assignment - 081404
22 pages
Chapter-2-Basics of Programming in Python Part 1
No ratings yet
Chapter-2-Basics of Programming in Python Part 1
94 pages
Lesson No 1 (Shreya)
No ratings yet
Lesson No 1 (Shreya)
9 pages
Introduction To Python Programming For Numerical Computation
No ratings yet
Introduction To Python Programming For Numerical Computation
17 pages
DSA Lab 1
No ratings yet
DSA Lab 1
21 pages
quickStartGuide Py
No ratings yet
quickStartGuide Py
30 pages
Aiml Notes
No ratings yet
Aiml Notes
84 pages
Basics
No ratings yet
Basics
17 pages
Python Book Pages
No ratings yet
Python Book Pages
135 pages
Iot Unit3
No ratings yet
Iot Unit3
8 pages
Project Report
0% (2)
Project Report
17 pages