0% found this document useful (0 votes)
5 views

Python Units 4 Notes

This document provides an overview of Python programming concepts related to modules, packages, and frameworks, focusing on their definitions and usage. It covers user-defined and built-in modules, the creation of packages, and introduces key Python libraries for data processing, mining, and visualization such as NumPy, Pandas, Matplotlib, and Plotly. Additionally, it explains web scraping techniques using libraries like Scrapy and BeautifulSoup, detailing their components and functionalities.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

Python Units 4 Notes

This document provides an overview of Python programming concepts related to modules, packages, and frameworks, focusing on their definitions and usage. It covers user-defined and built-in modules, the creation of packages, and introduces key Python libraries for data processing, mining, and visualization such as NumPy, Pandas, Matplotlib, and Plotly. Additionally, it explains web scraping techniques using libraries like Scrapy and BeautifulSoup, detailing their components and functionalities.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

lOMoARcPSD|50078205

Python Units-4 nots

Master of Computer Applications (Anna University)

Scan to open on Studocu

Studocu is not sponsored or endorsed by any college or university


lOMoARcPSD|50078205

MC4103_Python Programming Unit- 4 | MODULES, PACKAGES AND FRAMEWORKS

Unit- IV | MODULES, PACKAGES AND FRAMEWORKS

Modules: Introduction – Module Loading and Execution – Packages – Making


Your OwnModule – The Python Libraries for data processing, data mining and visualization
NUMPY,Pandas, Matplotlib, Plotly, -Frameworks- -Django, Flask, Web2Py

4.1 Modules: Introduction

• Module is a file containing python definitions and statements.


• Module name is filename.py
• Module allows to logically organizing python code.
• A module can be user defined or built in module.
• To perform mathematical operations like, square root, exponentiation,…
• We need to import module called math.

Example: For positive numbers


num = 8
# To take the input from the user
#num = float(input('Enter a number: '))
num_sqrt = num ** 0.5print('The square root of %0.3f is
%0.3f'%(num ,num_sqrt))
OUTPUT: The square root of 8.000 is 2.828

For real or complex numbers


import cmath
num = 1+2j
# To take input from the user
#num = eval(input('Enter a number: '))
num_sqrt = cmath.sqrt(num)
print('The square root of {0} is {1:0.3f}+{2:0.3f}j'.format(num
,num_sqrt.real,num_sqrt.imag))

OUTPUT : The square root of (1+2j) is 1.272+0.786j

We use the sqrt() function in the cmath (complex math) module.


Note: If we want to take complex number as input directly, like 3+4j,

We have to use the eval() function instead of float(). The eval() method can be used to convert
complex numbers as input to the complex objects in Python

AVCCE/MCA/R-21/I-SEM/NS Page 1
lOMoARcPSD|50078205

MC4103_Python Programming Unit- 4 | MODULES, PACKAGES AND FRAMEWORKS

4.2 Types of Module


Basically, there are two types of modules in python

4.2.1User Defined Modules


Creation of Module
1. Create a file/module having extension .py
2. Open python shell and import above module.
3. Using module name access the function.

Example 1
1. PrintMsg.py
def fun(usr):
print(“Welcome”,usr)
2.Mainprogram.py
import PrintMsg
PrintMsg.fun(“Srini”)
Output:
Welcome Srini
Example 2
Support.py:
def add(a,b):
print(“The result is “,a+b)
return
def display(p):
print(“welcome “,p)
return
The support.py file can be imported as a module into another python source file and its
functions can be called from the new files as shown in the following code

import Support
>>> Support.add(3,4)
The result is 7
>>> Support.add('a','b')
The result is ab
>>> Support.add("srini","vasan")
The result is srinivasan
>>> Support.display('I MCA Students')
welcome I MCA Students
4.2.2 Built in Modules
OS Module
• The OS module in python provide function for interacting with operating system
• To access the OS module, we have to import the OS module in our program

AVCCE/MCA/R-21/I-SEM/NS Page 2
lOMoARcPSD|50078205

MC4103_Python Programming Unit- 4 | MODULES, PACKAGES AND FRAMEWORKS

4.3 Module Loading and Execution

import os
>>> print(os.name)
nt
>>> op=os.environ['HOME']
>>> print(op)
C:\Users\SUCCESS
>>> os.getcwd()
'C:\\Python34'
4.3.1 Sys Module

AVCCE/MCA/R-21/I-SEM/NS Page 3
lOMoARcPSD|50078205

MC4103_Python Programming Unit- 4 | MODULES, PACKAGES AND FRAMEWORKS

4.4 Packages
• A package is a collection of python module. Packages are namespaces which contain
multiple packages and modules.
• They are simply directories. A package must include a file called init.py
• To differentiate a package from a directory. Packages can be nested to any depth, provided
that the corresponding directories contain their own init.py file

4.4.1 Creating Packages


Steps for creating packages
• Create a folder named „calc‟ in working drive.
• Inside that folder create init.py file(Empty file).
• Create folder named add inside calc folder.
• Similarily sub,mul,div,… inside calc folder.
Example 1:
#addition.py
def add_fun(a,b):
return a+b
#subraction.py
def sub_fun(a,b):
return a-b
#multiplication.py
def mul_fun(a,b)
return a*b
#division.py
def div_fun(a,b)
Writing a program that invokes all directories
import pack1.add.adder
import pack1.sub.subractor
import pack1.mul.multiplier
import pack1.div.divider
print("addition of 10 & 20 is", pack1.add.adder.addition(10,20))
print("subraction of 10 & 20 is",calc.sub.subraction.sub_fun(10,20))
print("multiplication of 10 & 20 is",calc.mul.multiplication.mul_fun(10,20))
print("division of 10 & 20 is",calc.div.division.div_fun(10,20))

AVCCE/MCA/R-21/I-SEM/NS Page 4
lOMoARcPSD|50078205

MC4103_Python Programming Unit- 4 | MODULES, PACKAGES AND FRAMEWORKS

4.5 The Python Libraries for data processing, data mining and visualization
4.5.1 Data Mining

Scrapy
• One of the most popular Python data science libraries,
• It helps to build crawling programs (spider bots) that can retrieve structured data from the
web
– for example, URLs or contact info. It's a great tool for scraping data used in, for
example, Python machine learning models.
• Developers use it for gathering data from APIs. This full-fledged framework follows the
Don't Repeat Yourself principle in the design of its interface. As a result, the tool inspires
users to write universal code that can be reused for building and scaling large crawlers.

BeautifulSoup
• BeautifulSoup is another really popular library for web crawling and data scraping. If you
want to collect data that‟s available on some website but not via a proper CSV or API,
BeautifulSoup can help you scrape it and arrange it into the format you need.

• Web scraping has become an effective way of extracting information from the web for
decision making and analysis. It has become an essential part of the data science toolkit.
• Data scientists should know how to gather data from web pages and store that data in
different formats for further analysis.
• Any web page you see on the internet can be crawled for information and anything
visible on a web page can be extracted.
• Every web page has its own structure and web elements that because of which you need
to write your web crawlers/spiders according to the web page being extracted.
• Scrapy provides a powerful framework for extracting the data, processing it and then
save it.

AVCCE/MCA/R-21/I-SEM/NS Page 5
lOMoARcPSD|50078205

MC4103_Python Programming Unit- 4 | MODULES, PACKAGES AND FRAMEWORKS

• Scrapy uses spiders, which are self-contained crawlers that are given a set of instructions .
• In Scrapy it is easier to build and scale large crawling projects by allowing developers to
reuse their code.

• The data flow in Scrapy is controlled by the execution engine, and goes like this:
• The Engine gets the initial Requests to crawl from the Spider.
• The Engine schedules the Requests in the Scheduler and asks for the next Requests to
crawl.
• The Scheduler returns the next Requests to the Engine.
• The Engine sends the Requests to the Downloader, passing through the Downloader
Middlewares (see process_request()).
• Once the page finishes downloading the Downloader generates a Response (with that
page) and sends it to the Engine, passing through the Downloader
Middlewares (see process_response()).
• The Engine receives the Response from the Downloader and sends it to the Spider for
processing, passing through the Spider Middleware (see process_spider_input()).
• The Spider processes the Response and returns scraped items and new Requests (to
follow) to the Engine, passing through the Spider
Middleware (see process_spider_output()).
• The Engine sends processed items to Item Pipelines, then send processed Requests to
the Scheduler and asks for possible next Requests to crawl.
• The process repeats (from step 1) until there are no more requests from the Scheduler.

AVCCE/MCA/R-21/I-SEM/NS Page 6
lOMoARcPSD|50078205

MC4103_Python Programming Unit- 4 | MODULES, PACKAGES AND FRAMEWORKS

Components
• Scrapy Engine
– The engine is responsible for controlling the data flow between all components of
the system, and triggering events when certain actions occur.
• Scheduler
– The Scheduler receives requests from the engine and enqueues them for feeding
them later (also to the engine) when the engine requests them.
• Downloader
– The Downloader is responsible for fetching web pages and feeding them to the
engine which, in turn, feeds them to the spiders.
• Spiders
– Spiders are custom classes written by Scrapy users to parse responses and
extract items from them or additional requests to follow.
• Item Pipeline
– The Item Pipeline is responsible for processing the items once they have been
extracted (or scraped) by the spiders. Typical tasks include cleansing, validation
and persistence (like storing the item in a database).

• Downloader middlewares
– Downloader middlewares are specific hooks that sit between the Engine and the
Downloader and process requests when they pass from the Engine to the
Downloader, and responses that pass from Downloader to the Engine.
– Use a Downloader middleware if you need to do one of the following:
– process a request just before it is sent to the Downloader (i.e. right before Scrapy
sends the request to the website);
– change received response before passing it to a spider; send a new Request instead
of passing received response to a spider; pass response to a spider without fetching
a web page; silently drop some requests.
• Spider middlewares
– Spider middlewares are specific hooks that sit between the Engine and the Spiders
and are able to process spider input (responses) and output (items and requests).
– Use a Spider middleware if you need to post-process output of spider callbacks -
change/add/remove requests or items;

AVCCE/MCA/R-21/I-SEM/NS Page 7
lOMoARcPSD|50078205

MC4103_Python Programming Unit- 4 | MODULES, PACKAGES AND FRAMEWORKS

– post-process start_requests; handle spider exceptions; call errback instead of


callback for some of the requests based on response content.
• Event-driven networking
– Scrapy is written with Twisted, a popular event-driven networking framework for
Python. Thus, it‟s implemented using a non-blocking (aka asynchronous) code for
concurrency.

4.5.2 Data Processing and Modeling


NumPy
– One of the most fundamental packages in Python, NumPy is a general-purpose
array-processing package. It provides high-performance multidimensional array
objects and tools to work with the arrays.
– NumPy is an efficient container of generic multi-dimensional data. NumPy‟s main
object is the homogeneous multidimensional array. It is a table of elements or
numbers of the same datatype, indexed by a tuple of positive integers.
– In NumPy, dimensions are called axes and the number of axes is called rank.
NumPy‟s array class is called ndarray aka array.
• When to use?
– NumPy is used to process arrays that store values of the same datatype. NumPy
facilitates math operations on arrays and their vectorization. This significantly
enhances performance and speeds up the execution time correspondingly.
• What can you do with NumPy?
– Basic array operations: add, multiply, slice, flatten, reshape, index arrays
– Advanced array operations: stack arrays, split into sections, broadcast arrays
– Work with DateTime or Linear Algebra. Basic Slicing and Advanced Indexing in
NumPy Python
Pandas
– Pandas is an open-source Python package that provides high-performance, easy-
to-use data structures and data analysis tools for the labeled data in Python
programming language. Pandas stand for Python Data Analysis Library.
• When to use?
– Pandas is a perfect tool for data wrangling or munging.
– It is designed for quick and easy data manipulation, reading, aggregation, and
visualization.

AVCCE/MCA/R-21/I-SEM/NS Page 8
lOMoARcPSD|50078205

MC4103_Python Programming Unit- 4 | MODULES, PACKAGES AND FRAMEWORKS

– Pandas take data in a CSV or TSV file or a SQL database and create a Python
object with rows and columns called a data frame.
– The data frame is very similar to a table in statistical software, say Excel or SPSS.
• What can you do with Pandas?
– Indexing, manipulating, renaming, sorting, merging data frame
– Update, Add, Delete columns from a data frame Impute missing files, handle
missing data or NANs Plot data with histogram or box plot
– This makes Pandas a foundation library in learning Python for Data Science.
4.5.3 Data Visualization

Matplotlib
• This is a quintessential Python library. You can create stories with the data visualized
with Matplotlib. Another library from the SciPy Stack, Matplotlib plots 2D figures.
• When to use?
• Matplotlib is the plotting library for Python that provides an object-oriented
API for embedding plots into applications. It is a close resemblance to
MATLAB embedded in Python programming language.
• What can you do with Matplotlib?
• Histogram, bar plots, scatter plots, area plot to pie plot, Matplotlib can depict a
wide range of visualizations. With a bit of effort and tint of visualization
capabilities, with Matplotlib, you can create just any visualizations:
• Line plots
• Scatter plots
• Area plots
• Bar charts and Histograms
• Pie charts
• Stem plots
• Contour plots
• Quiver plots
• Spectrograms
• Matplotlib also facilitates labels, grids, legends, and some more
formatting entities with Matplotlib. Basically, everything that can be
drawn!

AVCCE/MCA/R-21/I-SEM/NS Page 9
lOMoARcPSD|50078205

MC4103_Python Programming Unit- 4 | MODULES, PACKAGES AND FRAMEWORKS

Plotly
• Plotly is a quintessential graph plotting library for Python. Users can import, copy, paste,
or stream data that is to be analyzed and visualized. Plotly offers a sandboxed
Python(Something where you can run a Python that is limited in what it can do) Now I‟ve
had a hard time understanding what sandboxing is but I know for a fact that Plotly makes
it easy!?
• When to use?
• You can use Plotly if you want to create and display figures, update figures, hover over
text for details. Plotly also has an additional feature of sending data to cloud servers.
That‟s interesting!
• What can you do with Plotly?
• The Plotly graph library has a wide range of graphs that you can plot:
• Basic Charts: Line, Pie, Scatter, Bubble, Dot, Gantt, Sunburst, Treemap, Sankey, Filled
Area Charts
• Statistical and Seaborn Styles: Error, Box, Histograms, Facet and Trellis Plots, Tree plots,
Violin Plots, Trend Lines
• Scientific charts: Contour, Ternary, Log, Quiver, Carpet, Radar, Heat maps Windrose and
Polar Plots
– Financial Charts
– Maps
– Subplots
– Transforms
– Jupyter Widgets Interaction
– Told you, Plotly is the quintessential plots library. Think of visualization and
plotly can do it!

***** SUCCESS!*****

AVCCE/MCA/R-21/I-SEM/NS Page 10

You might also like