Python Units 4 Notes
Python Units 4 Notes
We have to use the eval() function instead of float(). The eval() method can be used to convert
complex numbers as input to the complex objects in Python
AVCCE/MCA/R-21/I-SEM/NS Page 1
lOMoARcPSD|50078205
Example 1
1. PrintMsg.py
def fun(usr):
print(“Welcome”,usr)
2.Mainprogram.py
import PrintMsg
PrintMsg.fun(“Srini”)
Output:
Welcome Srini
Example 2
Support.py:
def add(a,b):
print(“The result is “,a+b)
return
def display(p):
print(“welcome “,p)
return
The support.py file can be imported as a module into another python source file and its
functions can be called from the new files as shown in the following code
import Support
>>> Support.add(3,4)
The result is 7
>>> Support.add('a','b')
The result is ab
>>> Support.add("srini","vasan")
The result is srinivasan
>>> Support.display('I MCA Students')
welcome I MCA Students
4.2.2 Built in Modules
OS Module
• The OS module in python provide function for interacting with operating system
• To access the OS module, we have to import the OS module in our program
AVCCE/MCA/R-21/I-SEM/NS Page 2
lOMoARcPSD|50078205
import os
>>> print(os.name)
nt
>>> op=os.environ['HOME']
>>> print(op)
C:\Users\SUCCESS
>>> os.getcwd()
'C:\\Python34'
4.3.1 Sys Module
AVCCE/MCA/R-21/I-SEM/NS Page 3
lOMoARcPSD|50078205
4.4 Packages
• A package is a collection of python module. Packages are namespaces which contain
multiple packages and modules.
• They are simply directories. A package must include a file called init.py
• To differentiate a package from a directory. Packages can be nested to any depth, provided
that the corresponding directories contain their own init.py file
AVCCE/MCA/R-21/I-SEM/NS Page 4
lOMoARcPSD|50078205
4.5 The Python Libraries for data processing, data mining and visualization
4.5.1 Data Mining
Scrapy
• One of the most popular Python data science libraries,
• It helps to build crawling programs (spider bots) that can retrieve structured data from the
web
– for example, URLs or contact info. It's a great tool for scraping data used in, for
example, Python machine learning models.
• Developers use it for gathering data from APIs. This full-fledged framework follows the
Don't Repeat Yourself principle in the design of its interface. As a result, the tool inspires
users to write universal code that can be reused for building and scaling large crawlers.
BeautifulSoup
• BeautifulSoup is another really popular library for web crawling and data scraping. If you
want to collect data that‟s available on some website but not via a proper CSV or API,
BeautifulSoup can help you scrape it and arrange it into the format you need.
• Web scraping has become an effective way of extracting information from the web for
decision making and analysis. It has become an essential part of the data science toolkit.
• Data scientists should know how to gather data from web pages and store that data in
different formats for further analysis.
• Any web page you see on the internet can be crawled for information and anything
visible on a web page can be extracted.
• Every web page has its own structure and web elements that because of which you need
to write your web crawlers/spiders according to the web page being extracted.
• Scrapy provides a powerful framework for extracting the data, processing it and then
save it.
AVCCE/MCA/R-21/I-SEM/NS Page 5
lOMoARcPSD|50078205
• Scrapy uses spiders, which are self-contained crawlers that are given a set of instructions .
• In Scrapy it is easier to build and scale large crawling projects by allowing developers to
reuse their code.
• The data flow in Scrapy is controlled by the execution engine, and goes like this:
• The Engine gets the initial Requests to crawl from the Spider.
• The Engine schedules the Requests in the Scheduler and asks for the next Requests to
crawl.
• The Scheduler returns the next Requests to the Engine.
• The Engine sends the Requests to the Downloader, passing through the Downloader
Middlewares (see process_request()).
• Once the page finishes downloading the Downloader generates a Response (with that
page) and sends it to the Engine, passing through the Downloader
Middlewares (see process_response()).
• The Engine receives the Response from the Downloader and sends it to the Spider for
processing, passing through the Spider Middleware (see process_spider_input()).
• The Spider processes the Response and returns scraped items and new Requests (to
follow) to the Engine, passing through the Spider
Middleware (see process_spider_output()).
• The Engine sends processed items to Item Pipelines, then send processed Requests to
the Scheduler and asks for possible next Requests to crawl.
• The process repeats (from step 1) until there are no more requests from the Scheduler.
AVCCE/MCA/R-21/I-SEM/NS Page 6
lOMoARcPSD|50078205
Components
• Scrapy Engine
– The engine is responsible for controlling the data flow between all components of
the system, and triggering events when certain actions occur.
• Scheduler
– The Scheduler receives requests from the engine and enqueues them for feeding
them later (also to the engine) when the engine requests them.
• Downloader
– The Downloader is responsible for fetching web pages and feeding them to the
engine which, in turn, feeds them to the spiders.
• Spiders
– Spiders are custom classes written by Scrapy users to parse responses and
extract items from them or additional requests to follow.
• Item Pipeline
– The Item Pipeline is responsible for processing the items once they have been
extracted (or scraped) by the spiders. Typical tasks include cleansing, validation
and persistence (like storing the item in a database).
• Downloader middlewares
– Downloader middlewares are specific hooks that sit between the Engine and the
Downloader and process requests when they pass from the Engine to the
Downloader, and responses that pass from Downloader to the Engine.
– Use a Downloader middleware if you need to do one of the following:
– process a request just before it is sent to the Downloader (i.e. right before Scrapy
sends the request to the website);
– change received response before passing it to a spider; send a new Request instead
of passing received response to a spider; pass response to a spider without fetching
a web page; silently drop some requests.
• Spider middlewares
– Spider middlewares are specific hooks that sit between the Engine and the Spiders
and are able to process spider input (responses) and output (items and requests).
– Use a Spider middleware if you need to post-process output of spider callbacks -
change/add/remove requests or items;
AVCCE/MCA/R-21/I-SEM/NS Page 7
lOMoARcPSD|50078205
AVCCE/MCA/R-21/I-SEM/NS Page 8
lOMoARcPSD|50078205
– Pandas take data in a CSV or TSV file or a SQL database and create a Python
object with rows and columns called a data frame.
– The data frame is very similar to a table in statistical software, say Excel or SPSS.
• What can you do with Pandas?
– Indexing, manipulating, renaming, sorting, merging data frame
– Update, Add, Delete columns from a data frame Impute missing files, handle
missing data or NANs Plot data with histogram or box plot
– This makes Pandas a foundation library in learning Python for Data Science.
4.5.3 Data Visualization
Matplotlib
• This is a quintessential Python library. You can create stories with the data visualized
with Matplotlib. Another library from the SciPy Stack, Matplotlib plots 2D figures.
• When to use?
• Matplotlib is the plotting library for Python that provides an object-oriented
API for embedding plots into applications. It is a close resemblance to
MATLAB embedded in Python programming language.
• What can you do with Matplotlib?
• Histogram, bar plots, scatter plots, area plot to pie plot, Matplotlib can depict a
wide range of visualizations. With a bit of effort and tint of visualization
capabilities, with Matplotlib, you can create just any visualizations:
• Line plots
• Scatter plots
• Area plots
• Bar charts and Histograms
• Pie charts
• Stem plots
• Contour plots
• Quiver plots
• Spectrograms
• Matplotlib also facilitates labels, grids, legends, and some more
formatting entities with Matplotlib. Basically, everything that can be
drawn!
AVCCE/MCA/R-21/I-SEM/NS Page 9
lOMoARcPSD|50078205
Plotly
• Plotly is a quintessential graph plotting library for Python. Users can import, copy, paste,
or stream data that is to be analyzed and visualized. Plotly offers a sandboxed
Python(Something where you can run a Python that is limited in what it can do) Now I‟ve
had a hard time understanding what sandboxing is but I know for a fact that Plotly makes
it easy!?
• When to use?
• You can use Plotly if you want to create and display figures, update figures, hover over
text for details. Plotly also has an additional feature of sending data to cloud servers.
That‟s interesting!
• What can you do with Plotly?
• The Plotly graph library has a wide range of graphs that you can plot:
• Basic Charts: Line, Pie, Scatter, Bubble, Dot, Gantt, Sunburst, Treemap, Sankey, Filled
Area Charts
• Statistical and Seaborn Styles: Error, Box, Histograms, Facet and Trellis Plots, Tree plots,
Violin Plots, Trend Lines
• Scientific charts: Contour, Ternary, Log, Quiver, Carpet, Radar, Heat maps Windrose and
Polar Plots
– Financial Charts
– Maps
– Subplots
– Transforms
– Jupyter Widgets Interaction
– Told you, Plotly is the quintessential plots library. Think of visualization and
plotly can do it!
***** SUCCESS!*****
AVCCE/MCA/R-21/I-SEM/NS Page 10