0% found this document useful (0 votes)

387 views9 pages

Numpy - Python Package For Data

This document provides an introduction and overview of the Python NumPy library for data science. It discusses NumPy's N-dimensional array data structure (ndarray) which supports efficient array operations and broadcasting. The document also introduces other key Python libraries for data science like pandas, matplotlib, SciPy, scikit-learn, and tools like IPython notebook and Jupyter. It describes how NumPy will be covered in depth in this course on using NumPy for data science applications.

Uploaded by

Daniel N Sherine Foo

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as TXT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

387 views9 pages

Numpy - Python Package For Data

Uploaded by

Daniel N Sherine Foo

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as TXT, PDF, TXT or read online on Scribd

You are on page 1/ 9

Python NumPy - Course Introduction

Python is emerging as one of the favorite tools in the field of data science. With
powerful data science libraries like NumPy, SciPy,
pandas, matplotlib, scikit-learn and tools like IPython notebook combined with ease
of programming, Python is proving to be the preferred language for organizations.

This course will introduce you to some of these libraries useful for data science.
You will further take a deep dig on playing with NumPy.

Data Science
Data Science is an interdisciplinary area that extracts insights from data, present
in multiple forms.

To master the field of Data Science, one must possess knowledge on all of the
following fields.

Computer Science

Artificial Intelligence and Machine Learning

Statistics and Mathematics

Domain Knowledge

Knowledge Discovery Process

Knowledge Discovery Process extracts meaningful insights from rawdata. It involves
the following series of steps.

Problem Definition

Data Collection

Data Preprocessing

Data Transformation

Data Mining

Data Analysis

Data Visualization

Python provides many powerful libraries that can be used to perform various tasks
described above.

Python libraries for Data Science.

NumPy
An essential library used for scientific computing in Python.
Holds data in N-dimensional array (ndarray) objects, which can store data in
multiple dimensions.
Supports performing efficient array operations through Broadcasting feature.

pandas
Provides functionality to deal with structured data.
Stores Data in different Primary data structures: Series, DataFrame and Panel.

matplotlib
Widely used for Data Visualization.
Used to generate various types of plots.
SciPy
A collection of efficient numerical algorithms used in Numerical integration,
Signal processing and Optimization.

NLTK
Performs different tasks related to Natural Language Processing.

scikit-learn
Python library used for Machine learning

Jupyter
Provides web based interactive computational environment.
Combines code, rich text, plots, media and mathematical equations together.

Bokeh
Offers interactive Web visualization features.

PyMongo
PyMongo distribution comprises tools for working with MongoDB.
MongoDB is a highly scalable and robust NoSQL database.

Scientific Distributions
Data scientist has to manually install all the python libraries required for
performing various tasks involved in Knowledge Discovery Process.

Drawbacks of Manual Installation

Installing few libraries may require an installation of other dependencies.

Time-consuming task.

Installation of few libraries may be unsuccessful.

Prone to manual errors.

Scientific Distribution.

All draw backs of manual installation could be overcome using any one of the
available Scientific Distributions.

A Scientific distribution is a collection of Python libraries, which provide a

ready to use Python environment.

A Scientific distribution is easy to download, install and use.

Few popular distributions include Anaconda, Enthought Python, PythonXY, WinPython.

In this course, you will learn about Anaconda.

Anaconda
Anaconda is a popular high-performance platform used for data science.

The base version is open source and contains over 100+ packages from Python, R, and
Scala.

Additionally, provides access to over 700+ packages that could be installed and
managed using conda.
Anaconda is available for 32-bit and 64-bit Operating systems: Windows, Linux, and
Mac OSX.

Installing Anaconda
Steps for installing Anaconda

Identify your system's OS and its architecture, i.e., 32-bit or 64-bit.

Go to Anaconda's downloads page.

Select the download section of your OS.

Choose the Python Version, i.e 3.x or 2.x, based on your interest.

Download the installer based on your system architecture.

Optional: Verify data integrity with MD5 or SHA-256.

Install the downloaded file.

Anaconda Navigator
Provides access to various components of Anaconda Distribution.

The following windows appear at the left side of Anaconda Navigator.

Home

Environment

Projects

Learning

Community

Home and Environment Windows

Home Window

Opened by default with root environment.

Enables launching working environment through various modes like Jupyter Notebooks,
Jupyter qt-console, and Sypder IDE.

Environment Window

Shows information about various available environments.

Details of packages installed for each available environment is viewable.

Projects, Learning, and Community Windows

Project Window

Provides tools for managing Anaconda projects.

Learning Window

Provides access to popular Data Science Resources.

Community Window

Provides links to popular Data Science Events, Forums, Blogs, etc.

Anaconda Prompt is the command line tool provided by Anaconda Distribution.

You can access anaconda's default Python interactive interpreter, using command
'python'.

You can also work with Conda, anaconda's package manager.

Command for checking Conda's version.

conda --version
Command for viewing available environments.
conda info --envs

Creating a new environment

By default anaconda comes with root environment.

A new environment testenv, with Python 2.7, can be created using the below command.

conda create --name testenv python=2.7

Command for activating testenv

activate testenv

Command for viewing available packages in testenv.

conda list

Accessing numpy package from current testenv results in ImportError.

You can install the package using conda install.

conda install numpy

Now you can verify the numpy availability with conda list command.

After successful installation, you can access numpy from testenv, without any
errors.

IPython
IPython provides interactive working environment, which is highly convenient and
efficient.

Its major components are:

An interactive Python shell.

A Jupyter kernel that allows working with Python code in various interactive front
ends.

Features of IPython
Python statements and System commands can be executed in IPython.

IPython supports Tab completion feature.

With Magic Methods, IPython enables performing many tasks easily.

IPython caches Input and Output history.

IPython supports Parallel Computing.

Launching Jupyter qt-console

The GIF illustrates the following:

How to open IPython in Jupyter qt-console from Anaconda Navigator.

How to execute Python statements in IPython?

How to run System commands in IPython?

Knowing about an object or a method.

Using Tab completion feature.

Understanding Magic Methods

Magic Methods begin with a single % or double %% symbols.

Line Magic Method: Magic method starting with one % symbol.

Line Magic Method is applicable only on a single line of code.

Cell Magic Method: Magic method starting with two %% symbols.

Cell Magic Method is applicable on multiple lines of code, written in a single

cell.

Starting Jupyter Notebook Server

Jupyter Notebook server can be launched from Anaconda Navigator Home Window. The
Notebook server opens in a browser and displays contents of starting folder.

The displayed page contains the following three tabs.

Files displays folders and files present in starting folder.

Running holds information of notebooks that are running.

Clusters contain information of notebooks running in parallel mode.

Creating a Folder
A folder can be created using Folder option present under New section.

The GIF illustrates the following.

Creating an Untitled folder.

Renaming it to MyJupyterNoteBooks, and

Changing working directory to MyJupyterNoteBooks folder.

Starting a Jupyter Notebook

A Jupyter Notebook can be created by Choosing an available Kernel.

The Kernel enables the environment required for executing the code snippets.

The GIF illustrates

Creation of Untitled Notebook.

Renaming it to MyFirstNoteBook.

Checking it's running status in Files / Running tabs.

Shutting down the notebook MyFirstNoteBook.

About a Notebook Cell

The basic element of a Notebook is Cell.

A user is allowed to write either code snippets or markdown text, inside a cell.

A Markdown Text can be used to embed Normal text, Header Text, Unordered, Ordered
Lists, Hyperlinks, Tables, Images,
Videos, HTML content, and other useful elements inside the Notebook.

Markdown Basics
In this section, you will be writing the following elements in Markdown.

Headers : Continuous 1 to 6 Hash Symbols are used to create Headers.

Emphasizing Text : Asterix *, or underscores _ are used to emphasize the text in

bold or italic.

Markdown Basics
Unordered Lists : Either of the symbols - Asterix *, hypen -, plus + are used.

Ordered Lists : Numbers followed with a dot . and

a space are used.

Nested Unordered Lists : The nested lists are indexed with a minimum of four spaces
and followed with symbols.

Justifying Text of a list element : Two spaces, at the end of each line, are used
to justify multiple lines of text.

Code snippets: Pair of three back quotes are used.

Hyperlinks: Text, written in a pair of square brackets, is linked to a Hyperlink,

specified in a pair of parenthesis.

Reference Links: Text and Reference both are written in two different pairs of
square brackets.

HTML Content : HTML tags can be directly used in Markdown.

Writing Your First Notebook

The above-shown GIF performs the following tasks in the notebook - MyFirstNoteBook.

Defines the string 's' with value Welcome to Jupyter Notebooks!!!.

Displays the string 's'.

Provides the required description.

The above GIF illustrates performing the following, additional tasks in
MyFirstNoteBook.

Determines the length of 's'.

Obtains the slice Jupyter Notebooks from 's'.

Find the number of vowels in 's'.

Filter the words starting with either 'J' or 'N'.

Provides titles as required.

Numpy
NumPy is a Python library, which supports efficient handling of various numerical
operations on arrays holding numeric data.

These arrays are known as N-dimensional arrays or ndarrays.

Ndarrays are capable of holding data elements in multiple dimensions.

Each data element of a ndarray is of fixed size.

All elements of a ndarray are of same data type.

N-dimensional array (ndarray)

N-dimensional array is an object, capable of holding data elements of same type and
of a fixed size in multiple dimensions.

Creation of a 1-D array of five elements, from a list is shown in Example 1.

Example 1

import numpy as np
x = np.array([5, 8,
9, 10,
11]) # using 'array' method

type(x) # Displays type of array 'x'

Output

numpy.ndarray

N-dimensional array (ndarray)...

Creation of a 2-D array from a list of lists is shown in Example 2.
Example 2

y = np.array([[6, 9, 5],
[10, 82, 34]])
print(y)
Output

array([[ 6, 9, 5],
[10, 82, 34]])

ndarray Attributes
Some of the important attributes of a ndarray are

ndim : Returns number of dimensions.

shape: Returns Shape in tuple.

size : Total number of elements.

dtype : Type of each element.

itemsize : Size of each element in Bytes.

nbytes : Total bytes consumed by all elements.

Example 3

print(y.ndim, y.shape, y.size, y.dtype, y.itemsize, y.nbytes)

Output

2 (2, 3) 6 int32 4 24

Numpy dtypes
Numpy supports various data types based on number of bytes required by the data
elements.

Data type can be explicitly specified with dtype argument.

A ndarray, holding float values is defined in Example 4.

Example 4

y = np.array([[6, 9, 5],
[10, 82, 34]],
dtype='float64')
print(y)
print(y.dtype)
Output

array([[ 6., 9., 5.],

[ 10., 82., 34.]])
float64

++++
def array_operations(l):
#Write your code below
x = np.array(l)
print(type(x),
print(x.ndim, x.shape, x.size))

Numpy Array creation

N-dimensional arrays or ndarray can be created in multiple ways in numpy.

Now let us focus on creating ndarray,

From Python built-in datatypes : lists or tuples

Using Numpy array creation methods like ones, ones_like, zeros, zeros_like

Using Numpy numeric sequence generators.

Using Numpy random module.

By reading data from a file.

Image Processing
No ratings yet
Image Processing
5 pages
Import As From Import Import: Problem 1
100% (1)
Import As From Import Import: Problem 1
5 pages
Fresco
100% (2)
Fresco
17 pages
Python Hands On Answers
No ratings yet
Python Hands On Answers
15 pages
Stat
No ratings yet
Stat
5 pages
Image Classification Hands-On
100% (1)
Image Classification Hands-On
1 page
Python Funstinos and OOPS
No ratings yet
Python Funstinos and OOPS
7 pages
FP Chef-Titan - Python
No ratings yet
FP Chef-Titan - Python
5 pages
Maven Colasecing Pipeline Ans - Apr16
No ratings yet
Maven Colasecing Pipeline Ans - Apr16
4 pages
Python-Module03-Case Study03
100% (1)
Python-Module03-Case Study03
2 pages
Unstructured Data Classification
No ratings yet
Unstructured Data Classification
5 pages
Unstructured
No ratings yet
Unstructured
37 pages
Data Visualization New
No ratings yet
Data Visualization New
3 pages
Python 3 Programming
No ratings yet
Python 3 Programming
3 pages
Azure ML Fresco - Toaz - Info
No ratings yet
Azure ML Fresco - Toaz - Info
28 pages
Python Pandas MCQs
No ratings yet
Python Pandas MCQs
7 pages
Python 3 Functions and OOPs
No ratings yet
Python 3 Functions and OOPs
7 pages
Image Classification Handson-Image - Test
No ratings yet
Image Classification Handson-Image - Test
5 pages
E1 Fresco Prob3 Correct
No ratings yet
E1 Fresco Prob3 Correct
1 page
Descriptor
No ratings yet
Descriptor
4 pages
Machine Learning Scikit Handson
No ratings yet
Machine Learning Scikit Handson
4 pages
Python TCS
0% (1)
Python TCS
6 pages
Python Qualis
No ratings yet
Python Qualis
6 pages
Nodejs Mock Test III
No ratings yet
Nodejs Mock Test III
6 pages
Must Know in D3js
100% (1)
Must Know in D3js
1 page
Basics of Statistics and Probability - FP: Statistical Measures
No ratings yet
Basics of Statistics and Probability - FP: Statistical Measures
12 pages
Data Handling Using R
No ratings yet
Data Handling Using R
2 pages
Tensor Flow
No ratings yet
Tensor Flow
2 pages
Association Rule Mining
100% (2)
Association Rule Mining
2 pages
Fresco
No ratings yet
Fresco
29 pages
Python 3 Programming Q & A
No ratings yet
Python 3 Programming Q & A
4 pages
Intuitive Visualization Basics
0% (1)
Intuitive Visualization Basics
2 pages
This Study Resource Was
No ratings yet
This Study Resource Was
5 pages
Powershell
No ratings yet
Powershell
4 pages
Grail
No ratings yet
Grail
23 pages
New Text Document
No ratings yet
New Text Document
10 pages
Machine Learning Scikit Handson
0% (1)
Machine Learning Scikit Handson
4 pages
Hands On Python Qualis Pytest
No ratings yet
Hands On Python Qualis Pytest
7 pages
Data Mining Nostos
100% (1)
Data Mining Nostos
39 pages
Scala - The Diatonic Syallable
No ratings yet
Scala - The Diatonic Syallable
2 pages
Abstract Class 1
No ratings yet
Abstract Class 1
1 page
Python 3 Oops Hands On
No ratings yet
Python 3 Oops Hands On
7 pages
Unstructured Data Classification
No ratings yet
Unstructured Data Classification
2 pages
Machine Learning - Exploring The Model Q&A.txt TCS
100% (1)
Machine Learning - Exploring The Model Q&A.txt TCS
1 page
Java8 Innards
No ratings yet
Java8 Innards
4 pages
R
No ratings yet
R
15 pages
Unstructured Data Classification Handson
No ratings yet
Unstructured Data Classification Handson
4 pages
Redux Async
No ratings yet
Redux Async
3 pages
Clustering - The Data Ensemble Q&A
No ratings yet
Clustering - The Data Ensemble Q&A
2 pages
AngularJS 1.x Routers and Custom Directives Q&A
No ratings yet
AngularJS 1.x Routers and Custom Directives Q&A
4 pages
Automation Anywhere
No ratings yet
Automation Anywhere
3 pages
Continuous Integration
No ratings yet
Continuous Integration
6 pages
Python Pandas Hands-On CID 55937
No ratings yet
Python Pandas Hands-On CID 55937
10 pages
Advance Statistics & Probability Q & A
100% (3)
Advance Statistics & Probability Q & A
2 pages
Python Classes Q&A - 1
No ratings yet
Python Classes Q&A - 1
19 pages
Statistics and Probability Katabasis
0% (4)
Statistics and Probability Katabasis
1 page
In Cryptographic Terms
No ratings yet
In Cryptographic Terms
3 pages
Num Py
No ratings yet
Num Py
20 pages
Python Programming Development Environment Set-Up
No ratings yet
Python Programming Development Environment Set-Up
19 pages
DSF - Unit II Notes
No ratings yet
DSF - Unit II Notes
43 pages
Gartner Predicts Procurement Data Challenges and Rapid Change 2025
No ratings yet
Gartner Predicts Procurement Data Challenges and Rapid Change 2025
2 pages
Statistics
No ratings yet
Statistics
1 page
Statistics and Probability Katabasis
No ratings yet
Statistics and Probability Katabasis
7 pages
Q and A For Job Interview
No ratings yet
Q and A For Job Interview
2 pages
Data Cleansing Using R
0% (1)
Data Cleansing Using R
10 pages
Data Handling Using R
No ratings yet
Data Handling Using R
1 page
Advanced Regression
No ratings yet
Advanced Regression
13 pages
Clustering - The Data Ensemble
No ratings yet
Clustering - The Data Ensemble
4 pages
PCNE Workbook
No ratings yet
PCNE Workbook
83 pages
End-to-End Developer Journey On GKE Ebook 02
No ratings yet
End-to-End Developer Journey On GKE Ebook 02
37 pages
Solved ISRO Scientist or Engineer Electronics 2006 Paper With Solutions
No ratings yet
Solved ISRO Scientist or Engineer Electronics 2006 Paper With Solutions
23 pages
Best Graphic Design Institute in Noida
No ratings yet
Best Graphic Design Institute in Noida
8 pages
Discrete Mathematics PRAGATI PILANIA
100% (1)
Discrete Mathematics PRAGATI PILANIA
189 pages
Rexroth Visualmotion 11 Multi Axis Machine Control Dok Vismot VM 11vrs pr02 Indramat
No ratings yet
Rexroth Visualmotion 11 Multi Axis Machine Control Dok Vismot VM 11vrs pr02 Indramat
169 pages
Collaborative Development of ICT Content
No ratings yet
Collaborative Development of ICT Content
33 pages
Number: 300-100 Passing Score: 800 Time Limit: 120 Min
No ratings yet
Number: 300-100 Passing Score: 800 Time Limit: 120 Min
19 pages
BB - Nang Cao Ea
No ratings yet
BB - Nang Cao Ea
3 pages
Lecture 3 - Ch2 (2.6-2.9) - Probability
No ratings yet
Lecture 3 - Ch2 (2.6-2.9) - Probability
25 pages
Resume - Lalit - P2P R2R
No ratings yet
Resume - Lalit - P2P R2R
2 pages
Troubleshooting Guide imageRUNNER ADVANCE C5560 Series - en - 03.0
No ratings yet
Troubleshooting Guide imageRUNNER ADVANCE C5560 Series - en - 03.0
7 pages
Gs11y01d01 01en
No ratings yet
Gs11y01d01 01en
25 pages
Spectrum's History Ready Reference Data
No ratings yet
Spectrum's History Ready Reference Data
13 pages
LMS SRS
No ratings yet
LMS SRS
8 pages
Slang and Jargon
No ratings yet
Slang and Jargon
20 pages
Pointers in C++
No ratings yet
Pointers in C++
11 pages
Using Impact Mapping in Agile Software Product Management
100% (1)
Using Impact Mapping in Agile Software Product Management
45 pages
Designer
No ratings yet
Designer
34 pages
IJISAE Template
No ratings yet
IJISAE Template
4 pages
rm2000 030
No ratings yet
rm2000 030
40 pages
Tema 7
No ratings yet
Tema 7
6 pages
A Guide To The National Initiative For Cybersecurity Education (NICE) Cybersecurity Workforce Framework (2.0) (PDFDrive)
No ratings yet
A Guide To The National Initiative For Cybersecurity Education (NICE) Cybersecurity Workforce Framework (2.0) (PDFDrive)
554 pages
Procureport: Your E-Procurement Partner
No ratings yet
Procureport: Your E-Procurement Partner
37 pages
HW 3 Sol
No ratings yet
HW 3 Sol
5 pages
Face Detection Thesis PDF
100% (3)
Face Detection Thesis PDF
4 pages
The Sorcerer's Apprentice Guide To Fault Attacks: Proceedings of The IEEE August 2004
No ratings yet
The Sorcerer's Apprentice Guide To Fault Attacks: Proceedings of The IEEE August 2004
14 pages
Paper Untuk Prop Ta
No ratings yet
Paper Untuk Prop Ta
6 pages
PoE Surge Protector Series - QIG - 230615
No ratings yet
PoE Surge Protector Series - QIG - 230615
2 pages
QX ONE Droplet Digital PCR System
No ratings yet
QX ONE Droplet Digital PCR System
4 pages
4629 Susilo & Abdurrahman
No ratings yet
4629 Susilo & Abdurrahman
9 pages
JADAK HS 1M Series N User Manual 2
No ratings yet
JADAK HS 1M Series N User Manual 2
90 pages