0% found this document useful (0 votes)

69 views49 pages

Python For Data Analysis

The document discusses the importance of data analysis and the role of data analysts, emphasizing the need for programming skills, domain expertise, and data visualization abilities. It highlights Python as a preferred language for data analysis due to its simplicity, speed, and extensive libraries like Pandas, NumPy, and SciPy. Additionally, it covers the installation of relevant packages and provides an overview of key data structures in Pandas, including Series and DataFrame, along with their functionalities.

Uploaded by

mcadepartment.pibm

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

69 views49 pages

Python For Data Analysis

Uploaded by

mcadepartment.pibm

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 49

PYTHON FOR DATA

ANALYSIS

Chapter:8
Prof. Priya Mathurkar
INTRODUCTION

• Data is the new Oil. This statement shows how every modern IT system is
driven by capturing, storing and analysing data for various needs.
• Be it about making decision for business, forecasting weather, studying
protein structures in biology or designing a marketing campaign. All of these
scenarios involve a multidisciplinary approach of using mathematical models,
statistics, graphs, databases and of course the business or scientific logic
behind the data analysis.
DATA SCIENCE

• Data science is the process of deriving knowledge and insights from a huge and diverse
set of data through organizing, processing and analyzing the data.
• It involves many different disciplines like mathematical and statistical modelling,
extracting data from it source and applying data visualization techniques.
• Often it also involves handling big data technologies to gather both structured and
unstructured data.
• Below we will see some example scenarios where Data science is used.
• Recommendation systems
• Financial Risk management
• Improvement in Health Care services
THE ROLE OF A DATA ANALYST

• A data analyst uses programming tools to mine large amounts of complex data, and find relevant information from
this data.
• In short, an analyst is someone who derives meaning from messy data. A data analyst needs to have skills in the
following areas, in order to be useful in the workplace:
• Domain Expertise — In order to mine data and come up with insights that are relevant to their workplace, an
analyst needs to have domain expertise.
• Programming Skills —As a data analyst, you will need to know the right libraries to use in order to clean data, mine,
and gain insights from it.
• Statistics — An analyst might need to use some statistical tools to derive meaning from data.
• Visualization Skills — A data analyst needs to have great data visualization skills, in order to summarize and
present data to a third party.
• Storytelling — Finally, an analyst needs to communicate their findings to a stakeholder or client. This means that
they will need to create a data story, and have the ability to narrate it.
WHY LEARN PYTHON FOR DATA
ANALYSIS?
• Here are some reasons which go in favour of learning Python:
• Open Source – free to install
• Awesome online community
• Very easy to learn
• Can become a common language for data science and production of web based
analytics products.
• A simple and easy to learn language which achieves result in fewer lines of code than other
similar languages like R. Its simplicity also makes it robust to handle complex scenarios with
minimal code and much less confusion on the general flow of the program.
• It is cross platform, so the same code works in multiple environments without needing any
change. That makes it perfect to be used in a multi-environment setup easily.
• It executes faster than other similar languages used for data analysis like R and MATLAB.
• Its excellent memory management capability, especially garbage collection makes it versatile
in gracefully managing very large volume of data transformation, slicing, dicing and
visualization.
• Most importantly Python has got a very large collection of libraries which serve as special
purpose analysis tools. For example – the NumPy package deals with scientific computing and
its array needs much less memory than the conventional python list for managing numeric
data. And the number of such packages is continuously growing.
• Python has packages which can directly use the code from other languages like Java or C.
This helps in optimizing the code performance by using existing code of other languages,
whenever it gives a better result.
INSTALLING SCIPY PACK

• The best way to enable the required packs is to use an installable binary package specific to your
operating system. These binaries contain full SciPy stack (inclusive of NumPy, SciPy, matplotlib, IPython,
SymPy and nose packages along with core Python).
• Windows
• Anaconda (from www.continuum.io) is a free Python distribution for SciPy stack. It is also available for
Linux and Mac.

• Canopy (www.enthought.com/products/canopy/) is available as free as well as commercial distribution with

full SciPy stack for Windows, Linux and Mac.

• Python (x,y): It is a free Python distribution with SciPy stack and Spyder IDE for Windows OS.
(Downloadable from www.python-xy.github.io/)
LIBRARIES FOR SCIENTIFIC
COMPUTATIONS AND DATA ANALYSIS:
• Following are a list of libraries, you will need for any scientific computations
and data analysis:
• NumPy stands for Numerical Python. The most powerful feature of NumPy is
n-dimensional array. This library also contains basic linear algebra functions,
Fourier transforms, advanced random number capabilities and tools for
integration with other low level languages like Fortran, C and C++
• SciPy stands for Scientific Python. SciPy is built on NumPy. It is one of the
most useful library for variety of high level science and engineering modules
like discrete Fourier transform, Linear Algebra, Optimization and Sparse
matrices.
• Matplotlib for plotting vast variety of graphs, starting from histograms to
line plots to heat plots..
• Pandas for structured data operations and manipulations. It is extensively
used for data munging and preparation. Pandas were added relatively
recently to Python and have been instrumental in boosting Python’s usage in
data scientist community.
• Scikit Learn for machine learning. Built on NumPy, SciPy and matplotlib, this
library contains a lot of efficient tools for machine learning and statistical
modeling including classification, regression, clustering and dimensionality
reduction.
• Statsmodels for statistical modeling. Statsmodels is a Python module that
allows users to explore data, estimate statistical models, and perform
statistical tests. An extensive list of descriptive statistics, statistical tests,
plotting functions, and result statistics are available for different types of
data and each estimator.
• Seaborn for statistical data visualization. Seaborn is a library for making
attractive and informative statistical graphics in Python. It is based on
matplotlib. Seaborn aims to make visualization a central part of exploring and
understanding data.
• Bokeh for creating interactive plots, dashboards and data applications on
modern web-browsers. It empowers the user to generate elegant and concise
graphics in the style of D3.js. Moreover, it has the capability of high-
performance interactivity over very large or streaming datasets.
• Blaze for extending the capability of Numpy and Pandas to distributed and
streaming datasets. It can be used to access data from a multitude of
sources including Bcolz, MongoDB, SQLAlchemy, Apache Spark, PyTables,
etc. Together with Bokeh, Blaze can act as a very powerful tool for creating
effective visualizations and dashboards on huge chunks of data.
• Scrapy for web crawling. It is a very useful framework for getting specific
patterns of data. It has the capability to start at a website home url and then
dig through web-pages within the website to gather information.
• SymPy for symbolic computation. It has wide-ranging capabilities from basic
symbolic arithmetic to calculus, algebra, discrete mathematics and quantum
physics. Another useful feature is the capability of formatting the result of
the computations as LaTeX code.
• Requests for accessing the web. It works similar to the the standard python
library urllib2 but is much easier to code. You will find subtle differences with
urllib2 but for beginners, Requests might be more convenient.
PYTHON - PANDAS

• Pandas is an open-source Python Library used for high-performance data

manipulation and data analysis using its powerful data structures.
• Python with pandas is in use in a variety of academic and commercial domains,
including Finance, Economics, Statistics, Advertising, Web Analytics, and more.
• Using Pandas, we can accomplish five typical steps in the processing and
analysis of data, regardless of the origin of data — load, organize, manipulate,
model, and analyse the data.
• The name "Pandas" has a reference to both "Panel Data", and "Python Data
Analysis" and was created by Wes McKinney in 2008.
KEY FEATURES OF PANDAS

• Below are the some of the important features of Pandas which is used specifically for Data processing
and Data analysis work.
• Fast and efficient DataFrame object with default and customized indexing.
• Tools for loading data into in-memory data objects from different file formats.
• Data alignment and integrated handling of missing data.
• Reshaping and pivoting of date sets.
• Label-based slicing, indexing and subsetting of large data sets.
• Columns from a data structure can be deleted or inserted.
• Group by data for aggregation and transformations.
• High performance merging and joining of data.
• Time Series functionality.

• Pandas deals with the following three data structures −

• Series
• DataFrame
These data structures are built on top of Numpy array, making them fast and efficient.
INSTALLATION OF PANDAS

• If you have Python and PIP already installed on a system, then installation of
Pandas is very easy.
• Install it using this command:
• C:\Users\Your Name>pip install pandas
DIMENSION & DESCRIPTION

• The best way to think of these data structures is that the higher dimensional
data structure is a container of its lower dimensional data structure. For
example, DataFrame is a container of Series, Panel is a container of
DataFrame.
Data Dimensions Description
Structure
Series 1 1D labeled homogeneous array, size-immutable.
Data Frames 2 General 2D labeled, size-mutable tabular structure
with potentially heterogeneously typed columns.

• DataFrame is widely used and it is the most important data structures.

SERIES

• Series is a one-dimensional array like structure with homogeneous data. For

example, the following series is a collection of integers 10, 23, 56, …

• Key Points of Series

• Homogeneous data
• Size Immutable
• Values of Data Mutable
• A pandas Series can be created using the following constructor −

pandas.Series( data, index, dtype, copy)

• A series can be created using various inputs like −
• Array
• Dict
• Scalar value or constant

• Example:

import pandas as pd
a = [1, 7, 2]
myvar = pd.Series(a)
print(myvar)
LABELS
• If nothing else is specified, the values are labeled with their index number. First value has index 0,
second value has index 1 etc.
• This label can be used to access a specified value.
print(myvar[0])

Create Labels
• With the index argument, you can name your own labels.
• Example

import pandas as pd
a = [1, 7, 2]
myvar = pd.Series(a, index = ["x", "y", "z"])
print(myvar)

print(myvar["y"])
KEY/VALUE OBJECTS AS SERIES

• You can also use a key/value object, like a dictionary, when creating a Series.
import pandas as pd
calories = {"day1": 420, "day2": 380, "day3": 390} # The keys of the dictionary become the
labels.
myvar = pd.Series(calories)
print(myvar)

myvar = pd.Series(calories, index = ["day1", "day2"]) # To select only some of the items in the dictionary,

print(myvar)
DATAFRAME

• DataFrame is a two-dimensional array with heterogeneous data. For example,

Name Age Gender Rating

Steve 32 Male 3.45
Lia 28 Female 4.6
Vin 45 Male 3.9
Katie 38 Female 2.78

The table represents the data of a sales team of an organization with their overall
performance rating. The data is represented in rows and columns. Each column
represents an attribute and each row represents a person.
DATA TYPE OF COLUMNS

• The data types of the four columns are as follows −

Column Type
Name String
Age Integer
Gender String
Rating Float

• Key Points of Data Frame

• Heterogeneous data
• Size Mutable
• Data Mutable
CREATE DATAFRAME

• A pandas DataFrame can be created using the following constructor −

pandas.DataFrame( data, index, columns, dtype, copy)

• A pandas DataFrame can be created using various inputs like −

• Lists
• dict
• Series
• Numpy ndarrays
• Another DataFrame
• Parameters
• Data: ndarray (structured or homogeneous), Iterable, dict, or DataFrame
• Dict can contain Series, arrays, constants, dataclass or list-like objects. If data is a dict,
column order follows insertion-order. If a dict contains Series which have an index
defined, it is aligned by its index.
• index: Index or array-like
• Index to use for resulting frame. Will default to RangeIndex if no indexing information
part of input data and no index provided.
• columns: Index or array-like
• Column labels to use for resulting frame when data does not have them, defaulting to
RangeIndex(0, 1, 2, …, n). If data contains column labels, will perform column selection
instead.
• dtype: dtype, default None
• Data type to force. Only a single dtype is allowed. If None, infer.
• copy: bool or None, default None
• Copy data from inputs. For dict data, the default of None behaves like copy=True. For
DataFrame or 2d ndarray input, the default of None behaves like copy=False.
Example:1
import pandas as pd
df = pd.DataFrame() #Empty Dataframe
print (df)

Example:2
import pandas as pd
data = [1,2,3,4,5]
df = pd.DataFrame(data) #Create a DataFrame from Lists
print df

Example:3
import pandas as pd
data = [['Alex',10],['Bob',12],['Clarke',13]]
df = pd.DataFrame(data,columns=['Name','Age']) #Create a DataFrame from Lists
print df
Example:4
import pandas as pd
data = [['Alex',10],['Bob',12],['Clarke',13]]
df = pd.DataFrame(data,columns=['Name','Age'],dtype=float)
print df

Example:5
import pandas as pd
data = {'Name':['Tom', 'Jack', 'Steve', 'Ricky'],'Age':[28,34,29,42]}
df = pd.DataFrame(data)
print df
• Locate Row
• Pandas use the loc attribute to return one or more specified row(s)
• print(df.loc[0]) # if indexes are integer
• print(df.loc[[0, 1]])

• Named Indexes
• With the index argument, you can name your own indexes.
data = {
"calories": [420, 380, 390],
"duration": [50, 40, 45]
}

df = pd.DataFrame(data, index = ["day1", "day2", "day3"])

• Locate Named Indexes
• Use the named index in the loc attribute to return the specified row(s).
• #refer to the named index:
print(df.loc["day2"]) #use iloc if named indexes and index number has to pass

• Attributes and methods in DataFrame:

• Please refer this website
• https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html
LOAD FILES INTO A DATAFRAME

• If your data sets are stored in a file, Pandas can load them into a DataFrame.
• Example
• Load a comma separated file (CSV file) into a DataFrame:
import pandas as pd
df = pd.read_csv('data.csv')
print(df)
print(df.to_string()) : use to_string() to print the entire DataFrame.
PANDAS READ CSV

• Read CSV Files

• A simple way to store big data sets is to use CSV files (comma separated files).
• CSV files contains plain text and is a well know format that can be read by
everyone including Pandas.
• In our examples we will be using a CSV file called ‘iris.csv'.
• max_rows
• The number of rows returned is defined in Pandas option settings.
• You can check your system's maximum rows with the
• pd.options.display.max_rows statement.
• In my system the number is 60, which means that if the DataFrame contains more
than 60 rows, the print(df) statement will return only the headers and the first and
last 5 rows.
• You can change the maximum rows number with the same statement.
• pd.options.display.max_rows = 9999
WORKING WITH MISSING DATA

• Missing Data can occur when no information is provided for one or more
items or for a whole unit. Missing Data is a very big problem in real life
scenario. Missing Data can also refer to as NA(Not Available) values in
pandas.
• Checking for missing values using isnull() and notnull() :
• In order to check missing values in Pandas DataFrame, we use a function
isnull() and notnull(). Both function help in checking whether a value is NaN
or not. These function can also be used in Pandas Series in order to find null
values in a series.
PANDAS - CLEANING EMPTY CELLS

Remove Rows
• One way to deal with empty cells is to remove rows that contain empty cells.
• This is usually OK, since data sets can be very big, and removing a few rows
will not have a big impact on the result.
• df.dropna()

• If you want to change the original DataFrame, use the inplace = True
argument:
• df.dropna(inplace = True)

Note: In our cleaning examples we will be using a CSV file called 'dirtydata.csv'.
Replace Empty Values:
• Another way of dealing with empty cells is to insert a new value instead.
• This way you do not have to delete entire rows just because of some empty
cells.
• The fillna() method allows us to replace empty cells with a value:
• df.fillna(130, inplace = True)

Replace Only For Specified Columns

• The example above replaces all empty cells in the whole Data Frame.
• To only replace empty values for one column, specify the column name for
the DataFrame:
• df["Calories"].fillna(130, inplace = True)
Replace Using Mean, Median, or Mode
• A common way to replace empty cells, is to calculate the mean, median or
mode value of the column.
• Pandas uses the mean() median() and mode() methods to calculate the
respective values for a specified column:
• Calculate the MEAN, and replace any empty values with it:
• x = df["Calories"].mean()
df["Calories"].fillna(x, inplace = True)

• Calculate the MEDIAN, and replace any empty values with it:
• x = df["Calories"].median()
df["Calories"].fillna(x, inplace = True)

• Calculate the MODE, and replace any empty values with it:
• x = df["Calories"].mode()[0]
df["Calories"].fillna(x, inplace = True)
• Discovering Duplicates
• Duplicate rows are rows that have been registered more than one time.
• To discover duplicates, we can use the duplicated() method.
• The duplicated() method returns a Boolean values for each row:
• print(df.duplicated())
• df.drop_duplicates(inplace = True)
PANDAS - FIXING WRONG DATA

Wrong Data
• "Wrong data" does not have to be "empty cells" or "wrong format", it can just be
wrong, like if someone registered "199" instead of "1.99".
• Sometimes you can spot wrong data by looking at the data set, because you have an
expectation of what it should be.
• If you take a look at our data set, you can see that in row 7, the duration is 450, but
for all the other rows the duration is between 30 and 60.
• It doesn't have to be wrong, but taking in consideration that this is the data set of
someone's workout sessions, we conclude with the fact that this person did not work
out in 450 minutes.
Replacing Values
• One way to fix wrong values is to replace them with something else.
• In our example, it is most likely a typo, and the value should be "45" instead
of "450", and we could just insert "45" in row 7:
• df.loc[7, 'Duration'] = 45

• For small data sets you might be able to replace the wrong data one by one,
but not for big data sets.
• To replace wrong data for larger data sets you can create some rules, e.g. set
some boundaries for legal values, and replace any values that are outside of
the boundaries.
• Example
• for x in df.index:
if df.loc[x, "Duration"] > 120:
df.loc[x, "Duration"] = 120
• Removing Rows
• Another way of handling wrong data is to remove the rows that contains wrong
data.
• This way you do not have to find out what to replace them with, and there is a
good chance you do not need them to do your analyses.
• Delete rows where "Duration" is higher than 120:
• for x in df.index:
if df.loc[x, "Duration"] > 120:
df.drop(x, inplace = True)
CLEANING DATA OF WRONG FORMAT

Data of Wrong Format

• Cells with data of wrong format can make it difficult, or even impossible, to
analyze data.
• To fix it, you have two options: remove the rows, or convert all cells in the
columns into the same format.
Convert Into a Correct Format
• In our Data Frame, we have two cells with the wrong format. Check out row
at index 22 and 26, the 'Date' column should be a string that represents a
date:
• Let's try to convert all cells in the 'Date' column into dates.
• Pandas has a to_datetime() method for this:
• Example
• Convert to date:
import pandas as pd
df = pd.read_csv('data.csv')
df['Date'] = pd.to_datetime(df['Date'])
print(df.to_string())
Removing Rows
• The result from the converting in the example above gave us a NaT value,
which can be handled as a NULL value, and we can remove the row by using
the dropna() method.
• Remove rows with a NULL value in the "Date" column:
• df.dropna(subset=['Date'], inplace = True)
PYTHON - NUMPY

• NumPy is a Python package which stands for 'Numerical Python'. It is a

library consisting of multidimensional array objects and a collection of
routines for processing of array.
• Operations using NumPy
• Using NumPy, a developer can perform the following operations −
• Mathematical and logical operations on arrays.
• Fourier transforms and routines for shape manipulation.
• Operations related to linear algebra. NumPy has in-built functions for linear
algebra and random number generation.
NUMPY – A REPLACEMENT FOR MATLAB

• NumPy is often used along with packages like SciPy (Scientific Python) and
Mat−plotlib (plotting library). This combination is widely used as a
replacement for MatLab, a popular platform for technical computing.
However, Python alternative to MatLab is now seen as a more modern and
complete programming language.

• It is open source, which is an added advantage of NumPy.

• https://numpy.org/doc/stable/user/quickstart.html
NDARRAY OBJECT

• The most important object defined in NumPy is an N-dimensional array type

called ndarray.
• It describes the collection of items of the same type.
• Items in the collection can be accessed using a zero-based index.
• Every item in an ndarray takes the same size of block in the memory.
• Each element in ndarray is an object of data-type object (called dtype).
• Any item extracted from ndarray object (by slicing) is represented by a
Python object of one of array scalar types.
PYTHON - MATPLOTLIB

• Matplotlib is a python library used to create 2D graphs and plots by using python scripts.
• It has a module named pyplot which makes things easy for plotting by providing feature to
control line styles, font properties, formatting axes etc.
• It supports a very wide variety of graphs and plots namely - histogram, bar charts, power
spectra, error charts etc.
• It is used along with NumPy to provide an environment that is an effective open source
alternative for MatLab.
• It can also be used with graphics toolkits like PyQt and wxPython.
• Conventionally, the package is imported into the Python script by adding the following
statement −
• from matplotlib import pyplot as plt
MATPLOTLIB EXAMPLE

• The following script produces the • Its output is as follows −

sine wave plot using matplotlib.
• https://www.analyticsvidhya.com/blog/2016/01/complete-tutorial-learn-data-s
cience-python-scratch-2
/
• https://
www.tutorialspoint.com/python_data_science/python_data_operations.htm

Apache Cassandra Administrator Associate - Exam Practice Tests
From Everand
Apache Cassandra Administrator Associate - Exam Practice Tests
Cristian Scutaru
No ratings yet
Python Foundations For Data Analysis
67% (3)
Python Foundations For Data Analysis
339 pages
Python Libraries and Packages For Data Science
100% (1)
Python Libraries and Packages For Data Science
5 pages
Data Engineering with Scala and Spark: Build streaming and batch pipelines that process massive amounts of data using Scala
From Everand
Data Engineering with Scala and Spark: Build streaming and batch pipelines that process massive amounts of data using Scala
Eric Tome
No ratings yet
HDInsight Essentials - Second Edition
From Everand
HDInsight Essentials - Second Edition
Rajesh Nadipalli
No ratings yet
2 IntroPython
No ratings yet
2 IntroPython
18 pages
PYTHON
No ratings yet
PYTHON
11 pages
Python
No ratings yet
Python
23 pages
Python For Data Science
No ratings yet
Python For Data Science
8 pages
Python 2
No ratings yet
Python 2
18 pages
Python Tutorial
No ratings yet
Python Tutorial
18 pages
TY FDS Workbook
No ratings yet
TY FDS Workbook
56 pages
Introduction-It Skills
No ratings yet
Introduction-It Skills
20 pages
Dsbda Unit4
No ratings yet
Dsbda Unit4
110 pages
Data Science Using Python - Introduction
No ratings yet
Data Science Using Python - Introduction
6 pages
What Is Python?: Why Python For Data Science?
No ratings yet
What Is Python?: Why Python For Data Science?
3 pages
Data Science 2
No ratings yet
Data Science 2
15 pages
Top 18 Python Libraries
100% (1)
Top 18 Python Libraries
11 pages
Data Visualization
No ratings yet
Data Visualization
25 pages
Python Programming1
No ratings yet
Python Programming1
27 pages
Python Libraries Seminar Report
100% (2)
Python Libraries Seminar Report
16 pages
Suraj Report File
No ratings yet
Suraj Report File
17 pages
Python Data Analysis Sample Chapter
No ratings yet
Python Data Analysis Sample Chapter
40 pages
Data Science I: Charles C.N. Wang
No ratings yet
Data Science I: Charles C.N. Wang
68 pages
DDI Book Chapter Tools and Techniques
No ratings yet
DDI Book Chapter Tools and Techniques
13 pages
Data Ty
No ratings yet
Data Ty
59 pages
Python For Data Science
No ratings yet
Python For Data Science
20 pages
PythonDASE - 2025 Version1
No ratings yet
PythonDASE - 2025 Version1
44 pages
Getting Started With Python Data Analysis - Sample Chapter
0% (1)
Getting Started With Python Data Analysis - Sample Chapter
17 pages
Basic Libraries For Data Science
No ratings yet
Basic Libraries For Data Science
4 pages
DAL EXT 1 and 2
No ratings yet
DAL EXT 1 and 2
125 pages
Python Libraries For Data Science 1679435534
No ratings yet
Python Libraries For Data Science 1679435534
64 pages
DS Unit 1 - NUMPY
No ratings yet
DS Unit 1 - NUMPY
29 pages
Ds Module 1
No ratings yet
Ds Module 1
72 pages
Python Libraries
No ratings yet
Python Libraries
17 pages
AIES Assignment1
No ratings yet
AIES Assignment1
15 pages
Klein B. Data Analysis With Python. Numpy, Matplotlib and Pandas 2021
No ratings yet
Klein B. Data Analysis With Python. Numpy, Matplotlib and Pandas 2021
515 pages
Moocs jayashRA2111003011636
No ratings yet
Moocs jayashRA2111003011636
14 pages
Unit 1
No ratings yet
Unit 1
84 pages
Programming For Data Science
No ratings yet
Programming For Data Science
48 pages
Data Science Lecture No 5
No ratings yet
Data Science Lecture No 5
16 pages
PDS Unit1-1
No ratings yet
PDS Unit1-1
104 pages
Python For Data Exploration
No ratings yet
Python For Data Exploration
28 pages
Python Data Science - A Beginner's Guide To Mastering Analysis, Visualization, and Machine Learning by A. Eich Liana
No ratings yet
Python Data Science - A Beginner's Guide To Mastering Analysis, Visualization, and Machine Learning by A. Eich Liana
86 pages
Ass1 DSBDA Writeup
No ratings yet
Ass1 DSBDA Writeup
8 pages
CH 4
No ratings yet
CH 4
17 pages
Machine Learning Lecture2
No ratings yet
Machine Learning Lecture2
38 pages
Python for Data Analysis
No ratings yet
Python for Data Analysis
15 pages
1 Introduction Python Programming For Data Science
No ratings yet
1 Introduction Python Programming For Data Science
11 pages
Introduction To Data Science - 1650687630477
No ratings yet
Introduction To Data Science - 1650687630477
34 pages
Data Science Lecture 5 6th Semster
No ratings yet
Data Science Lecture 5 6th Semster
3 pages
Python Ca22
No ratings yet
Python Ca22
14 pages
Python Libraries 2
No ratings yet
Python Libraries 2
80 pages
Igual-SeguÃ 2017 Chapter ToolboxesForDataScientists
No ratings yet
Igual-SeguÃ 2017 Chapter ToolboxesForDataScientists
24 pages
Introduction To Python
No ratings yet
Introduction To Python
71 pages
Data Analysis With Pandas
No ratings yet
Data Analysis With Pandas
7 pages
Stats Unit1
No ratings yet
Stats Unit1
27 pages
10 Essential Python Libraries For Data Professionals - by Sigli Mumuni - Medium
No ratings yet
10 Essential Python Libraries For Data Professionals - by Sigli Mumuni - Medium
6 pages
Top 20 Python Libraries For Data Science
No ratings yet
Top 20 Python Libraries For Data Science
15 pages
Data Analytics and Visualization With Python 1728356869
No ratings yet
Data Analytics and Visualization With Python 1728356869
121 pages
Python Programming For Data Analysis
No ratings yet
Python Programming For Data Analysis
6 pages
Data Mining & Data Science Practical Slips
No ratings yet
Data Mining & Data Science Practical Slips
45 pages
BernhardGschaider-OFW11 pyFoamForTheLazy
No ratings yet
BernhardGschaider-OFW11 pyFoamForTheLazy
160 pages
Grace Python Numpy MB Final
No ratings yet
Grace Python Numpy MB Final
55 pages
End To End Project Multiple Disease Detection Using ML - Nomidl
No ratings yet
End To End Project Multiple Disease Detection Using ML - Nomidl
24 pages
ML Lab Manual 1-10
No ratings yet
ML Lab Manual 1-10
58 pages
Python Lab Manual 2023
No ratings yet
Python Lab Manual 2023
66 pages
Tanya 1 Resume 2
No ratings yet
Tanya 1 Resume 2
1 page
Santander Bank US EBA Stress Test Automation Project Playbook
No ratings yet
Santander Bank US EBA Stress Test Automation Project Playbook
20 pages
Data Science Curriculum
No ratings yet
Data Science Curriculum
22 pages
Data Science and Data Analytics Brochure Welcome To RISE INSTITUTE 1
No ratings yet
Data Science and Data Analytics Brochure Welcome To RISE INSTITUTE 1
13 pages
TSAnalyzer User Manual
No ratings yet
TSAnalyzer User Manual
16 pages
Ravi Teja Data Analyst 2 - 1749121237810 - Raviteja V
No ratings yet
Ravi Teja Data Analyst 2 - 1749121237810 - Raviteja V
4 pages
Student Grade Prediction
No ratings yet
Student Grade Prediction
35 pages
Class 12 Informatics Practices Sample Paper Set 15
No ratings yet
Class 12 Informatics Practices Sample Paper Set 15
10 pages
Diabetes Data Analysis Using Python Report
No ratings yet
Diabetes Data Analysis Using Python Report
15 pages
UNIT 4 (File Handling and Exception Handling)
No ratings yet
UNIT 4 (File Handling and Exception Handling)
15 pages
E-Book Data Cleaning Techniques in Python
100% (2)
E-Book Data Cleaning Techniques in Python
50 pages
Replicating Tradingview Chart in Python
No ratings yet
Replicating Tradingview Chart in Python
30 pages
Matplotlib 1722309886
No ratings yet
Matplotlib 1722309886
99 pages
Class 12 IP Practice Assignment Series 9
No ratings yet
Class 12 IP Practice Assignment Series 9
3 pages
Mythily Ramanathan - Data Analyst Resume PDF
No ratings yet
Mythily Ramanathan - Data Analyst Resume PDF
1 page
Cektitle
No ratings yet
Cektitle
701 pages
Sarkar, DR Tirthajyoti - Roychowdhury, Shubhadeep - Data Wrangling With Python - Creating Actionable Data From Raw Sources-Packt Publishing (2019)
No ratings yet
Sarkar, DR Tirthajyoti - Roychowdhury, Shubhadeep - Data Wrangling With Python - Creating Actionable Data From Raw Sources-Packt Publishing (2019)
538 pages
Python Re Pe
No ratings yet
Python Re Pe
9 pages
Resume Reviewer Report
No ratings yet
Resume Reviewer Report
12 pages
Rudra Aiml 1.4
No ratings yet
Rudra Aiml 1.4
4 pages
Data Science
No ratings yet
Data Science
3 pages
Question Bank Python For Data Science
0% (1)
Question Bank Python For Data Science
3 pages

Python For Data Analysis

Uploaded by

Python For Data Analysis

Uploaded by

PYTHON FOR DATA

• Canopy (www.enthought.com/products/canopy/) is available as free as well as commercial distribution with

• Pandas is an open-source Python Library used for high-performance data

• Pandas deals with the following three data structures −

• DataFrame is widely used and it is the most important data structures.

• Series is a one-dimensional array like structure with homogeneous data. For

• Key Points of Series

pandas.Series( data, index, dtype, copy)

• DataFrame is a two-dimensional array with heterogeneous data. For example,

Name Age Gender Rating

• The data types of the four columns are as follows −

• Key Points of Data Frame

• A pandas DataFrame can be created using the following constructor −

• A pandas DataFrame can be created using various inputs like −

df = pd.DataFrame(data, index = ["day1", "day2", "day3"])

• Attributes and methods in DataFrame:

• Read CSV Files

Replace Only For Specified Columns

Data of Wrong Format

• NumPy is a Python package which stands for 'Numerical Python'. It is a

• It is open source, which is an added advantage of NumPy.

• The most important object defined in NumPy is an N-dimensional array type

• The following script produces the • Its output is as follows −

You might also like