Data Analysis With Pandas
Data Analysis With Pandas
But before we dive into specific Python related data analysis library like Let us try to understand
what this data analysis is and then we'll dive towards the pandas and hands on Python
programming.
Now based on this height you just cannot make a conclusion that what size or pants/jeans you
have. To create it we need the data analysis and it will come into picture.
let's see if 35 cm with that is just the physical data I created and average values like a 155
centimetre
range is nothing but the difference between a maximum height minimum height.
So this is where whole cities maximum amount of people resides whereas the average values
155 centimetres
are based on those data.
You can start manufacturing your next product.
So that is all about the data analysis.
If you take example of a Netflix or Netflix knows that What are the movies or video programs
you watch. And what next programs you may like. So that is where the data analysis plays a very
important role.
That just based on the raw data you just cannot do anything that a whole lot of different stuffs
like, a data preparation data cleaning data munging, all those kinds of things you need to do
after that.
There is a predictive modeling kind of algorithm machine learning deep learning kind of
algorithm which
How we will achieve this data analysis task, so for doing data analysis does various different
libraries are available
But mainly people were doing this data analysis task earlier in R programming language
but in last couple of years this pandas library from python programming languages
which is written on the top of Python programming language.
In this whole course we will learn that what this panda’s library is and how you can perform
data analysis task with the help of different functionalities available in pandas.
INTRODUCTION TO PANDA’s
What is Pandas: Pandas is a library which is built on top of the Python programming language.
It is one of the open source project available in a python. It is coming under the BSD license.It is
one of the very high performance data analysis library. And it is one of the great tools available
in a Python programming language for the data analysis.
BSD: As a low restriction and requirement license type, Berkeley Source Distribution
(BSD) licenses are used for the distribution of many freeware, shareware and open source
software.
you can consider this Pandas as something like Excel which is working inside a python.
And as you are dealing with data through Python programming language you have a much more
control over the data. So this project has been completely sponsored by a company
NUMFOCUS.
You can go to this particular link of this NUMFOCUS. "numfocus.org/sponsored-project" and you
can see about what other project they are sponsoring and why this pandas has been created on
top of this Python programming language.
Python is very good at data munging and data preparation kind of operations but data analysis
and data modelling related tasks python is not very good so generally after doing this munging
and preparation we have to use another data analysis tool like “R”
Every time this data scientist / analytics people have to import all those output from this
munging and preparation operations to specifically analysis tool like “R” and again bring
back to all those data into Python. So, Pandas will solve this and it is replaced of this “R”
programming language.
NOTE: It is a replacement of R but some of the data analysis related tasks we are still doing in R
programming language
NOTE: Should not need to go back between R and a python between Python and R for every
single data analysis, data preparation, data munging. Now all those kinds of operations can be
performed inside this Python environment only.
Feature of the Pandas library:
Pandas library has 3 very important data types are available
1. Series data type (1 dimensional)
2. Data frame data type (2 dimensional)
3. Panel Data type (To perform operation on a tabular kind of structure and multiple
tables, which is nothing but a 3-dimensional data)
All those 3 data types we will see into great detail throughout this whole course.
https://docs.anaconda.com/anaconda/install/windows/
https://docs.anaconda.com/anaconda/install/mac-os/
https://docs.anaconda.com/anaconda/install/linux/
Update the old version to new version:
https://docs.anaconda.com/anaconda/install/update-version/
Jupiter-Lab Documentation:
https://jupyterlab.readthedocs.io/en/stable/
Import Libraries:
Python Crash course:
Python Exercise:
NUMPY’s: NumPy is the fundamental package for scientific computing in Python. It is a Python library
that provides a multidimensional array object, various derived objects (such as masked arrays and
matrices),
NOTE: NumPy is Linear algebra, Matrix manipulation library. Every major Data
science related library written on top of NumPy.
3. Some_More_Numpy_Functions
4. Linear_Algebra_Fnction_With_Numpy
Inverse matrix:
Matrix multiplication:
5. ListVsNumpy Array
6. Views Vs Copy
7. Insert_Upadte_Delete_Operations_On_Numpys