Unit 2
Unit 2
DATA MANIPULATION
PYTHON SHELL - JUPYTER NOTEBOOK - IPYTHON MAGIC COMMANDS -
NUMPY ARRAYS-UNIVERSAL FUNCTIONS – AGGREGATIONS – COMPUTATION
ON ARRAYS – FANCY INDEXING – SORTING ARRAYS-STRUCTURED DATA –
DATA MANIPULATION WITH PANDAS – DATA INDEXING AND SELECTION –
HANDLING MISSING DATA – HIERARCHICAL INDEXING – COMBINING
DATASETS – AGGREGATION AND GROUPING –STRING OPERATIONS –
WORKING WITH TIME SERIES – HIGH PERFORMANCE
Python Shell
Python is an interpreter language. It means it executes the code line by line.
Python provides a Python Shell, which is used to execute a single Python
command and display the result.
It is also known as REPL (Read, Evaluate, Print, Loop), where it reads the
command, evaluates the command, prints the result, and loop it back to read
the command again.
It provides an easy and interactive way to write and test small pieces of
Python code.
It can be a useful tool for debugging code. For example, if you are having an
issue with a larger program, you can use the Python shell to test out specific
lines of code or to try out different approaches to solving a problem.
It can be a good way to learn about the various built-in functions and
modules in Python.
Jupyter Notebook
1. Dictionary method :
np.dtype({'names': ('name', 'age', 'weight'),
'formats': ('U10', '14', 'f8')})
Output: dtype([('name', '<U10'), ('age', '<i4'), ('weight',
'<f8')])
Numerical types can be specified with
Python types or NumPydtypes instead :
np.dtype({'names': ('name', 'age', 'weight'),
df.drop_duplicates()
• Drop duplicates in the first name column, but take the last
observation in the duplicated set
date = ['21.07.2020']
print(pd.to_datetime(date))
Using pandas.to_datetime() with a
date and time
import pandas as pd
print(pd.to_datetime(date))
Missing Data
print(dataset.describe())
Hierarchical Indexing
In [6]: fifa19
Aggregation and Grouping
• Pandas aggregation methods are as follows:
a) count() Total number of items
b) first(), last(): First and last item
c) mean(), median(): Mean and median
d) min(), max(): Minimum and maximum
e) std(), var(): Standard deviation and variance
f) mad(): Mean absolute deviation
g) prod(): Product of all items
h) sum(): Sum of all items.
Pivot Tables
• A pivot table is a similar operation that is commonly seen in
spreadsheets and other programs that operate on tabular data.
The pivot table takes simple column-wise data as input, and
groups the entries into a two-dimensional table that provides a
multidimensional summarization of the data.