Pandas
Pandas
Creating DataFrame:
Syntax: DF.DataFrame(data,columns,index)
Attributes of DataFrame:
1. index 6. shape
2. columns 7.values
3. axes 8. ndim
4. dtype 9. empty
5. size 10. T
Selecting/Accessing DataFrame:
Note: When we use iloc[ ], then end index is excluded but while using loc[ ], the end index is included.
Note: you may also use dot( . ) notation of modify/add a column Ex. DF.col_name = [list of values]
It the column name does not exist in the DataFrame, it will add a new column but if a column name already exist
with the same name, it will overwrite the values of that particular column.
If you are assigning a list of values, the values in the column required must match the number of values you have
assigned otherwise it will raise an error i.e. ValueError.
assign( ) function can be used to add/modify the values for multiple columns. (the column_name must be
written without quotation marks)
DF.loc[‘row_label’]= [ List of values ] will add/overite a row values with given values
DF.loc[‘row_label’]= Single_Value will add/overwrite a row values with all values as same
DF.loc [ ‘row_label’ , start : end ] = [ List of values ] will only modify the given slice of row
Renaming Rows/Columns:
Boolean Indexing: For this first of all you need to create the DF with Boolean indexing.
Note: The default value of sep is space (Means the data items will be separated by space character)
Note: By default the value of hearer and index attributes is True. If you don’t want column labels/ row index to be
stored in CSV file, you can make it False.
.
*********************
PANDAS – 2
Descriptive Statistics:
Data Aggregations: Aggregation means to transform the dataset and produce a single numeric value from an array.
Aggregation can be applied to one or more columns together. Aggregate functions are max(),min(), sum(), count(), std(),
var().
>>> df.aggregate('max') will calculate max for each column
>>> df.aggregate(['max','count']) will calculate max and total items for each column
>>> df['Maths'].aggregate(['max','min'],axis=0) will calculate max and min value of Maths column
>>> df[['Maths','Science']].aggregate('sum',axis=1) will calculate sum of Maths and Science in each row.