0% found this document useful (0 votes)
10 views

Fundamental - Python

This document provides an overview of key NumPy and Pandas functions for working with arrays and DataFrames. It covers NumPy functions like arange, zeros, random, and reshape. For Pandas, it discusses Series, DataFrames, indexing, filtering, grouping, merging, joining, and input/output functions like read_csv and to_excel.

Uploaded by

sai.kanthamneni
Copyright
© © All Rights Reserved
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views

Fundamental - Python

This document provides an overview of key NumPy and Pandas functions for working with arrays and DataFrames. It covers NumPy functions like arange, zeros, random, and reshape. For Pandas, it discusses Series, DataFrames, indexing, filtering, grouping, merging, joining, and input/output functions like read_csv and to_excel.

Uploaded by

sai.kanthamneni
Copyright
© © All Rights Reserved
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
You are on page 1/ 3

.

format() - print formatting

NUMPY

In numpy range() is arange()


np.zeros(), np.ones()
np.linspace() - evenly spaces values with the specified range
np.linspace(0, 5, 10) - 10 evenly spaces elements in the array from range 0 to 5
np.eye() - square matrix with except the diagonal are 0's
np.random.rand() - gives ramdom numbers in the specified size of the array between
0 and 1
np.random.randn() - gives the random numbers as the array as per the specified
variable given as input
np.random.randint() - gives the random integers as per the specifications
arr.reshape() - reshape the array without disturbing the values
m x n = total number of elements in the array - condition for reshaping the array
arr.max() - max value of the array
arr.argmax() - index value of the smallest value in the array
arr.min() - min value of the array
arr.argmin() - index value of the smallest value in the array
arr.shape() - returns the shape of the array
arr.dtype() - type of the varaible
mat.sum() - Get the sum of all the values in mat
mat.std() - Get the standard deviation of the values in mat
mat.sum(axis = 0) - Get the sum of all the columns in mat

PANDAS

pd.Series(data, index)
pd.DataFrames(data, index, columns)
df[column_name] - displays in the form of a series
Data Frae is a bunch of a series that shows same indexes
df[[col1, col2]] - list of columns will be displayed
df.drop(column name, axis = 1 ) - will drop the columns
This will not completly delete the column. To remove it completely use inplace=True
argument
axis = 0 - rows
axis = 1 - columns
df.loc['row name'] - to access the rows in the data frame
df.iloc[index of the row] - to access the rows in the data frame
df>0 - returns a df with boolean notation
df[df>0] - returns the df with the values greater than 0, and for those which are
less than 0 returns NaN
& - to compare we don't use 'and' operator instead we use '&' operator as it can
only compare one boolean value at a time
df[(df['W']> 0) & (df['Y]>1) - will compare the values and give the dataframe
which has only true values
| - pipe operator to get the 'or' operation
df.reset_index() - columns names get reseted into a seperate column and the indexes
will be 0 to last
df.set_index(newly added index) - resets the index but not permanent until we use
inplace argument
df.loc[coloumn name].loc[row number] - used in the multi level dataframes
df.xs() - cross-sectional (numbered index, level = 'column name')
df.dropna() - axis = 1: drop Null values
- thresh = 2: will print rows that have atleast 2 non NaN values
df.fillna() - fill values( can fill with mean values of the dataframe)
groupby - perform aggregrate functios by using the functions
groupby().describe() - gives the entire values oof the values(count, mean, min,
25%, 50%, 75%, std, max)
transpose() - changes the values from row to columns and vice versa

concatenation: dimensions should match along the axis


pd.concat()
pd.concat(___, axis = 1) - joins all the dataframes and the values that dont have
the values will get NaN. it is goining along the columns

merge: merge DF together same as SQL


pd.merge(df1, df2, how=___, on=__)
two Df's will have a key column with same values and 'how' takes the attribute like
"left, right, inner, outer" same as in SQL

joining:combining the columns which are differently indexed into single result
Dataframe
df1.join(df2, how=__)

df[col_name].unique() - gives the unique values in the form of array


df[col_name].nunique() - gives number of unique values in the column
df[col_name].value_counts() - gives how many times each unique value occurs in the
column
apply() - it takes a function(takes custom function, bydefault or lambda functions)
that is written and does the operation
def times2(x):
return x*2
df['col1'].apply(times2)
df.drop(col_name, axis =0) - will drop the columns
df.columns - will return the list of columns
df.index - gives details of the indexes like start, stop and step count
df.sort_values(col_name) - sorts the values
df.isnull - boolean result that gives if null is present

df.pivot_table(values, index, columns) - creates the pivot tables same like in


excel

To input and output


sqlalchemy
lxml
xlrd
html5lib
BeautifulSoup4

CSV
pd.read_csv() - read csv file
pd.read_html() - read html file

df.to_csv(__, index= False) - will clear out the previous index and will give the
new index

EXCEL
pandas can only data in the excel sheets, it cannot import tables or pictures
in excel each sheet is a dataframe
pd.read_excel(__, sheet_name=__)
df.to_excel(__, sheet_name=__)

HTML
pandas try to get each table element in the html and convert it to a dataframe
data = pd.read_html(__)
SQL
from sqlalchemy import create_engine
engine = create_engine('sqlite:///:memory:')
df.to_sql('__', engine)

MATPLOTLIB - (https://matplotlib.org/)

%matplotlib inline - shows the plots in the jupyter notebook

Types of Plots:
1. Functional:
plt.plot(x, y)
[plt.xlabel, plt.ylabel, plt.title]
plt.subplot(rows, columns, number of plot reffereing to) - to create multiple plots
in the same canvas

2. Object-oriented(better way to create)


fig = plt.figure() - creates the figure object
axes = fig.add_axes([0.1, 0.1, 0.8, 0.8]) - adds axis [left, right, width, height]
axes.plot(x,y) - prints the plot
[axes.set_xlabel(), axes.set_ylabel(), axes.set_title]

fig, axes = plt.subplots(rows, columns, figsize)


plt.tight_layout() - fixes the overlaps in the subplots
dpi - dots per inch/ pixels per inch

fig = plt.figure(figsize = (3,2)) <--- width and height


ax = fig.add_axes([0,0,1,1])
ax.plot(x,y)
fig.savefig('my_picture.png', dpi = 200) <-- save the plots

ax.plot(x, x**2, label = 'X Squared')


ax.plot(x, x**3, label = 'X cubed')
ax.legend() - create the legend checks the labels and gives the description

plot appearances
1. color = 'purple' <-- color of the plot
2. linewidth = 0.3 / lw = 0.3 <-- width of the line
3. alpha = 0.5 <-- transparancy of the line
4. linestyle = '--' / ls = '--' <-- style of the line
5. marker = 'o' <-- marks out each point in the axis (markersize to
specify the size)
6. markerfacecolor = 'yello' <-- colour of the marker
7. markeredgewidth = 3 <-- changes the width of the marker outline
8. markeredgecolor = 'green' <-- changes the colour of the marker border

ax.set_xlim([0,1]) <-- set the x axis limit


ax.set_ylim([0,2])

You might also like