EDS - Python Cheat Sheet
EDS - Python Cheat Sheet
Importing Data
Any kind of data analysis starts with getting hold of some data. Pandas gives you plenty of options for
getting data into your Python workbook:
Exploring Data
Once you have imported your data into a Pandas dataframe, you can use these methods to get a sense of
what the data looks like:
Selecting
Often, you might need to select a single element or a certain subset of the data to inspect it or perform
further analysis. These methods will come in handy:
Data Cleaning
If you’re working with real world data, chances are you’ll need to clean it up. These are some helpful
methods:
In [ ]: df[df[col] > 0.5] # Rows where the col column is greater than 0.5
df[(df[col] > 0.5) & (df[col] < 0.7)] # Rows where 0.5 < col < 0.7
df.sort_values(col1) # Sorts values by col1 in ascending order
df.sort_values(col2,ascending=False) # Sorts values by col2 in descending order
df.sort_values([col1,col2], ascending=[True,False]) # Sorts values by col1 in ascending order t
df.groupby(col) # Returns a groupby object for values from one column
df.groupby([col1,col2]) # Returns a groupby object values from multiple columns
df.groupby(col1)[col2].mean() # Returns the mean of the values in col2, grouped by the values i
df.pivot_table(index=col1, values= col2,col3], aggfunc=mean) # Creates a pivot table that group
df.groupby(col1).agg(np.mean) # Finds the average across all columns for every unique column 1
df.apply(np.mean) # Applies a function across each column
df.apply(np.max, axis=1) # Applies a function across each row
In [ ]: df1.append(df2) # Adds the rows in df1 to the end of df2 (columns should be identical)
pd.concat([df1, df2],axis=1) # Adds the columns in df1 to the end of df2 (rows should be identi
df1.join(df2,on=col1,how='inner') # SQL-style joins the columns in df1 with the columns on df2
Writing Data
And finally, when you have produced results with your analysis, there are several ways you can export your
data:
Machine Learning
The Scikit-Learn library contains useful methods for training and applying machine learning models. Our
Scikit-Learn tutorial provides more context for the code below.
For a complete list of the Supervised Learning, Unsupervised Learning, and Dataset Transformation, and
Model Evaluation modules in Scikit-Learn, please refer to its user guide.
clf.fit(X_train, y_train)
Conclusion
We’ve barely scratching the surface in terms of what you can do with Python and data science, but we hope
this cheatsheet has given you a taste of what you can do!
This post was kindly provided by our friend Kara Tan. Kara is a cofounder of Altitude Labs, a full-service app
design and development agency that specializes in data driven design and personalization.