> Getting started with lists > Getting started with characters and strings
A list is an ordered and changeable sequence of elements. It can hold integers, characters, floats, strings, and even objects.
# Create a string with double or single quotes
Python Basics Creating lists
"DataCamp"
# Embed a quote in string with the escape character \
"He said, \"DataCamp\""
Getting started with Python Cheat Sheet #
x
Create lists with [],
= [1, 3, 2]
elements separated by commas
# Create multi-line strings with triple quotes
"""
Learn Python online at www.DataCamp.com List functions and methods A Frame of Data
Tidy, Mine, Analyze It
x.sorted(x) # Return a sorted copy of the list e.g., [1,2,3]
Now You Have Meaning
x.sort() # Sorts the list in-place (replaces x)
Citation: https://mdsr-book.github.io/haikus.html
> How to use this cheat sheet reversed(x) # Reverse the order of elements in x e.g., [2,3,1]
x.reversed() # Reverse the list in-place
"""
x.count(2) # Count the number of element 2 in the list
str[0] # Get the character at a specific position
Python is the most popular programming language in data science. It is easy to learn and comes with a wide array of str[0:2] # Get a substring from starting to ending index (exclusive)
powerful libraries for data analysis. This cheat sheet provides beginners and intermediate users a guide to starting
using python. Use it to jump-start your journey with python. If you want more detailed Python cheat sheets, check out Selecting list elements
the following cheat sheets below:
Combining and splitting strings
Python lists are zero-indexed (the first element has index 0). For ranges, the first element is included but the last is not.
# Define the list
"Data" + "Framed" # Concatenate strings with +, this returns 'DataFramed'
x = ['a', 'b', 'c', 'd', 'e']
x[1:3] # Select 1st (inclusive) to 3rd (exclusive)
3 * "data " # Repeat strings with *, this returns 'data data data '
x[0] # Select the 0th element in the list
x[2:] # Select the 2nd to the end
"beekeepers".split("e") # Split a string on a delimiter, returns ['b', '', 'k', '', 'p', 'rs']
x[-1] # Select the last element in the list
x[:3] # Select 0th to 3rd (exclusive)
Mutate strings
Importing data in python Data wrangling in pandas
Concatenating lists
str = "Jack and Jill" # Define str
# Define the x and y lists
Returns [1, 3, 6, 10, 15, 21]
> Accessing help and getting object types
x + y #
str.upper() # Convert a string to uppercase, returns 'JACK AND JILL'
x = [1, 3, 6]
3 * x # Returns [1, 3, 6, 1, 3, 6, 1, 3, 6] str.lower() # Convert a string to lowercase, returns 'jack and jill'
y = [10, 15, 21]
str.title() # Convert a string to title case, returns 'Jack And Jill'
1 + 1 # Everything after the hash symbol is ignored by Python
str.replace("J", "P") # Replaces matches of a substring with another, returns 'Pack and Pill'
help(max) # Display the documentation for the max function
type('a') # Get the type of an object — this returns str > Getting started with dictionaries
A dictionary stores data values in key-value pairs. That is, unlike lists which are indexed by position, dictionaries are indexed
> Getting started with DataFrames
> Importing packages by their keys, the names of which must be unique.
Pandas is a fast and powerful package for data analysis and manipulation in python. To import the package, you can
use import pandas as pd. A pandas DataFrame is a structure that contains two-dimensional data stored as rows and
Python packages are a collection of useful tools developed by the open-source community. They extend the
Creating dictionaries columns. A pandas series is a structure that contains one-dimensional data.
capabilities of the python language. To install a new package (for example, pandas), you can go to your command
prompt and type in pip install pandas. Once a package is installed, you can import it as follows.
# Create
{'a': 1,
a dictionary with {}
'b': 4, 'c': 9}
Creating DataFrames
import pandas # Import a package without an alias
import pandas as pd # Import a package with an alias
from pandas import DataFrame # Import an object from a package
Dictionary functions and methods # Create a dataframe from a
pd.DataFrame({
dictionary
# Create a dataframe from a list
pd.DataFrame([
of dictionaries
'a': [1, 2, 3],
{'a': 1, 'b': 4, 'c': 'x'},
x = {'a': 1, 'b': 2, 'c': 3} # Define the x ditionary
'b': np.array([4, 4, 6]),
{'a': 1, 'b': 4, 'c': 'x'},
x.keys() # Get the keys of a dictionary, returns dict_keys(['a', 'b', 'c'])
'c': ['x', 'x', 'y']
{'a': 3, 'b': 6, 'c': 'y'}
> The working directory x.values() # Get the values of a dictionary, returns dict_values([1, 2, 3])
}) ])
Selecting dictionary elements Selecting DataFrame Elements
The working directory is the default file path that python reads or saves files into. An example of the working directory
is ”C://file/path". The os library is needed to set and get the working directory.
x['a'] # 1 # Get a value from a dictionary by specifying the key
Select a row, column or element from a dataframe. Remember: all positions are counted from zero, not one.
import os # Import the operating system package
# Select the 3rd row
os.getcwd() # Get the current directory
df.iloc[3]
os.setcwd("new/working/directory") # Set the working directory to a new file path
> NumPy arrays # Select one column by name
df['col']
# Select multiple columns by names
> Operators NumPy is a python package for scientific computing. It provides multidimensional array objects and efficient operations
on them. To import NumPy, you can run this Python code import numpy as np
df[['col1', 'col2']]
# Select 2nd column
df.iloc[:, 2]
Arithmetic operators Creating arrays
# Select the element in the 3rd row, 2nd column
df.iloc[3, 2]
102 + 37 # Add two numbers with +
22 // 7 # Integer divide a number with //
Convert a python list to a NumPy array
Manipulating DataFrames
102 - 37 # Subtract a number with -
3 ^ 4 # Raise to the power with ^
#
4 * 6 # Multiply two numbers with *
22 % 7 # Returns 1 # Get the remainder after np.array([1, 2, 3]) # Returns array([1, 2, 3])
22 / 7 # Divide a number by another with /
division with %
# Return a sequence from start (inclusive) to end (exclusive)
np.arange(1,5) # Returns array([1, 2, 3, 4])
# Concatenate DataFrames vertically
# Calculate the mean of each column
# Return a stepped sequence from start (inclusive) to end (exclusive)
pd.concat([df, df])
df.mean()
Assignment operators np.arange(1,5,2) # Returns array([1, 3])
# Concatenate DataFrames horizontally
# Get summary statistics by column
# Repeat values n times
pd.concat([df,df],axis="columns")
df.agg(aggregation_function)
a = 5 # Assign a value to a
np.repeat([1, 3, 6], 3) # Returns array([1, 1, 1, 3, 3, 3, 6, 6, 6])
# Get rows matching a condition
# Get unique rows
x[0] = 1 # Change the value of an item in a list # Repeat values n times
df.query('logical_condition')
df.drop_duplicates()
np.tile([1, 3, 6], 3) # Returns array([1, 3, 6, 1, 3, 6, 1, 3, 6])
# Drop columns by name
# Sort by values in a column
Numeric comparison operators df.drop(columns=['col_name'])
# Rename columns
df.sort_values(by='col_name')
# Get rows with largest values
Math functions and methods
in a column
3 == 3 # Test for equality with ==
3 >= 3 # Test greater than or equal to with >=
> df.rename(columns={"oldname": "newname"})
# Add a new column
df.nlargest(n, 'col_name')
3 != 3 # Test for inequality with !=
3 < 4 # Test less than with <
df.assign(temp_f=9 / 5 * df['temp_c'] + 32)
All functions take an array as the input.
3 > 1 # Test greater than with >
3 <= 4 # Test less than or equal to with <=
np.log(x) # Calculate logarithm
np.quantile(x, q) # Calculate q-th quantile
np.exp(x) # Calculate exponential
np.round(x, n) # Round to n decimal places
Logical operators np.max(x) # Get maximum value
np.var(x) # Calculate variance
np.min(x) # Get minimum value
np.std(x) # Calculate standard deviation
~(2 == 2) # Logical NOT with ~
(1 >= 1) | (1 < 1) # Logical OR with |
np.sum(x) # Calculate sum
(1 != 1) & (1 < 1) # Logical AND with & (1 != 1) ^ (1 < 1) # Logical XOR with ^ np.mean(x) # Calculate mean