0% found this document useful (0 votes)
0 views

2.1 Importing Python Data

The document is a cheat sheet for importing various data types in Python, including Excel, pickled files, HDF5, SAS, Matlab, Stata, and relational databases using libraries like pandas, NumPy, and SQLAlchemy. It provides code snippets for reading data from different file formats and accessing their contents. Additionally, it includes tips for navigating the filesystem and using context managers for file operations.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
0 views

2.1 Importing Python Data

The document is a cheat sheet for importing various data types in Python, including Excel, pickled files, HDF5, SAS, Matlab, Stata, and relational databases using libraries like pandas, NumPy, and SQLAlchemy. It provides code snippets for reading data from different file formats and accessing their contents. Additionally, it includes tips for navigating the filesystem and using context managers for file operations.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 1

Excel Spreadsheets Pickled Files

Python For Data Science Cheat Sheet


>>> file = ‘urbanpop.xlsx’ >>> import pickle
>>> data = pd.ExcelFile(file) >>> with open(‘pickled_fruit.pkl’, ‘rb’) as file:

Importing Data
>>> df_sheet2 = data.parse(‘1960-1966’, pickled_data = pickle.load(file)
skiprows=[0],
names=[‘Country’,
‘AAM: War(2002)’])
>>> df_sheet1 = data.parse(0,
parse_cols=[0],
skiprows=[0], HDF5 Files
names=[‘Country’])
Learn Python for Data Science Interactively To access the sheet names, use the sheet_names attribute:
>>> import h5py
>>> filename = ‘H-H1_LOSC_4_v1-815411200-4096.hdf5’
>>> data = h5py.File(filename, ‘r’)
>>> data.sheet_names
Importing Data in Python
Most of the time, you’ll use either NumPy or pandas to import
your data:

>>> import numpy as np


SAS Files Matlab Files
>>> import pandas as pd
>>> from sas7bdat import SAS7BDAT
>>> with SAS7BDAT(‘urbanpop.sas7bdat’) as file: >>> import scipy.io
df_sas = file.to_data_frame() >>> filename = ‘workspace.mat’
>>> mat = scipy.io.loadmat(filename)
Help
>>> np.info(np.ndarray.dtype)
>>> help(pd.read_csv)
Stata Files
>>> data = pd.read_stata(‘urbanpop.dta’) Exploring Dictionaries
Text Files Accessing Elements with Functions
Relational Databases >>> print(mat.keys())
>>> for key in data.keys():
Print dictionary keys
Print dictionary keys
Plain Text Files print(key)
>>> from sqlalchemy import create_engine meta
>>> filename = ‘huck_finn.txt’ >>> engine = create_engine(‘sqlite://Northwind.sqlite’) quality
>>> file = open(filename, mode=’r’) Open the file for reading strain
>>> text = file.read() Read a file’s contents >>> pickled_data.values() Return dictionary values
>>> print(file.closed) Check whether file is closed Use the table_names() method to fetch a list of table names: >>> print(mat.items()) Returns items in list format
>>> file.close() Close file of (key, value)tuple pairs
>>> print(text) >>> table_names = engine.table_names()

Using the context manager with Accessing Data Items with Keys
Querying Relational Databases
>>> with open(‘huck_finn.txt’, ‘r’) as file:
>>> con = engine.connect() >>> for key in data [‘meta’].keys() Explore the HDF5 structure
print(file.readline()) Read a single line
>>> rs = con.execute(“SELECT * FROM Orders”) print(key)
print(file.readline())
>>> df = pd.DataFrame(rs.fetchall()) Description
print(file.readline())
>>> df.columns = rs.keys() DescriptionURL
>>> con.close() Detector
Duration
GPSstart
Using the context manager with Observatory
Table Data: Flat Files Type
>>> with engine.connect() as con:
UTCstart
rs = con.execute(“SELECT OrderID FROM Orders”)
Importing Flat Files with numpy >>> print(data[‘meta’][‘Description’].value) Retrieve the value for a key
df = pd.DataFrame(rs.fetchmany(size=5))
Files with one data type df.columns = rs.keys()

>>> filename = ‘mnist.txt’


>>> data = np.loadtxt(filename,
Querying relational databases with pandas
Navigating Your FileSystem
delimiter=’,’, String used to separate values
skiprows=2, Skip the first 2 lines
usecols=[0,2], Read the 1st and 3rd column >>> df = pd.read_sql_query(“SELECT * FROM Orders”, engine)
dtype=str) The type of the resulting array Magic Commands
Files with mixed data types !ls List directory contents of files and directories
%cd .. Change current working directory
>>> filename = ‘titanic.csv’ Exploring Your Data %pwd Return the current working directory path
>>> data = np.genfromtxt(filename,
delimiter=’,’,
names=True, Look for column header NumPy Arrays
dtype=None) os Library
>>> data_array.dtype Data type of array elements
>>> data_array.shape Array dimensions >>> import os
>>> data_array = np.recfromcsv(filename) >>> len(data_array) Length of array >>> path = “/usr/tmp”
>>> wd = os.getcwd() Store the name of current
directory in a string
Importing Flat Files with numpy pandas DataFrames >>> os.listdir(wd) Output contents of the di
rectory in a list
>>> filename = ‘winequality-red.csv’ >>> df.head() Return first DataFrame rows >>> os.chdir(path) Change current working
>>> data = pd.read_csv(filename, >>> df.tail() Return last DataFrame rows directory
nrows=5, Number of rows of file to read >>> df.index Describe index >>> os.rename(“test1.txt”, Rename a file
header=None, Row number to use as col names >>> df.columns Describe DataFrame columns “test2.txt”)
sep=’\t’, Delimiter to use >>> df.info() Info on DataFrame >>> os.remove(“test1.txt”) Delete an existing file
comment=’#’, Character to split comments >>> data_array = data.values Convert a DataFrame to an a >>> os.mkdir(“newdir”) Create a new directory
n a_values=[“”]) String to recognize as NA/NaN NumPy array

You might also like