Python provides several methods for importing different data file types into Python for data analysis. These include importing Excel, CSV, text, HDF5, pickled, SAS, Stata, and database files using NumPy, Pandas, h5py, pickle, and SQLAlchemy. Pandas and NumPy are commonly used to import array and table data, while other libraries provide functionality for specific file types. The data can then be explored and queried within Python.
Python provides several methods for importing different data file types into Python for data analysis. These include importing Excel, CSV, text, HDF5, pickled, SAS, Stata, and database files using NumPy, Pandas, h5py, pickle, and SQLAlchemy. Pandas and NumPy are commonly used to import array and table data, while other libraries provide functionality for specific file types. The data can then be explored and queried within Python.
Python For Data Science Cheat Sheet Excel Spreadsheets Pickled Files
>>> file = 'urbanpop.xlsx' >>> import pickle
Importing Data >>> data = pd.ExcelFile(file) >>> with open('pickled_fruit.pkl', 'rb') as file: pickled_data = pickle.load(file) >>> df_sheet2 = data.parse('1960-1966', Learn Python for data science Interactively at www.DataCamp.com skiprows=[0], names=['Country', 'AAM: War(2002)']) >>> df_sheet1 = data.parse(0, HDF5 Files parse_cols=[0], Importing Data in Python skiprows=[0], >>> import h5py >>> filename = 'H-H1_LOSC_4_v1-815411200-4096.hdf5' names=['Country']) Most of the time, you’ll use either NumPy or pandas to import >>> data = h5py.File(filename, 'r') your data: To access the sheet names, use the sheet_names attribute: >>> import numpy as np >>> data.sheet_names >>> import pandas as pd Matlab Files Help SAS Files >>> import scipy.io >>> filename = 'workspace.mat' >>> from sas7bdat import SAS7BDAT >>> mat = scipy.io.loadmat(filename) >>> np.info(np.ndarray.dtype) >>> help(pd.read_csv) >>> with SAS7BDAT('urbanpop.sas7bdat') as file: df_sas = file.to_data_frame()
Text Files Exploring Dictionaries
Stata Files Accessing Elements with Functions Plain Text Files >>> data = pd.read_stata('urbanpop.dta') >>> print(mat.keys()) Print dictionary keys >>> filename = 'huck_finn.txt' >>> for key in data.keys(): Print dictionary keys >>> file = open(filename, mode='r') Open the file for reading print(key) >>> text = file.read() Read a file’s contents Relational Databases meta quality >>> print(file.closed) Check whether file is closed >>> from sqlalchemy import create_engine strain >>> file.close() Close file >>> print(text) >>> engine = create_engine('sqlite://Northwind.sqlite') >>> pickled_data.values() Return dictionary values >>> print(mat.items()) Returns items in list format of (key, value) Use the table_names() method to fetch a list of table names: tuple pairs Using the context manager with >>> with open('huck_finn.txt', 'r') as file: >>> table_names = engine.table_names() Accessing Data Items with Keys print(file.readline()) Read a single line print(file.readline()) Querying Relational Databases >>> for key in data ['meta'].keys() Explore the HDF5 structure print(file.readline()) print(key) >>> con = engine.connect() Description >>> rs = con.execute("SELECT * FROM Orders") DescriptionURL Table Data: Flat Files >>> df = pd.DataFrame(rs.fetchall()) Detector >>> df.columns = rs.keys() Duration GPSstart Importing Flat Files with numpy >>> con.close() Observatory Files with one data type Using the context manager with Type UTCstart >>> filename = ‘mnist.txt’ >>> with engine.connect() as con: >>> print(data['meta']['Description'].value) Retrieve the value for a key >>> data = np.loadtxt(filename, rs = con.execute("SELECT OrderID FROM Orders") delimiter=',', String used to separate values df = pd.DataFrame(rs.fetchmany(size=5)) df.columns = rs.keys() skiprows=2, usecols=[0,2], Skip the first 2 lines Read the 1st and 3rd column Navigating Your FileSystem dtype=str) The type of the resulting array Querying relational databases with pandas Magic Commands Files with mixed data types >>> df = pd.read_sql_query("SELECT * FROM Orders", engine) >>> filename = 'titanic.csv' !ls List directory contents of files and directories >>> data = np.genfromtxt(filename, %cd .. Change current working directory %pwd Return the current working directory path delimiter=',', names=True, Look for column header Exploring Your Data dtype=None) NumPy Arrays os Library >>> data_array = np.recfromcsv(filename) >>> data_array.dtype Data type of array elements >>> import os >>> data_array.shape Array dimensions >>> path = "/usr/tmp" The default dtype of the np.recfromcsv() function is None. >>> wd = os.getcwd() Store the name of current directory in a string >>> len(data_array) Length of array >>> os.listdir(wd) Output contents of the directory in a list Importing Flat Files with pandas >>> os.chdir(path) Change current working directory pandas DataFrames >>> os.rename("test1.txt", Rename a file >>> filename = 'winequality-red.csv' "test2.txt") >>> data = pd.read_csv(filename, >>> df.head() Return first DataFrame rows nrows=5, >>> os.remove("test1.txt") Delete an existing file Number of rows of file to read >>> df.tail() Return last DataFrame rows >>> os.mkdir("newdir") Create a new directory header=None, Row number to use as col names >>> df.index Describe index sep='\t', Delimiter to use >>> df.columns Describe DataFrame columns comment='#', Character to split comments >>> df.info() Info on DataFrame na_values=[""]) String to recognize as NA/NaN >>> data_array = data.values Convert a DataFrame to an a NumPy array DataCamp Learn R for Data Science Interactively