Commonly used file formats in Data Science
Last Updated :
15 Dec, 2022
What is a File Format
File formats are designed to store specific types of information, such as CSV, XLSX etc. The file format also tells the computer how to display or process its content. Common file formats, such as CSV, XLSX, ZIP, TXT etc.
If you see your future as a data scientist so you must understand the different types of file format. Because data science is all about the data and it's processing and if you don't understand the file format so may be it's quite complicated for you. Thus, it is mandatory for you to be aware of different file formats.
Different type of file formats:
CSV: the CSV is stand for Comma-separated values. as-well-as this name CSV file is use comma to separated values. In CSV file each line is a data record and Each record consists of one or more than one data fields, the field is separated by commas.
Code: Python code to read csv file in pandas
python3
import pandas as pd
df = pd.read_csv("file_path / file_name.csv")
print(df)
XLSX: The XLSX file is Microsoft Excel Open XML Format Spreadsheet file. This is used to store any type of data but it's mainly used to store financial data and to create mathematical models etc.
Code: Python code to read xlsx file in pandas
python3
import pandas as pd
df = pd.read_excel (r'file_path\\name.xlsx')
print (df)
Note:
install xlrd before reading excel file in python for avoid the error. You can install xlrd using following command.
pip install xlrd
ZIP: ZIP files are used an data containers, they store one or more than one files in the compressed form. it widely used in internet After you downloaded ZIP file, you need to unpack its contents in order to use it.
Code: Python code to read zip file in pandas
python3
import pandas as pd
df = pd.read_csv(' File_Path \\ File_Name .zip')
print(df)
TXT: TXT files are useful for storing information in plain text with no special formatting beyond basic fonts and font styles. It is recognized by any text editing and other software programs.
Code: Python code to read txt file in pandas
python3
import pandas as pd
df = pd.read_csv('File_Path \\ File_Name .txt')
print(df)
JSON: JSON is stand for JavaScript Object Notation. JSON is a standard text-based format for representing structured data based on JavaScript object syntax
Code: Python code to read json file in pandas
python3
import pandas as pd
df = pd.read_json('File_path \\ File_Name .json')
print(df)
HTML: HTML is stand for stands for Hyper Text Markup Language is use for creating web pages. we can read html table in python pandas using read_html() function.
Code: Python code to read html file in pandas
python3
import pandas as pd
df = pd.read_html('File_Path \\File_Name.html')
print(df)
Note:
You need to install a package named "lxml & html5lib" which can handle the file with '.html' extension.
pip install html5lib
pip install lxml
PDF: pdf stands for Portable Document Format (PDF) this file format is use when we need to save files that cannot be modified but still need to be easily available.
Code: Python code to read pdf in pandas
python3
pip install tabula-py
pip install pandas
df = tabula.read_pdf(file_path \\ file_name .pdf)
print(df)
Note:
You need to install a package named "tabula-py" which can handle the file with '.pdf' extension.
pip install tabula-py
Similar Reads
How To Read .Data Files In Python? Unlocking the secrets of reading .data files in Python involves navigating through diverse structures. In this article, we will unravel the mysteries of reading .data files in Python through four distinct approaches. Understanding the structure of .data files is essential, as their format may vary w
4 min read
Loading Different Data Files in Python We are given different Data files and our task is to load all of them using Python. In this article, we will discuss how to load different data files in Python. Loading Different Data Files in PythonBelow, are the example of Loading Different Data Files in Python: Loading Plain Text Files Loading Im
2 min read
Difference Between ContentType and MimeType When dealing with web development, file uploads, APIs, or browser interactions, we often encounter two important terms: ContentType and MimeType. These terms are used to specify the nature of data being transferred between the client and server, but they are often confused with one another.The main
4 min read
Load H5 Files In Python Handling large datasets efficiently is a common challenge in data science and machine learning. Hierarchical Data Format, or H5, is a file format that addresses this challenge by providing a flexible and efficient way to store and organize large amounts of data. In this article, we will explore what
3 min read
List of File Formats with Types and Extensions File Formats store a large variety of raw information in a structured format so that the data can be easily stored, processed, and harnessed. A file format is a standard way of storing data on a computer file. There are multiple types of file formats present which can be used to store and retrieve d
8 min read
Text File Formats TXT file format is a format that is developed for storing plain text with no formatting such as graphics, bolding, italicization, font style, alignment, and so on. It is one of the most basic file types and the most widely used file formats on computers. These files are smaller than other file forma
7 min read