How to merge multiple excel files into a single files with Python ?
Last Updated :
07 Mar, 2022
Normally, we're working with Excel files, and we surely have come across a scenario where we need to merge multiple Excel files into one. The traditional method has always been using a VBA code inside excel which does the job but is a multi-step process and is not so easy to understand. Another method is manually copying long Excel files into one which is not only time-consume, troublesome but also error-prone.
This task can be done easily and quickly with few lines of code in Python with the Pandas module. First, we need to install the module with pip. So let's get the installation out of our way.
Use the following command in the terminal:
pip install pandas
Method 1: Using dataframe.append()
Pandas dataframe.append() function is used to append rows of other dataframe to the end of the given dataframe, returning a new dataframe object. Columns not in the original dataframes are added as new columns and the new cells are populated with NaN value.
Syntax : DataFrame.append(other, ignore_index=False, verify_integrity=False, sort=None)
Parameters :
- other : DataFrame or Series/dict-like object, or list of these
- ignore_index : If True, do not use the index labels. default False.
- verify_integrity : If True, raise ValueError on creating index with duplicates. default False.
- sort : Sort columns if the columns of self and other are not aligned. default False.
Returns: appended DataFrame
Example:
Excel Used: FoodSales1-1, FoodSales2-1
Python3
# importing the required modules
import glob
import pandas as pd
# specifying the path to csv files
path = "C:/downloads"
# csv files in the path
file_list = glob.glob(path + "/*.xlsx")
# list of excel files we want to merge.
# pd.read_excel(file_path) reads the excel
# data into pandas dataframe.
excl_list = []
for file in file_list:
excl_list.append(pd.read_excel(file))
# create a new dataframe to store the
# merged excel file.
excl_merged = pd.DataFrame()
for excl_file in excl_list:
# appends the data into the excl_merged
# dataframe.
excl_merged = excl_merged.append(
excl_file, ignore_index=True)
# exports the dataframe into excel file with
# specified name.
excl_merged.to_excel('total_food_sales.xlsx', index=False)
Output :
'total_food_sales.xlsx'
Method 2: Using pandas.concat()
The pandas.concat() function does all the heavy lifting of performing concatenation operations along with an axis of Pandas objects while performing optional set logic (union or intersection) of the indexes (if any) on the other axes.
Syntax: concat(objs, axis, join, ignore_index, keys, levels, names, verify_integrity, sort, copy)
Parameters:
- objs: Series or DataFrame objects
- axis: axis to concatenate along; default = 0 //along rows
- join: way to handle indexes on other axis; default = ‘outer’
- ignore_index: if True, do not use the index values along the concatenation axis; default = False
- keys: sequence to add an identifier to the result indexes; default = None
- levels: specific levels (unique values) to use for constructing a MultiIndex; default = None
- names: names for the levels in the resulting hierarchical index; default = None
- verify_integrity: check whether the new concatenated axis contains duplicates; default = False
- sort: sort non-concatenation axis if it is not already aligned when join is ‘outer’; default = False
- copy: if False, do not copy data unnecessarily; default = True
Returns: a pandas dataframe with concatenated data.
Example:
In the last example, we worked on only two Excel files with a few rows. Let's try merging more files each containing approximately 5000 rows and 7 columns. We have 5 files BankE, BankD, BankC, BankB, BankA having historical stock data for respective bank. Let's merge them into a single 'Bank_Stocks.xlsx' file. Here we are using the pandas.concat() method.
Python3
# importing the required modules
import glob
import pandas as pd
# specifying the path to csv files
path = "C:/downloads"
# csv files in the path
file_list = glob.glob(path + "/*.xlsx")
# list of excel files we want to merge.
# pd.read_excel(file_path) reads the
# excel data into pandas dataframe.
excl_list = []
for file in file_list:
excl_list.append(pd.read_excel(file))
# concatenate all DataFrames in the list
# into a single DataFrame, returns new
# DataFrame.
excl_merged = pd.concat(excl_list, ignore_index=True)
# exports the dataframe into excel file
# with specified name.
excl_merged.to_excel('Bank_Stocks.xlsx', index=False)
Output :
Bank_Stocks.xlsx
Similar Reads
How to Merge Multiple Excel Files into one Folder?
Power Query is a data manipulation tool frequently used for business intelligence and data analysis. Both Microsoft Excel and Microsoft Power BI support Power Query. A single source of truth and well-organized, error-free data are requirements for high-quality analysis. While many analysts spend man
4 min read
How to Merge multiple CSV Files into a single Pandas dataframe ?
While working with CSV files during data analysis, we often have to deal with large datasets. Sometimes, it might be possible that a single CSV file doesn't consist of all the data that you need. In such cases, there's a need to merge these files into a single data frame. Luckily, the Pandas library
3 min read
How to merge multiple folders into one folder using Python ?
In this article, we will discuss how to move multiple folders into one folder. This can be done using Python's OS and Shutil module. Approach:Get the current directory and the list of the folders you want to merge.Loop through the list of folders and store their content in a list. Here, we have stor
4 min read
How to Merge all excel files in a folder using Python?
In this article, we will see how to combine all Excel files present in a folder into a single file. Module used: The python libraries used are: Pandas: Pandas is a python library developed for a python programming language for manipulating data and analyzing the data. It is widely used in Data Scien
3 min read
Python | Write multiple files data to master file
Given a number of input files in a source directory, write a Python program to read data from all the files and write it to a single master file. Source directory contains n number of files, and structure is same for all files. The objective of this code is to read all the files one by one and then
2 min read
How to Filter and save the data as new files in Excel with Python Pandas
Sometimes you will want to filter and save the data as new files in Excel with Python Pandas as it can help you in selective data analysis, data organization, data sharing, etc. In this tutorial, we will learn how to filter and save the data as new files in Excel with Python Pandas. This easy guide
3 min read
Joining Excel Data from Multiple files using Python Pandas
Let us see how to join the data of two excel files and save the merged data as a new Excel file. We have 2 files, registration details.xlsx and exam results.xlsx. registration details.xlsx We are having 7 columns in this file with 14 unique students details. Column names are as follows : Admission D
2 min read
How to Convert Multiple PowerPoint Files Into Pdf with Excel VBA?
Often clients need PPT files as PDFs. It helps to view on any device. Use below VBA Macro to convert PowerPoint files from a folder and save them as PDF in the same folder. Implementation: Follow the below steps to convert multiple PowerPoint files into PDFs using Excel VBA: Step 1: Open Excel. Step
2 min read
How to Merge Content of All Files in Folder with Power Query?
Use Power Query to create a single table from numerous files with the same schema that are saved in the same folder. For instance, you could wish to merge budget workbooks from several departments each month when the columns are the same but the workbooks have different numbers of rows and values. O
3 min read
How to read multiple Excel files in R
In this article, we will discuss how to merge multiple Excel files in the R programming language. Modules Used:dplyr: The dplyr package in R is a structure of data manipulation that provides a uniform set of verbs, helping to resolve the most frequent data manipulation hurdles.plyr: The âplyrâ packa
2 min read