How to merge multiple Microsoft Excel files into a single workbook with Python

How to merge multiple Microsoft Excel files into a single workbook with Python. Combining multiple excel sheets into one workbook has never been this easy, now with the use of python just with a few lines of code it can be achieved.

The python programming language is an invaluable addition to the tech space, especially when you want to work with structured data. Since data is mostly stored in Excel files, it is compulsory to consolidate multiple files to save time and effort.

Normally, we’re working with Excel files, and we surely have come across a scenario where we need to merge multiple Excel files into one. The traditional method has always been using a VBA code inside excel which does the job but is a multi-step process and is not so easy to understand. Another method is manually copying long Excel files into one which is not only time-consume, troublesome but also error-prone. 

Python’s Pandas is an excellent tool for beginners and advanced users alike. The library is used extensively by developers who want to master Python. Even if you are a beginner, you can benefit immensely by learning the nuances of Pandas and how the library is used within Python.

In this guide, you will need to install and use the Pandas libraries to import data into Python before merging.

Install Pandas Libraries in Python

Pandas is a third-party library that you can install in Python. Some IDEs already have Pandas installed in them. If you are using an IDE version that doesn’t come with pre-installed Pandas, rest assured, that you can install it directly in Python.

Here’s how to install Pandas:

pip install pandas

If you’re using Jupyter Notebook, you can install Pandas directly with the PIP command. Mostly, when you have installed Jupyter with Anaconda, there are high chances of already having Pandas available for direct use.

If you cannot call Pandas, you can use the above command to install them directly.

How to simply combine Excel files with Python

First, you need to create a folder in your preferred location with all the Excel files. Once the folder is ready, you can start writing the code to import the libraries.

You will use two variables in this code:

  1. Pandas: The Pandas library provides the data frames to store Excel files.
  2. OS: The library is beneficial for reading data from your machine’s folder

To import these libraries, use these commands:

Import Pandas as pd
Import OS
  • Import: Python syntax used to import the libraries in Python
  • Pandas: Name of the library
  • pd: Alias given to the library
  • OS: A library to access the system folder

Once you have imported the libraries, create two variables to store the input and output file path. The input file path is needed to access the files’ folder. The output file path is necessary as the combined file will be exported there.

If you are using Python, make sure you change the backslash to forward-slash (\ to /)

input_file_path = "C:/Users/gaurav/OneDrive/Desktop/Excel files/"
output_file_path = "C:/Users/gaurav/OneDrive/Desktop/"

Append the / at the end as well to complete the paths.

The folder’s files are available in a list. Create a list to store all the file references of the input folder using the listdir function from the OS library.

If you are unsure of the functions available within a library, you can use the dir function with the library name. For example, to check the exact version of the listdir function, you can use the command as follows:

dir(OS)

The output will consist of all the associated functions available within the OS library. The listdir function is one of the many functions available within this library.

Create a new variable to store the input files from the folder.

excel_file_list = os.listdir(input_file_path)

Print this variable to see the names of the files stored within the folder. All files stored within the folder are displayed once you use the print function.

print (excel_file_list)

Next, you need to add a new data frame to store each Excel file. Imagine a data frame as a container for storing data. Here’s the command for creating a data frame.

df = pd.DataFrame()
  • df: Variable to store the value of the DataFrame
  • pd: Alias for the Pandas library
  • DataFrame: Default syntax for adding a data frame

The input folder has three .xlsx files in this example. The file names are:

File1_excel.xlsx
File2_excel.xlsx
File3_excel.xlsx

To open each file from this folder, you need to run a loop. The loop will run for each of the files in the list created above.

Here’s how you can do it:

for excel_files in excel_file_list:

Next, it’s necessary to check the extensions of the files since the code will open XLSX files only. To check these files, you can use an If statement.

Use the endswith function for this purpose, as follows:

for excel_files in excel_file_list:

if excel_files.endswith(".xlsx"):
  • excel_files: List with all the file values
  • endswith: Function to check the extension of the files
  • (“.xlsx”): This string value can change, depending on what you want to search for

Now that you’ve identified the Excel files, you can create a new data frame to read and store the files individually.

for excel_files in excel_file_list:

if excel_files.endswith(".xlsx"):

df1 = pd.read_excel(input_file_path+excel_files)
  • df1: New data frame
  • pd: Pandas library
  • read_excel: Function to read Excel files within the Pandas library
  • input_file_path: Path of the folder where the files are stored
  • excel_files: Any variable which is used in the for loop

To start appending the files, you need to use the append function.

for excel_files in excel_file_list:

if excel_files.endswith(".xlsx"):

df1 = pd.read_excel(input_file_path+excel_files)
df = df.append(df1)

Finally, now that the consolidated data frame is ready, you can export it to the output location. In this case, you are exporting the data frame to an XLSX file.

df.to_excel(output_file_path+"Consolidated_file.xlsx")
  • df: Dataframe to export
  • to_excel: Command used to export the data
  • output_file_path: Path defined for storing the output
  • Consolidated_file.xlsx: Name of the consolidated file

Now, let’s look at the final code:

#Pandas is used as a dataframe to handle Excel files
import pandas as pd
import os

# change the slash from “\” to “/”, if you are using Windows devices

input_file_path = "C:/Users/gaurav/OneDrive/Desktop/Excel files/"
output_file_path = "C:/Users/gaurav/OneDrive/Desktop/"

#create a list to store all the file references of the input folder using the listdir function from the os library.
#To see the contents of a library (like the listdir function, you can use the dir function on the library name).
#Use dir(library_name) to list contents

excel_file_list = os.listdir(input_file_path)

#print all the files stored in the folder, after defining the list
excel_file_list


#Once each file opens, use the append function to start consolidating the data stored in multiple files

#create a new, blank dataframe, to handle the excel file imports
df = pd.DataFrame()

#Run a for loop to loop through each file in the list
for excel_files in excel_file_list:
#check for .xlsx suffix files only
if excel_files.endswith(".xlsx"):
#create a new dataframe to read/open each Excel file from the list of files created above
df1 = pd.read_excel(input_file_path+excel_files)
#append each file into the original empty dataframe
df = df.append(df1)

#transfer final output to an Excel (xlsx) file on the output path
df.to_excel(output_file_path+"Consolidated_file.xlsx")

Leave A Reply

Please enter your comment!
Please enter your name here

This site uses Akismet to reduce spam. Learn how your comment data is processed.

AllEscort