Working with Excel files in Python is not that much hard as you might think. In this tutorial, we are going to learn how to create, read and modify .xlsx files using python.
Introduction
Xlsx files are the most widely used documents in the technology field. Data Scientists uses spreadsheets more than anyone else in the world and obivously they don't do it manually.
We will need a module called openpyxl which is used to read, create and work with .xlsx files in python. There are some other modules like xlsxwriter, xlrd, xlwt, etc., but, they don't have methods for performing all the operations on excel files. To install the openpyxl module run the following command in command line:
pip install openpyxl
Let's see what all operations one can perform using the openpyxl module after importing the module in our code, which is simple:
import openpyxl
Once we have imported the module in our code, all we have to do is use various methods of the module to rad, write and create .xlsx files.
Creating New .xlsx File
In this program, we are going to create a new .xlsx file.
import openpyxl
## CREATING XLSX FILE
## initializing the xlsx
xlsx = openpyxl.Workbook()
## creating an active sheet to enter data
sheet = xlsx.active
## entering data into the A1 and B1 cells in the sheet
sheet['A1'] = 'Studytonight'
sheet['B1'] = 'A Programming Site'
## saving the xlsx file using 'save' method
xlsx.save('sample.xlsx')
The above program creates an .xlsx file with the name sample.xlsx in the present working directory.
Writing to a Cell
There are to ways to write to a cell. The first method is the one which we used in the program above and the second method is using the cell()
method by passing the row and column numbers.
Let's see how the second way works:
import openpyxl
## initializing the xlsx
xlsx = openpyxl.Workbook()
## creating an active sheet to enter data
sheet = xlsx.active
## entering data into the cells using 1st method
sheet['A1'] = 'Studytonight'
sheet['B2'] = 'Cell B2'
## entering data into the cells using 2nd method
sheet.cell(row = 1, column = 2).value = 'A Programming Site'
sheet.cell(row = 2, column = 1).value = "B1"
## saving the xlsx file using 'save' method
xlsx.save('write_to_cell.xlsx')
In the second method above, we are getting the cell
with the row and column values. After getting the cell, we are assigning a value to it using the value
variable.
Appending Data to a .xlsx file
The append()
method is used to append the data to any cell. Here is an example:
import openpyxl
## initializing the xlsx
xlsx = openpyxl.Workbook()
## creating an active sheet to enter data
sheet = xlsx.active
## creating data to append
data = [
[1, 2, 3],
[4, 5, 6],
[7, 8, 9],
[10, 11, 12]
]
## appending row by row to the sheet
for row in data:
## append method is used to append the data to a cell
sheet.append(row)
## saving the xlsx file using 'save' method
xlsx.save('appending.xlsx')
Using the above program, we have appended 4 rows and 3 column values in our .xlsx file. You can also use tuples
or any iteratable object instead of lists.
Reading Data from a Cell
We are now going to learn how to read data from a cell in a xlsx file. We will use the previously created .xlsx file to read data from the cell.
import openpyxl
## opening the previously created xlsx file using 'load_workbook()' method
xlsx = openpyxl.load_workbook('sample.xlsx')
## getting the sheet to active
sheet = xlsx.active
## getting the reference of the cells which we want to get the data from
name = sheet['A1']
tag = sheet.cell(row = 1, column = 2)
## printing the values of cells
print(name.value)
print(tag.value)
Output of above Program:
Studytonight
A Programming Site
Reading Data from Multiple Cells
Now we are going to use the appending.xlsx file to read data. It contains numeric values from 1 to 12 saved in the cells in form of 4 rows and 3 columns.
import openpyxl
## opening the previously created xlsx file using 'load_workbook()' method
xlsx = openpyxl.load_workbook('appending.xlsx')
## getting the sheet to active
sheet = xlsx.active
## getting the reference of the cells which we want to get the data from
values = sheet['A1' : 'C4']
## printing the values of cells
for c1, c2, c3 in values:
print("{} {} {}".format(c1.value, c2.value, c3.value))
Output of above Program:
1 2 3
4 5 6
7 8 9
10 11 12
Slicing method applied on cells returns a tuple
containing each row as a tuple. And we can print all the cell's data using a loop.
Getting Dimensions of an .xlsx Sheet
Getting the dimensions of an .xlsx sheet is also possible and super-easy using the dimensions
method.
import openpyxl
## opening the previously created xlsx file using 'load_workbook()' method
xlsx = openpyxl.load_workbook('appending.xlsx')
## getting the sheet to active
sheet = xlsx.active
## getting the reference of the cells which we want to get the data from
dimensions = sheet.dimensions
## printing the dimensions of the sheet
print(dimensions)
Output of above Program:
A1:C4
The output of the dimensions
method is the range of the sheet from which cell to which cell the data is present.
Getting Data from Rows of .xlsx File
We can also get data from all the rows of an xlsx file by using the rows
method.
import openpyxl
## opening the previously created xlsx file using 'load_workbook()' method
xlsx = openpyxl.load_workbook('appending.xlsx')
## getting the sheet to active
sheet = xlsx.active
## getting the reference of the cells which we want to get the data from
rows = sheet.rows
## printing the values of cells using rows
for row in rows:
for cell in row:
print(cell.value, end = ' ')
print("\n")
Output of above Program:
1 2 3
4 5 6
7 8 9
10 11 12
rows
method returns a generator
which contains all rows of the sheet.
Getting Data from Columns of .xlsx File
We can get data from all the columns of an xlsx file by using the columns
method.
import openpyxl
## opening the previously created xlsx file using 'load_workbook()' method
xlsx = openpyxl.load_workbook('appending.xlsx')
## getting the sheet to active
sheet = xlsx.active
## getting the reference of the cells which we want to get the data from
columns = sheet.columns
## printing the values of cells using rows
for column in columns:
for cell in column:
print(cell.value, end = ' ')
print("\n")
Output of above Program:
1 4 7 10
2 5 8 11
3 6 9 12
columns
method returns a generator
which contains all the columns of the sheet.
Working with Excel Sheets
In this section we will see how we can create more sheets, get name of sheets and even change the name of any given sheet etc.
1. Changing the name of an Excel Sheet
We can also change the name of a given excel sheet using the title
variable. Let's see an example:
import openpyxl
## initializing the xlsx
xlsx = openpyxl.Workbook()
## creating an active sheet to enter data
sheet = xlsx.active
## entering data into the A1 and B1 cells in the sheet
sheet['A1'] = 'Studytonight'
sheet['B1'] = 'A Programming Site'
## setting the title for the sheet
sheet.title = "Sample"
## saving the xlsx file using 'save' method
xlsx.save('sample.xlsx')
2. Getting Excel Sheet name
Getting the names of all the sheets present in xlsx file is super easy using the openpyxl module. We can use the method called get_sheet_names()
to get names of all the sheets present in the excel file.
import openpyxl
## initializing the xlsx
xlsx = openpyxl.load_workbook('sample.xlsx')
## getting all sheet names
names = xlsx.get_sheet_names()
print(names)
Output of above Program:
['Sample']
3. Creating more than one Sheet in an Excel File
When creating our first xlsx file, we only created one sheet. Let's see how to create multiple sheets and give names to them.
import openpyxl
## initializing the xlsx
xlsx = openpyxl.Workbook()
## creating sheets
xlsx.create_sheet("School")
xlsx.create_sheet("College")
xlsx.create_sheet("University")
## saving the xlsx file using 'save' method
xlsx.save('multiple_sheets.xlsx')
As you can see in the snapshot above that a new excel file is created with 3 new sheets with different names provided by us in the program.
4. Adding Data to Multiple Sheets
Entering data into different sheets present in xlsx file can also be done easily. In the program below, we will get the sheets by their names individually or all at once as we have done above. Let's see how to get sheets using their name and enter data into them.
We will use previously created xlsx file called multiple_sheets.xlsx to enter the data into sheets.
import openpyxl
## initializing the xlsx
xlsx = openpyxl.Workbook()
## creating sheets
xlsx.create_sheet("School")
xlsx.create_sheet("College")
xlsx.create_sheet("University")
## getting sheet by it's name
school = xlsx.get_sheet_by_name("School")
school['A1'] = 1
## getting sheet by it's name
college = xlsx.get_sheet_by_name("College")
college['A1'] = 2
## getting sheet by it's name
university = xlsx.get_sheet_by_name("University")
university['A1'] = 3
## saving the xlsx file using 'save' method
xlsx.save('multiple_sheets.xlsx')
We can't modify the existing xlsx file using the openpyxl module but we can read data from it.
Conclusion
We have seen different methods which can be used while working with xlsx files using Python. If you want to explore more methods available in the openpyxl module, then you can try them using the dir()
method to get information about all methods of openpyxl module.
You can also see other modules like xlsxwriter, xlrd, xlwt, etc., for more functionalities. If you have already used any of these modules, do share your experience with us through comments section.
You may also like: