Working with Excel using Python: Filtering and combining data frames

Ugur Comlekcioglu (PhD)
4 min readApr 16, 2022

--

In many real-life situations, the data we want to use arrives at us in a very complex way. Firstly, it is best to save the data under an Excel file to analyse it. However, if the data in Excel is too much, Python provides very important conveniences in analysing this data. Many operations such as filtering the data on Excel, recombining the filtered data, and visualising this data can be done using Python. Especially with Pandas library, it is convenient to easily combine Series or DataFrame with various set logic for indexes and relational algebra functionality. Also, pandas provide utilities to compare two Series or DataFrame and summarise their differences.

In this blog, you can find the codes for filtering the data in Excel and visualising the filtered data using boxplot. The flow we will use while writing the codes is as follows.

Workflow

1- Reading the Excel file.
2- Filtering by specific column in Excel
3- Filtering according to certain parameters in a certain column in Excel
4- Combining the filtered data
5- Displaying the filtered data as box plot

Codes

To read Excel, first import Pandas. Then define the Excel file as a data frame to the df variable by applying the following code. If you do not specify a sheet name, this code will read the first sheet of your excel file. You can view the data frame as follows.

import pandas as pd
df = pd.read_excel('C:/database.xlsx')
df.head(5)

You can use the code below to filter the data frame according to the data in certain columns. Here, I define the rows containing “Control” in the Treatment column and “6” in the Time column into a new data frame.

df_6 = df[(df['Treatment']=='Control') & (df['Time']==6)] #Filtering dataframedf_6.head(5)

Now, I do the same to filter the rows containing 18, 30 and 42 in the Time column.

df_18 = df[(df['Treatment']=="Control") & (df['Time']==18)] 
df_30 = df[(df['Treatment']=="Control") & (df['Time']==30)]
df_42 = df[(df['Treatment']=="Control") & (df['Time']==42)]

In the “Time” column, I created 4 data frames, each containing 6, 18, 30 and 42 separately. Now I create a single data frame by combining these data frames with the code below.

result = pd.concat([df_6, df_18, df_30, df_42])
display(result)

With a box plot graph, let’s display the results at the 6th, 18th, 30th and 42nd hours in the Control group using the last created data frame. To do this, first import the Seaborn library.

import seaborn as sns
sns.boxplot(y= "Result", x= "Time", data = result, palette= "inferno")

As you can see, there is an outlier data that prevented the boxplot plot from being displayed correctly. Let’s create a new data frame to solve this problem by filtering the outlier data.

filtered_result = result[result["Result"].between(0.0, 0.50)]
sns.boxplot(y= "Result", x= "Time", data = filtered_result, palette= "rocket")

You can get graphics in different colours using different palettes. The “rocket” palette is used in the chart above. Now, if we write the same code with the “pastel” palette, we can see the colour difference.

sns.boxplot(y= "Result", x= "Time", data = filtered_result, palette= "pastel")

Below you can find all the codes written above combined.

import pandas as pd
import seaborn as sns
df = pd.read_excel('C:/database.xlsx')
df_18 = df[(df['Treatment']=="Control") & (df['Time']==18)] #Filtering dataframe
df_30 = df[(df['Treatment']=="Control") & (df['Time']==30)] #Filtering dataframe
df_42 = df[(df['Treatment']=="Control") & (df['Time']==42)] #Filtering dataframe
result = pd.concat([df_6, df_18, df_30, df_42])
filtered_result = result[result["Result"].between(0.0, 0.50)]
sns.boxplot(y= "Result", x= "Time", data = filtered_result, palette= "inferno")

Conclusion

Excel is a great program for working with data. By combining the power of Excel with the power of Python, it is possible to make great analyses. Python libraries like Pandas are designed so you can easily export your data from Excel to Python and back to Excel. You can also visualise data with the most beautiful graphics with Seaborn. As a result, you can strengthen your Excel analysis and report with Python knowledge. In this blog, I will continue to present the operations that can be done using Python and Excel in small workflows. Therefore, you can be informed about new content by following this blog.

--

--

Ugur Comlekcioglu (PhD)

You find articles about science, environment and critical thinking here.