Working with Excel in Python: Transpose rows and columns
Data analysis is about organising collected data and generating actionable predictions. The data analysis process includes various stages from data set creation, cleaning and transformation to producing consistent results and predictions with appropriate models.
Can use various data analysis tools such as Excel, SQL and Tableau in data analysis. Apart from these, Python is a widely-used programming language for data analysis. One of the critical advantages of Python is that it has built-in mathematical libraries and functions that make it easy to calculate mathematical problems and perform data analysis.
Python supports object-oriented programming, structured programming, and functional programming models. Furthermore, Python can handle everything in one unified language, from data mining to website construction to running embedded systems. Also, Python’s being a flexible and open-source language has its advantages. Its huge libraries can be used for data manipulation and, therefore, very easy to learn even for a beginner data analyst.
Data analysis using Python from scratch
Because of these features, I wondered if I could use Python in data analysis from scratch. When using Python to process and analyze data, I find Python’s scripting language easier to learn than other languages. The fact that Python is also intuitive has made my work easier. But most importantly, I was able to find the solution to every problem I encountered while using Python by searching Google.
Am I a professional in Python? Nope! As I said, I started from scratch and had to search the internet for hours while writing each code. Then, I thought I would share the codes I use for data analysis here. So I could help Python beginners like me by sharing the codes I need while working. For this reason, I will present the codes I use. And this is my first blog post about Python, and follow me for the codes that I share in this blog.
Workflow
I used Jupiter Notebook while writing these codes. I am using Python 3.7 and Anaconda installed on my computer. The flow I needed for my work was as follows.
1- Reading the Excel file containing the data,
2- Filtering data in a particular range,
3- Removing rows and columns that I don’t need,
4- Changing the location of rows and columns,
5- Writing the newly created dataset to a new excel file.
Python Codes
I wrote the codes below to provide the workflow above with Python. First, import Pandas, which is required to read the Excel file in Python.
import pandas as pd
Then use the read_excel command to read the Excel file from its location. For example, you can see the first two lines of the Excel file with the df.head(5).
df = pd.read_excel("C:/microbiology.xlsx")
df.head(5)
Use the drop command to delete the “taxlevel” and “daughterlevels” columns from the data frame you see. Set axis to 1 to delete columns. You can see the final version of the data frame.
df.drop(["taxlevel", "daughterlevels"], axis=1, inplace=True)
df.head(5)
Then use the drop command again to delete the lines you don’t want, but this time make sure the axis is 0. If you notice, I have specified the line I want to delete with the index number. You can see the latest version of the data frame.
df.drop([0], axis=0, inplace=True)
df.head(5)
I want to filter the values in the “total” column in my Excel file to be between 0.1 and 1.0. I assign 0.1 and 1.0 values to min and max variables, respectively. You can see the latest status of the data frame after filtering.
min = 0.01
max = 1.0
df2 = df[df["total"].between(vmin, vmax)]
df2.head(5)
Replace rows and columns in Excel file with the following code. You can see the new data frame created. However, if you notice, the index numbers in the data frame have become the header line of the newly created data frame.
df3=df2.T
df3.head(5)
If you do not want this header line in the Excel file, define the header as None when printing the Excel file with a new name. So your new Excel file will be as follows.
df3.to_excel("C:/final_microbiology.xlsx", index=True, header=False)
You can easily do the above workflow using Python in an Excel file. I will share the codes that may be needed for data analysis using Python on this blog. If you are a beginner in Python like me, I hope these codes will be helpful to you. So don’t forget to follow me.