8 Ways to Drop Columns in Pandas
Often there is a need to modify a pandas dataframe to remove unnecessary columns or to prepare the dataset for model building. Column manipulation can happen in a lot of ways in Pandas, for instance, using df.drop method selected columns can be dropped. In this comprehensive tutorial we will learn how to drop columns in pandas dataframe in following 8 ways:
1. Making use of “columns” parameter of drop method
2. Using a list of column names and axis parameter
3. Select columns by indices and drop them : Pandas drop unnamed columns
4. Pandas slicing columns by index : Pandas drop columns by Index
5. Pandas slicing columns by name
6. Python’s “del” keyword :
7. Selecting columns with regex patterns to drop them
8. Dropna : Dropping columns with missing values
This detail tutorial shows how to drop pandas column by index, ways to drop unnamed columns, how to drop multiple columns, uses of pandas drop method and much more. Furthermore, in method 8, it shows various uses of pandas dropna method to drop columns with missing values.
Let’s get started.
First, let’s understand pandas drop method and it’s parameters.
Pandas Dataframe's drop() method
DataFrame.drop(self, labels=None, axis=0, index=None, columns=None, level=None, inplace=False, errors=’raise’)
Parameters:
labels: String/List of column names or row index value.
axis: 0 or “index” for rows. 1 or “columns” for columns.
index: to provide row labels
columns: to provide column names
level: to specify level in case of multi-index dataframes
inplace: modifies original dataframe if set True
errors: ignores errors(eg. if provided column does not exist in dataframe) if set ‘ignore’.
Code to drop one or multiple columns in pandas in 8 ways
import pandas as pd
# Let's create a pandas dataframe
df = pd.DataFrame({"Name": ['Joyce', 'Joy', 'Ram', 'Maria'],
"Age": ['19', '18', '20', '19'],
"Hobbies": ['Yoga', 'Dance', 'Sports', 'Reading'],
"Favorite Food": ['Pizza', 'Pasta', 'Dosa', 'Idly']},
columns = ['Name', 'Age', 'Hobbies', 'Favorite Food'])
df
Name | Age | Hobbies | Favorite Food | |
---|---|---|---|---|
0 | Joyce | 19 | Yoga | Pizza |
1 | Joy | 18 | Dance | Pasta |
2 | Ram | 20 | Sports | Dosa |
3 | Maria | 19 | Reading | Idly |
## No 1. Drop Columns by their names using columns parameter
df.drop(columns = ['Age', 'Hobbies'])
Name | Favorite Food | |
---|---|---|
0 | Joyce | Pizza |
1 | Joy | Pasta |
2 | Ram | Dosa |
3 | Maria | Idly |
## No 2. Drop columns by their names, don't forget to set axis=1
df.drop(['Age', 'Hobbies'], axis=1)
Name | Favorite Food | |
---|---|---|
0 | Joyce | Pizza |
1 | Joy | Pasta |
2 | Ram | Dosa |
3 | Maria | Idly |
## No 3. Drop columns using column indices,
df.drop(df.iloc[:, 1:3], axis = 1)
Name | Favorite Food | |
---|---|---|
0 | Joyce | Pizza |
1 | Joy | Pasta |
2 | Ram | Dosa |
3 | Maria | Idly |
## No 4. Drop columns using index slicing
df.drop(df.iloc[:, 1::2], axis = 1)
Name | Hobbies | |
---|---|---|
0 | Joyce | Yoga |
1 | Joy | Dance |
2 | Ram | Sports |
3 | Maria | Reading |
## No 5. Dropping columns with column names and slicing with step size 2
df.drop(df.loc[:, 'Name':'Favorite Food':2].columns, axis = 1)
Age | Favorite Food | |
---|---|---|
0 | 19 | Pizza |
1 | 18 | Pasta |
2 | 20 | Dosa |
3 | 19 | Idly |
## No 6. Drop columns using python's del keyword, becareful as this would modify the original dataframe.
del df['Hobbies']
df
Name | Age | Favorite Food | |
---|---|---|---|
0 | Joyce | 19 | Pizza |
1 | Joy | 18 | Pasta |
2 | Ram | 20 | Dosa |
3 | Maria | 19 | Idly |
## No 7. Dropping selected columns using Regex, Here we removed column which had food in it.
regex_str = 'Food'
df.drop(df.columns[df.columns.str.contains(regex_str)], axis=1)
Name | Age | |
---|---|---|
0 | Joyce | 19 |
1 | Joy | 18 |
2 | Ram | 20 |
3 | Maria | 19 |
## No 8. Drop columns which have missing values
# Let's create a dataframe which has missing values
import numpy as np
df = pd.DataFrame({'Name': ['Gullu', 'Pinto', 'Roger', 'Dan', 'Billo', np.nan],
'Age': [23, 32, 12, 32, 34, 13],
'Job': ['Painter', 'Singer', np.nan, np.nan, 'Coder', np.nan],
'City': [np.nan, 'Paris', np.nan, np.nan, 'Tokyo', np.nan],
'Zipcode': [np.nan, np.nan, np.nan, np.nan, np.nan, np.nan]}, columns = ['Name', 'City', 'Age', 'Job', 'Zipcode'])
df
Name | City | Age | Job | Zipcode | |
---|---|---|---|---|---|
0 | Gullu | NaN | 23 | Painter | NaN |
1 | Pinto | Paris | 32 | Singer | NaN |
2 | Roger | NaN | 12 | NaN | NaN |
3 | Dan | NaN | 32 | NaN | NaN |
4 | Billo | Tokyo | 34 | Coder | NaN |
5 | NaN | NaN | 13 | NaN | NaN |
df.dropna(axis='columns')
Age | |
---|---|
0 | 23 |
1 | 32 |
2 | 12 |
3 | 32 |
4 | 34 |
5 | 13 |
## Keep columns which have atleast 50% not null values and drop others
df.dropna(thresh=len(df)*.5, axis='columns')
Name | Age | Job | |
---|---|---|---|
0 | Gullu | 23 | Painter |
1 | Pinto | 32 | Singer |
2 | Roger | 12 | NaN |
3 | Dan | 32 | NaN |
4 | Billo | 34 | Coder |
5 | NaN | 13 | NaN |
## Drop Columns where all values are missing
df.dropna(how='all', axis='columns')
Name | City | Age | Job | |
---|---|---|---|---|
0 | Gullu | NaN | 23 | Painter |
1 | Pinto | Paris | 32 | Singer |
2 | Roger | NaN | 12 | NaN |
3 | Dan | NaN | 32 | NaN |
4 | Billo | Tokyo | 34 | Coder |
5 | NaN | NaN | 13 | NaN |
Conclusion
To sum up, in this tutorial we learned 8 different ways to remove columns in python pandas dataframe. We explored the use of df.drop method, df.dropna method, python’s del keyword and learned to use their different parameters efficiently. I hope you found this tutorial helpful.
Related Resources:
- Reset Index in Pandas Dataframe | Pandas tutorial Reset Index Reset index in pandas using “reset_index” method of pandas dataframe. When we perform slicing or filtering operations on...
- Not Operation in Pandas Conditions | Pandas tutorial Not Operation in Pandas Conditions Apply not operation in pandas conditions using (~ | tilde) operator. In this Pandas tutorial...
- Pandas groupby tutorial | Understand Group by Pandas Groupby Group by is an important technique in Data Analysis and Pandas groupby method helps us achieve it. In...
- Rename columns in Pandas | Change column Names Rename Columns in Pandas Set df.columns = List of column name strings to rename columns # Imports import pandas as...
- Save pandas dataframe to csv or excel file ( 2 Ways to Save pandas dataframe) Save Pandas dataframe to csv and excel file Use “to_csv” and “to_excel” methods to save pandas dataframe to csv and...
- Learn Pandas easily with mini tutorials Pandas Learn Pandas with easy mini tutorials. Pandas is one of the major tools for Data Scientists. Pandas enables us...
- Iris Dataset – A Detailed Tutorial Iris Dataset Iris Dataset is a part of sklearn library. Sklearn comes loaded with datasets to practice machine learning techniques...
- Create Pandas series | Pandas tutorial Create Pandas Series Create Pandas Series with “Series” method of Pandas library. In this Pandas tutorial we are creating a...
- Pandas series from Dictionary | Pandas Tutorial Pandas Series from Dictionary Create pandas series from dictionary using “Series” method of Pandas library. In the below example we...
- Pandas Series Index | Pandas tutorial Create Pandas Series with Custom index Create Pandas Series with custom index using “Series” method of Pandas library and index...