Often there is a need to modify a pandas dataframe to remove unnecessary columns or to prepare the dataset for model building. Column manipulation can happen in a lot of ways in Pandas, for instance, using df.drop method selected columns can be dropped. In this comprehensive tutorial we will learn how to drop columns in pandas dataframe in following 8 ways: 

1. Making use of “columns” parameter of drop method

2. Using a list of column names and axis parameter

3. Select columns by indices and drop them : Pandas drop unnamed columns

4. Pandas slicing columns by index : Pandas drop columns by Index

5. Pandas slicing columns by name

6. Python’s “del” keyword : 

7. Selecting columns with regex patterns to drop them

8. Dropna : Dropping columns with missing values

This detail tutorial shows how to drop pandas column by index, ways to drop unnamed columns, how to drop multiple columns, uses of pandas drop method and much more. Furthermore,  in method 8, it shows various  uses of pandas dropna method to drop columns with missing values. 

Let’s get started.

First, let’s understand pandas drop method and it’s parameters.  

 

Pandas Dataframe's drop() method

DataFrame.drop(self, labels=None, axis=0, index=None, columns=None, level=None, inplace=False, errors=’raise’)

Parameters:

labels:         String/List of column names or row index value.

axis:            0 or “index” for rows. 1 or “columns” for columns.

index:         to provide row labels

columns:    to provide column names

level:          to specify level in case of multi-index dataframes

inplace:      modifies original dataframe if set True

errors:        ignores errors(eg. if provided column does not exist in dataframe) if set ‘ignore’.

 

Code to drop one or multiple columns in pandas in 8 ways

import pandas as pd

# Let's create a pandas dataframe
df = pd.DataFrame({"Name": ['Joyce', 'Joy', 'Ram', 'Maria'], 
                   "Age": ['19', '18', '20', '19'],
                   "Hobbies": ['Yoga', 'Dance', 'Sports', 'Reading'],
                   "Favorite Food": ['Pizza', 'Pasta', 'Dosa', 'Idly']},
                   columns = ['Name', 'Age', 'Hobbies', 'Favorite Food'])
df
 NameAgeHobbiesFavorite Food
0Joyce19YogaPizza
1Joy18DancePasta
2Ram20SportsDosa
3Maria19ReadingIdly
## No 1. Drop Columns by their names using columns parameter
df.drop(columns = ['Age', 'Hobbies'])
 NameFavorite Food
0JoycePizza
1JoyPasta
2RamDosa
3MariaIdly
## No 2. Drop columns by their names, don't forget to set axis=1
df.drop(['Age', 'Hobbies'], axis=1)
 NameFavorite Food
0JoycePizza
1JoyPasta
2RamDosa
3MariaIdly
## No 3.  Drop columns using column indices,
df.drop(df.iloc[:, 1:3], axis = 1)
 NameFavorite Food
0JoycePizza
1JoyPasta
2RamDosa
3MariaIdly
## No 4. Drop columns using index slicing
df.drop(df.iloc[:, 1::2], axis = 1)
 NameHobbies
0JoyceYoga
1JoyDance
2RamSports
3MariaReading
## No 5. Dropping columns with column names and slicing with step size 2
df.drop(df.loc[:, 'Name':'Favorite Food':2].columns, axis = 1) 
 AgeFavorite Food
019Pizza
118Pasta
220Dosa
319Idly
## No 6. Drop columns using python's del keyword, becareful as this would modify the original dataframe.
del df['Hobbies']
df
 NameAgeFavorite Food
0Joyce19Pizza
1Joy18Pasta
2Ram20Dosa
3Maria19Idly
## No 7. Dropping selected columns using Regex, Here we removed column which had food in it.
regex_str = 'Food'
df.drop(df.columns[df.columns.str.contains(regex_str)], axis=1)
 NameAge
0Joyce19
1Joy18
2Ram20
3Maria19
## No 8. Drop columns which have missing values
# Let's create a dataframe which has missing values
import numpy as np
df = pd.DataFrame({'Name': ['Gullu', 'Pinto', 'Roger', 'Dan', 'Billo', np.nan],
                   'Age': [23, 32, 12, 32, 34, 13],
                   'Job': ['Painter', 'Singer', np.nan, np.nan, 'Coder', np.nan],
                   'City': [np.nan, 'Paris', np.nan, np.nan, 'Tokyo', np.nan],
                   'Zipcode': [np.nan, np.nan, np.nan, np.nan, np.nan, np.nan]}, columns = ['Name', 'City', 'Age', 'Job', 'Zipcode'])

df
 NameCityAgeJobZipcode
0GulluNaN23PainterNaN
1PintoParis32SingerNaN
2RogerNaN12NaNNaN
3DanNaN32NaNNaN
4BilloTokyo34CoderNaN
5NaNNaN13NaNNaN
df.dropna(axis='columns')
 Age
023
132
212
332
434
513
## Keep columns which have atleast 50% not null values and drop others
df.dropna(thresh=len(df)*.5, axis='columns')
 NameAgeJob
0Gullu23Painter
1Pinto32Singer
2Roger12NaN
3Dan32NaN
4Billo34Coder
5NaN13NaN
## Drop Columns where all values are missing
df.dropna(how='all', axis='columns')
 NameCityAgeJob
0GulluNaN23Painter
1PintoParis32Singer
2RogerNaN12NaN
3DanNaN32NaN
4BilloTokyo34Coder
5NaNNaN13NaN

Conclusion

To sum up, in this tutorial we learned 8 different ways to remove columns in python pandas dataframe. We explored the use of df.drop method, df.dropna method, python’s del keyword and learned to use their different parameters efficiently. I hope you found this tutorial helpful.

Leave a Reply

Your email address will not be published. Required fields are marked *