One Hot Encoding | Dummies
One Hot encoding means splitting categorical variable into multiple binary variables.
“One hot” means at a time only one feature is Hot/Active out of all the split features.
In one hot encoding a categorical feature is split into as many features as there are categories in that feature.
In below example, COLOR is a categorical feature with 3 categories namely red, blue and green,
so new features that are formed from COLOR are COLOR_RED, COLOR_GREEN and COLOR_BLUE.
Now, if the value of color for an entry was red then COLOR_RED will be assigned value 1 and
COLOR_GREEN, COLOR_BLUE will be assigned value 0.
In Pandas categorical features can be one hot encoded by ‘get_dummies’ method.
# Imports
import pandas as pd
# Let's create a dataframe
data = {'Color': ['Red', 'Red', 'Green', 'Blue', 'Red', 'Green'],
'Shape': ['Circle', 'Square', 'Square', 'Triangle', 'Circle', 'Triangle'],
'Value': [1, 1, 2, 1, 3, 3]}
df = pd.DataFrame(data)
df
Color | Shape | Value | |
---|---|---|---|
0 | Red | Circle | 1 |
1 | Red | Square | 1 |
2 | Green | Square | 2 |
3 | Blue | Triangle | 1 |
4 | Red | Circle | 3 |
5 | Green | Triangle | 3 |
# Color is a categorical feature having Red, Blue and Green as categories
# Shape is a categorical feature having Circle, Square, Triangle as categories
# Let's get one hot encoded features from Color and Shape
pd.get_dummies(df)
Value | Color_Blue | Color_Green | Color_Red | Shape_Circle | Shape_Square | Shape_Triangle | |
---|---|---|---|---|---|---|---|
0 | 1 | 0 | 0 | 1 | 1 | 0 | 0 |
1 | 1 | 0 | 0 | 1 | 0 | 1 | 0 |
2 | 2 | 0 | 1 | 0 | 0 | 1 | 0 |
3 | 1 | 1 | 0 | 0 | 0 | 0 | 1 |
4 | 3 | 0 | 0 | 1 | 1 | 0 | 0 |
5 | 3 | 0 | 1 | 0 | 0 | 0 | 1 |
That's how we do One Hot Encoding | Dummies
That’s all for this mini tutorial. To sum it up, we learned how to do One Hot Encoding | Dummies.
Hope it was easy, cool and simple to follow. Now it’s on you.
Related Resources:
- Not Operation in Pandas Conditions | Pandas tutorial Not Operation in Pandas Conditions Apply not operation in pandas conditions using (~ | tilde) operator. In this Pandas tutorial...
- Label Encoding | Encode categorical features Label Encoding | Encode Categorical features Label Encoding means converting categorical features into numerical values. Features which define a category...
- Pandas groupby tutorial | Understand Group by Pandas Groupby Group by is an important technique in Data Analysis and Pandas groupby method helps us achieve it. In...
- Reset Index in Pandas Dataframe | Pandas tutorial Reset Index Reset index in pandas using “reset_index” method of pandas dataframe. When we perform slicing or filtering operations on...
- Create Pandas series | Pandas tutorial Create Pandas Series Create Pandas Series with “Series” method of Pandas library. In this Pandas tutorial we are creating a...
- Pandas Series Index | Pandas tutorial Create Pandas Series with Custom index Create Pandas Series with custom index using “Series” method of Pandas library and index...
- Pandas series from Dictionary | Pandas Tutorial Pandas Series from Dictionary Create pandas series from dictionary using “Series” method of Pandas library. In the below example we...
- Gini Index vs Entropy Information gain | Decision Tree | No 1 Guide Gini index vs Entropy Gini index and entropy is the criterion for calculating information gain. Decision tree algorithms use information...
- 8 Ways to Drop Columns in Pandas | A Detailed Guide 8 Ways to Drop Columns in Pandas Often there is a need to modify a pandas dataframe to remove unnecessary...
- PCA Principal Component Analysis PCA Principal Component Analysis PCA is a dimensionality reduction technique. PCA aims to find the direction of maximum spread(principal components)....