Label Encoding | Encode Categorical features

  • Label Encoding means converting categorical features into numerical values.

  • Features which define a category are Categorical Variables. E.g. Color (red, blue, green), Gender(Male, Female). Machine learning models expect features to be either floats or integers therefore categorical features like color, gender etc. need to be converted to numerical values.  Label encoder converts categorical feature to integers.

# Imports
import pandas as pd
import numpy as np
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import LabelEncoder


# Let's make a sample dataframe
df = pd.DataFrame({'Name': ['Spidy','Hulk','Captain','Vision'], 'Likes': ['candy', 'pizza', 'burger', 'candy'],
                   'Favorite number': ['1', '3','2','7'],'Target': [1,2,2,1]},
                   columns=['Name','Likes','Favorite number', 'Target'])
df
NameLikesFavorite numberTarget
0Spidycandy11
1Hulkpizza32
2Captainburger22
3Visioncandy71
# Likes and Favorite color are categorical variable
# They need to be converted to numerical form so that machine learning model can understand

# Using Labelencoder to convert categorical to numerical
le = LabelEncoder()
df['Likes_encoded'] = le.fit_transform(df['Likes'])
df
NameLikesFavorite numberTargetLikes_encoded
0Spidycandy111
1Hulkpizza322
2Captainburger220
3Visioncandy711
# To get labels from values 
le.inverse_transform([1,0,0,2])
array(['candy', 'burger', 'burger', 'pizza'], dtype=object)

There are other ways to handle non numeric data. TFIDF and count vectorizer/bag of words are used to handle text data. Check them out. 

That's how we learned about Label Encoding | Encode Categorical features

That’s all for this mini tutorial. To sum it up, we learned about Label Encoding | Encode Categorical features.

Hope it was easy, cool and simple to follow. Now it’s on you.

Leave a Reply

Your email address will not be published.