• Label Encoding means converting categorical features into numerical values.

  • Features which define a category are Categorical Variables. E.g. Color (red, blue, green), Gender(Male, Female). Machine learning models expect features to be either floats or integers therefore categorical features like color, gender etc. need to be converted to numerical values.  Label encoder converts categorical feature to integers.

# Imports
import pandas as pd
import numpy as np
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import LabelEncoder


# Let's make a sample dataframe
df = pd.DataFrame({'Name': ['Spidy','Hulk','Captain','Vision'], 'Likes': ['candy', 'pizza', 'burger', 'candy'],
                   'Favorite number': ['1', '3','2','7'],'Target': [1,2,2,1]},
                   columns=['Name','Likes','Favorite number', 'Target'])
df
Name Likes Favorite number Target
0 Spidy candy 1 1
1 Hulk pizza 3 2
2 Captain burger 2 2
3 Vision candy 7 1
# Likes and Favorite color are categorical variable
# They need to be converted to numerical form so that machine learning model can understand

# Using Labelencoder to convert categorical to numerical
le = LabelEncoder()
df['Likes_encoded'] = le.fit_transform(df['Likes'])
df
Name Likes Favorite number Target Likes_encoded
0 Spidy candy 1 1 1
1 Hulk pizza 3 2 2
2 Captain burger 2 2 0
3 Vision candy 7 1 1
# To get labels from values 
le.inverse_transform([1,0,0,2])
array(['candy', 'burger', 'burger', 'pizza'], dtype=object)

There are other ways to handle non numeric data. TFIDF and count vectorizer/bag of words are used to handle text data. Check them out. 

That's how we learned about Label Encoding | Encode Categorical features

That’s all for this mini tutorial. To sum it up, we learned about Label Encoding | Encode Categorical features.

Hope it was easy, cool and simple to follow. Now it’s on you.

Leave a Reply

Your email address will not be published. Required fields are marked *