Label Encoding means converting categorical features into numerical values.
Features which define a category are Categorical Variables. E.g. Color (red, blue, green), Gender(Male, Female). Machine learning models expect features to be either floats or integers therefore categorical features like color, gender etc. need to be converted to numerical values. Label encoder converts categorical feature to integers.
# Imports
import pandas as pd
import numpy as np
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import LabelEncoder
# Let's make a sample dataframe
df = pd.DataFrame({'Name': ['Spidy','Hulk','Captain','Vision'], 'Likes': ['candy', 'pizza', 'burger', 'candy'],
'Favorite number': ['1', '3','2','7'],'Target': [1,2,2,1]},
columns=['Name','Likes','Favorite number', 'Target'])
df
Name | Likes | Favorite number | Target | |
---|---|---|---|---|
0 | Spidy | candy | 1 | 1 |
1 | Hulk | pizza | 3 | 2 |
2 | Captain | burger | 2 | 2 |
3 | Vision | candy | 7 | 1 |
# Likes and Favorite color are categorical variable
# They need to be converted to numerical form so that machine learning model can understand
# Using Labelencoder to convert categorical to numerical
le = LabelEncoder()
df['Likes_encoded'] = le.fit_transform(df['Likes'])
df
Name | Likes | Favorite number | Target | Likes_encoded | |
---|---|---|---|---|---|
0 | Spidy | candy | 1 | 1 | 1 |
1 | Hulk | pizza | 3 | 2 | 2 |
2 | Captain | burger | 2 | 2 | 0 |
3 | Vision | candy | 7 | 1 | 1 |
# To get labels from values
le.inverse_transform([1,0,0,2])
array(['candy', 'burger', 'burger', 'pizza'], dtype=object)
There are other ways to handle non numeric data. TFIDF and count vectorizer/bag of words are used to handle text data. Check them out.
That's how we learned about Label Encoding | Encode Categorical features
That’s all for this mini tutorial. To sum it up, we learned about Label Encoding | Encode Categorical features.
Hope it was easy, cool and simple to follow. Now it’s on you.
Related Resources:
- One Hot Encoding | What is one hot encoding? One Hot Encoding | Dummies One Hot encoding means splitting categorical variable into multiple binary variables. “One hot” means at...
- Boston Dataset | Scikit learn datasets Boston Dataset Boston Dataset is a part of sklearn library. Sklearn comes loaded with datasets to practice machine learning techniques...
- Iris Dataset – A Detailed Tutorial Iris Dataset Iris Dataset is a part of sklearn library. Sklearn comes loaded with datasets to practice machine learning techniques...
- 8 Ways to Drop Columns in Pandas | A Detailed Guide 8 Ways to Drop Columns in Pandas Often there is a need to modify a pandas dataframe to remove unnecessary...
- Building Adaboost classifier model in Python Building Adaboost classifier model Adaboost is a boosting algorithm which combines weak learners into a strong classifier. Let’s learn building...
- Build Decision Tree classification model in Python Build Decision Tree classifier Build Decision tree model. It is a machine learning algorithm which creates a tree on the...
- Digits Dataset | Scikit learn datasets Digits Dataset Digits Dataset is a part of sklearn library. Sklearn comes loaded with datasets to practice machine learning techniques...
- PCA Principal Component Analysis PCA Principal Component Analysis PCA is a dimensionality reduction technique. PCA aims to find the direction of maximum spread(principal components)....
- Build Logistic Regression classifier model in Python Build Logistic Regression classifier Logistic regression is a linear classifier. Despite the name it is actually a classification algorithm. #...
- Build K Nearest Neighbors classifier model in Python Build K Nearest Neighbors classifier K Nearest Neighbors also known as KNN takes max vote of nearest neighbors and predicts...