Spot Checking ML algorithms
Spot Checking means trying different algorithms.
There is no ‘one algorithm fits all’ in machine learning. An algorithm good for one problem might perform badly for another problem therefore it is necessary to check a few algorithms.
Let’s do it in a simple way.
# Imports
import numpy as np
import pandas as pd
import os
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.model_selection import cross_val_score
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.naive_bayes import MultinomialNB
from sklearn.ensemble import RandomForestClassifier
# Load Dataset
# Dataset can be found at: https://www.kaggle.com/uciml/sms-spam-collection-dataset
df = pd.read_csv('spam.csv', encoding = 'latin-1' )
# Keep only necessary columns
df = df[['v2', 'v1']]
# Rename columns
df.columns = ['SMS', 'Type']
df.head()
# Let's view top 5 rows of the loaded dataset
df.head()
SMS | Type | |
---|---|---|
0 | Go until jurong point, crazy.. Available only … | ham |
1 | Ok lar… Joking wif u oni… | ham |
2 | Free entry in 2 a wkly comp to win FA Cup fina… | spam |
3 | U dun say so early hor… U c already then say… | ham |
4 | Nah I don’t think he goes to usf, he lives aro… | ham |
# Let's see how many spams and hams are there
df.Type.value_counts()
ham 4825 spam 747 Name: Type, dtype: int64
# Let's process the text data
# Instantiate count vectorizer
countvec = CountVectorizer(ngram_range=(1,4), stop_words='english', strip_accents='unicode', max_features=1000)
# countvec = TfidfVectorizer(ngram_range=(1,2), stop_words='english', strip_accents='unicode', max_features=100)
cdf = countvec.fit_transform(df.SMS)
# Instantiate algos
lr = LogisticRegression(penalty='l2')
dt = DecisionTreeClassifier(class_weight="balanced")
mnb = MultinomialNB()
rf = RandomForestClassifier(n_jobs=-1)
ests = {'Logistic Regression':lr,'Decision tree': dt,'Random forest': rf, 'Naive Bayes': mnb}
for est in ests:
print("{} score: {}%".format(est, round(cross_val_score(ests[est],X=cdf.toarray(), y=df.Type.values, cv=5).mean()*100, 3)))
print("\n")
Naive Bayes score: 98.187% Decision tree score: 94.311% Random forest score: 97.469% Logistic Regression score: 97.864%
That's how we Build Spot Checking ML algorithms
That’s all for this mini tutorial. To sum it up, we learned how to Build Spot Checking ML algorithms.
Hope it was easy, cool and simple to follow. Now it’s on you.
Related Resources:
- Gini Index vs Entropy Information gain | Decision Tree | No 1 Guide Gini index vs Entropy Gini index and entropy is the criterion for calculating information gain. Decision tree algorithms use information...
- Spam Classifier | Text Classification ML model Spam Classifier using Naive Bayes Spam classifier machine learning model is need of the hour as everyday we get thousands...
- Cross Validation | How good is your ML model? Cross Validation Cross Validation is a technique to estimate model performance. In N fold cross validation, data is divided into...
- Save Machine Learning model to a file | Pickle Save model to file Save machine learning model so that it can be used again and again without having to...
- PCA Principal Component Analysis PCA Principal Component Analysis PCA is a dimensionality reduction technique. PCA aims to find the direction of maximum spread(principal components)....
- Multilayer Perceptron | Build a Neural Network Multilayer Perceptron | Neural Network Multilayer perceptron is an artificial neural network. MLP is a deep learning algorithm comprising of...
- Precision and Recall to evaluate classifier Precision and Recall Precision and Recall are metrics to evaluate a machine learning classifier. Accuracy can be misleading e.g. Let’s...
- Stop words removal | NLP | Bag of words Stop words removal Stop words are words like a, an, the, is, has, of, are etc. Most of the times...
- Iris Dataset – A Detailed Tutorial Iris Dataset Iris Dataset is a part of sklearn library. Sklearn comes loaded with datasets to practice machine learning techniques...
- Learn Machine learning with mini tutorials Machine Learning Learn machine learning easily and efficiently with mini tutorials one snippet at a time. Machine learning is the...