Skip to content

Naive Bayes

Naive Bayes Tutorial

1- Overview

Machine Learning is a very impressive field in general. But I find Naive Bayes breathtakingly fascinating. Let me explain why.

Imagine a Machine Learning algorithm that was found 250+ years ago by a protesting pastor contemplating God’s involvement in daily events so he founded a probabilistic formula to explain it all which gave way to the whole field of statistics. Now, that’s just the history part. Also imagine this formula working so incredibly fast that it’s still commonly used in 2020s. On top of that, depending on the dataset Naive Bayes can perform with very high accuracy

Also, it’s one of the easiest Machine Learning algorithm to build a model on even without famous machine learning libraries such as Scikit-Learn. As you know we’re talking about Naive Bayes Algorithm.

Different Naive Bayes implementations can be categorized under these titles:

  • Gaussian NB

  • Multinomial NB
  • Complement NB
  • Bernoulli NB
  • Categorical NB
  • Out-of-core NB

Why Naive Bayes Algorithm?

2- Naive Bayes Benefits

Very fast.

Accurate.

Simple to implement.

Scales well.

Interpretable, produces probabilistic reports.

Naive Bayes Pros

Depending on the problem, you might want to be the one to decide on ultimate classification or regression outcome regarding a Machine Learning algorithm’s findings.

In those cases it makes a huge difference to have probability reports on the decisions. Interpretation of the probability results of machine learning algorithm is one thing and having them or not is another, …

For more details, you can read Naive Bayes Advantages.

Naive Bayes Cons

You might be dealing with a safety concerning problem where you don’t accept results below 99% certainty or you might be working on an investment portfolio with high risk appetite and be cool with results above 60% probability.

Cases can change but what definitely helps sometimes is having a machine learning algorithm with the probability reports.

For more details, you can read Naive Bayes Disadvantages.

Application Areas

3- Key Industries

  1. Finance
  2. Medicine
  3. Cyber Security
  4. Retail
  5. E-commerce

Who Found Naive Bayes?

4- Naive Bayes History

Thomas Bayes found Bayes Theorem sometime in mid-18th century and can be accepted as the father Naive Bayes Theorem as well as the whole field of Bayesian Statistics. He only published two papers during his life in 1730s and the original paper regarding Bayes Theorem was edited and published by his friend Richard Price in 1763 after his death.

You can access the original Bayes Theorem paper in the article below as well as read more details about the history of Naive Bayes algorithm:

Is Naive Bayes Fast?

5- Naive Bayes Complexity

Naive Bayes is one of the fastest Machine Learning algorithm we have today. This is the result of its simple training formula with a couple of basic calculus operations. Also, Naive Bayes algorithm stores apriori and conditional parameters to be used during prediction (or inference) phase which makes it blazingly fast during the prediction as well. As a result Naive Bayes learns fast, predicts fast and scales well.

Time complexity of Naive Bayes Algorithms is O(N*P) where N is data size and P is feature size. Regarding Big O terminology it falls somewhere between Linear Complexity and Quadratic Complexity (can also be Cubic Complexity if P=N^2 but that’s quite rare.) for training phase and it’s O(P) Linear Complexity for prediction phase.

We have made some tests to demonstrate the impressive performance of Naive Bayes Algorithms using Scikit-Learn in Python.

Runtime Performance

  • Naive Bayes (1 Million rows with 2 features): 0.2236 seconds (or 223.6 miliseconds)
  • Naive Bayes (1 Million rows with 50+ features): 0.5 seconds

Tests were done using i7 8th Gen processor and 16GB RAM.

You can read a more detailed tutorial about Naive Bayes Complexity and Runtime Performance in the article below:

How to Use Naive Bayes?

6- Scikit-Learn Naive Bayes Implementation

Naive Bayes is a fast, accurate and reliable classifier that can be very helpful when chosen for the suitable project and utilized appropriately with the right kind of Naive Bayes model and correct parameters.

We have created a tutorial that can be used to see a simple initiation of different Naive Bayes models using Scikit-Learn and Python. Naive Bayes can only be used to make classification predictions

There are also a number of Naive Bayes classifiers that are suitable for specific type of feature values each. You can read about the differences between Naive Bayes models such as Gaussian, Multinomial, Complement, Categorical and Bernoulli Naive Bayes in the tutorial below:

How can I improve Naive Bayes?

7- Naive Bayes Optimization

Naive Bayes models can be optimized using its hyperparameters. You can make them behave according to your project by assigning predefined prior class probabilities to its prior parameter or you can increase or decrease a Naive Bayes model’s additive smoothing degree by adjusting the var_smoothing parameter.

You can see a detailed tutorial about optimizing Naive Bayes with Scikit-Learn below:

Is there a Naive Bayes Implementation Example?

8- Naive Bayes Example

Of course the best way to understand a machine learning model and gain actionable skills one has to experiment with data, programming and implementing machine learning algorithms. Naive Bayes is no exception. To help you pave the way we have created a well rounded Naive Bayes example where you can learn basics of a Naive Bayes implementation as well as Naive Bayes Visualization and Machine Learning Model Evaluation.

You can use this example as a starting point and refer to it, replicate it or advance it if you’d like. It’s in the tutorial below: