Binary Neural Network Part 1

Binary Neural Network Part 1

What you should know:

Before delving into what Binary Neural Networks (BNNs) are, a foundational understanding of forward and back-propagation used in Deep Neural Networks (DNNs) is important (no sweat if not, I will try to keep it simple).

Tip: A great source for learning is "Deep Learning AI" courses taught by Dr.Andrew NG. The online courses offered on Coursera and YouTube by Dr.Andrew NG, will definitely cover everything you need to get started in machine learning and/or deep learning.

What You'll Learn Here (Hopefully):

This blog post aims to be your ticket to understanding what BNNs are all about, how they're different from the regular Full-Precision networks (don't worry, I'll explain that in a second), and why they're causing such a buzz in the world of deep learning research.

Introduction:

In the vast landscape of deep learning, there exists a variety of neural network architectures, each designed to tackle specific tasks or data types. Some popular architectures include the multi-layer perceptron (MLP), convolutional neural networks (CNNs), recurrent neural networks (RNNs), and many more.

These architectures operate by following a common structural pattern:

  1. Input layer: This is where the data to be processed, whether for training or testing, is initially fed into the network. It acts as the entry point for information.

  2. Hidden layer/s: The hidden layers, located between the input and output layers, serve as the powerhouse where the bulk of computations and transformations take place. They extract and process features from the input data, enabling the network to learn complex patterns and relationships within the data.

  3. Output layer: Finally, the processed information from the hidden layers is distilled and presented through the output layer. The output layer provides the network's predictions or results based on the patterns learned during training.

Each of the above mentioned layers is composed of different nodes as shown in the picture below:

The complexity of a neural network is commonly determined by two primary factors: the number of hidden layers and the number of neurons within each layer.

Within these networks, the term "weights" plays a pivotal role. Weights are essentially parameters that the network adjusts during the training phase to produce certain "activations" across the layers. These activations are generated through a combination of weighted inputs, bias terms, and activation functions. The weights in neural networks are fundamental as they represent the strength of connections between neurons in different layers.

At the outset of the training phase, a neural network is initialized with random weights. These weights serve as the knobs that the network tweaks throughout its learning process to predict outputs accurately from given inputs.

During the training phase, the network attempts to predict outcomes based on the provided inputs. Once the prediction is made, it calculates the "error" between its predicted output and the actual observed output. This discrepancy, often quantified using a loss or cost function, represents how far off the predicted value is from the true value.

The real magic happens during "back-propagation." This phase involves the network's adjustment of its weights to minimize the prediction error. Think of it as a learning process where the network acknowledges its mistake (the prediction being far from the actual value) and tries to rectify it.

For instance, suppose the network is tasked with predicting the cost of a 3-bedroom house. If the network initially estimates it to be $100k, yet the actual value stands at $200k, the network recognizes the $100k difference and initiates the necessary weight adjustments during "back-propagation".

By continuously adjusting these weights, the network fine-tunes its internal weights to reduce the prediction error. It repeats this process iteratively, gradually converging towards a point where the predicted value aligns closely with the actual value. Ultimately, the aim is to minimize this prediction error as much as possible, ideally matching the network's predictions with the observed outcomes.

This iterative process of forward-propagation (making predictions) and back-propagation (adjusting weights based on the error) is the fundamental mechanism through which neural networks learn and improve their predictive capabilities.

Now that this is out of the way, lets actually talk about BNNs.

Binary Neural Networks:

Now, here's the catch with many neural networks, they're big on memory consumption. How big? Well, they typically use 32-bit spaces to store each weight. Imagine your network having over 100,000 of these weights! That's a huge amount of memory just for those weights, not to mention the other parameters it needs to keep track of.

For high-powered computers or workstations, this might not be a big deal. But for less powerful devices like IoT gadgets or those with limited computing power, this becomes a big obstacle. These devices simply can't afford to gulp down that much memory. It's like fitting a sumo wrestler into a mini car—it just won't work!

This is where Binary Neural Networks come into play. Instead of storing weights as 32-bits, these networks drastically reduce weight precision to just 1-bit! But that's not all; they take it a step further by also quantizing activations to 1-bit using either stochastic or deterministic activation functions.

The pioneering work in implementing such networks and extreme quantization of weights and activations can be found in this paper. I highly recommend anyone genuinely interested in delving deeper into Binary Neural Networks (BNNs) to begin with that paper. From there, exploring survey papers, like, can provide a broader understanding and insights into the advancements and scope of BNNs. These resources serve as excellent starting points for comprehending the intricacies and developments in the field of BNNs.

However, I acknowledge that not everyone will have access to those academic journals for free, thus I highly encourage watching this series of videos on YouTube for further understanding.

Binary Weights and Activation's:

In Binary Neural Networks (BNNs), activation functions, like the sign function, play a crucial role. They're the ones responsible for generating a binary output for each neuron in the hidden layers.

Here's how it works: Imagine each neuron's output is like a light switch—either "on" or "off," represented as +1 or -1. This decision is made using a simple threshold. For instance, if the result of multiplying the weight of a neuron in the first hidden layer with a single neuron from the input layer is greater than or equal to 0, it's set as +1. If it's less than 0, it's flipped to -1.

While binaryizing the weights and activations in Binary Neural Networks (BNNs) proves effective, a critical issue arises due to the choice of certain activation functions. Functions like the sign function, commonly used in BNNs, present a challenge—they lack gradients, as illustrated in the picture below.

Why is this a problem? Well, in neural networks, gradients play a pivotal role during backpropagation. They guide the network on how to adjust its weights and parameters, ensuring continual improvement in performance. Without gradients, it's like navigating in the dark; the network lacks the essential cues needed to learn from its mistakes and enhance its accuracy.

The authors, were able to overcome this problem by using the straight through estimator technique. If your interested in knowing more about how this technique works, you can read this article.

Sign function

Source

As discussed earlier, during the back-propagation process, neural networks adjust their weights to improve accuracy. However, in Binary Neural Networks (BNNs), there's a twist: the weights are clipped to remain within the range of -1 or +1 throughout the entire training phase.

This means that as the network learns and updates its weights to minimize errors, it ensures that these weights stay within the -1 or +1 boundaries. This constraint maintains the binary nature of the weights, keeping them streamlined and consistent, which is a unique characteristic of BNNs.

Binary Neural Networks Perks:

Given that Binary Neural Networks (BNNs) utilize binary weights, the significant advantage lies in their compact representation, requiring just 1-bit for storage instead of the traditional 32-bit allocation per weight. This substantial reduction in memory footprint translates to a staggering 32x decrease in the memory required for the network to operate efficiently.

The implications of this memory reduction extend beyond mere storage benefits. With less memory required, the frequency of memory accesses also decreases. This reduction in memory access not only enhances the network's speed but also plays a pivotal role in reducing power consumption.

Moreover, the binary nature of weights and activations in Binary Neural Networks (BNNs) leads to a game-changing simplification in computational operations. The traditionally expensive dot product computations between full-precision weights and activations are streamlined into an incredibly simple operation known as the XNOR operation.

This XNOR operation is a fundamental building block in BNNs. It efficiently replaces the computationally intensive dot operations. How? Well, think of it this way: instead of performing complex multiplications and additions of full-precision numbers, the XNOR operation condenses these computations into a straightforward bit-level operation. The binary representation of -1 and +1 is translated to binary operations replacing the heavy computations as shown in the table below.

Summary:

Binary Neural Networks (BNNs) extremely quantize neural network operations by constraining weights and activations to +1 or -1 values. This transformation yields a multitude of advantages, including:

  1. Reduced Memory Footprint: By compressing weights to 1-bit representations, BNNs drastically reduce memory requirements, facilitating efficient network operation even in resource-constrained environments.

  2. Lower Energy Consumption: The decreased memory access and simplified computations through binary operations significantly cut down energy consumption, making BNNs ideal for devices with limited hardware and power resources.

  3. Simplified Computational Complexity: By replacing expensive dot computations with straightforward XNOR binary operations, BNNs streamline computational processes, offering high-speed and efficient information processing within the network.

These advantages position BNNs as a game-changer for deploying deep neural networks across various devices, from smart fridges to IoT gadgets and much more!

In the upcoming part 2, I'll walk through a practical demonstration using Python and TensorFlow. The demonstration will showcase the deployment of BNNs, highlighting their accuracy and resource utilization in comparison to full precision networks. Stay tuned for a hands-on exploration into the practical implementation and performance assessment of BNNs!

Did you find this article valuable?

Support Mo's blog by becoming a sponsor. Any amount is appreciated!