Deep Residual Learning For Image Recognition

Deep neural networks are extremely challenging to train. A deep residual learning framework is used to ease the training of neural networks which are significantly deeper. The crystal-clear reformulated layers act as learning residual functions regarding layer inputs. Let's have a look at how deep residual learning benefits image recognition.

The extensive pragmatic proof in companies like Tooliqa and Netguru exhibit that these residual networks are easier to optimize which gains reliability from the substantially increased depths.

On the Image net dataset, residual nets having a depth of up to 152 layers can be accessed easily which is 8 times larger than the VGGnets but still pose less complexity. This combo of residual nets achieves a 3.75% error on the Image net test set.

In addition, the depth of representations plays a vital role in many visual recognition tasks. The deep representation result shows a 28% relative refinement over the CoCo object detection dataset.

Introduction to deep residual learning

Deep recognition is an engineering application of machine learning. The Deep convolutional neural networks have a series of breakthroughs for image classifications. Deep networks incorporate low, high, and even level features and the classifiers follow up an end-to-end multi-layer fashion, and the level of feature can be further enhanced by the number of layers.

The network depth plays a crucial role and the leading results on the challenging Image net dataset, utilize very deep models with a depth ranging from sixteen to thirty.

Non-trivial recognition tasks have also enjoyed the advantageous position due to the very deep models. The significance of the depth leads to a question: Is learning better networks is as easy as stacking layers? The barrier in achieving this is the problem of vanishing/exploding gradients which in turn hinders convergence from the beginning.

Further, this problem is rectified by using the normalized initialization and intermediate normalization layers, enabling networks with tens of layers to start converging for stochastic gradient descent (SGD) with the use of backpropagation.

  • Residual network

A residual network solves the degradation problem by shortcut or skips connections. They enable very deep networks to be built.

deep residual learning network
Photo by Pixabay

When deeper networks start converging, there comes a degradation problem; with the increase in network depth which leads accuracy to saturation, and then it starts to degrade rapidly. The reason behind this degradation is not overfitting, which in turn adding more layers to a suitable deep model leads to a high training error.

This degradation specifies that all the systems are not easy to optimize. The deep residual nets are easy to optimize when compared to the plain nets which exhibit higher training error when there is an increase in depth.

Also, deep residual nets enjoy the accuracy gains when there is an increase in depth when compared with the plain nets. The image net classification dataset gives far accurate results by using the extremely deep residual nets.

Further, the 152 layers residual net is the deepest network in the Image net, but still poses less complexity than VGG nets (40). The residual learning principle is common in nature and hence can be applied in vision as well as non-vision problems.

Related Work in deep residual learning

1. Residual representations

In image recognition, the representation that is commonly used in the VLAD which encodes by the residual vectors based on the dictionary and the fisher vector can be designed as a probabilistic version of the CLAD. Both have a powerful representation for image retrieval as well as classification.

For vector quantization, encoding the residual vector is more productive than the original vectors. In low-level vision and computer graphics, the Multigrid method is used to solve the partial differential equations redraft the system as subproblems at multiple scales, and these subproblems provide the coarser to a finer scale.

Furthermore, the hierarchical basis preconditioning acts as a backup to a multigrid which depends on variables for representing the residual vectors between two scales. These solvers converge much faster than the standard solvers, unaware of the residual solutions. This method puts forward a good reformulation or preconditioning which in turn simplifies the optimization.

2. Shortcut connections

The prior practice of training multi-layer perceptrons (MLPs) is by adding a linear layer that is connected from the network input to output, some intermediate layers which are connected to auxiliary classifiers for addressing vanishing/exploding gradients.

Moreover, an inception layer is made of a shortcut branch and some deeper branches. The highway networks have shortcut connections with gate functions where these gates are data-dependent and hold parameters.

Also, the identity shortcuts are parameter-free. When a gated shortcut is closed, the highway network layers represent the non-residual function. The identity shortcuts are never closed and always learn from the residual functions where all the pieces of information are passed through the residual function which to be learned. The highway networks have showcased the accuracy rate even with increased depths.

Photo by Saad Ahmad on Unsplash

Deep residual learning

1. Residual learning

Consider H(x) as an underlying mapping to be fit by a few stacked layers, where x denotes the initial input for these layers. If multiple nonlinear layers asymptotically approximate complicated functions which is equivalent to hypothesize asymptotically approximate residual functions.

Furthermore, making stacked layers to approximate H(x) instead, make these layers approximate a residual function F(x) = H(x) − x. Then the original function will be F(x)+x. Both forms asymptotically approximate the desired functions where ease of learning differs.

2. Identify mapping by shortcuts

A building block is defined as y = F(x, {Wi}) + x, where x and y are input and output vectors of layers. The function F(x, {Wi}) is the residual mapping which too is learned. The function F + x is carried out by a shortcut connection and element-wise addition. The dimensions of x and F should be equal; this makes a linear projection Ws by the shortcut connections to match the dimensions.

                                     y = F(x, {Wi}) + Wsx

3. Network architectures

Plain network: The convolutional layer in plain baselines uses 3×3 filters which have two design rules:

(i) The number of filters in the layers will be the same as the output feature map size.

(ii) If feature map size is halved, which makes the filter number double.

Photo by Moritz Kindler on Unsplash

Residual network

The insertion of shortcut connections in the plain network turns into a residual network (deep residual learning). Moreover, the identity shortcut is used when the dimension is the same for input and output. If the dimension increases, it falls into two categories:

(i) The shortcut performs identity mappings with additional zero entries padded for the dimensional increase.

(ii) The projection shortcut is used to match the increased dimensions.


The Image net implementation takes [21, 40] in practice. The image is resized randomly to its shortest sample size [256, 480] for the scale enhancement [40]. The standard color enhancement in [21] is most commonly used. We initialize the weights in [12] to train all plain as well as residual nets. For comparison studies standard 10-crop testing [2] is used whereas for best results fully convolutional form in [40,12] is used.

Also read: Image Classification: An Artistic Science | insights - Tooliqa

Tooliqa specializes in AI, Computer Vision and Deep Technology to help businesses simplify and automate their processes with our strong team of experts across various domains.

Want to know more on how AI can result in business process improvement? Let our experts guide you.

Reach out to us at


Quick queries for this insight

What are deep neural networks?

Deep neural networks are a type of machine learning algorithm that are modeled on the brain. They are composed of a series of interconnected processing nodes, or neurons, that can learn to recognize patterns of input data. Deep neural networks are capable of performing complex tasks such as identifying objects in images or facial recognition. They are also able to learn from unlabeled data, making them well-suited for applications such as unsupervised learning and anomaly detection. While deep neural networks have shown great promise, they are still a relatively new technology and there is much research ongoing to improve their performance.

What is deep residual network?

A deep residual network (ResNet) is a type of neural network that is designed to reduce the amount of training data required to achieve good performance on a deep learning task. ResNets are built on the idea of skip connections, which are shortcuts that allow information to bypass lower layers in the network and be directly routed to higher layers. This approach helps to improve the accuracy of the model by alleviating the vanishing gradient problem, which is a challenge that Deep Learning models face when training on large datasets. The ResNet architecture was first proposed in 2015, and has since been shown to be effective for a variety of tasks, including image classification, object detection, and semantic segmentation.

Connect with our experts today for a free consultation.

Want to learn more on how computer vision, deep tech and 3D can make your business future proof?

Learn how Tooliqa can help you be future-ready.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

Subscribe to Tooliqa

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

Similar Insights

Built for


waves icon
wedge icon
With advanced-tech offerings designed to handle challenges at scale, Tooliqa delivers solid infrastructure and solutioning which are built for to meet most difficult enterprise-level needs.​

Let's Work Together

wedge icon

Learn how Tooliqa can help you be future-ready with advanced tech solutions addressing your current challenges