What neural network model for segmentation?

In the previous part. We looked at some of the different state of the art segmentation neural network models. In particular FCN, PSP and DeepLabV3. In this part we are going to see if we can improve PSP and DeepLabV3. Moving DeepLabV3 into the direction of DeepLabV3+ and PSP to PSP+. One of the weakest part of both the original implementation of PSP and DeepLabV3 network models is the loss of context information due to the down-scaling in the encoder network and then just basically up-sampling the final label result image to the original resolution. Therefore lot of small objects and fine detail get lost by the crude upsampling. Instead we tried to add a very simple decoder network (using a deconvolution layer with batch normalization and relu activation) inbetween the final convolution layer / up sampling layer. This should reduces the resolution loss to half of the original image input. We use the same nine test images from the first part and check how it compares to the original network

First batch of images

Here is the result of the first three test images three different ethnicities relatively simply background and composition. Note wearable items should be marked as background as they are in the training set

Source Image First example image   Source example image 2  Source example image 3
Original PSP Resnet50  PSP50 Result image 1  PSP50 Result Image 2  PSP50 Result image 3
Original PSP Resnet101  PSP101 Result Image 1  PSP101 Result Image 2  PSP101 Result Image 3
PSP+ Resnet50  PSP+50 Result image 1  PSP+50 Result Image 2  PSP+50 Result image 3
PSP+ Resnet101  PSP+101 Result Image 1  PSP+101 Result Image 2  PSP+101 Result Image 3
Deeplab V3 Resnet 50  DeepLabV3 Res50 Result Image 1  DeepLabV3 Res50 Result Image 2  DeepLabV3 Res50 Result Image 3
Deeplab V3 Resnet 101  DeepLabV3 Res101 Result Image 1  DeepLabV3 Res101 Result Image 2  DeepLabV3 Res101 Result Image 3
Deeplab V3+ Resnet 50  DeepLabV3+ Res50 Result Image 1  DeepLabV3+ Res50 Result Image 2  DeepLabV3+ Res50 Result Image 3
Deeplab V3+ Resnet 101  DeepLabV3 Res101 Result Image 1  DeepLabV3 Res101 Result Image 2  DeepLabV3 Res101 Result Image 3

From the first three test images, the improvement of having even a very simple decoder network is quiet substantial. Far more of the small details eyes, eye brows, mouth gets labeled correctly now  and a lot more details is recovered that previously got lost in the original network implementation. However what is also interesting is that the version with the smaller base network easily exceeds the original version even the one with the much deeper/larger basenetwork. Actually there seems to be little benefit in having a larger/deeper basenetwork for neither PSP+ and DeepLabV3+ compared to the original version. The other noteworthy thing is that the beard does no longer get classified at all for the second source image.

Second batch of images

The second batch of examples starts of with an really tough example a female person were her hair is basically covering most of her face and therefore occluding many parts of the face. The other one is a frontal pose of a face and a side potrait with different amount of detail visible

Source Image Source example image 4  Source example image 5  Source example image 6
PSP Resnet50  PSP50 Result image 4  PSP50 Result Image 5  PSP50 Result image 6
PSP Resnet101  PSP101 Result Image 4  PSP101 Result Image 5  PSP101 Result Image 6
PSP+ Resnet50  PSP50 Result image 4  PSP50 Result Image 5  PSP50 Result image 6
PSP+ Resnet101  PSP+101 Result Image 4  PSP+101 Result Image 5  PSP+101 Result Image 6
Deeplab V3 Resnet 50  DeepLabV3 Res50 Result Image 4  DeepLabV3 Res50 Result Image 5  DeepLabV3 Res50 Result Image 6
Deeplab V3 Resnet 101  DeepLabV3 Res101 Result Image 4  DeepLabV3 Res101 Result Image 5  DeepLabV3 Res101 Result Image 6
Deeplab V3+ Resnet 50  DeepLabV3+ Res50 Result Image 4  DeepLabV3+ Res50 Result Image 5  DeepLabV3+ Res50 Result Image 6
Deeplab V3+ Resnet 101  DeepLabV3+ Res101 Result Image 4  DeepLabV3+ Res101 Result Image 5  DeepLabV3+ Res101 Result Image 6

Adding a simple decoder did not help much in the case of the first test image as already mentioned in the first part this probably indicates that there are not enough similar cases of these in the training set. But for the other two as with the first batch we can see a good improvement especially on the second image teeth are not correctly classified for PSP+ and DeepLabV3+ network which was not the case before. Again the benefit of the deeper basenetwork in case of PSP+ seems marginal.  

Third batch of images

 Another diverse batch here with different difficulty, diverse headwerable, different head poses and lighting conditions

Source Image Source example image 7 Source example image 8  Source example image 9
PSP Resnet50  PSP50 Result image 7  PSP50 Result Image 8  PSP50 Result image 9
PSP Resnet101  PSP101 Result Image 7  PSP101 Result Image 8  PSP101 Result Image 9
PSP+ Resnet50  PSP+50 Result image 7  PSP+50 Result Image 8  PSP+50 Result image 9
PSP+ Resnet101  PSP+101 Result Image 7  PSP+101 Result Image 8  PSP+101 Result Image 9
Deeplab V3 Resnet 50  DeepLabV3 Res50 Result Image 7  DeepLabV3 Res50 Result Image 8  DeepLabV3 Res50 Result Image 9
Deeplab V3 Resnet 101  DeepLabV3 Res101 Result Image 7  DeepLabV3 Res101 Result Image 8  DeepLabV3 Res101 Result Image 9
Deeplab V3+ Resnet 50  DeepLabV3+ Res50 Result Image 7  DeepLabV3+ Res50 Result Image 8  DeepLabV3+ Res50 Result Image 9
Deeplab V3+ Resnet 101  DeepLabV3+ Res101 Result Image 7  DeepLabV3+ Res101 Result Image 8  DeepLabV3+ Res101 Result Image 9

This batch shows some interesting differences first this is one of the few examples where the larger basenetwork of PSP has some advantages and in the case of DeepLabV3 we see it does but not always (check the second image in the batch)

Summary

  • Even a simple decoder network can increase the quality of the labled results dramatically especially if you have small areas or fine details
  • Benefit of having a deeper basenetwork for PSP+ is small if you have to trade off computation by reducing the basenetwork complexity and add a decoder network instead. Definitely go for the decoder network
  • DeepLabV3+ mostly seems to outperform PSP+

In the next part we are going to look at some other alternative network solutions, before looking at alternative base networks for PSP and DeepLabV3 and further improvements on the decoder network

You are interested in any of the trained models well get in contact with us simply write an email to