What neural network model for segmentation?
In the previous part. We looked at some of the different state of the art segmentation neural network models. In particular FCN, PSP and DeepLabV3. In this part we are going to see if we can improve PSP and DeepLabV3. Moving DeepLabV3 into the direction of DeepLabV3+ and PSP to PSP+. One of the weakest part of both the original implementation of PSP and DeepLabV3 network models is the loss of context information due to the down-scaling in the encoder network and then just basically up-sampling the final label result image to the original resolution. Therefore lot of small objects and fine detail get lost by the crude upsampling. Instead we tried to add a very simple decoder network (using a deconvolution layer with batch normalization and relu activation) inbetween the final convolution layer / up sampling layer. This should reduces the resolution loss to half of the original image input. We use the same nine test images from the first part and check how it compares to the original network
First batch of images
Here is the result of the first three test images three different ethnicities relatively simply background and composition. Note wearable items should be marked as background as they are in the training set
Source Image | |||
Original PSP Resnet50 | |||
Original PSP Resnet101 | |||
PSP+ Resnet50 | |||
PSP+ Resnet101 | |||
Deeplab V3 Resnet 50 | |||
Deeplab V3 Resnet 101 | |||
Deeplab V3+ Resnet 50 | |||
Deeplab V3+ Resnet 101 |
From the first three test images, the improvement of having even a very simple decoder network is quiet substantial. Far more of the small details eyes, eye brows, mouth gets labeled correctly now and a lot more details is recovered that previously got lost in the original network implementation. However what is also interesting is that the version with the smaller base network easily exceeds the original version even the one with the much deeper/larger basenetwork. Actually there seems to be little benefit in having a larger/deeper basenetwork for neither PSP+ and DeepLabV3+ compared to the original version. The other noteworthy thing is that the beard does no longer get classified at all for the second source image.
Second batch of images
The second batch of examples starts of with an really tough example a female person were her hair is basically covering most of her face and therefore occluding many parts of the face. The other one is a frontal pose of a face and a side potrait with different amount of detail visible
Source Image | |||
PSP Resnet50 | |||
PSP Resnet101 | |||
PSP+ Resnet50 | |||
PSP+ Resnet101 | |||
Deeplab V3 Resnet 50 | |||
Deeplab V3 Resnet 101 | |||
Deeplab V3+ Resnet 50 | |||
Deeplab V3+ Resnet 101 |
Adding a simple decoder did not help much in the case of the first test image as already mentioned in the first part this probably indicates that there are not enough similar cases of these in the training set. But for the other two as with the first batch we can see a good improvement especially on the second image teeth are not correctly classified for PSP+ and DeepLabV3+ network which was not the case before. Again the benefit of the deeper basenetwork in case of PSP+ seems marginal.
Third batch of images
Another diverse batch here with different difficulty, diverse headwerable, different head poses and lighting conditions
Source Image | |||
PSP Resnet50 | |||
PSP Resnet101 | |||
PSP+ Resnet50 | |||
PSP+ Resnet101 | |||
Deeplab V3 Resnet 50 | |||
Deeplab V3 Resnet 101 | |||
Deeplab V3+ Resnet 50 | |||
Deeplab V3+ Resnet 101 |
This batch shows some interesting differences first this is one of the few examples where the larger basenetwork of PSP has some advantages and in the case of DeepLabV3 we see it does but not always (check the second image in the batch)
Summary
- Even a simple decoder network can increase the quality of the labled results dramatically especially if you have small areas or fine details
- Benefit of having a deeper basenetwork for PSP+ is small if you have to trade off computation by reducing the basenetwork complexity and add a decoder network instead. Definitely go for the decoder network
- DeepLabV3+ mostly seems to outperform PSP+
In the next part we are going to look at some other alternative network solutions, before looking at alternative base networks for PSP and DeepLabV3 and further improvements on the decoder network
You are interested in any of the trained models well get in contact with us simply write an email to