Humorous examples: Why loss alone is not a good metric for training success

Here is a cautious warning for people doing extensive DL training sessions and hope as long as the global loss in the loss function keeps dropping things are going in the right direction, right?

Well not quiet your network might learn something but not necessarily exactly what you think it will. Here is an example of a modification done to the segmentation encoder/decoder-type network in the hope of improving further. But without realizing I made a small mistake in the new network structure in the encoder part not knowing it would change the whole structure .

So I started a full training session and the loss was going down nicely it started off a bit lower then I was used to be seeing from previous networks but I didn't think much of it at the time.

At the end the loss reached similar or even better levels than in the previous network layout. So all great!

But then when I actually ran some footage over the newly trained network something was rather odd, things did not look quiet in place where they should be, as you can see from these examples:

Strange result from testbatch
Strange result from testbatch

Suspicious I checked some of the training material to see what it produces.

Strange result from trainigbatch
Strange result from trainingbatch
Strange result from trainigbatch
Strange result from trainingbatch

It looks like it intrinsically learned some strange wrapping function that turns the face segmentations into caricature face segmentations or child drawings