Damaged car estimation model


I am working in insurance company, so I have a task to build damaged car recognition and estimation model. I have to predict how much it costs to repair the car. Currently I am in the middle of the third lesson of this course, so if your advice will be from the next lessons please make a note about it.

I think we have to divide the task into several parts.

  1. we have to find the location of the damaged parts.
  2. we have to estimate the repairing cost for each part.
  3. we have to sum it up.

I’ve researched a lot about this task and I’ve found this paper (https://www.ee.iitb.ac.in/student/~kalpesh.patil/material/car_damage.pdf) which I thought would be a good starting point. The main idea is to divide the damage into types like Bumper Dent, Door Dent, Glass Shatter and so on ( you can see the example below) and train the classifier. We are working on the task 1 to find locations of the damaged parts.

In the paper there is 8 types (output) but I decided to take 3 just for testing. So in my case I have Head lamp broken, Tail lamp broken and No damage.

Back Lamp Broken - 80 picture (Train set) 31 picture (Valid set)
Head Lamp Broken - 145 picture (Train set) 52 picture (Valid set)
No Damage - 243 picture (Train set) 81 picture (Valid set)

The examples of the training set of each type is here.
All the original images have 480 x 640 resolution, I crop the damaged parts with 320 x 320 resolution (+ - 20 )

Head Lamp Broken

Back Lamp Broken

No Damage

I’ve used lesson 1 material. The final results are below ( I used resnet34). The Accuracy is 0.926829268292683


Now if we want to localize the damaged part, As the paper says, "For each pixel in the test image, we crop a region of size 100×100 around it, resize it to 224×224 and predict the class posteriors. A damage is considered to be detected if the probability value is above certain threshold (0.9) " I think for each pixel would be too much so I took 100x100 square and shift 20 pixel each time (sliding window), then I am predicting the result for each 100x100 image and if the probability is more than 0.9 I’m saving the image.

below there are some examples of this process

Starting Image :

100x100 images prob > 0.9

Final result to localize the damaged parts:

In this case it works fine, but in the list of 100x100 pictures the 4th picture is not damaged but the probability is 0.95, Also taking into consideration that this image was not the original (480 x 640) and was the croped one( 320x320) the result might not be that good. so lets take a look 480 x 640 case.

First Case :
Starting image :

100x100 images prob > 0.9

final image localization :


In this case it works fine, but there is still error the snow on the ground has prob > 0.9

Second Case :

100x100 images prob > 0.9 :

result localization :

As you can see in the second case the result is worse, It can’t detected the broken Lamp but detect the unbroken middle part of the car. In some cases I have even worse results but there are also good results …

So there are several tasks to do next. I have to add damaged types (The problem here is that in some picture there will me more than one damaged type), I have to collect more data, somehow I have to differentiate between good localized parts and bad localized parts (but both of them have >0.9 probability and I don’t know how) and I have to start the second task which is to Estimate the repairing costs (I have no idea how to do this).

So any tips, advice and help from those who have some experience would be great for me. Thanks in advance !

(Kevin Bird) #2

Would it be possible to gather more data using a site like Mechanical Turk to outsource the determination of damaged parts? That could help you get a more robust dataset. It looks like you’re on a good track, I’m surprised you have that accurate of a model, nice job!

(Benedikt Brandt) #3

I agree with KevinB. Your dataset is very small. Getting more data (at least 400 examples per category) is important. In the mean time, try heavy data augmentation (past what is available in fast.ai), i.e. take a look at https://github.com/aleju/imgaug and create several augmented versions of each of your images. That way you should have at least 1k images for each class to feed into fast.ai (ideally use imgaug to do augmentations that are different from the augmentations available in fast.ai).

Also how are you training the classifier that is used on the cropped (100x100) pictures? Can you be more specific on your training procedure for that classifier?


also check out lessons 8 and 9.

I guess you need look at multi-label output vector so that you output probabilities for each individual damage class.


thanks I will check it out. The Original images are 480 x 640. So I’ve manually croped 320x320 out of it. So my training set images are all 320 x 320 (± 20). I’ve trained the model on those images. Now when I am trying to find the damaged part location I am using 100x100 square, sliding it over those images (sliding on 320 x 320 predicts better result than sliding on 480 x 640, both example are above and if the probability is greater than 0.9 I am saving this 100x100 image (this is the location of damaged part) …

(Benedikt Brandt) #6

I think there might be an issue in this approach. Your model is trained on 320x320 crops and as you see by your confusion matrix it is doing a pretty good job at classifying. However then you ask it to make predictions on 100x100 crops. I see two problems with this:

  1. There is about 10 times less context in a 100x100 crop vs a 320x320 crop. Your neural net was able to see the damaged location and it’s surroundings (hood, wheel, part of the chassis) during training. Now you ask it to tell you whether a part is damaged when you feed it a picture that is a small part of the car (e.g. just the surface of the hood). This is a similar, but different problem from the one you trained your net on.

  2. When you are trying to localize the damaged location you are only interested in is damaged vs is not damaged. This is a binary classification problem. You can of course use a multi-label classifier, but it only makes things harder.

Imo you should build a training set of 100x100 crops and train a binary classifier on it. So you use your original neural net to classify what type of damage it is and then the binary classification neural net trained on the 100x100 crops to determine the location. The good news is you can create a large training data set for the 100x100 crops. With a stride of 20 you should get over a 100 cropped images for each of the original 320x320 crops.

However you definitely want to take a look at lesson 9 and 10 as dmto suggested. There are some techniques in there that you definitely want to try as well. You can skip over all the NLP stuff in the deep learning 1 course and only study the image classification related material. And then move to lesson 9 and 10 (= lesson 1 and 2 in dl 2).


Thanks, I will look over the lessons and test some cases including your advice. I will update this post later.