Today I was able to build a very basic traffic sign recognizer based on Neural Nets (using Selective Search and Deep Convolutional Neural Nets, so a basic RCNN2 approach). I used the annotated files of “The German Traffic Sign Detection Benchmark”1 to train my net. I implemented the net using the Deeplearning Framework Caffe. However after augmenting the training data (to get a larger dataset) and training the net, I got results like this image:
Obviously there are two big problems easy to see. The first problem is, that the net kind of finds 100 of “Round About” signs, even if the region selected is just black. The second problem is the wrong detection of speedlimit sign.
The explanation for finding everywhere this “Round About” sign was quite easy and fast to figure out. I just had to take a look at my Trainingset and I recognized the following two things:
- The whole trainingsset contains nearly no low-light images
- There are only eight different “Round About” signs in the whole dataset and one of these eight looks like this
Obviously this one of these eight images is nearly unreadable and quite dark. That plus the fact that there are nearly no dark images in the dataset makes the net go crazy and let it see everywhere a “Round About” sign. So I will try to fix this by enriching the dataset with my own collected data. Until now some friends of mine and me labeled all traffic signs in approximately 3000 images. There are still 2000 left to annotate. However I try to train a CNN with this not finished training data to see if the CNN is able to boost its performance. When we finished annotating and I checked the files for privacy-problems we will upload the annotated images and share for free in our dataset section.
The second mentioned problem was the problem that the net did not recognize the speed limit sign correctly (Limit 120 instead of 70). After having a look at my CNN architecture, the source of this problem seems obvious to me. My trained net was about 9 layers deep (3 Convolutional + 4 Inception Layers) but I reduced the size of the input image in layer one by factor of two (so 64×64 got 32×32). Then I applied pooling (32×32 –> 16×16). Then came the second layer. After this layer I added another pooling layer (8×8 –> 4×4). And last but not least after convolution layer three there were also one pooling layer (4×4 –> 2×2). You may ask: “Why did you design the net this way?” and the answer is quite easy: “I did not redesign this net, I just copied an old net of mine to this project and only changed the input-size and outputsize”. The images my other project got fed in were 200×200 so halving the size four times results in an image of 25×25 which was still enough for my other project (I wanted to get high performance). But this time this obviously will not work.
For tomorrow I plan to train my changed NN (overall two times pooling; 2 Convolutional + 3 Inception Layers but more filters per layer) and hopefully I am also able to generate some good first results.
I am really looking forward to my first tests with the new CNN architecture and the enriched training dataset! 🙂