Project

Vietnam Traffic Signs Detection using Faster RCNN

Created a new Vietnam Traffic Signs Dataset and trained Faster RCNN model on 28 classes.

Project Details

Role
Student Researcher, Team Lead
Team Size
3
Duration
Sep 2017 - Dec 2017

Links

Overview

Object detection is a critical task in computer vision, enabling machines to identify the position and type of objects within an image. Addressing the need to improve traffic solutions in Vietnam, this project focuses on detecting road traffic signs. This initiative is part of a larger effort to implement self-driving cars, aiming to reduce human errors in complex traffic environments like those in Vietnam.

Fun Fact

This was a student project I undertook in 2017. It marked one of my very first forays into Machine Learning and Computer Vision, sparking my enthusiasm to pivot into this field.

The Challenge

Performing detection on images or videos taken at road traffic scenes is essential to indicate where traffic signs appear and what type they are. Given the complex traffic situation in Vietnam, it is difficult for drivers to simultaneously focus on the road and observe traffic signs. There have been numerous cases of traffic violations and accidents due to missed signs. Therefore, a tool to notify drivers of traffic signs is necessary. This project involved building a dataset on Vietnam traffic signs and demonstrating machine learning methods on this dataset.

Faster RCNN Method

This project utilizes Faster RCNN, a method that combines a Convolutional Neural Network (CNN) to extract image features with a Region Proposal Network (RPN) to suggest potential object locations.

Key Steps

Feature Extraction: Using a CNN (specifically the ZF model due to hardware limitations) to build a feature map from the entire image.
Region Proposal Network (RPN): A sliding window approach over the feature map to generate proposal regions (anchors) and calculate objectness scores and bounding box regression.
Classification: Cropping the feature map based on RPN proposals and performing classification to identify the traffic sign class and refine the bounding box.

Dataset Construction

A significant portion of this project was dedicated to creating the Vietnam Traffic Sign Dataset (VNTSDB).

Collection Process

Method: Recorded video of routes in Ho Chi Minh City using motorbikes.
Annotation: Developed a custom tool to extract images from videos and annotate them with sign IDs, flags, and bounding boxes.

Dataset Statistics

Total Classes: 69
Total Images: Varied per class (Average 22 images/class, Max 164, Min 1)
Format: JPG

To facilitate the use of existing tools, a converter was built to transform VNTSDB to the German Traffic Sign Dataset (GTSDB) format.

Official Dataset for Training

For the Faster RCNN experiment, a subset of the data was used:

Classes: 28
Training Images: 424 panoramic street scenes
Testing Images: 106 panoramic street scenes
Configuration: Resized to 1280x720, 80-20 train-test split.

Results

The model was trained for 70,000 iterations using the ZF architecture.

Model Accuracy: Evaluated mean accuracy over 28 classes
Analysis:
- Signs with more training data (e.g., 130, 131a, 102) showed higher accuracy.
- Confusion occurred between visually similar signs (e.g., 130 vs 131a).
- Some classes with few images or complex details (e.g., 205d, 208) had 0% accuracy.

Conclusion & Future Work

The project successfully established a baseline for Vietnam traffic sign detection. While the accuracy highlights the difficulty of the task given the limited dataset and hardware constraints at the time, it paved the way for further research.

Improvement Directions:

Utilizing deeper CNN backbones (VGG16, ResNet-101).
Optimizing RPN anchor sizes for small objects.
Expanding the dataset with more collected images.

Key Achievements

Created a new Vietnam Traffic Signs Dataset by manually recording and annotating
Trained Faster RCNN Detection model on 28 traffic sign classes
Published source code and dataset for educational purposes

Technologies Used

Python 2.7Faster RCNNCaffeCUDACuDNN

Skills Applied

Object DetectionComputer VisionDataset Creation