Proposing A Route Recommendation Algorithm For Vehicles Based On Receiving Video
Proposing A Route Recommendation Algorithm For Vehicles Based On Receiving Video
Proposing A Route Recommendation Algorithm For Vehicles Based On Receiving Video
Phat Nguyen Huu1, Phuong Tong Thi Quynh1, Thien Pham Ngoc1, Quang Tran Minh2,3
1
School of Electrical and Electronics Engineering, Hanoi University of Science and Technology (HUST), Hanoi City, Vietnam
2
Faculty of Computer Science and Engineering, Ho Chi Minh City University of Technology (HCMUT), Ho Chi Minh City, Vietnam
3
Vietnam National University Ho Chi Minh City (VNU-HCM), Ho Chi Minh City, Vietnam
Corresponding Author:
Phat Nguyen Huu
School of Electrical and Electronics Engineering, Hanoi University of Science and Technology (HUST)
Hanoi, Vietnam
Email: [email protected]
1. INTRODUCTION
The population and traffic and transportation demands are increasing, especially in big cities [1], [2].
This causes serious regional traffic jams in urban areas of our countries. Traffic congestion is still a problem not
only in Vietnam but also in major cities around the world. This situation leads to many unfortunate consequences
such as economic development, environmental pollution, and especially social and security problems. Therefore,
this is an issue that needs to be solved with a high priority in our sustainable development plans.
Currently, many systems that detect traffic status and navigate users to avoid congestion are being
widely applied around the world such as Google Map, Map, and Waze. In Vietnam, the research and
development of similar systems have also received much attention. The most recent can be mentioned as
Utraffic-An urban traffic congestion warning system based on data from the community based on analysis of
historical data of traffic conditions [3], community data sources [4], and urban traffic conditions from
crowdsourced data [1]. Currently, there is a system being deployed for the user community in Ho Chi Minh
city. The system collects traffic data from multiple sources and communities through a mobile application. It
analyzes the data and applies machine learning techniques to estimate and predict traffic conditions.
We have found that collecting data from the community is a pretty cool and useful solution. Its
disadvantage is to take a lot of time to aggregate and analyze data from many different sources. Therefore, we
propose a system to detect traffic status in the urban transport network and suggest routes to avoid congestion,
and find the shortest path for road users with extracted data from the camera without accessing user data. To
solve the problem, we design a system to detect congestion points in the urban traffic network and propose the
shortest and most convenient way to avoid congestion for traffic participants. The proposed system has two
new features. Firstly, we use the you only look once (YOLOv5) model based on [5], which is a new model for
vehicle detection and traffic status determination based on videos extracted from cameras at intersections.
Second, we apply a vehicle dataset collected in Vietnam to retrain the YOLOv5 model to improve detection
performance in real-time applications. The paper builds a real-time algorithm for displaying and detecting
traffic conditions at intersections accurately and to propose optimal routes to help avoid traffic jams for users.
The rest of the paper includes five parts. In section 2, we present several related works. The section 3
proposes the route recommendation algorithm. In the section 4, we will perform the algorithm to evaluate and
analyze the results. The final section gives conclusions and future work.
2. RELATED WORK
Currently, there are many methods to determine the traffic condition at a point such as counting the
number of vehicles, classifying vehicles, calculating vehicle speed, and vehicle density, calculating the area
occupied by vehicles on the road, classifying images from surveillance cameras. Supporting technologies in
this process include convolutional neural network (CNN) models such as region - convolutional network (R-
CNN) [5], deep convolutional neural network (DCNN) [6], Fast R-CNN [7], and Faster R-CNN [8]. The
models have been proposed and achieved many positive results when applied in traffic congestion detection.
In [9], the authors use a selective search method to select the candidate regions among possible regions. In [5],
they use the R-CNN model because of its candidate regions. In [7], the Fast R-CNN model suggested a less
number of candidate regions. However, the using algorithm is not able to learn from the context. In [8], the
authors use Faster R-CNN. However, it is difficult to detect objects for real-time applications.
In [10], an intelligent traffic congestion system (CNN model) is introduced by leveraging image
classification methods. It uses 1000 images to train for road traffic conditions. The authors just resized and
converted the 100-100 grayscale images. This model is proposed to be deployed in a future congestion detection
system using closed circuit television (CCTV) cameras that record images on specific locations in real-time.
In [11], the authors use a support vector machine (SVM) and two different deep learning techniques
(YOLO and DCNN) to compare the accuracy in classifying congestion images from surveillance cameras. The
entire image extracted from the camera. To avoid overfitting, they use DCNN models and millions of images
to train. To solve the problem, the authors used SVM model for both the data augmentation method and
dropping out. They use oriented fast and rotated brief (ORB) detection tools to detect key points of each image.
It then determines the top N points based on the angular distance Harris. Currently, you only look once (YOLO)
model [12] is being used to detect traffic that predicts based on the bounding boxes. In [13], the author uses
the YOLOv3 model [14] in combination with the Lucas-Kanade method (LK) [15] to identify the vehicles in
the region of interest (RoI) and calculate the speed of vehicles. Therefore, it is possible to determine the traffic
status at urban intersections as illustrated in Figure 1.
In the Figure 1, RoI is selected to crop the entire image to improve processing speed and accuracy
when recognizing images. The obtained RoI mask is detect based on a binary of original image. The vehicles
in the RoI were detected using the YOLOv3 model. The four peaks of the bounding boxes obtained by
YOLOv3 are optical stream inputs for vehicle speed tracking and calculation. Traffic status will be determined
based on the travel speed of the vehicle. The algorithm indicates that if the rate is less than a specified threshold,
it will be considered congested. However, the vehicle speed will be very low during the red-light waiting
period, and thus it is difficult to distinguish the traffic jam. Therefore, the authors have chosen the signal light
period to distinguish the continuous speed and determine the final traffic state. This method also achieves
positive results when compared with kernel based fuzzy c-means clustering algorithm (KFCM) [16] and
Bayes [17] algorithms. In the context of traffic in Vietnam, the method is not suitable in several cases such as
passing a red light or moving vehicles earlier than the time to change the signal and it takes time to wait for
one signal cycle to measure vehicle speed. Our recommendation system uses the YOLOv5 model to detect and
calculate the number of vehicles on the RoI for higher accuracy than the YOLOv3 model. The problem of
congestion identification is also made simpler by analyzing the variability of the obtained data after using
YOLOv5.
3. PROPOSAL SYSTEM
3.1. Overview
Currently, many traffic congestion avoidance routing systems have been deployed and shown good
results such as Google Map [18], congestion prediction and navigation models based on dynamic traffic
networks and balanced Markov chains [19], or a dynamic vehicle navigation system using positioning for
mobile phones [20]. Instead of using GPS user positioning to collect data for congestion detection like the
systems, our proposed system has the following points. In congestion prediction, we utilize live data from
surveillance cameras at intersections. We then apply the YOLOv5 model to analyze the videos to detect and
calculate and determine its status. In the routing part, we apply the A* algorithm to find the optimal path after
removing the congestion points on the map. Figure 2 is the proposed system.
The overview of the proposed system will include two modules with four main functions. In the
module 1 (Traffic condition detection) includes three parts, namely detecting and counting vehicles, and
predicting traffic condition. Detecting vehicle will detect and classify vehicles. Counting vehicles will calculate
the number of vehicles collected at the predefined RoI. Predicting traffic condition will identify traffic
congestion based on the average number and the fluctuation of vehicles in the RoI. In the module 2 (Routing),
the analyzed traffic status data at the intersections are then updated on the urban traffic map. It will then perform
the algorithm to find the most optimal path and avoid going through congested nodes. The input to the system
is videos extracted from cameras at traffic intersections and the system output is one or more suitable paths.
Proposing a route recommendation algorithm for vehicles based on receiving video (Phat Nguyen Huu)
1490 ISSN: 2252-8938
mostly do not move. The average volume of vehicles in the common traffic state will be between smooth and
congested volumes with higher variability due to the inter-vehicle movement in the RoI area with slow traffic.
Traffic condition is determined by two factors, namely the average number of vehicles per frame and variability
(CV) of vehicles entering the RoI. The thresholds for the mean number of vehicles and the variability are set
as M and CVԑ, respectively. These values will be determined as shown in Figure 4.
3500
Parameter of model
3000
2500
2000
1500
1000
FLOPs…
500 Parameters…
Speed…
0 Speed…
Speed…
mAPval…
mAPval…
size (pixels)
Network model
For the average number of vehicles (mean), Video is a collection of many frames that appear
consecutively, one after another. Assuming the input video of the system has n frames equivalent to n samples.
We can count xi cars for each frame. The average number of cars per frame ( X ) is calculated by (1),
1
𝑋̄ = ∑𝑖=𝑛
𝑖=1 𝑥𝑖 . (1)
𝑛
For the coefficient of variation (CV), the CV is used to determine the dispersion of data points to compare
the volatility of datasets with different mean values. The CV is calculated as,
𝜎
𝐶𝑉 = , (2)
𝜇
𝑖=𝑚
∑𝑖=1 (𝑥𝑖 −𝑋̅)2
𝜎=√ , (3)
𝑚−1
where m is the points in a dataset. The average value () has been calculated in (1).
4.3. Results
After running the test of the traffic detection module, we achieved several results. Calculation results
on average vehicle amounts, variability coefficients, and execution time of the traffic counting process in the
RoI area are given in Table 3. The result of the accuracy of the YOLOv5 model in detecting objects is relatively
high in two types of normal and slow traffic. The accuracy of the model is relatively low with congestion
traffic. YOLOv5 ignores several objects when they are adjacent or are partially obscured. We suggest to change
the higher camera rotation angle and pre-train the YOLOV5 model with datasets of vehicles in Vietnam to
solve this issue. Figure 5 shows the number of cars in the RoI.
In Figure 5, the diagram shows the vehicle traffic in the RoI area over time. The number of vehicles
remains low as shown in Figure 5(a) for normal traffic (Video1). The number of vehicles has a large variation
and the number of vehicles reached over 18 vehicles in the middle range. It has less than 10 cars at the first and
end period as shown in Figure 5(b) for slow traffic (Video3). It has high vehicles and maintains quite uniformly
between 13 and 15 vehicles as shown in Figure 5(c) for congestion traffic (Video5).
Table 3. Evaluate the parameters for testing with three types of traffic
Video Mean CV Processing
time (second)
Normal Video 1 6.867 0.302 26.294
Video 2 2.521 0.511 18.506
Slow traffic Video 3 11.410 0.292 24.029
Video 4 15.951 0.350 25.762
Traffic Video 5 13.738 0.133 25.741
congestion Video 6 17.60 0.126 24.966
Proposing a route recommendation algorithm for vehicles based on receiving video (Phat Nguyen Huu)
1492 ISSN: 2252-8938
(a) (b)
(c)
Figure 5. Result of vehicle traffic through the RoI area for; (a) video 1, (b) video 3, and (c) video 5
Table 4. Evaluate the parameters for testing with three types of traffic
Video Mean CV Processing time (second) Traffic status Results Average accuracy (%)
Type 1 Video 1 2.590 0.515 22.790 Normal True
Video 2 2.583 0.829 22.872 Normal True
Video 3 0.885 0.925 24.751 Normal True
Video 4 1.393 0.684 25.893 Normal True
Type 2 Video 1 20.129 0.260 24.405 Slow traffic True
Video 2 40.393 0.088 24.333 Traffic congestion False
91.67%
Video 3 14.295 0.223 25.778 Slow traffic True
Video 4 13.647 0.256 21.636 Slow traffic True
Type 3 Video 1 16.450 0.180 25.383 Traffic congestion True
Video 2 33.355 0.179 27.016 Traffic congestion True
Video 3 33.295 0.127 24.707 Traffic congestion True
Video 4 37.672 0.085 23.545 Traffic congestion True
5. CONCLUSION
The main purpose of this work is to build an application that suggests appropriate routes/ways in urban
traffic. It is worth noticed that this paper mainly focuses on traffic situation awareness for the routing. A new
model, namely YOLOv5, is utilized to detect vehicles and then determine traffic conditions based on videos
extracted from traffic cameras. Besides, we use the vehicle dataset collected in Vietnam to retrain the YOLOv5
model to improve the detection performance in real applications. In the future, we will take the steps to improve
accuracy of the YOLOv5 model which can be deployed on Web/App platforms for real world applications.
ACKNOWLEDGEMENTS
This research is funded by Vietnam National University Ho Chi Minh City (VNU-HCM) under grant
number NCM2021-20-02.
REFERENCES
[1] H. Mai-Tan, H. N. Pham-Nguyen, N. X. Long, and Q. T. Minh, “Mining Urban Traffic Condition from Crowd-Sourced Data,” SN
Computer Science, vol. 1, no. 4, 2020, doi: 10.1007/s42979-020-00244-6.
[2] Q. T. Minh, E. Kamioka, and S. Yamada, “CFC-ITS: Context-Aware Fog Computing for Intelligent Transportation Systems,” IT
Professional, vol. 20, no. 6, pp. 35–44, 2018, doi: 10.1109/MITP.2018.2876978.
[3] H. M. Tan, H. N. Pham-Nguyen, Q. T. Minh, and P. Nguyen Huu, “Traffic Condition Estimation Based on Historical Data
Analysis,” ICCE 2020 - 2020 IEEE 8th International Conference on Communications and Electronics, pp. 256–261, 2021, doi:
10.1109/ICCE48956.2021.9352107.
[4] Q. Tran Minh, H. N. Pham-Nguyen, H. Mai Tan, and N. Xuan Long, “Traffic Congestion Estimation Based on Crowd-Sourced
Data,” Proceedings - 2019 International Conference on Advanced Computing and Applications, ACOMP 2019, pp. 119–126, 2019,
doi: 10.1109/ACOMP.2019.00026.
[5] K. Li and L. Cao, “A review of object detection techniques,” Proceedings - 2020 5th International Conference on Electromechanical
Control Technology and Transportation, ICECTT 2020, pp. 385–390, 2020, doi: 10.1109/ICECTT50890.2020.00091.
[6] N. Aburaed, A. Panthakkan, M. Al-Saad, S. A. Amin, and W. Mansoor, “Deep Convolutional Neural Network (DCNN) for Skin
Cancer Classification,” ICECS 2020 - 27th IEEE International Conference on Electronics, Circuits and Systems, Proceedings,
2020, doi: 10.1109/ICECS49266.2020.9294814.
[7] R. Girshick, “Fast R-CNN,” Proceedings of the IEEE International Conference on Computer Vision, vol. 2015 International
Conference on Computer Vision, ICCV 2015, pp. 1440–1448, 2015, doi: 10.1109/ICCV.2015.169.
[8] S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks,”
IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, no. 6, pp. 1137–1149, 2017, doi:
10.1109/TPAMI.2016.2577031.
[9] J. R. R. Uijlings, K. E. A. Van De Sande, T. Gevers, and A. W. M. Smeulders, “Selective search for object recognition,”
International Journal of Computer Vision, vol. 104, no. 2, pp. 154–171, 2013, doi: 10.1007/s11263-013-0620-5.
[10] J. Kurniawan, C. K. Dewa, and Afiahayati, “Traffic Congestion Detection: Learning from CCTV Monitoring Images using
Convolutional Neural Network,” Procedia Computer Science, vol. 144, pp. 291–297, 2018, doi: 10.1016/j.procs.2018.10.530.
[11] P. Chakraborty, Y. O. Adu-Gyamfi, S. Poddar, V. Ahsani, A. Sharma, and S. Sarkar, “Traffic Congestion Detection from Camera
Images using Deep Convolution Neural Networks,” Transportation Research Record, vol. 2672, no. 45, pp. 222–231, 2018, doi:
10.1177/0361198118777631.
[12] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You only look once: Unified, real-time object detection,” Proceedings of the
IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 2016-December, pp. 779–788, 2016, doi:
10.1109/CVPR.2016.91.
[13] X. Yang, F. Wang, Z. Bai, F. Xun, Y. Zhang, and X. Zhao, “Deep learning-based congestion detection at urban intersections,”
Sensors, vol. 21, no. 6, pp. 1–14, 2021, doi: 10.3390/s21062052.
[14] J. Redmon and A. Farhadi, “YOLOv3: An Incremental Improvement,” 2018, [Online]. Available: http://arxiv.org/abs/1804.02767.
[15] A. Ranjan and M. J. Black, “Optical flow estimation using a spatial pyramid network,” Proceedings - 30th IEEE Conference on
Computer Vision and Pattern Recognition, CVPR 2017, vol. 2017-January, pp. 2720–2729, 2017, doi: 10.1109/CVPR.2017.291.
[16] Z. Z. Y. Liu, F. Liu, T. Hou, “Fuzzy C-means clustering algorithm to optimize kernel parameters,” J. Jilin Univ, vol. 46, pp. 246–
251, 2016, doi: 10.1109/ICCIMA.2003.1238099.
[17] S. Wang, W. Huang, and H. K. Lo, “Traffic parameters estimation for signalized intersections based on combined shockwave
analysis and Bayesian Network,” Transportation Research Part C: Emerging Technologies, vol. 104, pp. 22–37, 2019, doi:
10.1016/j.trc.2019.04.023.
[18] J. Cui and X. Wang, “Research on Google map algorithm and implementation,” Journal of Information and Computational Science,
vol. 5, no. 3, pp. 1191–1200, 2008.
[19] Y. Zheng, Y. Li, C. M. Own, Z. Meng, and M. Gao, “Real-time predication and navigation on traffic congestion model with
equilibrium Markov chain,” International Journal of Distributed Sensor Networks, vol. 14, no. 4, 2018, doi:
10.1177/1550147718769784.
Proposing a route recommendation algorithm for vehicles based on receiving video (Phat Nguyen Huu)
1494 ISSN: 2252-8938
[20] A. Shahzada and K. Askar, “Dynamic vehicle navigation: An A* algorithm based approach using traffic and road information,”
ICCAIE 2011 - 2011 IEEE Conference on Computer Applications and Industrial Electronics, pp. 514–518, 2011, doi:
10.1109/ICCAIE.2011.6162189.
[21] A. Bochkovskiy, C.-Y. Wang, and H.-Y. M. Liao, “YOLOv4: Optimal Speed and Accuracy of Object Detection,” 2020, [Online].
Available: http://arxiv.org/abs/2004.10934.
[22] G. Jocher, A. Stoken, J. Borovec, and et al., “Ultralytics/YOLOv5: v5.0 - YOLOv5-P6 1280 models, AWS, supervisely and youtube
integrations,” 2021, doi: 10.5281/zenodo.4679653.
[23] T. Y. Lin et al., “Microsoft COCO: Common objects in context,” Lecture Notes in Computer Science (including subseries Lecture
Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 8693 LNCS, no. PART 5, pp. 740–755, 2014, doi:
10.1007/978-3-319-10602-1_48.
[24] G. Jocher, “ultralytics / YOLOv5,” 2021, [Online]. Available: https://github.com/ultralytics/YOLOv5.
[25] R. P. N. Rao and D. H. Ballard, “Predictive coding in the visual cortex: A functional interpretation of some extra-classical receptive-
field effects,” Nature Neuroscience, vol. 2, no. 1, pp. 79–87, 1999, doi: 10.1038/4580.
BIOGRAPHIES OF AUTHORS
Phat Nguyen Huu received his B.E. (2003), and M.S. (2005) degrees in
Electronics and Telecommunications at Hanoi University of Science and Technology
(HUST), Vietnam, and a Ph.D. degree (2012) in Computer Science at Shibaura Institute of
Technology, Japan. Currently, he lecturer at the HUST Vietnam. His research interests
include digital image and video processing, wireless networks, ad hoc, and sensor networks,
intelligent traffic systems (ITS), and the internet of things (IoT). He received the best
conference paper award in SoftCOM (2011), the best student grant award in APNOMS
(2011), and the hisayoshi yanai honorary award by Shibaura Institute of Technology, Japan
in 2012. He can be contacted at email: [email protected].