Abstract + Introduction
Abstract + Introduction
Abstract + Introduction
Abstract:- In today's rapidly developing urban areas, vehicle congestion has become
a persistent issue, making traffic monitoring ,parking lot management and many other
challenging task. Traditional techniques such as loop detectors and ultrasonic sensors
not only struggle to effectively manage this congestion but also contribute to increased
costs. Therefore, the real-time vehicle counting system based on yolov7 has emerged as a
crucial tool for monitoring traffic congestion and managing parking lots, highway
traffic, smart city initiatives, and incident detection. This system serves two critical
functions: detecting and counting vehicles. Additionally, it includes features such as
sending alerts when the vehicle count surpasses a predetermined threshold and
generating reports that detail the number of vehicles present at specific time after every
interval. Key phases of this system involve computer vision, feature extraction,
categorization, and counting algorithms. The primary goal of this system is to provide
real-time vehicle and counting, thereby enabling informed decision-making. The paper
demonstrates the real-time calculation of vehicle detection and counting in videos using
the yolov7 model, which offers high speed and accuracy. By leveraging this technology,
it becomes possible to gain insights into traffic patterns and make well-informed
decisions based on real-time data. This innovative approach shows great potential in
tackling the challenges presented by urban traffic congestion, parking lot management,
incident detection, and more, ultimately enhancing overall management efficiency.
Deep Learning (DL) has demonstrated its superiority over traditional Machine Learning
(ML) algorithms in numerous tasks, particularly within the domain of Computer Vision (CV).
ML plays a pivotal role in CV by virtue of its capacity to discern patterns within images and
categorize objects captured by cameras. In the past, a CV system necessitated a preprocessing
and feature extraction step before it could effectively detect, classify, or recognize objects
within an image using ML algorithms. Different objects or scenarios required distinct
techniques in preprocessing and feature extraction, thereby constraining the capabilities of a
conventional CV model to detect or recognize only specific objects. In contrast, DL, with its
expansive and intricate networks, autonomously preprocesses and extracts image features
within its networks, subsequently classifying the image class and even detecting the precise
locations of individual objects within the image. However, it's important to note that DL
demands high-specification hardware and substantial amounts of data to train the networks
and optimize their performance.
The vehicle counting system is a prime example of an application that leverages Computer
Vision (CV) technology for various tasks, such as intelligent traffic light systems, parking lot
monitoring, smart city initiatives, and more. Each of these tasks begins with the crucial step
of detecting and counting each vehicle, underscoring the pivotal role of object detection and
counting algorithms in this domain. Object detection, a fundamental computer vision
technique, entails the identification and localization of objects within images or videos.
Unlike simple image classification, object detection not only recognizes the contents of an
image but also precisely pinpoints the location of each object. Typically, object detection
algorithms output the coordinates of a bounding box around the object, along with a label
identifying the type of object detected. This technology finds applications in diverse fields,
ranging from surveillance and security systems to autonomous vehicles and augmented
reality, serving as a foundational element in enabling machines to perceive and comprehend
their surroundings.
In the context of the vehicle counting system, the counting algorithm is utilized to accurately
tally the number of vehicles present at a specific instance within a designated area. By
integrating sophisticated algorithms and camera systems, this counting process yields
valuable insights into traffic flow, parking lot occupancy, and overall transportation patterns.
By accurately detecting and counting objects, computer vision systems can gather
information to make informed decisions and take appropriate actions. These techniques find
applications in various fields, including robotics and autonomous vehicles, enabling these
systems to operate efficiently and effectively. YOLOv7 stands out as a popular algorithm for
object detection due to its speed, accuracy, flexibility, and user-friendly nature. However,
other algorithms, such as Faster R-CNN and SSD, may be more suitable for specific
applications. For instance, Faster R-CNN might be better suited for applications prioritizing
higher accuracy at the expense of speed, while SSD could be more appropriate for real-time
applications requiring low latency. Ultimately, the selection of an object detection algorithm
hinges on the specific requirements of the application, encompassing factors such as desired
speed, accuracy, and the categories of objects to be detected. Evaluating the performance of
different algorithms on the target dataset and application requirements is crucial for making
an informed choice.
Acknowledged as a pivotal addition to the arsenal of tools aimed at urban and highway traffic
analysis and planning [1], computer vision's forte encompasses the realm of vehicle detection
and recognition, offering a plethora of possibilities within the domain of automated driving.
However, the challenges inherent in real-time image acquisition via onboard cameras on road
vehicles are notably influenced by various factors, including camera angles and inter-object
distances.
The conventional trajectory of employing Traditional Machine Learning approaches
mandates preprocessing methods to fulfill the task's requisites. Techniques such as image
greyscale conversion, binarization, and background subtraction, at times complemented by
edge detection, are deployed. Yet, this approach encounters limitations; instances such as the
presence of shadows cast by vehicles can impede precise detection. Similarly, alterations in
road surfaces due to repairs, damage, or the presence of obstacles disrupt the image
subtraction process, leading to inaccuracies in detection.
Contrarily, the advent of Deep Learning (DL) heralds a paradigm shift by affording a more
adaptable performance sans the need for extensive image preprocessing or feature extraction
through multiple methodologies. Although computationally intensive, the DL approach
transcends these preprocessing demands, albeit requiring substantial volumes of data for
network training. Moreover, the evolution of DL architectures, trained on extensive datasets
encompassing millions of instances, has significantly facilitated the development of
Computer Vision (CV) systems, rendering the process more streamlined and accessible.
The advent of Deep Learning has significantly reshaped the landscape, empowering computer
vision systems with greater adaptability and resilience, thereby mitigating some of the
traditional challenges encountered in vehicle detection and recognition within dynamic urban
and highway environments.
The initial concept of the algorithm employed in this paper is rooted in the RCNN (Region-
Based Convolutional Neural Network) algorithm [7]. This algorithm serves as the foundation
for enhancing the YOLO (You Only Look Once) method by adopting the mechanism of
Faster-anchor RCNN. In a manner similar to YOLOv3, the algorithm generates anchors at
multiple scales on the feature map and creates prior boxes on the feature map for making
predictions [8]. Currently, the YOLO method stands out as the most widely used technique
for target identification and finds extensive application in industrial manufacturing. Hence, in
this paper, the YOLO algorithm framework is chosen as the basis for implementation.
According to Wang et al. in 2022 [9], this algorithm has achieved remarkable performance
with YOLOv7. However, there is still room for improvement in terms of the algorithm's
detection precision.
The primary objective of this study is to bridge the gaps and overcome the limitations
observed in previous research by developing a real-time vehicle detection and counting
system based on the YOLOv7 model. To achieve this, novel techniques or enhancements
will be incorporated to enhance the system's performance and robustness in real-world
scenarios. The study aims to provide a comprehensive solution that integrates accurate
vehicle detection and reliable counting in real-time video streams.
The proposed system holds significant potential across various domains, including
transportation management, traffic monitoring, and surveillance applications. By offering
real-time insights into vehicle movement patterns and traffic congestion, as well as providing
accurate vehicle counts, the system can greatly contribute to better decision-making and
resource allocation. These insights can aid in identifying potential traffic violations and
improving overall road safety.
Furthermore, the outcomes of this study have the potential to enhance the efficiency of traffic
management systems and facilitate urban planning. By accurately detecting and counting
vehicles, the system can assist in optimizing traffic flow, implementing effective safety
measures, and supporting the development of well-planned urban infrastructures.
By critically assessing the identifying the gaps and limitations in previous research, our study
aims to contribute to the field of real-time vehicle detection and counting based on the
YOLOv7algorithm. The proposed enhancements, novel techniques, and comprehensive
evaluation of the system can potentially address the identified limitations and provide a more
accurate and robust solution for real-world scenarios.
Literature survey: -
Before working on our research, we have reviewed some works that have been done related
to our research.
Numerous methodologies have been proposed for vehicle detection and counting, each
offering distinct approaches to tackle this complex task. Li et al. [1] introduced a real-time
system that encompasses several stages. Initially, an adaptive background subtraction
technique identifies moving objects within video frames. Subsequent binarization and
morphological operations refine the foreground area, eliminating noise and shadows. To
prevent over segmentation, the resulting foreground image is combined with the frame's edge
image before undergoing hole filling. The system then employs a virtual road-based detector
for vehicle identification and counting, followed by blob tracking across frames to monitor
vehicle movement.
In a similar vein, Bhaskar and Yong [2] devised a method employing Gaussian mixture model
(GMM) and blob detection. GMM models the background, extracting foreground pixels
based on Mahalanobis distance. Morphological operations are applied for noise removal and
blob aggregation, subsequently facilitating blob analysis for vehicle identification. Counting
and tracking mechanisms are then employed for comprehensive vehicle monitoring.
Contrastingly, Kryjak et al. [3] engineered a hardware-software system tailored for road
intersections. This system diverges from conventional techniques like background subtraction
and optical flow, opting instead for similarity measurements in consecutive frames. Patch
analysis is employed to detect vehicles at red signals, enabling effective vehicle counting.
Liu et al. [4] introduced a real-time counting method centered on virtual detection lines and
spatio-temporal contour techniques. Leveraging GMM, moving foreground pixels along the
detection line are detected, enabling the construction of vehicle contours across multiple
frames in the spatio-temporal domain. These contours are meticulously analyzed to ascertain
the number of vehicles present.
Additionally, shadow detection and removal algorithms have been proposed in [5,6],
highlighting a specialized area of focus within the realm of vehicle detection. These
algorithms contribute to refining the accuracy and precision of vehicle counting systems by
addressing shadow-related challenges inherent in image processing.
Each of these methodologies represents a distinct approach, showcasing the diversity and
innovation within the field of vehicle detection and counting. Their individual strengths and
novel techniques contribute to the ongoing advancements in this domain.
In a research study conducted by Md Abdur Rouf, Qing Wu, and their team, they scrutinized
the conventional methods used for vehicle detection, such as RADAR, LiDAR, RFID, or
LASAR. These methods have some serious drawbacks—they're not only slow but also quite
expensive and require a lot of human effort. Moreover, these techniques have limitations
when it comes to accurately classifying different types of vehicles or gathering detailed data
about them, like how many vehicles of each kind are on the road and in which direction they
are moving.
The researchers, instead of relying on these traditional methods, worked on enhancing a more
sophisticated algorithm called RCNN (Region-based Convolutional Neural Network). They
modified and improved the YOLO (You Only Look Once) technique by adopting some of the
mechanisms from Faster-anchor RCNN, creating a more efficient and accurate approach for
vehicle detection. YOLO, a prevalent method widely used in industrial settings for
identifying targets, served as the primary algorithm framework in their study.
To achieve this goal, the team analyzed how the computer's memory functions, looking into
ways to reduce the time it takes to process information. They explored aspects such as the
number of connections in the computer's architecture and how it handles various tasks to
make it faster and more efficient. One key aspect they noticed was the attention to activation,
a process crucial for how the computer interprets and reacts to data.
While focusing on enhancing the algorithm, they also critically evaluated past studies. They
discovered that some studies lacked comprehensive real-world testing, especially in scenarios
with different lighting conditions and environments. Many of these studies concentrated
solely on detecting vehicles and missed out on tracking and accurately counting them. This
gap in research highlighted potential limitations in handling challenges like obscured views,
changes in size, and coping with complex traffic scenarios.
By addressing these limitations, the researchers are striving to develop a smarter system that
not only identifies vehicles but also effectively monitors, tracks, and precisely counts them.
This innovative approach aims to make these smart systems more adaptable and reliable,
functioning accurately across diverse real-world situations.
In the research conducted by Aman Preet Singh Gulati, a sophisticated vehicle detection and
counting system was developed employing the robust OpenCV library and the efficient haar
cascade algorithm. This system was designed to accurately detect and count vehicles within
both images and video streams. Leveraging the comprehensive capabilities of OpenCV, the
study utilized a range of image processing operations crucial for this task. Particularly, the
implementation involved the use of specific car and bus haar cascade classifiers, which
proved instrumental in effectively identifying and enumerating cars and buses in the visual
data. Through this methodical approach, the system demonstrated a commendable ability to
discern and tally vehicles, showcasing its potential for diverse applications in traffic
management, surveillance, and beyond.
The research outlined in this paper emphasizes the implementation of YOLO3 (You Only
Look Once) for the pivotal tasks of vehicle detection, classification, and counting. The
primary objective was to accurately discern various vehicle types—specifically, cars, Heavy
Motor Vehicles (HMVs), and Light Motor Vehicles (LMVs) traversing a roadway. The
authors meticulously employed the capabilities of YOLO3, a state-of-the-art object detection
system, to achieve this multifaceted goal. By leveraging this advanced neural network
architecture, the study aimed to precisely identify and categorize diverse vehicles present on
the road while concurrently tallying the total count of vehicles navigating through the given
area. This research marks a significant stride in enhancing automated surveillance, traffic
monitoring, and transportation management systems by providing a robust methodology for
real-time vehicle analysis and enumeration.
In the research conducted by Atharva Musale, the focus was on employing advanced
algorithms for vehicle detection and tracking within visual data. Specifically, the study
utilized the YOLOv3 algorithm, renowned for its precision in object detection tasks, to
accurately identify vehicles within the given scenes. Furthermore, the implementation of the
Deep Sort Algorithm played a crucial role in tracking these detected vehicles over
consecutive frames. The integration of YOLOv3 and the Deep Sort Algorithm enabled not
only the identification but also the continuous monitoring and tracking of vehicles, ensuring
their trajectories could be followed across time and space within the video or image
sequences. This methodological combination highlights a sophisticated approach towards
comprehensive vehicle analysis, contributing significantly to enhanced surveillance systems,
traffic management, and logistical tracking solutions.
The paragraph discusses methods used in vehicle object detection, primarily divided into
conventional machine vision techniques and complex deep learning strategies. Traditional
techniques use a vehicle's motion to separate it from the background image, employing three
main methods: background subtraction, continuous video frame contrast, and optical flow.
These methods detect moving foreground areas by analyzing differences between frames or
the motion region in videos. Furthermore, vehicle detection methods utilizing features like
SIFT and SURF have been widely used, including 3D models for classification. Deep
Convolutional Neural Networks (CNNs) have significantly advanced vehicle detection by
learning image features and performing tasks like classification and regression. The detection
methods are broadly categorized into two: the two-stage method, involving candidate box
generation and classification using CNNs, and the one-stage method, directly converting
object positioning into a regression problem. Various models like R-CNN, SPP NET, R-FCN,
FPN, Mask RCNN, SSD, and YOLO, each with its unique approach, have improved feature
extraction, object selection, and classification capabilities of CNNs. However, while the
traditional machine vision techniques offer faster detection speeds, they struggle with
changing image conditions and complex scenarios. On the other hand, advanced CNNs may
struggle with scale changes and precise detection of small objects. The current methods also
lack precision in recognizing objects of different sizes belonging to the same category, and
the use of image pyramids or multi-scale input images, although effective, requires
significant computational resources. (Prof. Pallavi Hiwarkar, Damini Bambal, Rishabh Roy)
The research conducted by Ravula Arun Kumar, D. Sai Tharun Kumar, K. Kalyan, and B.
Rohan Ram Reddy focuses on developing an intelligent vehicle counting system to address
increased traffic congestion. This study explores various methods, including blob analysis,
background subtraction, image enhancement, sensor-based systems, image segmentation, and
pedestrian detection using neural networks. They've designed software that processes video
input to count vehicles by performing tasks like image segmentation, vehicle tracking,
detection, and blob analysis for traffic surveillance. Their approach also includes utilizing
convolutional neural networks (CNNs) for real-time vehicle counting with high accuracy.
Additionally, they've explored techniques like virtual coils and CNNs for precise vehicle
counting, especially on highways. The research emphasizes achieving high accuracy in
vehicle counting using background subtraction alongside virtual collectors and morphological
operations for tracking and counting vehicles on roads and highways.
Previous research predominantly focused on testing videos from highways frequented solely
by cars, buses, or trucks, omitting motorcycles. Notably, buses or trucks were generically
categorized as 'cars' without finer distinctions as 'bus' or 'truck' during counting. However,
certain traffic monitoring systems necessitate more detailed vehicle information, specifying
car, truck, bus, or motorcycle types. The earlier studies referenced in [6]–[9] were mostly
conducted in favorable traffic conditions with responsible driving behaviors to ensure
accurate counting. Consequently, our focus in this work is to develop a system that not only
counts vehicles crossing roads but also categorizes them—car, bus, truck, or motorcycle—
utilizing a Deep Learning algorithm employing YOLOv3 architecture.
The Karlsruhe Institute of Technology and Toyota Technological Institute at Chicago (KITTI)
designed for self-driving cars, offering extensive traffic scene images aiding in 3D object
detection. However, TRANCOS, another dataset capturing traffic jams through surveillance
cameras, suffers from occlusions and lacks vehicle type records, limiting its broader
application. Deep learning networks follow a two-step method involving candidate box
generation and subsequent sample classification, seen in algorithms like RCNN, Fast R-CNN,
and Faster R-CNN, known for improved feature extraction but slower detection. In contrast,
one-step methods like SSD efficiently locate objects with default anchors at different
resolutions, handling various scales, while the YOLO series divides images into grids for
swift object prediction, with YOLOv3 utilizing logistic regression and multiple scales for
rapid and accurate detection. Overall, training deep learning models using vehicle datasets
leads to highly performing vehicle detection models.
The research conducted by Huansheng Song, Haoxiang Liang, Huaiyu Li, Zhe Dai, and Xu
Yun delves into the realm of vehicle object detection, exploring both traditional machine
vision methods and the more intricate deep learning techniques.
The traditional approaches primarily rely on the motion exhibited by vehicles to distinguish
them from a static background image. This methodology encompasses three primary
categories: background subtraction, continuous video frame difference, and optical flow.
Each of these methods operates by detecting variations in pixel values across consecutive
frames or utilizing the motion regions in the video, allowing for the identification of moving
objects. These techniques, while foundational, come with limitations. They might struggle
with abrupt changes in lighting, scenes with periodic motion, or scenarios involving slow-
moving vehicles.
Conversely, the study emphasizes the prowess of Deep Convolutional Neural Networks
(CNNs) in the domain of vehicle detection. CNNs exhibit remarkable capabilities in learning
intricate image features and performing various tasks like classification and bounding box
regression. The research navigates through the landscape of detection methodologies,
categorizing them into two primary approaches: two-stage methods (such as R-CNN) and
one-stage methods (like YOLOv3).
Two-stage methods, represented by R-CNN, involve generating candidate boxes for objects
through intricate algorithms, followed by classification via a convolutional neural network.
Although these methods offer high precision, they are computationally intensive and demand
significant storage memory. On the other hand, one-stage methods, exemplified by YOLOv3,
directly convert object localization into a regression problem, resulting in faster detection
speeds. However, they may sacrifice some precision, especially concerning smaller objects or
intricate scenes.
The research underscores the trade-offs between these methodologies. Traditional methods
might offer faster detection but struggle in challenging conditions, while advanced CNNs
provide accuracy but face challenges in handling scale variations. It also highlights the need
for more adaptable and precise approaches that can handle a diverse range of scenarios.
To address these challenges, the study suggests leveraging multi-scale input images, enabling
the models to handle various object sizes and complexities. By incorporating this approach,
the research aims to overcome the limitations posed by traditional methods and enhance the
adaptability and precision of advanced CNNs in vehicle object detection.
Methodology:-
1. Data Collection and Preparation
For an accurate system, we start by collecting a diverse dataset of images or videos
containing various traffic scenarios. These datasets should encompass different weather
conditions, lighting variations, diverse vehicle types (cars, trucks, buses), and traffic
densities. Each image or video segment in the dataset needs careful annotation, marking the
locations of vehicles. This annotated data helps model to learn and identify vehicles
accurately.
Data Acquisition:
Capturing real-world footage or images using cameras or sensors in locations with varying
traffic densities is critical. These recordings should encapsulate different environmental
conditions, such as daylight, nighttime, adverse weather, or diverse traffic patterns like
congested urban roads or highways.
Annotation and Labeling:
The collected video frames or images need meticulous annotation or labeling. Each frame
requires a manual or automated process to identify vehicles and accurately count their
numbers. This annotation includes marking vehicles and specifying the vehicle count per
frame.
Gathering data for training the system to recognize scenarios with high vehicle density
accurately. This approach significantly influences the system's performance, reducing false
accuracy and ensuring the system reacts precisely to threshold breaches. Ultimately, this
robust training enables the AI system to reliably identify high-traffic instances in real-time,
enhancing the overall efficiency and reliability of the vehicle counting system.
2. Understanding YOLOv7
Analyze how YOLOv7 processes data and detects vehicles. This model identifies objects in
an image and outputs bounding boxes around them, along with their class probabilities. The
logic for vehicle counting is embedded within this process.
This involves grasping a clever way computers learn to recognize things like vehicles in
pictures or videos. YOLO stands for "You Only Look Once," and the "v7" refers to its version
number, which means it's a more advanced and improved version of the original YOLO.
Imagine a computer trying to spot and count different types of cars, trucks, or bikes in a busy
street photo. YOLOv7 is like a smart detective – it looks at the entire picture once and
quickly figures out where the vehicles are and what types they might be. It's fast, accurate,
and efficient.
The "You Only Look Once" part means it doesn't need to take lots of glances or look
repeatedly at different parts of the picture. Instead, it takes one glance and uses a network of
patterns it learned from many other images to instantly recognize the vehicles and count
them.
YOLOv7 is a bit like having a super-smart friend who's seen tons of cars, trucks, and bikes,
and can spot them in a picture without needing to check the same spot multiple times.
This version, "v7," is an improved and updated version of this smart system. It's even better
at recognizing vehicles accurately and quickly than the previous versions. It's learned from
more pictures, refined its skills, and now it's faster and more accurate in identifying vehicles
in pictures or videos.
3. Model Training
Model training is like teaching a computer to recognize when there are lots of vehicles in a
picture or video. To do this, we're tweaking the computer's learning so that when it sees too
many vehicles, it does something specific. This involves changing the computer's instructions
so that it can notice when there's a high number of vehicles and then start the system that
alerts or makes reports about it. It's like giving the computer a new skill – noticing crowded
streets and taking action when there are too many vehicles.
model training based on labeled data is a fundamental aspect of supervised learning in
machine learning. Labeled data refers to a dataset where each input (such as an image or a
video frame) is paired with its corresponding output label (like the count of vehicles in that
image). For vehicle counting, this means having images or frames where the number of
vehicles has been manually annotated or marked.
Model training is essentially teaching an AI system to recognize specific patterns or
situations. In this case, it's about making the system understand what a scene looks like when
there are a lot of vehicles present. To train the system, we use a lot of examples—pictures or
video frames that show different scenarios with varying vehicle counts.
The computer learns from these examples. It figures out what details and features indicate a
high number of vehicles. It might notice things like dense clusters of cars, long queues, or
crowded intersections. These examples help the AI understand the context in which the
vehicle count is considered high.
Once the computer learns these patterns, we adjust its instructions to make it respond when it
recognizes this scenario. It's like saying, "Hey, when you see lots of vehicles, let me know."
This involves tweaking the system's code so that it can detect these situations accurately.
Integration and deployment mark the stage where all the parts come together. We take our
well-tested email alert and reporting features and combine them into the actual live system.
It's like assembling different pieces of a puzzle to create the complete picture.
We carefully place the email alert and reporting mechanisms into the system's framework,
making sure they fit seamlessly without causing any disruptions. This integration step is
critical because it ensures that these new components work smoothly with the existing
system.
But before this goes live, we conduct thorough checks to ensure everything is compatible and
functioning as expected within the real-time environment. It's a bit like trying out a new
component in a machine to see if it works well with the rest and doesn't throw off the entire
system.
The goal here is to make sure that when the system goes live, these added features of email
alerts and reporting operate harmoniously, enhancing the system's capabilities without
causing any glitches or slowdowns. This integration and compatibility check are essential to
guarantee a seamless and efficient system once it's up and running.
9. Maintenance and Updates
Maintenance and updates are like taking care of a garden. Once the system is running, it
needs constant attention to stay in top shape. We keep an eye on how well it's working—sort
of like checking plants for signs of wilting.
Regularly, we peek at the system's performance, checking if it's doing its job properly. If we
notice any hiccups, like the system missing its alert mark or couldn’t detect the vehicle, we
step in and make tweaks. It's akin to adjusting watering times if the plants seem thirsty or
flooded.
Also, just like how new flowers are planted in a garden to keep it vibrant, we update our
system regularly. This might involve adding new features or improving existing ones to keep
it up to date and efficient.
Maintenance and updates are crucial to ensure the system stays reliable and effective. It's
about nurturing the system, making sure it runs smoothly day in and day out.
Video processing Feature Extraction
Input video
(Frame Extraction) (By applying yolov7 model in each
frame)
If detected
NO object is
belong to
vehicle class
YES
YES