Roboflow’s Post

Roboflow reposted this

View profile for Piotr Skalski, graphic

Open Source Lead @ Roboflow | Computer Vision | Vision Language Models

YOLOv5 vs. YOLO11: Comparing Detection Quality 🔴 🟢 🔵 I'm working on a new annotator for supervision-0.26.0 that lets you compare predictions from two detection models. While working on example visualizations, I noticed that YOLO11 struggles to detect objects further from the camera. Is this a known limitation or something specific to my data? Here's how I'm visualizing the comparisons: 🔴 - detected only by YOLOv5m 🟢 - detected only by YOLO11m 🔵 - detected by both models ⮑ 🔗 use supervision for benchmarking: https://lnkd.in/d57M3c32 And, of course, you can compare any pair of detectors. More comparisons below 👇🏻 #computervision #objectdetection

Piotr Skalski

Open Source Lead @ Roboflow | Computer Vision | Vision Language Models

1w

YOLOv9 vs. YOLO11 object detection

  • No alternative text description for this image
Piotr Skalski

Open Source Lead @ Roboflow | Computer Vision | Vision Language Models

1w

YOLOv8 vs. YOLO11 instance segmentation

  • No alternative text description for this image
Deepti Prasad

Dentist, Data Analyst, Data Scientist, SAS Programmer, Statistician with R programming and SAS

1w

aren't you overfitting your data by using superimposition of two models together in the same video by using y11 and y5. The reason why I felt this is all your data detecting by both models seems to be actually more of yolo11. Did you try only with y5 and combination of y5+y11, does it still give the same detection??

Jens Schneider

Technology Specialist - M365 Copilot bei Microsoft | Microsoft 365

1w

Thank you for the comparison, Piotr. We are currently using a YOLOv3 model with the DeepStream Tensor RT framework and Gst-nvtracker to detect and track an object multiple times within a video stream. The object can appear up to 50 times in a single frame. The model was trained in 2020, and while its accuracy is still quite good, we’ve observed that detection quality decreases when objects are closer to the camera. We are currently exploring which YOLO version to adopt next. Would you recommend upgrading to the latest YOLOv11 version? Our challenge lies in balancing improved detection accuracy with performance, particularly given the number of channels we need to process in our framework.

Naman Makkar

CEO, Vayuvahana Technologies Private Limited | Building SOTA Vision AI models

1w

The YOLO11 is trained for 600 epochs on the COCO dataset compared to 500 epochs for YOLOv5 and YOLOv8. I think it is simply overfitting on the COCO benchmark. Can you compare the new D-FINE model as well on the same video?

It looks to me that yolo5 is doing a bit better job since red is more frequent than green and all red predictions look like true positives. Do you have any stats for both models? Regarding your question, I'd also pay attention to people that are going away from camera compared to those that are getting closer. Even though the distance of people going towards camera is less than those going away from predictions seem uncertain. It looks like if it detects a person in closer distance it would keep detecting while going away compared to those coming towards camera. I would try something like run detection on the same video but played in reverse to check assumption if it detects people better that are looking towards camera compared to those looking in the opposite side.

I am working on a facial recognition system project, do you recommend to use YOLO11 for that, or maybe other versions.

Fatih Ors, MSc

Data Scientist / AI Specialist

1w

Detection for out of cocoset objects yolov8 still rocks. I tried again on my last project and v11 is ~%10-15 worse.

Piotr , you should consider a ratio metric to quantify and validate in more detail some of these "discrepancies" across models. I think the fact of the 600 epocs for Y11 on COCO wrt Y5 o 8 as sapiently pointed out by Naman Makkar is the factor, but what's important here is the "effect size" of such cross model discrepancies...especially when one is truly making such performance comparisons across models. And yes, I refer to the statistical effect size of such discrepancies!. I would rather not measure the differences by relying on my impression on the clips : ) Daniele

Georgios Faltakas

Research Affiliate @WiMoTS | MSc CSIoT @UTH | BSc in Digital Systems

1w

I noticed the same with YOLO11 while using an aerial video for traffic analysis, trying to replicate your relative tutorial. The "height" and quality of the video I used resembled the one you used, but it evidently struggled to make detections with v11 compared to yours, where you used v8.

See more comments

To view or add a comment, sign in

Explore topics