Roboflow reposted this
YOLOv5 vs. YOLO11: Comparing Detection Quality 🔴 🟢 🔵 I'm working on a new annotator for supervision-0.26.0 that lets you compare predictions from two detection models. While working on example visualizations, I noticed that YOLO11 struggles to detect objects further from the camera. Is this a known limitation or something specific to my data? Here's how I'm visualizing the comparisons: 🔴 - detected only by YOLOv5m 🟢 - detected only by YOLO11m 🔵 - detected by both models ⮑ 🔗 use supervision for benchmarking: https://lnkd.in/d57M3c32 And, of course, you can compare any pair of detectors. More comparisons below 👇🏻 #computervision #objectdetection
aren't you overfitting your data by using superimposition of two models together in the same video by using y11 and y5. The reason why I felt this is all your data detecting by both models seems to be actually more of yolo11. Did you try only with y5 and combination of y5+y11, does it still give the same detection??
Thank you for the comparison, Piotr. We are currently using a YOLOv3 model with the DeepStream Tensor RT framework and Gst-nvtracker to detect and track an object multiple times within a video stream. The object can appear up to 50 times in a single frame. The model was trained in 2020, and while its accuracy is still quite good, we’ve observed that detection quality decreases when objects are closer to the camera. We are currently exploring which YOLO version to adopt next. Would you recommend upgrading to the latest YOLOv11 version? Our challenge lies in balancing improved detection accuracy with performance, particularly given the number of channels we need to process in our framework.
The YOLO11 is trained for 600 epochs on the COCO dataset compared to 500 epochs for YOLOv5 and YOLOv8. I think it is simply overfitting on the COCO benchmark. Can you compare the new D-FINE model as well on the same video?
It looks to me that yolo5 is doing a bit better job since red is more frequent than green and all red predictions look like true positives. Do you have any stats for both models? Regarding your question, I'd also pay attention to people that are going away from camera compared to those that are getting closer. Even though the distance of people going towards camera is less than those going away from predictions seem uncertain. It looks like if it detects a person in closer distance it would keep detecting while going away compared to those coming towards camera. I would try something like run detection on the same video but played in reverse to check assumption if it detects people better that are looking towards camera compared to those looking in the opposite side.
I am working on a facial recognition system project, do you recommend to use YOLO11 for that, or maybe other versions.
Detection for out of cocoset objects yolov8 still rocks. I tried again on my last project and v11 is ~%10-15 worse.
Piotr , you should consider a ratio metric to quantify and validate in more detail some of these "discrepancies" across models. I think the fact of the 600 epocs for Y11 on COCO wrt Y5 o 8 as sapiently pointed out by Naman Makkar is the factor, but what's important here is the "effect size" of such cross model discrepancies...especially when one is truly making such performance comparisons across models. And yes, I refer to the statistical effect size of such discrepancies!. I would rather not measure the differences by relying on my impression on the clips : ) Daniele
I noticed the same with YOLO11 while using an aerial video for traffic analysis, trying to replicate your relative tutorial. The "height" and quality of the video I used resembled the one you used, but it evidently struggled to make detections with v11 compared to yours, where you used v8.
Open Source Lead @ Roboflow | Computer Vision | Vision Language Models
1wYOLOv9 vs. YOLO11 object detection