statistics.com reposted this
It's our final Elder Research Commercial Team data challenge of the year, and folks were given an image classification task: Is Santa in this image[1]? As usual, we saw lots of different approaches, too, from Resnets to vision transformers to LLMs. (DALL-E can generate some eerie-looking Santas, too.) The tools have improved a lot since I last looked at image classification! For this task, an off-the-shelf, zero-shot classifier (OpenAI's clip-vit-large-patch14 via HuggingFace[2]) performed super competitively, finishing just behind a coworker's fine-tuned Resnet 50 (F1 = 0.960 vs. 0.966). [There are lots of interesting subtleties in these results, too. For one, are these images actually new to the various pre-trained models, or were they a part of that pre-training? We also observe a nonzero level of corruption—or at least ambiguity—in the ground-truth labels, which brings down the maximum possible performance and adds some extra intrigue.] My first attempt at classifying with CLIP actually ended a little worse, with F1 = 0.85, and for an interesting reason. I had, effectively, asked the model to decide between two captions—Santa and not-Santa—and then made decisions based on a learned threshold for P(Santa). This was supposed to be a contrastive approach. But, plot the model's estimated log odds for the not-Santa query vs. its Santa counterpart, and it jumps out at you: CLIP can't tell much difference between Santa and not-Santa![3] The model thinks not-Santa is less likely on average in these data, but the log odds of not-Santa increase at almost the same rate as those of the Santa query; the model can't handle the negation (thanks, Robert Robison!) I did much better in a second attempt by dropping the not-Santa query completely, simply making decisions based on the log odds of Santa. So, things are pretty great at the moment for a simple case like this! More things are getting to be automatic, but these little surprises still get you from time to time; gotta pay attention. [1]: https://lnkd.in/eYjukzB7 [2]: https://lnkd.in/eZMNY7_7 [3]: Red diamonds are images labeled as Santa; blue points represent images labeled as not-Santa.