Die Einführung des VIVO-Systems an der HTWD befindet sich derzeit in der Testphase. Daher kann es noch zu anwendungsseitigen Fehlern kommen. Sollten Sie solche Fehler bemerken, können Sie diese gerne >>hier<< melden.
Sollten Sie dieses Fenster schließen, können Sie über die Schaltfläche "Feedback" in der Fußleiste weiterhin Meldungen abgeben.
Vielen Dank für Ihre Unterstützung!
Contrastive Language-Image Pretrained (CLIP) Models are Powerful Out-of-Distribution Detectors
Arbeitspapier/F.-Berichte
We present a comprehensive experimental study on pretrained feature
extractors for visual out-of-distribution (OOD) detection. We examine several
setups, based on the availability of labels or image captions and using
different combinations of in- and out-distributions. Intriguingly, we find that
(i) contrastive language-image pretrained models achieve state-of-the-art
unsupervised out-of-distribution performance using nearest neighbors feature
similarity as the OOD detection score, (ii) supervised state-of-the-art OOD
detection performance can be obtained without in-distribution fine-tuning,
(iii) even top-performing billion-scale vision transformers trained with
natural language supervision fail at detecting adversarially manipulated OOD
images. Finally, we argue whether new benchmarks for visual anomaly detection
are needed based on our experiments. Using the largest publicly available
vision transformer, we achieve state-of-the-art performance across all $18$
reported OOD benchmarks, including an AUROC of 87.6\% (9.2\% gain,
unsupervised) and 97.4\% (1.2\% gain, supervised) for the challenging task of
CIFAR100 $\rightarrow$ CIFAR10 OOD detection. The code will be open-sourced.