Hover over a point to see details here
We trained 1500 ResNet models to compute label memorization scores for CIFAR-10N human noisy labels.
Human label noise challenges deep learning models, often degrading
performance more than synthetic label noise.
We study label memorization in CIFAR-10N using held-out estimation to
understand this effect. Unlike the recently adopted
PMD noise model, which generates feature-dependent noise along a model’s decision
boundary, our findings reveal that challenging human noisy labels can
form tight subclusters in their CLIP feature space.
Leveraging these insights, we propose Cluster-Based Noise (CBN), a
method to simulate human label noise. Finally, we introduce Soft
Neighbor Label Sampling (SNLS), which improves performance on CBN.
Comparison of noise functions at the same noise rate
SNLS generates a soft label distribution by leveraging the labels of the 100 nearest neighbors in CLIP space. Under our cluster-based noise assumption, incorporating richer label information from more distant neighbors in the CLIP feature space can provide signals about the true label.
We implement SNLS with LRA-Diffusion and evaluate against several other Learning with Noisy Labels (LNL) methods on CIFAR-10 and CIFAR-100 datasets with varying levels of CBN and PMD noise.
CIFAR-10 | CIFAR-100 | |||||||
---|---|---|---|---|---|---|---|---|
35% Noise | 70% Noise | 35% Noise | 70% Noise | |||||
PMD | CBN | PMD | CBN | PMD | CBN | PMD | CBN | |
Standard | 84.40 ± 0.18 | 75.44 ± 0.13 | 46.59 ± 0.33 | 27.22 ± 0.21 | 63.42 ± 0.15 | 46.17 ± 0.08 | 47.13 ± 0.13 | 17.48 ± 0.24 |
Co-teaching+🔗 | 67.08 ± 0.20 | 60.98 ± 0.45 | 35.35 ± 0.70 | 18.32 ± 0.14 | 55.09 ± 0.15 | 39.08 ± 0.11 | 39.36 ± 0.03 | 12.18 ± 0.09 |
GCE🔗 | 84.70 ± 0.10 | 77.73 ± 0.28 | 39.06 ± 0.66 | 25.16 ± 0.45 | 63.08 ± 0.25 | 39.60 ± 0.56 | 43.00 ± 0.25 | 12.59 ± 0.41 |
PLC🔗 | 86.11 ± 0.02 | 80.51 ± 0.19 | 42.66 ± 2.08 | 23.06 ± 4.08 | 62.23 ± 0.17 | 42.67 ± 0.15 | 47.86 ± 0.24 | 12.69 ± 0.37 |
LRA-Diffusion🔗 | 97.12 ± 0.10 | 91.74 ± 0.48 | 47.17 ± 2.00 | 18.60 ± 1.29 | 77.86 ± 0.43 | 50.34 ± 0.34 | 57.18 ± 0.81 | 11.76 ± 0.24 |
LRA-Diffusion+SNLS | 97.31 ± 0.03 | 92.77 ± 0.18 | 49.16 ± 2.01 | 19.05 ± 0.49 | 78.89 ± 0.28 | 58.80 ± 0.51 | 62.41 ± 0.51 | 15.13 ± 0.20 |
If you used our research in your work, please cite us:
@inproceedings{lim25snls,
author = {Lim, Gordon and Larson, Stefan and Leach, Kevin},
title = {Robust Testing for Deep Learning using Human Label Noise},
year = {2025},
series = {DeepTest '25}
}