Listed so far:
- Symbols (MNIST, SVHN, NIST SD, CASIA-HWDB, GTSRB),
- Textures (CUReT, KTH-TIPS, UIUC texture, ALOT, FMD),
- Easy natural (CIFAR, STL, Caltech, SUN),
- Hard natural (PASCAL VOC, ImageNet, MS COCO).
Symbols recognition - one class per image
MNIST (1998) - handwritten digits, old and ~solvedweb, paper
10 mutually exclusive classes // 0, ..., 9
70,000 images, 28x28 grayscale (~12MB)
Baseline: 2013 (0.21% error)
SVHN (2011) - street view house numbers
web, paper
10 mutually exclusive classes // 0, ..., 9
~100,000 images, 32x32, color (~250MB)
+531131 additional images
Baseline: 2013 (1.94% error)
See also:
- NIST SD 19 (characters)
- CASIA-HWDB (Chinese characters)
- GTSRB (traffic signs)
Texture datasets - one class per image
Usually images are large and are subject to some transformations, or captured from different angles.- CUReT (1999) - web, 61 classes, 12,505 images
- KTH-TIPS (2004) - web, 10 classes, 810 images
- KTH-TIPS2 (~2004) - web, 11 classes, 4,752 images
- UIUC texture (2004) - web, 25 classes, 1,000 images
- ALOT (2009) - web, 250 classes, 27,500 images
- FMD (2014) - web, 10 classes, 1,000 images
See also:
- http://www.cfar.umd.edu/~fer/website-texture/texture.htm
- http://www.iai.uni-bonn.de/~gall/download/jgall_materialBTF_eccv14.pdf
CIFAR - natural images, no occlusions, one class per image
web, paperCIFAR-10 (2009)
10 mutually exclusive classes // airplane, automobile, bird, cat, deer, dog, frog, horse, ship, truck
60,000 images, 32x32, color (~170 MB)
Baseline: 2011 (94%)
CIFAR-100 (2009)
100 mutually exclusive classes (20 superclasses)
60,000 images, 32x32, color (~170 MB) - same as CIFAR-10
Baseline: 2013 (64.32%)
STL-10 (2011) - natural images, no occlusions, one class per image, ~small
web, paper10 mutually exclusive classes // airplane, bird, car, cat, deer, dog, horse, monkey, ship, truck
5,000 training, 8,000 test images, 96x96 (~2.5GB)
+100,000 unlabelled for unsupervised training
Baseline: 2013 (~70%)
CALTECH - natural images, no occlusions, one class per image
CALTECH-101 (2003)web, paper
102 mutually exclusive classes
9,144 images, ~130MB, ~200x300, color
from 30 to 800 images per class (median: 59)
Baseline: ???
CALTECH-256 (2006)
web, paper
257 mutually exclusive classes
30,607 images, ~1.2GB, color
from 80 to 800 images per class (median: 100)
Baseline: ???
SUN - scene classification, one class per image
web, paperSUN-397 (2010) web
397 mutually exclusive categories
108,754 images, ~37GB, different sizes
at least 100 images per class
Baseline: ???
PASCAL VOC 2012 - natural images, many classes per image
web, paper20 classes, not always mutually exclusive
11,530 training images, ~2GB, color
from 300 to 4000 images per class
// these is also an action classification task
Baselines: retrospective, submission server
ImageNet - natural images, many classes per image, hard
web, paperLSVRC 2012-2014 (5 guesses)
web
1000 categories, 1.2 million images
Baseline 2012 (0.15315 error), 2013 (0.11197 error), 2014 (0.06656 error)
MS COCO - natural images, many classes per image, hard
web, paper~70 categories, ~300,000 images
// with per-pixel segmentation
Baseline: not yet available
Not really classification datasets:
- many Caltech old datasets: web
- many Oxford old datasets: web
- LabelMe (polygons in natural images): paper
- MSRC-21 (per pixel segmentation on 591 natural images): paper
- Middlebury Stereo (2002)
- UIUC Cars (2004)
- FERET Faces (1998)
- CMU/VASC Faces (1998)
- Caltech-UCSD Birds-200-2011
- Video KTH human action (2004)
- Video Sign Language (2008)
Yeah, I know, the whole post was about images and I use none :)
No comments :
Post a comment