Image classification datasets

Here I collect the most popular datasets in image classification (not segmentation, not detection, just classification). The description includes the basic statistics on the dataset and relevant links.

Listed so far:
  • Symbols (MNIST, SVHN, NIST SD, CASIA-HWDB, GTSRB),
  • Textures (CUReT, KTH-TIPS, UIUC texture, ALOT, FMD),
  • Easy natural (CIFAR, STL, Caltech, SUN),
  • Hard natural (PASCAL VOC, ImageNet, MS COCO).

Symbols recognition - one class per image

MNIST (1998) - handwritten digits, old and ~solved
web, paper

10 mutually exclusive classes // 0, ..., 9
70,000 images, 28x28 grayscale (~12MB)

Baseline: 2013 (0.21% error)

SVHN (2011) - street view house numbers
web, paper

10 mutually exclusive classes // 0, ..., 9
~100,000 images, 32x32, color (~250MB)
+531131 additional images

Baseline: 2013 (1.94% error)

See also:

Texture datasets - one class per image

Usually images are large and are subject to some transformations, or captured from different angles.
  • CUReT (1999) - web, 61 classes, 12,505 images
  • KTH-TIPS (2004) - web, 10 classes, 810 images
  • KTH-TIPS2 (~2004) - web, 11 classes, 4,752 images
  • UIUC texture (2004) - web, 25 classes, 1,000 images
  • ALOT (2009) - web, 250 classes, 27,500 images
  • FMD (2014) - web, 10 classes, 1,000 images
Baselines: ???

See also:

CIFAR - natural images, no occlusions, one class per image

web, paper

CIFAR-10 (2009)
10 mutually exclusive classes // airplane, automobile, bird, cat, deer, dog, frog, horse, ship, truck
60,000 images, 32x32, color (~170 MB)

Baseline: 2011 (94%)

CIFAR-100 (2009)
100 mutually exclusive classes (20 superclasses)
60,000 images, 32x32, color (~170 MB) - same as CIFAR-10

Baseline: 2013 (64.32%)

STL-10 (2011) - natural images, no occlusions, one class per image, ~small

web, paper

10 mutually exclusive classes // airplane, bird, car, cat, deer, dog, horse, monkey, ship, truck
5,000 training, 8,000 test images, 96x96 (~2.5GB)
+100,000 unlabelled for unsupervised training

Baseline: 2013 (~70%)

CALTECH - natural images, no occlusions, one class per image

CALTECH-101 (2003)
web, paper

102 mutually exclusive classes
9,144 images, ~130MB, ~200x300, color
from 30 to 800 images per class (median: 59)

Baseline: ???

CALTECH-256 (2006)
web, paper

257 mutually exclusive classes
30,607 images, ~1.2GB, color
from 80 to 800 images per class (median: 100)

Baseline: ???

SUN - scene classification, one class per image

web, paper

SUN-397 (2010) web

397 mutually exclusive categories
108,754 images, ~37GB, different sizes
at least 100 images per class

Baseline: ???

PASCAL VOC 2012 - natural images, many classes per image

web, paper

20 classes, not always mutually exclusive
11,530 training images, ~2GB, color
from 300 to 4000 images per class
// these is also an action classification task

Baselines: retrospective, submission server

ImageNet - natural images, many classes per image, hard

web, paper

LSVRC 2012-2014 (5 guesses)
web

1000 categories, 1.2 million images

Baseline 2012 (0.15315 error), 2013 (0.11197 error), 2014 (0.06656 error)

MS COCO - natural images, many classes per image, hard

web, paper

~70 categories, ~300,000 images
// with per-pixel segmentation

Baseline: not yet available

Not really classification datasets:
  • many Caltech old datasets: web
  • many Oxford old datasets: web
  • LabelMe (polygons in natural images): paper
  • MSRC-21 (per pixel segmentation on 591 natural images): paper
  • Middlebury Stereo (2002)
  • UIUC Cars (2004)
  • FERET Faces (1998)
  • CMU/VASC Faces (1998)
  • Caltech-UCSD Birds-200-2011
  • Video KTH human action (2004)
  • Video Sign Language (2008)
Current results in MNIST, CIFAR, SVHN and STL-10: web

Yeah, I know, the whole post was about images and I use none :)

No comments :

Post a Comment