Marco Filax, Tim Gonschorek, Frank Ortmeier: Data for Image Recognition Tasks: An Efficient Tool for Fine-Grained Annotations. In: Proceedings of the 8th International Conference on Pattern Recognition Applications and Methods, 2019.

Abstract

Using large datasets is essential for machine learning. In practice, training a machine learning algorithm requires hundreds of samples. Multiple off-the-shelf datasets from the scientific domain exist to benchmark new approaches. However, when machine learning algorithms transit to industry, e.g., for a particular image classification problem, hundreds of specific purpose images are collected and annotated in laborious manual work.

In this paper, we present a novel system to decrease the effort of annotating those large image sets. Therefore, we generate 2D bounding boxes from minimal 3D annotations using the known location and orientation of the camera. We annotate a particular object of interest in 3D once and project these annotations on to every frame of a video stream.

The proposed approach is designed to work with off-the-shelf hardware. We demonstrate its applicability with an example from the real world. We generated a more extensive dataset than available in other works for a particular industrial use case: fine-grained recognition of items within grocery stores. Further, we make our dataset available to the interested vision community consisting of over 60,000 images. Some images were taken under ideal conditions for training while others were taken with the proposed approach in the wild.

BibTeX (Download)

@inproceedings{Filax2019,
title = {Data for Image Recognition Tasks: An Efficient Tool for Fine-Grained Annotations},
author = {Marco Filax and Tim Gonschorek and Frank Ortmeier},
url = {https://bitbucket.org/cse_admin/md_groceries
http://www.scitepress.org/DigitalLibrary/Link.aspx?doi=10.5220/0007688709000907},
year  = {2019},
date = {2019-02-19},
booktitle = {Proceedings of the 8th International Conference on Pattern Recognition Applications and Methods},
abstract = {Using large datasets is essential for machine learning. In practice, training a machine learning algorithm requires hundreds of samples. Multiple off-the-shelf datasets from the scientific domain exist to benchmark new approaches. However, when machine learning algorithms transit to industry, e.g., for a particular image classification problem, hundreds of specific purpose images are collected and annotated in laborious manual work.

In this paper, we present a novel system to decrease the effort of annotating those large image sets. Therefore, we generate 2D bounding boxes from minimal 3D annotations using the known location and orientation of the camera. We annotate a particular object of interest in 3D once and project these annotations on to every frame of a video stream.

The proposed approach is designed to work with off-the-shelf hardware. We demonstrate its applicability with an example from the real world. We generated a more extensive dataset than available in other works for a particular industrial use case: fine-grained recognition of items within grocery stores. Further, we make our dataset available to the interested vision community consisting of over 60,000 images. Some images were taken under ideal conditions for training while others were taken with the proposed approach in the wild.
},
keywords = {Augmented Reality, Fine-Grained Recognition, VIOL},
pubstate = {published},
tppubtype = {inproceedings}
}