Type

Conference Proceedings

Authors

Eric Arazo Sanchez
Sean Quinn
Noel E. O'Connor
Kevin McGuinness
Yvette Graham
Alan F. Smeaton

Subjects

Computer Science

Topics
artificial intelligence digital video multimedia systems video understanding video to language video captioning machine learning semantic similarity

Exploring the impact of training data bias on automatic generation of video captions (2018)

Abstract A major issue in machine learning is availability of training data. While this historically referred to the availability of a sufficient volume of training data, recently this has shifted to the availability of sufficient unbiased training data. In this paper we focus on the effect of training data bias on an emerging multimedia application, the automatic captioning of short video clips. We use subsets of the same training data to generate different models for video captioning using the same machine learning technique and we evaluate the performances of different training data subsets using a well-known video caption benchmark, TRECVid. We train using the MSR-VTT video-caption pairs and we prune this to reduce and make the set of captions describing a video more homogeneously similar, or more diverse, or we prune randomly. We then assess the effectiveness of caption-generating trained with these variations using automatic metrics as well as direct assessment by human assessors. Our findings are preliminary and show that randomly pruning captions from the training data yields the worst performance and that pruning to make the data more homogeneous, or diverse, does improve performance slightly when compared to random. Our work points to the need for more training data, both more video clips but, more importantly, more captions for those videos.
Collections Ireland -> Dublin City University -> Publication Type = Conference or Workshop Item
Ireland -> Dublin City University -> DCU Faculties and Centres = DCU Faculties and Schools: Faculty of Engineering and Computing: School of Computing
Ireland -> Dublin City University -> Subject = Computer Science: Artificial intelligence
Ireland -> Dublin City University -> Status = Published
Ireland -> Dublin City University -> Subject = Computer Science: Multimedia systems
Ireland -> Dublin City University -> Subject = Computer Science: Digital video
Ireland -> Dublin City University -> DCU Faculties and Centres = Research Initiatives and Centres: INSIGHT Centre for Data Analytics
Ireland -> Dublin City University -> Subject = Computer Science: Machine learning

Full list of authors on original publication

Eric Arazo Sanchez, Sean Quinn, Noel E. O'Connor, Kevin McGuinness, Yvette Graham, Alan F. Smeaton

Experts in our system

1
Eric Arazo Sanchez
Dublin City University
Total Publications: 3
 
2
Sean Quinn
Dublin City University
Total Publications: 5
 
3
Noel E. O'Connor
Dublin City University
Total Publications: 474
 
4
Kevin McGuinness
Dublin City University
Total Publications: 93
 
5
Yvette Graham
Dublin City University
Total Publications: 25
 
6
Alan F. Smeaton
Dublin City University
Total Publications: 492