Using the visual denotations of image captions for semantic inference