Early Demo on Textual Image Search

This is a demonstrator that showcases the text-to-image search capabilities. The overall goal is to match textual search queries such as “rotes Kleid” (engl. “red dress”) to a ranked list of fashion products. As key innovation, we present a neural information retrieval approach that learns to embed both textual queries and product images as real-valued vectors in a shared, high-dimensonal vector space. We train the approach so that text and images with similar semantics are embedded close to each other in this shared space, thus allowing us to compute matching text-image pairs using the cosine similarity of their respective embeddings. This report accompanies the demonstrator, which is the basis for our future work that will extend the textual component by NLP and multilinguality.