There is plenty of literature on the use of deep-learning for detecting logos, so, additionally to sharing with you a couple of algorithms to get started with one-shot logo-detection, the aim of this project is to develop a flexible architecture to facilitate the comparison of different algorithms for one-shot object recognition.
The pipeline supports one or two stages. It is possible to only perform object-recognition, or to first perform object-detection and then object-recognition in a second stage.
The idea is that you can use a generic detector for a single class of objects (e.g. logos, traffic signs or faces) and then compare each of its detections with the exemplar, i.e., the sub-class that you are trying to recognize, to determine if both belong to the same sub-class (e.g. a concrete brand, a stop sign or the face of a loved one). To get started, we include two algorithms that you can play with. Both have a Faster-RCNN [1] in the first stage that performs object-detection and they differ in the second stage that performs object-recognition.
As a baseline, we bring the exemplars and the detections from the first stage to the same latent space
(this reduces the course of dimensionality) and then simply measure the Euclidian or the cosine distance
between both embeddings for object-recognition. Both inputs are considered to belong to the same sub-class
if their distance is below a threshold determined in a preliminary analysis of the training dataset.
The code also provides functionality to add various transformations, so you have the option to augment
each exemplar with different transformations if you want. Simply add one or more exemplars into
the data/exemplars
folder that is generated after you’ve followed the installation instructions below,
and you are good to go.
As a first reference against the baseline, we also provide a modified ResNet [2] for object-recognition that directly takes the exemplars and the detections from the first stage and predicts if both belong to the same sub-class. Similarly to [3], this network infers a distance metric after being trained with examples of different sub-classes, but instead of sharing the same weights and processing each input in a separate pass as in [4], it concatenates both inputs and processes them in one pass. This concept follows more closely the architecture proposed by [5], where the assumption is that the exemplars often have more high-frequency components than the detections, and therefore the model can increase its accuracy by learning a separate set of weights for each input.
The models that we are including in the repo achieved a reasonable performance after a few training epochs.
However, if you would like to improve their performance you can find pointers to various datasets in [6],
which can be used in the training
part of this project.
Enjoy the code!
[1] Ren et. al. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks (2016)
[2] He et. al. Deep Residual Learning for Image Recognition (2016)
[3] Hsieh et. al. One-Shot Object Detection with Co-Attention and Co-Excitation (2019)
[4] Koch et. al. Siamese Neural Networks for One-shot Image Recognition (2015)
[5] Bhunia et. al. A Deep One-Shot Network for Query-based Logo Retrieval (2019)
[6] Hoi et. al. LOGO-Net: Large-scale Deep Logo Detection and Brand Recognition with Deep Region-based Convolutional Networks (2015)
This library is available on PyPI, so you can simply run pip install logodetect
to install it.
If you want to build logodetect
from source, run
git clone git@github.com:Heldenkombinat/logodetect.git
cd logodetect
pip install -e ".[tests, dev]"
Depending on your system and setup, you might have to run the install command as sudo
.
After successful installation, a CLI tool called logodetect
becomes available to you. If you invoke logodetect
without any arguments, you will get help on how to use it. To automatically download all models and data needed
to test the application first run:
logodetect init
which will download all files to ~/.hkt/logodetect
. Note that if you prefer another folder to download the data,
please use the environment variable LOGOS_RECOGNITION
. For instance, if you want to install models and data relative
to your clone of this repository, use
export LOGOS_RECOGNITION=path/to/this/folder
before running logodetect init
, or consider putting this variable in your .bash_rc
, .zshrc
or an equivalent
configuration file on your system.
The logodetect
CLI tool comes with two main commands, namely video
and image
, both of which work
fairly similarly. In each case you need to provide the input data for which you would like to detect logos,
and the logo exemplars that you want to detect in the footage. To get you started, we’ve provided
some demo data that you can use out of the box. That means you can simply run:
logodetect video
which should output the following text:
Rendering video: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 17/17 [00:00<00:00, 707.42it/s]
Moviepy - Building video /path/data/test_videos/test_video_small_50ms_output.mp4.
Moviepy - Writing video /path/data/test_videos/test_video_small_50ms_output.mp4
Moviepy - Done !
Moviepy - video ready /path/data/test_videos/test_video_small_50ms_output.mp4
All done! ✨ 🍰 ✨
to run one-shot detection on an example video, or you can run
logodetect image
to do so for an example image, which results in the following output:
Saved resulting image as /path/data/test_images/test_image_small_output.png.
All done! ✨ 🍰 ✨
If you want to use another video, you can do so with the -v
option. Images can be provided
with the -i
option and custom exemplars are configured with the -e
option. That means, if you want to run detection
for custom video data with custom exemplars, you should use
logodetect video -v <path-to-video> -e <path-to-exemplars-folder>
To run a small web app locally in your browser to upload images to recognize, simply run
python app.py
and navigate to https://localhost:5000
in the browser of your choice. Also, we’ve hosted an online
demo for you here.
On top of that, the aws
folder explains in detail how to host this application yourself on Amazon Web
Services. This minimalistic application can of course be extended to your own needs at any point.
In the last section we have already discussed the three commands exposed to users through the logodetect
CLI tool, namely init
, image
, and video
. While init
does not take any parameters, the other two
need a bit more explanation. Below you find the complete API reference from the respective help pages
of our CLI.
logodetect image --help
Usage: logodetect image [OPTIONS]
Options:
-i, --image_filename TEXT path to your input image
-o, --output_appendix TEXT string appended to your resulting file
-e, --exemplars TEXT path to your exemplars folder
--help Show this message and exit.
logodetect video --help
Usage: logodetect video [OPTIONS]
Options:
-v, --video_filename TEXT path to your input video
-o, --output_appendix TEXT string appended to your resulting file
-e, --exemplars TEXT path to your exemplars folder
--help
The specific parameter settings of the algorithms used in logodetect
can be modified by adapting
the file constants.py
, which has options for all of our detectors, classifiers, data augmenters and
system devices used.
Note: modifying the configuration is for power users only and needs a deeper understanding of the
implementation details of logodetect
. Please make sure you know what you’re doing before touching
constants.py
.
If you prefer to work with Docker, build an image and run it like this:
docker build . -t logodetect
docker run -e LOGOS_RECOGNITION=/app -p 5000:5000 -t logodetect
Important: this assumes that you have previously downloaded all data
and models
right next to
the Dockerfile
in the local copy of this repo.
black
This project uses black
for code linting. To install the git pre-commit hook for black
,
simply run
pre-commit install
from the base of this repository. This will run (and fail in case of grave errors) black each time you make a commit.
Once CI is up for this project, we will ensure this hook runs on each CI pass. To manually use black
on a file,
use black <path-to-file>
.
Run all tests with pytest
, or just run the quicker unit test suite with
pytest -m unit
or all longer-running integration tests with
pytest -m integration