400 Project Research

Collection of research for 'Face detection using reinforcement learning'

TODO

Add which dataset each paper uses
Add more description of important topics covered by some papers (i.e. topics to then cite)
We use a learning system prior to the RL, 'We don't want to stub our toe lots to learn vision' - Geoff Hinton (https://www.youtube.com/watch?v=N0ER1MC9cqM ~28mins)
https://stats.stackexchange.com/questions/280179/why-is-resnet-faster-than-vgg/280338

Research papers

Title / Author / URL

Summary

Active Object Localization using Reinforcement J. C. Caicedo.

https://ieeexplore.ieee.org/document/7410643

Object detection using a novel reinforcement approach. Uses a pretrained CNN to extract features, and a reinforcement learning 'localisation policy' based on DeepQNetwork.

You Only Look Once : Unified, Real-Time Object Detection

J. Redmon https://ieeexplore.ieee.org/document/7780460

State of the art object detection neural network. Regression to take input image → regression of coordinates. Only needs to take the input in once, and confidence levels to determine which bounding boxes are good ones.

SkipNet: Learning Dynamic Routing in Convolutional Networks

X. Wang

https://arxiv.org/abs/1711.09485

Shallower networks are sufficient for a lot of tasks, proposes a model to automatically skip convolutional layers based on a per input basis.

Deep Sheep: kinship assignment in livestock from facial images

L. Szymanski

https://ieeexplore.ieee.org/document/9290558

A look into the viability of using deep learning to assign kinship between sheep faces.

An Introduction to Deep Reinforcement Learning

V. François-Lavet

https://arxiv.org/abs/1811.12560

An in-depth look into how deep reinforcement learning.

TODO: note important topics covered

Human-level control through deep reinforcement learning

V. Mnih

https://www.nature.com/articles/nature14236

Deep Q Network paper, the famous 2015 Atari deepmind project.

Collaborative Deep Reinforcement Learning for Joint Object Search

X. Kong

https://ieeexplore.ieee.org/document/8100231

Novel multi-agent deep Q-learning algorithm with joint exploitation sampling. Essentially allows multiple agents to collaborate to end up with 'person on a bicycle holding a cup'

Action-Decision Networks for Visual Tracking with Deep Reinforcement Learning

S. Yun

https://ieeexplore.ieee.org/document/8099631

Light computation for tracking, as a reinforcement learning algorithm learns sequential actions to move bounding box to track object

Deep reinforcement learning based lane detection and localization

Z. Zhao

https://www.sciencedirect.com/science/article/abs/pii/S0925231220310833

Deep reinforcement learning into cursory lane detection models for accurate lane detection and localization. Uses CNN to find bounding boxes, and reinforcement learning to use those boxes to create a good estimate of the curve of the lane.

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

S. Ren

https://arxiv.org/abs/1506.01497

Uses a Region Proposal Network to enable nearly cost free region proposals, that the FAST R-CNN can then utilise. State of the art detection rates, but achieves 5fps on GPU.

Multi-shot Pedestrian Re-identification via Sequential Decision Making

J. Zhang

https://ieeexplore.ieee.org/document/8578807

Pedestrian re-identification by using reinforcement learning. It can either output result (same or different) or request another pair of images (delay outcome till more certain). Achieves good tradeoff between speed and accuracy by changing reward for unsure action.

(Could be useful for sheep kinship verification?)

DEEP REINFORCEMENT LEARNING

Y. Li

https://arxiv.org/abs/1810.06339

Discusses deep reinforcement learning in an overview style.

LocNet: Improving Localization Accuracy for Object Detection

S. Gidaris

https://ieeexplore.ieee.org/document/7780461/

Novel method of using column and row estimates of how 'object' a given row/col looks. Creates two 1D curved regions that suggest a bounding box.

https://i.imgur.com/TxCetFs.png

Attention-aware Deep Reinforcement Learning for Video Face Recognition

Y. Rao

https://ieeexplore.ieee.org/document/8237686

Attention-aware deep reinforcement learning (ADRL) method for video face recognition. Poses the problem as a Markov decision process. Information from both the image space and the feature space are used.

Adaptive Object Detection Using Adjacency and Zoom Prediction

Y. Lu

https://arxiv.org/abs/1512.07711

Adjacency and Zoom Network to suggest regions to look closer in. "directs computational resources to sub-regions likely to contain objects"

Attention to Scale: Scale-aware Semantic Image Segmentation

LC. Chen

https://ieeexplore.ieee.org/document/7780765

An attention mechanism that learns to softly weight the multi-scale features at each pixel location.

YOLO-face: a real-time face detector

W. Chen

https://link.springer.com/article/10.1007/s00371-020-01831-7

A real time face detector based on YOLO. Provides better results than YOLO v2, YOLO v3 or Faster R-CNN.

Fast face detection on mobile devices by leveraging global and local facial characteristics

H. Zhang

https://www.sciencedirect.com/science/article/abs/pii/S0923596518303989

Faster proposal generation model, therefore faster detection models. Uses global and local face characteristics.

A fast face detection method via convolutional neural network

G. Guo https://arxiv.org/abs/1803.10103

Uses discriminative complete features (DCFs) to replace image pyramid employed by standard CNN to improve efficiency for face detection.

Reinforcement Learning for Improving Object Detection

S. Nayak

https://arxiv.org/pdf/2008.08005.pdf

Use RL to learn the preprocessing for images to improve object detection rates. Standard networks are so nonlinear they respond differently to image parameters like brightness, contrast, etc. Use RL to learn to fix this.

Efficient Object Detection in Large Images using Deep Reinforcement Learning

B. Uzkent

https://arxiv.org/pdf/1912.03966.pdf

From abstract: Proposes a reinforcement learning agent that adaptively selects the spatial resolution of each image that is provided to the detector.

Datasets

Name

Link

Pascal VOC 2007/2012

http://host.robots.ox.ac.uk/pascal/VOC/

ImageNet

CAS-PEAL Face Database

WiderFACE

LFW (Labelled Faces in the Wild)

http://vis-www.cs.umass.edu/lfw/

Websites and notes

https://www.linkedin.com/pulse/yolo-really-better-than-ssd-chandrakala-busireddy/ Explanation of YOLO vs SSD
Need to make it work better as a single shot sorta problem... but still use reinforcement learning to see if it can improve the detection rates?
Future work: Use reinforcement learning to learn how the convolution step moves? Is it feasible
A good example of how to use tensorflow keras with keras-rl https://soygema.github.io/starcraftII_machine_learning/#5

PreviousTo be added elsewhere NextAims

Last updated 4 years ago

Was this helpful?