When we need to perform object detection, what are some of the options that we can have in terms of the network architecture?
Instructor solution
We can build a CNN that has 2 branches: one of them performs a classification of the object and one of them finds the specifications of the bounding box. This architecture won't be good when we have multiple objects in the photo, and this is why we move to YOLO architectures where each part of the image would be studied alone in order to detect the different objects.
Peer Responses
Mask R-CNN:
Extension of Faster R-CNN.
Adds a mask prediction branch for instance segmentation.
High accuracy and versatility.
YOLO (You Only Look Once):
Single-stage detector.
Real-time detection.
Unified detection and classification.
Think you've got it?
How does YOLO perform object detection?
- A.
By using a cascade of classifiers to refine the detection
- B.
By dividing the image into a grid and predicting bounding boxes and probabilities for each grid cell
- C.
By scanning an image with a sliding window
- D.
By first identifying objects then dividing the image into a grid
You may exit out of this review and return later without penalty.