I'm studying Andrew NG's Convolutional Neural Networks and am in Week 3 of the course which deals with object detection using YOLO algorithm . I don't understand one section in the programming assignment that uses a function called 'scale_boxes' . This is what is described about the function in the course materials.
"*There're a few ways of representing boxes, such as via their corners or via their midpoint and height/width. YOLO converts between a few such formats at different times, using the following functions (which we have provided):
boxes = yolo_boxes_to_corners(box_xy, box_wh) which converts the yolo box coordinates (x,y,w,h) to box corners' coordinates (x1, y1, x2, y2) to fit the input of yolo_filter_boxes
boxes = scale_boxes(boxes, image_shape) YOLO's network was trained to run on 608x608 images. If you are testing this data on a different size image--for example, the car detection dataset had 720x1280 images--this step rescales the boxes so that they can be plotted on top of the original 720x1280 image.*"
And the function scale_boxes itself is defined as :
def scale_boxes(boxes, image_shape): """ Scales the predicted boxes in order to be drawable on the image""" height = image_shape[0] width = image_shape[1] image_dims = K.stack([height, width, height, width]) image_dims = K.reshape(image_dims, [1, 4]) boxes = boxes * image_dims return boxes
It is used in the following function 'yolo_eval' :
def yolo_eval(yolo_outputs, image_shape = (720., 1280.), max_boxes=10, score_threshold=.6, iou_threshold=.5): """ Converts the output of YOLO encoding (a lot of boxes) to your predicted boxes along with their scores, box coordinates and classes. Arguments: yolo_outputs -- output of the encoding model (for image_shape of (608, 608, 3)), contains 4 tensors: box_confidence: tensor of shape (None, 19, 19, 5, 1) box_xy: tensor of shape (None, 19, 19, 5, 2) box_wh: tensor of shape (None, 19, 19, 5, 2) box_class_probs: tensor of shape (None, 19, 19, 5, 80) image_shape -- tensor of shape (2,) containing the input shape, in this notebook we use (608., 608.) (has to be float32 dtype) max_boxes -- integer, maximum number of predicted boxes you'd like score_threshold -- real value, if [ highest class probability score < threshold], then get rid of the corresponding box iou_threshold -- real value, "intersection over union" threshold used for NMS filtering Returns: scores -- tensor of shape (None, ), predicted score for each box boxes -- tensor of shape (None, 4), predicted box coordinates classes -- tensor of shape (None,), predicted class for each box """ ### START CODE HERE ### # Retrieve outputs of the YOLO model (≈1 line) box_confidence, box_xy, box_wh, box_class_probs = yolo_outputs # Convert boxes to be ready for filtering functions (convert boxes box_xy and box_wh to corner coordinates) boxes = yolo_boxes_to_corners(box_xy, box_wh) # Use one of the functions you've implemented to perform Score-filtering with a threshold of score_threshold (≈1 line) scores, boxes, classes = yolo_filter_boxes(box_confidence,boxes,box_class_probs,score_threshold) # Scale boxes back to original image shape. boxes = scale_boxes(boxes, image_shape) # Use one of the functions you've implemented to perform Non-max suppression with # maximum number of boxes set to max_boxes and a threshold of iou_threshold (≈1 line) scores, boxes, classes = yolo_non_max_suppression(scores,boxes,classes,max_boxes,iou_threshold) ### END CODE HERE ### return scores, boxes, classes
I don't understand the need for the function 'scale_boxes' . There doesn't seem to be any answers/attention to this in the discussion forums as well , which is why I'm posting this question here .
Can someone please explain in detail what this function does exactly and why it is required ?