[YOLO][k-means聚类]寻找锚点

发表于 2020-05-30 更新于 2024-01-21 分类于目标检测/object-detect 阅读次数：

本文字数： 4.5k 阅读时长 ≈ 8 分钟

YOLOv2在YOLO的基础上进行了很多的调整，其中一个就是设置锚点框。相对于Faster R-CNN/SSD通过人工方式设置锚点的尺度和长宽比，YOLOv2提出了通过k-means聚类的方式自动计算出锚点的大小

k-means聚类

k-means聚类算法通过计算距离作为相似度来衡量两个对象是否属于同一簇

锚点计算

数据

输入数据：训练数据集中各个GT
输出数据：锚点框

改进

在原始的k-means聚类算法中，通过欧式距离来进行判断。在锚点计算过程中，这会导致更大的边界框会产生更多的误差，所以YOLOv2选择了计算聚类中心和对象的IoU作为评判标准

\[ d(box, centroid) = 1- IOU(box, centroid) \]

属性

锚点边界框由中心点坐标和长宽组成：\([center_{x}, center_{y}, w, h]\)。在训练过程中，中心点坐标已经固定（各个栅格中心），所以仅需计算长/宽即可

实现

仓库lars76/kmeans-anchor-boxes对YOLOv2的k-means锚点计算过程进行了很好的复现。关键代码如下

#############################  kmeans.py
import numpy as np


def iou(box, clusters):
    """
    Calculates the Intersection over Union (IoU) between a box and k clusters.
    :param box: tuple or array, shifted to the origin (i. e. width and height)
    :param clusters: numpy array of shape (k, 2) where k is the number of clusters
    :return: numpy array of shape (k, 0) where k is the number of clusters
    """
    x = np.minimum(clusters[:, 0], box[0])
    y = np.minimum(clusters[:, 1], box[1])
    if np.count_nonzero(x == 0) > 0 or np.count_nonzero(y == 0) > 0:
        raise ValueError("Box has no area")

    intersection = x * y
    box_area = box[0] * box[1]
    cluster_area = clusters[:, 0] * clusters[:, 1]

    iou_ = intersection / (box_area + cluster_area - intersection)

    return iou_


def avg_iou(boxes, clusters):
    """
    Calculates the average Intersection over Union (IoU) between a numpy array of boxes and k clusters.
    :param boxes: numpy array of shape (r, 2), where r is the number of rows
    :param clusters: numpy array of shape (k, 2) where k is the number of clusters
    :return: average IoU as a single float
    """
    return np.mean([np.max(iou(boxes[i], clusters)) for i in range(boxes.shape[0])])


def translate_boxes(boxes):
    """
    Translates all the boxes to the origin.
    :param boxes: numpy array of shape (r, 4)
    :return: numpy array of shape (r, 2)
    """
    new_boxes = boxes.copy()
    for row in range(new_boxes.shape[0]):
        new_boxes[row][2] = np.abs(new_boxes[row][2] - new_boxes[row][0])
        new_boxes[row][3] = np.abs(new_boxes[row][3] - new_boxes[row][1])
    return np.delete(new_boxes, [0, 1], axis=1)


def kmeans(boxes, k, dist=np.median):
    """
    Calculates k-means clustering with the Intersection over Union (IoU) metric.
    :param boxes: numpy array of shape (r, 2), where r is the number of rows
    :param k: number of clusters
    :param dist: distance function
    :return: numpy array of shape (k, 2)
    """
    rows = boxes.shape[0]

    distances = np.empty((rows, k))
    last_clusters = np.zeros((rows,))

    np.random.seed()

    # the Forgy method will fail if the whole array contains the same rows
    clusters = boxes[np.random.choice(rows, k, replace=False)]

    while True:
        for row in range(rows):
            distances[row] = 1 - iou(boxes[row], clusters)

        nearest_clusters = np.argmin(distances, axis=1)

        if (last_clusters == nearest_clusters).all():
            break

        for cluster in range(k):
            clusters[cluster] = dist(boxes[nearest_clusters == cluster], axis=0)

        last_clusters = nearest_clusters

    return clusters
########################### examples.py
import glob
import xml.etree.ElementTree as ET

import numpy as np

from kmeans import kmeans, avg_iou

ANNOTATIONS_PATH = "Annotations"
CLUSTERS = 5

def load_dataset(path):
	dataset = []
	for xml_file in glob.glob("{}/*xml".format(path)):
		tree = ET.parse(xml_file)

		height = int(tree.findtext("./size/height"))
		width = int(tree.findtext("./size/width"))

		for obj in tree.iter("object"):
			xmin = int(obj.findtext("bndbox/xmin")) / width
			ymin = int(obj.findtext("bndbox/ymin")) / height
			xmax = int(obj.findtext("bndbox/xmax")) / width
			ymax = int(obj.findtext("bndbox/ymax")) / height

			dataset.append([xmax - xmin, ymax - ymin])

	return np.array(dataset)


data = load_dataset(ANNOTATIONS_PATH)
out = kmeans(data, k=CLUSTERS)
print("Accuracy: {:.2f}%".format(avg_iou(data, out) * 100))
print("Boxes:\n {}".format(out))

ratios = np.around(out[:, 0] / out[:, 1], decimals=2).tolist()
print("Ratios:\n {}".format(sorted(ratios)))

整体代码还是比较容易理解的。首先提取训练数据集的标注框长宽坐标（比例形式）；然后进行k-means聚类计算，每一轮都重新计算所有标注框和聚类中心的IoU；最后输出计算得到的\(k\)个聚类中心，也就是锚点长宽（比例形式）

使用了VOC 2007进行了测试，其实现结果如下：

############# k = 5
Accuracy: 60.42%
Boxes:
 [[0.1        0.16266667]
 [0.044      0.07733333]
 [0.184      0.33866667]
 [0.796      0.77333333]
 [0.38       0.528     ]]
Ratios:
 [0.54, 0.57, 0.61, 0.72, 1.03]
############# k = 9
Accuracy: 67.29%
Boxes:
 [[0.4        0.70570571]
 [0.51       0.37      ]
 [0.102      0.10149254]
 [0.06       0.18318318]
 [0.114      0.31466667]
 [0.228      0.46246246]
 [0.824      0.7957958 ]
 [0.038      0.068     ]
 [0.2        0.19466667]]
Ratios:
 [0.33, 0.36, 0.49, 0.56, 0.57, 1.0, 1.03, 1.04, 1.38]

其Average IoU和论文结果一致：

大海

[YOLO][k-means聚类]寻找锚点

k-means聚类

锚点计算

数据

改进

属性

实现

相关阅读