径向畸变校正
由于相机镜头是将现实世界投影到平面上,可能会出现径向失真。这种失真有时是刻意的,比如会用广角镜头、鱼眼镜头。
这种失真会使图像中的矩形的边向外扩张或向内收缩,是非线性失真。由于下面的代码使用单应变换实现拼接,所以要先对这种非线性失真校正。
我们把待处理图像以欧拉角方式投影到半径为焦距的球面上,即以原来x轴为yaw角,y轴为phi角,这样相当于从原点来看过去还是原来的图案,但从平行投影来看图像已经进行了径向变形。
但是相机的径向畸变可能更加复杂。为了更精确地进行校正,使用参数k1,k2作为径向畸变函数的泰勒展开的系数进一步调整:
$$x_1=x_0(1+k_1 *r^2+k_2 *r^4), \ y_1=y_0(1+k_1*r^2+k_2*r^4)$$
r为径向(至图像中心)距离,k1, k2值由镜头本身决定。
def computeSphericalWarpMappings(dstShape, f, k1, k2): # calculate minimum y value vec = np.zeros(3) vec[0] = np.sin(0.0) * np.cos(0.0) vec[1] = np.sin(0.0) vec[2] = np.cos(0.0) * np.cos(0.0) min_y = vec[1] # calculate spherical coordinates # (x,y) is the spherical image coordinates. # (xf,yf) is the spherical coordinates, e.g., xf is the angle theta # and yf is the angle phi one = np.ones((dstShape[0],dstShape[1])) xf = one * np.arange(dstShape[1]) yf = one.T * np.arange(dstShape[0]) yf = yf.T xf = ((xf - 0.5 * dstShape[1]) / f) yf = ((yf - 0.5 * dstShape[0]) / f - min_y) # add code to apply the spherical correction, i.e., # compute the Euclidean coordinates, # and project the point to the z=1 plane at (xt/zt,yt/zt,1), # then distort with radial distortion coefficients k1 and k2 # Use xf, yf as input for your code block and compute xt, yt # as output for your code. They should all have the shape # (img_height, img_width) xt, yt, zt = np.zeros(dstShape), np.zeros(dstShape), np.zeros(dstShape) zt = np.cos(xf)*np.cos(yf) xt = np.sin(xf)*np.cos(yf) / zt yt = np.sin(yf) / zt r2 = xt**2+yt**2 xt = xt*(1 + k1*r2 + k2*r2**2) yt = yt*(1 + k1*r2 + k2*r2**2) # Convert back to regular pixel coordinates xn = 0.5 * dstShape[1] + xt * f yn = 0.5 * dstShape[0] + yt * f uvImg = np.dstack((xn,yn)) return uvImg
图像对齐
特征检测与匹配
使用之前的Harris角点检测等方式,并得到一些匹配。但这些匹配不可能是完全精确的。尤其是两张图不重合的部分的角点容易被错误匹配。
变换矩阵计算
有了一些点对之后,需要估计出最佳的变换矩阵。对于仿射变换,变换过程是线性的,只需要对这些点列出方程,使用最小二乘回归。这里特征点并不多,可直接用正则方程算出最佳拟合。
$$ \mathop{\arg\min}_{t} \ \left\|At-b\right\|_2 $$
$$ A^T At=A^T b, \quad t=(A^T A)^{-1} A^T b$$
对于平移变换,最小二乘回归的结果刚好是这些平移矢量的均值。
对于更一般的单应变换,情况会复杂一些,因为映射过程不再是线性的。
但是我们仍用这些点对列出相应的方程:
此时得到一个最优化问题$$ \mathop{\arg\min}_{t} \ \left\|Ah-0\right\|_2 \ \quad \mathbb{s.t.} \left\|h|\right\|=1$$。
设使用的匹配点数为d,则A是一个2*d行,9列的矩阵。此处对h的数乘不影响结果,所以限定h的模为1。数学上就是求这个齐次方程的非平凡最小二乘解。
根据特征值的几何意义,我们发现要求的h就是矩阵A最小特征值的特征向量。因为A不是方阵,所以使用奇异值分解,使用A最后一个右奇异向量。即\( [v,\lambda]eig(A_T A), \quad h=v_n \ (\lambda_n<=(\lambda_{1..n-1}))\)
代码:
def computeHomography(f1, f2, matches, A_out=None): ''' Input: f1 -- list of cv2.KeyPoint objects in the first image f2 -- list of cv2.KeyPoint objects in the second image matches -- list of cv2.DMatch objects DMatch.queryIdx: The index of the feature in the first image DMatch.trainIdx: The index of the feature in the second image DMatch.distance: The distance between the two features A_out -- ignore this parameter. If computeHomography is needed in other TODOs, call computeHomography(f1,f2,matches) Output: H -- 2D homography (3x3 matrix) Takes two lists of features, f1 and f2, and a list of feature matches, and estimates a homography from image 1 to image 2 from the matches. ''' num_matches = len(matches) # Dimensions of the A matrix in the homogenous linear # equation Ah = 0 num_rows = 2 * num_matches num_cols = 9 A_matrix_shape = (num_rows,num_cols) A = np.zeros(A_matrix_shape) for i in range(len(matches)): m = matches[i] (a_x, a_y) = f1[m.queryIdx].pt (b_x, b_y) = f2[m.trainIdx].pt #Fill in the matrix A in this loop. #Access elements using square brackets. e.g. A[0,0] A[2*i] = [a_x , a_y, 1, 0, 0, 0, -b_x * a_x, -b_x * a_y, -b_x] A[2*i+1] = [0, 0, 0, a_x, a_y, 1, -b_y * a_x, -b_y * a_y, -b_y] #s is a 1-D array of singular values sorted in descending order #U, Vt are unitary matrices #Rows of Vt are the eigenvectors of A^TA. #Columns of U are the eigenvectors of AA^T. U, s, Vt = np.linalg.svd(A) if A_out is not None: A_out[:] = A #Fill the homography H with the appropriate elements of the SVD H=Vt[8,:].reshape((3,3)) return H
RANSAC(随机抽样一致)
之前提到,匹配的点并不非常精确,一些离群点会对计算结果造成不必要的偏移。此外一些系统误差可能使匹配结果具有多重共线性,这对线性回归结果有严重影响。
为了避免这个问题,改用内群点(inliers)计数。对一条拟合直线,我们设定一个阈值。与线的距离在这个阈值内的点越多越好。从而计算最终的回归时,只使用内群点。
但是满足这个条件的线不好计算,使用RANSAC算法。
RANSAC:随机抽样一些匹配,并计算inliers,迭代一些轮次后取其中最好的。该算法的有效性基于一个假设:所有的inliers将在平移向量上达成一致,而少数离群点则互相矛盾。所以只要inliers数量大于50%,阈值取inliers期望的噪声大小(例如欧氏距离为某一方差的高斯分布),理论上RANSAC算法就能以指数速度收敛至正确结果。
注意抽样时用最少的点对即可,平移translate只需要一个点对确定,而同态homography需要4个。最终计算需要使用全部的内群点。
def alignPair(f1, f2, matches, m, nRANSAC, RANSACthresh): ''' Input: f1 -- list of cv2.KeyPoint objects in the first image f2 -- list of cv2.KeyPoint objects in the second image matches -- list of cv2.DMatch objects DMatch.queryIdx: The index of the feature in the first image DMatch.trainIdx: The index of the feature in the second image DMatch.distance: The distance between the two features m -- MotionModel (eTranslate, eHomography) nRANSAC -- number of RANSAC iterations RANSACthresh -- RANSAC distance threshold Output: M -- inter-image transformation matrix Repeat for nRANSAC iterations: Choose a minimal set of feature matches. Estimate the transformation implied by these matches count the number of inliers. For the transformation with the maximum number of inliers, compute the least squares motion estimate using the inliers, and return as a transformation matrix M. ''' ret=[] for i in range(nRANSAC): mat=np.random.choice(matches, 1 if m==eTranslate else 4) H=np.eye(3) if m==eTranslate: mat=mat[0] H[:2,2]=(np.array(f2[mat.trainIdx].pt)-f1[mat.queryIdx].pt) else: H=computeHomography(f1,f2,mat) inliers = getInliers(f1,f2,matches,H,RANSACthresh) if len(inliers)>len(ret): ret = inliers M = leastSquaresFit(f1,f2,matches,m,ret) return M def getInliers(f1, f2, matches, M, RANSACthresh): ''' Output: inlier_indices -- inlier match indices (indexes into 'matches') Transform the matched features in f1 by M. Store the match index of features in f1 for which the transformed feature is within Euclidean distance RANSACthresh of its match in f2. Return the array of the match indices of these features. ''' inlier_indices = [] for i in range(len(matches)): #Determine if the ith matched feature f1[id1], when transformed #by M, is within RANSACthresh of its match in f2. #If so, append i to inliers p1=(f1[matches[i].queryIdx].pt + (1,) ) @ M.T if np.linalg.norm(p1[:2]/p1[2]-f2[matches[i].trainIdx].pt) <= RANSACthresh: inlier_indices.append(i) return inlier_indices
图像混合
有了图像对应的变换后就可以进行混合了。为了使最终图像连城一体,需要尽可能减少拼接痕迹。
Alpha混合(羽化)
在图像边缘调整透明度是简单有效的混合方式。这里混合的是两张图的公共部分,所以简单的Alpha混合已经足够。
这里边界上的acc按blendwidth线性变化。计算最终图像时, 对叠于一个点的所有图像按acc输出的颜色值相加,除以这些图像的acc之和。 注意应消除变换填充的黑色部分。
需要注意这里图像卷绕和acc通道计算的实现方式。直接循环效率不佳,尽量使用库函数,可提高10~20倍速度。
def imageBoundingBox(img, M): """ This is a useful helper function that you might choose to implement that takes an image, and a transform, and computes the bounding box of the transformed image. """ h,w=img.shape[:2] corner=np.array([[0,0,1],[0,h-1,1],[w-1,0,1],[w-1,h-1,1]],dtype=np.float) @ M.T for i in range(4): corner[i]/=corner[i][2] corner=corner.T minX, maxX = np.min(corner[0]), np.max(corner[0]) minY, maxY = np.min(corner[1]), np.max(corner[1]) return int(minX), int(minY), int(maxX), int(maxY) def accumulateBlend(img, acc, M, blendWidth): """ INPUT: img: image to add to the accumulator acc: portion of the accumulated image where img should be added M: the transformation mapping the input image to the accumulator blendWidth: width of blending function. horizontal hat function OUTPUT: modify acc with weighted copy of img added where the first three channels of acc record the weighted sum of the pixel colors and the fourth channel of acc records a sum of the weights """ h,w=img.shape[:2] ha,wa=acc.shape[:2] ''' minX, minY, maxX, maxY=imageBoundingBox(img,M) M1=np.linalg.inv(M) #inverserd sample for i in range(minX, maxX): for j in range(minY, maxY): p=M1 @ np.array([i,j,1]) ox=p[0]/p[2] oy=p[1]/p[2] if ox<0 or ox>=w or oy<0 or oy>=h: continue #if (img[oy,ox]==(0,0,0)).all(): # continue weight=1.0 #[!] not on new image? oy? if ox>=minX and ox<minX+blendWidth: weight=(ox-minX)/blendWidth if ox<=maxX and ox>maxX-blendWidth: weight=(maxX-ox)/blendWidth acc[j,i,3]+=weight #[!] not nearest interpolation for k in range(3): acc[j,i,k]+=img[int(oy),int(ox),k]*weight ''' imgRGBA = np.ones((h, w, 4)) imgRGBA[:, :, :3] = img ty, tx = np.meshgrid(range(w), range(h)) blend_alpha = np.fmin( np.fmin((1+ty)/(blendWidth+1), (w-ty)/(blendWidth+1)), 1) blend_alpha*=np.sum(img,axis=2)>0 for i in range(4): imgRGBA[:, :, i] *= blend_alpha dst = cv2.warpPerspective(imgRGBA, M, (wa, ha), cv2.INTER_LINEAR) acc += dst def normalizeBlend(acc): """ INPUT: acc: input image whose alpha channel (4th channel) contains normalizing weight values OUTPUT: img: image with r,g,b values of acc normalized """ h,w=acc.shape[:2] img=np.zeros((h,w,3)) acc[:,:,3]+=(acc[:,:,3]==0) for i in range(3): img[:,:,i]=acc[:,:,i]/acc[:,:,3] img=np.uint8(img) return img
其他改进
由羽化拼接的图像边界相对平滑,但如果几张图的曝光不同,会有明显的明暗变化。需要调整图像曝光使被拼接的图像基本相近。
其他混合方式:高斯金字塔混合、泊松图像编辑可以在细节上做到更优。羽化拼接另外一个问题是图像中不希望出现的动态物体会产生“鬼影”,需要一些方式消除。