cv实验4-StereoVision(立体视觉)

基本思想

通过一些手段,采集对象的三维信息。

由于单张拍摄图像为三维世界的二维投影,理论上无法恢复出原始深度信息。所以使用光度测量、平面扫描等方式,以在特定的条件下获得多张图片,以恢复原始的三维结构。

此外,也有更加主动的立体重建方式,例如结构光扫描、激光扫描等。常见于精度、速度要求更高的场景。

基本的立体图算法

平移相机拍摄得到两张不同的图片,在极线(平移方向)上对每个像素点进行SSD匹配,根据视差关系,用基线和焦距算出深度。但是图片中的平滑区域会使该算法失效。
一种改进:单点匹配会有错误,使用窗口进行匹配。
更好的改进:使用能量最小化,对深度估计增加相邻像素的平滑约束。 这个能量方程可由动态规划求解。

光度测量立体视觉

光度测量是以固定的摄像机,拍摄在不同角度光照下的物体,根据物体表面的反射强度确定表面法线方向。要求每次拍摄时光源方向和强度已知。确定法线方向后就可以进行三维重建。

单一粗糙表面的理想物体,在光照条件下,入射光的反射可视为漫反射,可由朗伯 (Lambertian)方程描述:
$$I=k_d N \cdot L$$
其中k_d表示该点的反照率(albedo),N为该点的表面法线方向(normal),L为光线入射方向,I为观察到的该点图像强度。
给定若干光线入射方向L及所观察到的图像强度I,就可以求解出每个点的反照率k_d和法线方向N。此时上述方程构成线性方程组,需要解出未知量 \(G = k_d N\) 。有三个光源时,则
$$G=L^{-1} I, \\ k_d=\|G\| , N=G/k_d $$

有更多光源下的数据时,使用最小二乘法,则
$$L^T I =L^T LG, \\G=(L^T L)^{-1} (L^T I)$$

代码实现中,对于多通道图像,如RGB图,每个像素处每个通道有自己的反照率,但是法线方向对所有通道是相同的。

  1. def compute_photometric_stereo_impl(lights, images):
  2.     """
  3.     Given a set of images taken from the same viewpoint and a corresponding set
  4.     of directions for light sources, this function computes the albedo and
  5.     normal map of a Lambertian scene.
  6.  
  7.     If the computed albedo for a pixel has an L2 norm less than 1e-7, then set
  8.     the albedo to black and set the normal to the 0 vector.
  9.  
  10.     Normals should be unit vectors.
  11.  
  12.     Input:
  13.         lights -- N x 3 array.  Rows are normalized and are to be interpreted
  14.                   as lighting directions.
  15.         images -- list of N images.  Each image is of the same scene from the
  16.                   same viewpoint, but under the lighting condition specified in
  17.                   lights.
  18.     Output:
  19.         albedo -- float32 height x width x 3 image with dimensions matching the
  20.                   input images.
  21.         normals -- float32 height x width x 3 image with dimensions matching
  22.                    the input images.
  23.     """
  24.     #raise NotImplementedError()
  25.     h,w,chan=images[0].shape[0], images[0].shape[1], images[0].shape[2]
  26.     Mat=np.linalg.inv(lights.T @ lights) @ lights.T
  27.     imag1=np.stack(images,axis=3)
  28.     albedo = np.zeros((h,w,chan))
  29.     normals = np.zeros((h,w,3))
  30.     for i in range(h):
  31.         for j in range(w):
  32.             for c in range(chan):
  33.                 G=Mat@imag1[i,j,c]
  34.                 norm=np.linalg.norm(G)
  35.                 albedo[i,j,c]=norm
  36.             G=Mat@np.mean(imag1[i,j],axis=0)
  37.             norm=np.linalg.norm(G)
  38.             if (norm>1e-7):
  39.                 normals[i,j]=G/norm
  40.     return albedo, normals

延伸

光度测量适用性有限,我们需要预知光源具体位置。一个技巧是在场景中放置一个高光反射球得到光源位置。这种方式还忽略了场景和对象中的环境光、镜面反射、次表面反射、阴影等,具有很大的局限性。

改进:彩色照明光度立体视觉 Video Normals from Colored Lights Gabriel J. Brostow , Carlos Hernández, George Vogiatzis , Björn Stenger , Roberto Cipolla IEEE TPAMI, Vol. 33, No. 10, pages 2104 2114, October 2011.

多视图立体视觉

多基线立体视图

多基线立体视图通过多个不同位置摄像机图像的视差计算对应点的深度。根据三角关系,视差近似与基线大小和焦距成正比,与深度成反比。
$$disparity=x-x’=baseline*f/z$$

基线大小会影响效果:若相机之间过于接近,则会导致深度估计误差大。而过远时,搜索范围大,时间复杂度高。

基本方法:选择参考视图,将双视图的一致性评分SSD改为SSSD:对每对立体图像计算SSD求和。
局限性:只能给出深度图,且不适用于广泛分布的视图。

平面扫描立体视觉

极线几何:双视图几何

立体图基本算法中,参考视图一个点,会出现在其他视图的一条极线上。
我们知道,射影几何中,齐次点和极线一一对应。
这里使用3×3基本矩阵F描述极线几何,F将图像1中的齐次点映射到图像2的极线,图像2中的极线是Fp。图像2中对应点p和q之间满足极线约束\(q^T F p=0\)。

极线上有一个特殊点e,是另一个摄像机在这个摄像机上的投影,称为极点。所有的极线通过极点。

我们通过标定过的相机推导矩阵F:令K1为相机1内参数,K2为相机2内参数,R为相机2相对与相机1的单应变换,\(t \times \tilde{q}\)为极面。则
$$\tilde{p}=K_1^{-1}p, \tilde{q}=K_2^{-1}q, \\(R^T \tilde{q})^T (t \times \tilde{q})=0$$
与t做叉积可表示为矩阵乘。于是
$$ \tilde{q}^T R [t]_\times \tilde{p}=0 $$
称E=R[t]为本质矩阵。对于未标定的点,则方程为
$$q^T K_2^T R [t]_\times K_1^{-1} p = 0 $$
此时得到变换的基本矩阵F。

根据基本矩阵F,可对立体图进行校正,使极线平行,以便进行下一步的扫描。
若不知道F,只要有足够多的点,也可使用8点算法估计。它是用类似求解单应变换的奇异值分解方法出F。

平面扫描

通过枚举不同深度平面的方式进行三维扫描。对于每一个深度平面,可得到相应单应的变换矩阵H (在基本矩阵F中减去平移项得到)。于是只要对k幅图片进行图像卷绕,再计算出相应深度下相似度总和,以确定每个像素的最佳深度。
$$\Pi_m=[ n_m^T \quad -d_m ] \\ H_{\Pi_m, P_k}=K_k (R_k – \vec{t}_k \vec{n}_m^T /d_m) K_ref^{-1}$$

ZNCC相似度计算

这里使用ZNCC(Zero-mean Normalized Cross Correlation)代替SSE来计算图像相似度,以减小光度差异造成的影响。ZNCC计算两个向量标准化后的内积。实际上概率论中的皮尔逊相关系数是完全类似的,而协方差可理解为两个概率向量的内积。

代码

preprocess_ncc函数得到一幅图像像素点周围窗口的归一化向量。这里求平均时,每个通道单独做。归一化时(即做除法时),对所有通道一起做 。

  1. def project_impl(K, Rt, points):
  2.     """
  3.     Project 3D points into a calibrated camera.
  4.     Input:
  5.         K -- camera intrinsics calibration matrix
  6.         Rt -- 3 x 4 camera extrinsics calibration matrix
  7.         points -- height x width x 3 array of 3D points
  8.     Output:
  9.         projections -- height x width x 2 array of 2D projections
  10.     """
  11.     #raise NotImplementedError()
  12.     m1,m2 = np.eye(4),np.eye(4)
  13.     m1[:3,:3]=K
  14.     m2[:3,:]=Rt
  15.     projections=np.zeros(points.shape[:2]+(2,))
  16.     H=m1 @ m2
  17.     p1=np.ones([4])
  18.     for i in range(points.shape[0]):
  19.         for  j in range(points.shape[1]):
  20.             p1[:3]=points[i,j,:]
  21.             vec=H@p1
  22.             projections[i,j,:]=(vec/vec[2])[:2]
  23.     return projections
  24.  
  25.  
  26. def preprocess_ncc_impl(image, ncc_size):
  27.     """
  28.     Prepare normalized patch vectors according to normalized cross
  29.     correlation.
  30.  
  31.     This is a preprocessing step for the NCC pipeline.  It is expected that
  32.     'preprocess_ncc' is called on every input image to preprocess the NCC
  33.     vectors and then 'compute_ncc' is called to compute the dot product
  34.     between these vectors in two images.
  35.  
  36.     NCC preprocessing has two steps.
  37.     (1) Compute and subtract the mean.
  38.     (2) Normalize the vector.
  39.  
  40.     The mean is per channel.  i.e. For an RGB image, over the ncc_size**2
  41.     patch, compute the R, G, and B means separately.  The normalization
  42.     is over all channels.  i.e. For an RGB image, after subtracting out the
  43.     RGB mean, compute the norm over the entire (ncc_size**2 * channels)
  44.     vector and divide.
  45.  
  46.     If the norm of the vector is < 1e-6, then set the entire vector for that
  47.     patch to zero.
  48.  
  49.     Patches that extend past the boundary of the input image at all should be
  50.     considered zero.  Their entire vector should be set to 0.
  51.  
  52.     Patches are to be flattened into vectors with the default numpy row
  53.     major order.  For example, given the following
  54.     2 (height) x 2 (width) x 2 (channels) patch, here is how the output
  55.     vector should be arranged.
  56.  
  57.     channel1         channel2
  58.     +------+------+  +------+------+ height
  59.     | x111 | x121 |  | x112 | x122 |  |
  60.     +------+------+  +------+------+  |
  61.     | x211 | x221 |  | x212 | x222 |  |
  62.     +------+------+  +------+------+  v
  63.     width ------->
  64.  
  65.     v = [ x111, x121, x211, x221, x112, x122, x212, x222 ]
  66.  
  67.     see order argument in np.reshape
  68.  
  69.     Input:
  70.         image -- height x width x channels image of type float32
  71.         ncc_size -- integer width and height of NCC patch region.
  72.     Output:
  73.         normalized -- heigth x width x (channels * ncc_size**2) array
  74.     """
  75.     #raise NotImplementedError()
  76.     h, w, c = image.shape
  77.     normalized = np.zeros((h, w, c, ncc_size**2))
  78.     ncc_2 = ncc_size//2
  79.     # enum position in window, and save all to normalized[][] parallel
  80.     for i in range(ncc_size):
  81.         for j in range(ncc_size):
  82.             normalized[ncc_2: h-ncc_2, ncc_2: w-ncc_2, :, i*ncc_size+j] = \
  83.                 image[i: i + h + 1 - ncc_size, j: j + w + 1 - ncc_size, :]
  84.  
  85.     mean = np.mean(normalized, axis=3).reshape((h, w, c, 1))
  86.     normalized = (normalized - mean).reshape((h, w, c * ncc_size**2))
  87.     norm = np.linalg.norm(normalized, axis=2).reshape((h, w, 1))
  88.     return np.divide(normalized, norm, where=norm >= 1e-6)
  89.  
  90.  
  91. def compute_ncc_impl(image1, image2):
  92.     """
  93.     Compute normalized cross correlation between two images that already have
  94.     normalized vectors computed for each pixel with preprocess_ncc.
  95.  
  96.     Input:
  97.         image1 -- height x width x (channels * ncc_size**2) array
  98.         image2 -- height x width x (channels * ncc_size**2) array
  99.     Output:
  100.         ncc -- height x width normalized cross correlation between image1 and
  101.                image2.
  102.     """
  103.     return np.sum(image1*image2, axis=2)

泊松方程重建深度图

在光度测量方法中,我们得到了图像各个位置的法线。想要重建出三维模型,就是要得到一个合适的二维标量场,使之梯度符合之前得到的法线。

数学上,与之类似的常见问题是求解泊松方程\(-\Delta\varphi=f\),拉普拉斯算子的含义是只知道每一点梯度的散度,求解这个问题需要格林函数积分或变分法。
但这里我们已经有了梯度方向,可以线性列方程组求解。令未知数为深度,可列与周围点的关系为方程。

代码

实际求解时,使用最小二乘法。由于方程组为规模很大的稀疏矩阵,这里只填写矩阵的参数,后面就交给数学库计算。

如果有平面扫描得到的深度数据,也可以插入到矩阵中,最小二乘法会兼顾两者。

  1. def form_poisson_equation_impl(height, width, alpha, normals, depth_weight, depth):
  2.     """
  3.     Creates a Poisson equation given the normals and depth at every pixel in image.
  4.     The solution to Poisson equation is the estimated depth. 
  5.     When the mode, is 'depth' in 'combine.py', the equation should return the actual depth.
  6.     When it is 'normals', the equation should integrate the normals to estimate depth.
  7.     When it is 'both', the equation should weight the contribution from normals and actual depth,
  8.     using  parameter 'depth_weight'.
  9.  
  10.     Input:
  11.         height -- height of input depth,normal array
  12.         width -- width of input depth,normal array
  13.         alpha -- stores alpha value of at each pixel of image. 
  14.             If alpha = 0, then the pixel normal/depth should not be 
  15.             taken into consideration for depth estimation
  16.         normals -- stores the normals(nx,ny,nz) at each pixel of image
  17.             None if mode is 'depth' in combine.py
  18.         depth_weight -- parameter to tradeoff between normals and depth when estimation mode is 'both'
  19.             High weight to normals mean low depth_weight.
  20.             Giving high weightage to normals will result in smoother surface, but surface may be very different from
  21.             what the input depthmap shows.
  22.         depth -- stores the depth at each pixel of image
  23.             None if mode is 'normals' in combine.py
  24.     Output:
  25.         constants for equation of type Ax = b
  26.         A -- left-hand side coefficient of the Poisson equation 
  27.             note that A can be a very large but sparse matrix so csr_matrix is used to represent it.
  28.         b -- right-hand side constant of the the Poisson equation
  29.     """
  30.  
  31.     assert alpha.shape == (height, width)
  32.     assert normals is None or normals.shape == (height, width, 3)
  33.     assert depth is None or depth.shape == (height, width)
  34.  
  35.     '''
  36.     Since A matrix is sparse, instead of filling matrix, we assign values to a non-zero elements only.
  37.     For each non-zero element in matrix A, if A[i,j] = v, there should be some index k such that, 
  38.         row_ind[k] = i
  39.         col_ind[k] = j
  40.         data_arr[k] = v
  41.     Fill these values accordingly
  42.     '''
  43.     row_ind = []
  44.     col_ind = []
  45.     data_arr = []
  46.     '''
  47.     For each row in the system of equation fill the appropriate value for vector b in that row
  48.     '''
  49.     b = []
  50.     if depth_weight is None:
  51.         depth_weight = 1
  52.  
  53.     '''
  54.     TODO
  55.     Create a system of linear equation to estimate depth using normals and crude depth Ax = b
  56.  
  57.     x is a vector of depths at each pixel in the image and will have shape (height*width)
  58.  
  59.     If mode is 'depth':
  60.         > Each row in A and b corresponds to an equation at a single pixel
  61.         > For each pixel k, 
  62.             if pixel k has alpha value zero do not add any new equation.
  63.             else, fill row in b with depth_weight*depth[k] and fill column k of the corresponding
  64.                 row in A with depth_weight.
  65.  
  66.         Justification: 
  67.             Since all the elements except k in a row is zero, this reduces to 
  68.                 depth_weight*x[k] = depth_weight*depth[k]
  69.             you may see that, solving this will give x with values exactly same as the depths, 
  70.             at pixels where alpha is non-zero, then why do we need 'depth_weight' in A and b?
  71.             The answer to this will become clear when this will be reused in 'both' mode
  72.  
  73.     Note: The normals in image are +ve when they are along an +x,+y,-z axes, if seen from camera's viewpoint.
  74.     If mode is 'normals':
  75.         > Each row in A and b corresponds to an equation of relationship between adjacent pixels
  76.         > For each pixel k and its immideate neighbour along x-axis l
  77.             if any of the pixel k or pixel l has alpha value zero do not add any new equation.
  78.             else, fill row in b with nx[k] (nx is x-component of normal), fill column k of the corresponding
  79.                 row in A with -nz[k] and column k+1 with value nz[k]
  80.         > Repeat the above along the y-axis as well, except nx[k] should be -ny[k].
  81.  
  82.         Justification: Assuming the depth to be smooth and almost planar within one pixel width.
  83.         The normal projected in xz-plane at pixel k is perpendicular to tangent of surface in xz-plane.
  84.         In other word if n = (nx,ny,-nz), its projection in xz-plane is (nx,nz) and if tangent t = (tx,0,tz),
  85.             then n.t = 0, therefore nx/-nz = -tz/tx
  86.         Therefore the depth change with change of one pixel width along x axis should be proporational to tz/tx = -nx/nz
  87.         In other words (depth[k+1]-depth[k])*nz[k] = nx[k]
  88.         This is exactly what the equation above represents.
  89.         The negative sign in ny[k] is because the indexing along the y-axis is opposite of +y direction.
  90.  
  91.     If mode is 'both':
  92.         > Do both of the above steps.
  93.  
  94.         Justification: The depth will provide a crude estimate of the actual depth. The normals do the smoothing of depth map
  95.         This is why 'depth_weight' was used above in 'depth' mode. 
  96.             If the 'depth_weight' is very large, we are going to give preference to input depth map.
  97.             If the 'depth_weight' is close to zero, we are going to give preference normals.
  98.     '''
  99.     #TODO Block Begin
  100.     #fill row_ind,col_ind,data_arr,b
  101.  
  102.     row = 0
  103.     if normals is not None:
  104.         for i in range(height-1):
  105.             for j in range(width-1):
  106.                 if alpha[i, j] and alpha[i+1, j] and alpha[i, j+1]:
  107.                     row_ind.append(row)
  108.                     col_ind.append(i*width+j)
  109.                     data_arr.append(-normals[i, j, 2])
  110.                     row_ind.append(row)
  111.                     col_ind.append(i*width+j+1)
  112.                     data_arr.append(normals[i, j, 2])
  113.                     b.append(normals[i, j, 0])
  114.                     row += 1
  115.  
  116.                     row_ind.append(row)
  117.                     col_ind.append(i*width+j)
  118.                     data_arr.append(-normals[i, j, 2])
  119.                     row_ind.append(row)
  120.                     col_ind.append((i+1)*width+j)
  121.                     data_arr.append(normals[i, j, 2])
  122.                     b.append(-normals[i, j, 1])
  123.                     row += 1
  124.  
  125.     if depth is not None:
  126.         for i in range(height):
  127.             for j in range(width):
  128.                 if alpha[i, j]:
  129.                     row_ind.append(row)
  130.                     col_ind.append(i*width+j)
  131.                     data_arr.append(depth_weight)
  132.                     b.append(depth_weight*depth[i, j])
  133.                     row += 1
  134.  
  135.     #TODO Block end
  136.     # Convert all the lists to numpy array
  137.     row_ind = np.array(row_ind, dtype=np.int32)
  138.     col_ind = np.array(col_ind, dtype=np.int32)
  139.     data_arr = np.array(data_arr, dtype=np.float32)
  140.     b = np.array(b, dtype=np.float32)
  141.  
  142.     # Create a compressed sparse matrix from indices and values
  143.     A = csr_matrix((data_arr, (row_ind, col_ind)), shape=(row, width * height))
  144.  
  145.     return A, b

实验效果

发表评论

您的电子邮箱地址不会被公开。 必填项已用*标注