How to write a simple 3D reconstruction system
To start with the definition, reconstruction is a 3D model of an object developed on the basis of long-range data processing. Reconstruction can be performed using a wide variety of principles: stereometric, stereophotometric, volume removal, or motion data.
This work is a guide that explains how to develop a simple application to reconstruct an object shape using GPU, i.e. a video card.
Among the principles mentioned above, we have chosen the volume removal algorithm, suggested by Brian Cureless and Mark Levoy in their article titled “A Volumetric Method for Building Complex Models from Range Images”.
The figure below clarifies the basic principle of the algorithm. The object to be reconstructed from a set of images is shown on the left. When processing an image, the algorithm removes the 3D-points located in the front of the object (the authors applied the structured light technique for the depth mapping), and beyond it. The result of the first photo processing is shown in the center. Using the data obtained from the 2nd camera, the program deletes additional 3D-points. The more angles are used, the more extra 3D-points are removed; in the end, only the points belonging to the object remain.
In the application, we implement a simplified version of the algorithm that only deletes the points located beyond the object contour in the images. Following the original article, we divide the entire space into a set of cubic elements (voxels).
To determine whether a voxel belongs to the 3D-object, we apply the GPU rendering and match the obtained projection with the object silhouette.
To get the projection, the following function is used:
To explain in more detail,
pixelBuffer->makeCurrent () — switches the drawing contents into the off-screen QGLPixelBuffer buffer.
When initializing the output buffer, clipping, depth testing, and mixing are disabled, since the only goal is to determine the voxel spatial position relative to the object.
After switching the contents in HComparator::render, the output buffer is cleared and the projection parameters are set.
To render a voxel, the glCallList(voxelList) function is called to execute a pre-formed list of commands. The initialization function is:
After drawing, the voxel spatial position relative to the object is determined using the HComparator::compareData function.
The compareData function copies the buffer contents and compares it with the object silhouette based on the three possible options (see the figure below):
a) the voxel is entirely located within the object (code 1);
b) the voxel belongs to the border (code 2);
c) the voxel is entirely located beyond the object (code 0).
The set of angles used to develop the 3D-model is processed sequentially by the HReconstruction::process function. We start from the assumption that each of the voxels belongs to the object. If a voxel location is determined beyond the object for one of the angles, its processing stops and it is removed from the model. The whole processing is carried out until all of the angles are considered. In the end, only the voxels belonging to the object model remain.
To match the voxel and the object silhouette, the projection parameters should be known. They are defined by the GL_PROJECTION and GL_MODELVIEW matrices (see the setGL function).
The GL_PROJECTION matrix is defined by the camera parameters, specifically, the focal length and the image size (the HFrame::loadIntrisicParameters function).
The camera 3D-position can be determined using the augmented reality marker, we take it from the aruco library. Marker is a special image to be printed on a piece of paper (see the picture below).
When shooting an object, the marker must be kept motionless and get into the camera field of view in each of the object photos.
The library detects the marker control points, and then, using the camera focal length, calculates the marker 3D-position (rvec and tvec).
The rvec and tvec parameters determine the GL_MODELVIEW matrix (see the HFrame::loadExtrisicParameters function).
Thus, we have learned how to project voxels on the image plane, determine the voxel position relative to the object image, calculate the projection parameters, and determine whether the voxel belongs to the object volume, through processing the data from several cameras; that is a simplified but complete technique for reconstructing 3D-objects.
The source code can be downloaded here.