You are here: Home Institut Pressemitteilungen What Does the Future Look Like? Self-supervised 3D Point Cloud Prediction
Date: Apr 25, 2022

What Does the Future Look Like? Self-supervised 3D Point Cloud Prediction IGG-Blogpost Series | Working Group Photogrammetry

— filed under: ,

Most autonomous cars use 3D laser scanners, so-called LiDARs, to perceive the 3D world around them. A LiDAR generates local 3D point clouds of the scene around the car. A typical LiDAR sensor generates around 10 of such point clouds per second. These 3D point clouds are widely used for numerous robotics and autonomous driving tasks, like localization, object detection, obstacle avoidance, mapping, scene interpretation, and trajectory prediction


"For a lot of tasks, it would be great to know what the future might look like."

The ability to forecast what the sensor is likely to see in the future can enhance decision-making for an autonomous vehicle. A promising application is to use the predicted point clouds for path planning tasks like collision avoidance. In contrast to approaches that predict, for example, future 3D bounding boxes of traffic agents, point cloud prediction does not need any preceding inference steps such as localization, detection, or tracking to predict a future scene. Running an off-the-shelf detection and tracking system on the predicted point clouds yields future 3D object bounding boxes as demonstrated for point cloud forecasting last year by different researchers (Weng et al. at CoRL’20; Lu et al. via arXiv). From a machine learning perspective, point cloud prediction is an interesting problem since the next incoming LiDAR scans always give the ground truth data. This property offers the potential to train point cloud prediction in a self-supervised way without the need for expensive labeling and also evaluate its performance online, only with a small time delay in unknown environment.


Given a sequence

Given a sequence of past point clouds [red] at a time T, the goal is to predict the F future scans [blue] (© Photo: IGG / Photogrammetry).


In our recent work presented at CoRL 2021 by Benedikt Mersch and for which source code is available, we address the problem of predicting large and unordered future point clouds from a given sequence of past scans. High dimensional and sparse 3D point cloud data render point cloud prediction a challenging problem that is not yet fully explored. A future point cloud can be estimated by applying a predicted future scene flow to the last received scan or generating a new set of future points. Mersch et al. focus on the generation of new point clouds to predict the future scene.

In contrast to existing approaches, which exploit recurrent neural networks for modeling temporal correspondences, we use 3D convolutions to jointly encode spatial and temporal information. Our proposed approach takes a new 3D representation based on concatenated range images as input. It jointly estimates a future range image and per-point scores for being a valid or an invalid point for multiple future time steps. The method can obtain structural details of the environment by using skip connections and horizontal consistency using circular padding and provides more accurate predictions than other state-of-the-art approaches for point cloud prediction.


Current point cloud

Current point cloud at time T (top right) and the predicted next 5 future point clouds. Ground truth points at the corresponding time step are shown in red and predicted points in blue (© Photo: IGG / Photogrammetry).


This approach allows for predicting detailed future point clouds of varying sizes with a reduced number of parameters to optimize resulting in faster training and inference times. Furthermore, the approach is also fully self-supervised and does not require any manual labeling of the data. In sum, the approach can predict a sequence of detailed future 3D point clouds from a given input sequence by a fast joint spatio-temporal point cloud processing using temporal 3D convolutional networks, outperforms state-of-the-art point cloud prediction approaches, generalizes well to unseen environments, and operate online faster than a typical rotating 3D LiDAR sensor frame rate.


CNN-based approach

CNN-based approach behind the new approach on point cloud forecasting (© Photo: IGG / Photogrammetry).



Further reading:

Source Code:


Document Actions