LiDAR Annotation and Its Application to Autonomous Driving
Input data is critical to the decision-making process of autonomous vehicles. A vehicle’s decision-making would be better (and, more importantly, safer) if the data were more detailed. Although today’s cameras can produce highly detailed representations of the world, their output is only in 2D, limiting the amount of information they can feed the vehicles’ neural networks, making them learn some assumptions about the 3D environment. However, cameras can only capture so much information at one time. This is where LiDAR annotation gets into the scene when it’s about enabling automated functions in driverless cars.
When it comes to autonomous driving, LiDAR systems are more valuable than cameras since rain may render images useless. However, LiDAR still captures information despite the rain. There is a possibility that cameras might not work in every environment or under every circumstance. With autonomous vehicles being a high-impact and risky application of neural networks, we need to protect them as thoroughly and as robustly as possible, and that all begins with the data. Since the network needs to make predictions about a 3D world, we would like 3D data as an input. It is here that LiDAR comes into play.
Our goal in this blog post is to provide you with a thorough introduction to LiDAR technology and its networks in an easy-to-understand manner. Following are the key concepts you will understand by the end:
- Understanding the Basics of LiDAR
- Use of LiDAR Data in Deep Learning
- LiDAR Data Annotation Process
Understanding the Basics of LiDAR
With LiDAR, distances and dimensions between a sensor and a target object are measured by light pulses in the form of lasers. As autonomous vehicles become more popular, LiDAR sensors are being used to detect and pinpoint objects such as other cars, pedestrians, and buildings. With the advent of GPS, LiDAR technology began being used in the 1980s to create 3D models of real-life locations, making it much more popular. LiDAR has been used since the 1960s to scan the terrain airplanes fly over.
Often, modern LiDAR systems are capable of sending 500k pulses per second. A point cloud is developed from the measurements resulting from these pulses, which are actual coordinates corresponding to objects detected by the system. 3D models of areas around the LiDAR are created using point cloud labeling, which is done to optimize the data for self-driving vehicles’ computer vision systems.
Four key elements define most LiDAR systems:
- Laser: sends light pulses towards objects (ultraviolet or infrared).
- Scanner: It determines the speed at which the laser can scan target objects and the maximum distance it can reach.
- Sensor: Measures the time between the laser and the target object (thus measuring distance).
- GPS: Measures distances accurately by tracking the location of the LiDAR system.
A LiDAR system can either be airborne or terrestrial. We will primarily focus on terrestrial LiDAR since our use case involves autonomous vehicles. A terrestrial LiDAR scans the ground from all directions and is attached to an object fixed to the ground. In addition to being static (for example, mounted on a tripod or a building), they can also be mobile (for example, mounted on a car).
Use of LiDAR Data in Deep Learning
LiDAR systems generate the type of output that neural networks are well suited for, and indeed, neural networks are effective when operating on point clouds. In terms of autonomous vehicle applications, LiDAR point clouds can be categorized into two groups:
- Perception and analysis of the environment in real-time for the detection of objects and the understanding of scenes.
- Map generation and urban model generation for object referencing and localization.
We’re using LiDAR data to segment objects semantically, to detect and localize objects, and to classify objects, but we’re doing it in 3D, which lets us do it with more nuance.
Four families of architectures proposed to deal with LiDAR data are as follows:Point cloud-based methods
They use different approaches to operate on point clouds directly. The MLP method can be used directly to learn the spatial features of each point and accumulate them by using the maximum pooling algorithm.Voxel-based methods
A CNN-like architecture is used to apply 3D convolution and pooling to voxels (basically cubes) in the 3D data.Graph-based methods
To accomplish this, point clouds are constructed into graphs using the inherent geometry, and then GNN architectures such as graph CNNs and graph attention networks are applied (which are also permutation invariant).View-based methods
This relies on creating a 2D projection of the point clouds using the tried and tested architectures from 2D computer vision. In this case, a tactic that can help improve model performance is to set up multiple projections from various angles and conclude the final prediction.
LiDAR Data and Challenges for Neural Network Operation
As neural networks operate on LiDAR data, they face a lot of challenges because there are so many variables based on scanning times, monitoring weather conditions, detecting sensor types, measuring distances, and many other factors. Due to the way LiDAR functions, the density and intensity of target objects experience a bundle of variations.
A neural network working with LiDAR data needs to be able to handle a lot of variation, which is often due to noise in sensors and incomplete LiDAR data (due to factors like the low surface reflection of certain materials and disordered backgrounds in cities).
There is additionally a problem with 3D data: unlike 2D images, LiDAR points are not intuitively ordered, so our model must be permutation- and orientation-invariant, which is not achieved by all architectures.
LiDAR Data Annotation Process
LiDAR data is currently being segmented semantically, objects are detected, and classifications are performed most frequently. Annotating LiDAR images for these tasks is very similar to annotating images in general. As much as possible, LiDAR data is annotated by humans, but companies are using pre-trained networks to automate the process because of its complexity and potential confusion.
Annotating 3D data may seem cumbersome due to its 3D nature since we’re dealing with 3D data. The machine does not necessarily work like this; semantic segmentation and object classification in 3D are basically the same as in 2D, except that there are fewer pixels in 3D. 3D object detection only requires annotating its orientation, which is simply the direction the object faces, in addition to its location, simply because of its location.
Annotating LiDAR data isn’t really a challenge because of its 3D nature, as one can see in the diagram. Generally, LiDAR data takes a little bit longer to annotate than regular images, especially if you’re not used to it.
In simple terms, LiDAR constructs a 3D image of the sensor’s surroundings using laser pulses and sensors. For autonomous vehicles, LiDAR data is typically combined with neural networks, which dates back to the 1960s. In some cases, LiDAR data has been processed using common neural architectures, although they have been tweaked through point cloud labeling to meet the application’s needs. Despite the differences in data format between LiDAR point clouds and 2D images,LiDAR annotation is significantly different and key to autonomous vehicles’ functionality.