Paper

CVPR 2022에 제출된 차선 검출(Lane Detection) 관련 논문입니다.

Abstract

In this paper, we propose an advanced approach in targeting the problem of monocular 3D lane detection by leveraging geometry structure underneath the process of 2D to 3D lane reconstruction.

Inspired by previous methods, we first analyze the geometry heuristic between the 3D lane and its 2D representation on the ground and propose to impose explicit supervision based on the structure prior`, which makes it achievable to build inter-lane and intra-lane relationships to facilitate the reconstruction of 3D lanes from local to global.

Second, to reduce the structure loss in 2D lane representation, we directly extract top view lane information from front view images, which tremendously eases the confusion of distant lane features in previous methods.

Furthermore, we propose a novel task-specific data augmentation method by synthesizing new training data for both segmentation and reconstruction tasks in our pipeline, to counter the imbalanced data distribution of camera pose and ground slope to improve generalization on unseen data.

Our work marks the first attempt to employ the geometry prior information into DNN-based 3D lane detection and makes it achievable for detecting lanes in an extra-long dis- tance, doubling the original detection range. The proposed method can be smoothly adopted by other frameworks with- out extra costs. Experimental results show that our work outperforms state-of-the-art approaches by 3.8% F-Score on Apollo 3D synthetic dataset at real-time speed of 82 FPS without introducing extra parameters.

2D와 3D 공간 사이에 존재하는 기하학 기반 휴리스틱을 분석하고 적용하여, 3D 차선 구성을 용이하게 한다.
전면 이미지에서 차선 정보를 직접 추출하여, 먼 거리에 존재하는 차선 검출 정확도를 향상한다.
데이터를 합성하는 새로운 데이터 증강 방법을 적용하여, 카메라 자세와 지면 경사에 대한 일반화 성능을 향상한다.

Introduction

In this paper, we consider 3D lane detection as a reconstruction problem from the 2D image to the 3D space.

We propose that the geometry prior of 3D lanes should be explicitly imposed during the training process for fully utilizing the structural constraints of the inter-lane and intra-lane relationship and the height information of 3D lanes can be extracted from the 2D lane representation.

We first analyze the geometry relationship between 3D lane and its 2D representation, and propose an auxiliary loss function based on the geometry structure prior. We also demonstrate that the explicit geometry supervision would boost noise elimination, outlier rejection, and structure preservation for 3D lanes.

3D 높이 정보를 2D 이미지 차선 정보에서 추출할 수 있는 방법을 제안한다.
실제 차선이 2D 이미지에 표현되는 기하학적 관게를 분석하고, 보조 손실 함수를 적용하여 정확도를 향상한다.

Example) 3D 높이 정보를 추출하지 않는 경우
- 높이 정보가 없는 2D 차선 검출의 예시(오르막길 또는 내리막길과 같은 환경에서는 정확도가 떨어진다)

Second, in order to reduce the structural information loss in 2D plane, we redefine the pipeline by conducting lane segmentation with top view supervision instead of front view supervision, which addresses the issue of feature confusion due to the perspective distortion on the far side.

Lastly, we propose a novel task-specific data augmentation method on 3D lanes, which synthesizes new data by applying pitch, roll and yaw rotation on the original data.

This augmentation could generate new data with various 3D ground plane steepness and road structure patterns, which eases the imbalanced distribution of ground plane slope and camera pose. Figure 1 shows the proposed framework, and a more detailed pipeline is shown in Figure 2.

BEV 시점으로 차선 분할을 수행하는 파이프라인 구성
원본 데이터에 피치, 롤, 요 회전을 합성하는 새로운 데이터 증강 방법

Method

The proposed method takes a single RGB image from the front-view camera, and outputs a group of lane instances in the 3D world space. Following the basic assumption in the previous literature [12, 13] and existing dataset [13], we assume that the camera is installed with zero roll or yaw respect to the world coordinate, and only has pitch variance due to vehicle fluctuation.

We establish the world coordinate as the ego-vehicle coordinate with starting points as the perpendicular projection of camera center on the road. Figure 4 shows the world coordinate center at point O, and camera center at point C, with camera pitch θ.

Geometry in 3D Lane Detection

In most cases, lanes in 3D space are formed by a group of smooth parallel curves on the 3D road surface [7]. Instead of simply predicting discrete points of lane markings, an accurate reconstruction of 3D lanes should include the re-establishment of the reasonable geometry structure in 3D space.

However, most of the existing DNN-based methods choose to achieve 3D lane detection in a data-driven manner with only point-wise supervision, which may not result in robust preservation of 3D lane geometry and would be vulnerable to outliers under extreme lane structures because of the absence of structural guidance.

As a result, the geometry prior should be utilized to explicitly guide the learning of 3D lanes. We will first review the view projection systems proposed in previous literature and then analyze the structure prior under the existing projection system.

3D 공간의 차선은 도로 표면의 곡선 그룹에 의해 형성된다. 따라서 픽셀 단위의 예측이 아니라, 3D 공간에서 합리적인 구조를 기반한 예측이 필요하다.
기존의 딥러닝 방법들은 픽셀 단위의 데이터 기반 방식으로 예측하기 때문에, 극단적인 차선 구조에 대해 취약할 수 있다.

View Projection

In 3D-LaneNet [12], a real top view projection is utilized for creating lane anchors on the flat ground. Real top view stands for the direct vertical projection from 3D space to the ground plane $g$.

In this case, for a point $P_{3D}(x_{3D}, y_{3D}, z)$ in 3D space, the height dimension is simply discarded, and the point is projected onto the ground plane position of $P_{2DR}(x_{3D},y_{3D})$ under real top view projection with center at vehicle ego coordinate.

However, as shown in the top row of Figure 3, such 2D lane representation cannot properly reflect the change of lane height in 3D space, thus a heavy image feature encoder in [12] is necessary to estimate lane height from images.

Gen-LaneNet [13] then introduces the virtual top view projection. As shown in the bottom row of Figure 3, 3D lanes are projected onto the ground plane via virtual top view projection, which can be obtained from rays start from camera center $C$ to the ground $g$.

This is conceptually equivalent to 1) project the 3D lanes onto the image plane, and then 2) project image and lanes onto the ground plane by IPM. In this case, the 2D representation of lanes is no longer agnostic of height variance.

Equation 1 shows the transformation of lane point coordinates from point $P_{3D}$ in 3D to point $P_{2DV}$ on the ground plane via virtual top view projection. With the increment of lane height $z$, the $x$ and $y$ of lane points on ground plane $g$ would be projected away from the positions on real top view, causing the lane boundaries to be divergent in the uphill scenario.

(위) 오르막길, 내리막길과 같은 지형에서는 실제 투영 때문에, [12]에서 제안한것과 같은 Heavy Image Feature Encoder가 필요하다.
(아래) 가상의 평면 투영을 적용하여, 3D 차선은 가상 평면도 투영을 통해 지면에 투영된다. 이는 개념적으로 3D Lane → Image Plane → IPM과 동일하다.

3D 차선이 2D 가상 평면에 투영하는 방정식이다. 차선의 높이($z$)가 증가히면 지상 평면($g$)에 있는 차선 점($x$, $y$)이 실제 평면에서 멀리 투영되기 때문에 오르막 시나리오에서는 차선 경계가 발산한다.

Geometry prior

Parallel lane boundaries and constant lane width are basic assumptions for nearly all lane-based applications such as lane centering assistant (LCA) and lane keeping assistant (LKA). Instead of making a strong assumption that multiple lanes in one frame share a global lane width [17, 25], it is common to assume that a single lane would keep a relatively fixed width as it extends to infinity.

For flat ground cases, the 2D lane representation projected via the virtual top view would be parallel and have constant lane width. In reality, not-flat ground cases are not rare, such as twisted lanes on a helicoidal surface. In this case, parallelism of the projected lane boundaries is not satisfied. Thus, the basic assumption of parallelism and constant width [7] can only be established in 3D space.

Moreover, as shown in Equation 1, under virtual top view projection, lane boundaries in 3D space would be mapped onto the flat ground in different scales w.r.t. height information. Conceptually, the curvature of mapped lane boundaries is positively correlated to the lane slope in 3D space. That is to say, the width change in 2D projection could reflect the variance of lane height in 3D.

This provides basic theoretical intuition of how the height information in 3D can be reconstructed from a monocular 2D image, and the geometry structure in the 2D lane mask can be the guidance for estimating the 3D lanes in the real world.

차선은 일정한 너비를 공유한다는 것은 일반적인 ADAS 어플리케이션의 기본 가정이다. 이것은 단일 차선이 무한대로 확장될 때 상대적으로 고정된 너비를 유지한다고 가정하는 것이다.
평평한 지면의 경우 가상 평면도를 통해 투영된 2D 차선은 평행하고, 일정한 차선 너비를 가진다. 하지만 실제 차선은 나선형 표면의 꼬인 차선과 같이 평행하지 않은 차선이 존재한다. → 따라서 기본 가정은 3D 공간에서만 설정할 수 있다.
수식 1에서 볼 수 있듯이 가상 평면도 투영에서 3D 공간의 차선 경계는 서로 다른 축척으로 매핑된다. 개념적으로 매핑된 차선 경계의 곡률은 3D 공간의 차선 기울기와 양의 상관 관계가 존재한다. 따라서 2D 투영에서 너비 변화는 3D에서 차선 높이의 변화로 반영할 수 있다.

Geometry prior guided supervision

As illustrated above, the encoding of geometry structure information plays a vital role in accurately reconstructing lanes in 3D space. Specifically, we focus on the intra-lane and inter-lane properties between 3D lanes and 2D representation under the virtual top view projection.

We propose the geometry prior loss, an auxiliary loss function to involve explicit supervision in the local-to-global preservation of 3D lane structure.