Inspur Information AI Team Sets Best Performance in Object Detection on NuScience Autonomous Driving Dataset | Tech Rasta


San Jose, Calif.–(Business Wire)–Inspur Information, a leading IT infrastructure solutions provider, participated in the latest evaluation of the globally recognized autonomous driving dataset from nuScenes. The Inspur Information AI team won first place in the vision track of the 3D detection task (nuScenes Detection Task), increasing the key indicator nuScenes Detection Score (NDS) to 62.4%.

Autonomous driving will completely transform the transportation industry and is a major focus for automotive manufacturers and AI companies. Object detection is at the core of autonomous driving technology, with the accuracy and stability of its algorithms constantly being improved by AI research teams. The nuScenes dataset is one of the most respected public datasets in the field of autonomous driving. Data was collected from real autonomous driving scenarios in Boston and Singapore. This is the first dataset that integrates multiple sensors such as cameras, LiDAR and millimeter wave radar to achieve complete sensor coverage around the vehicle. The nuScenes dataset provides rich annotation information such as 2D and 3D object annotation, LiDAR point cloud segmentation, and high-precision maps including 1,000 scenes, 1.4 million camera images, 390,000 frames of LIDAR sweeps, 23 million object classes, and 1.4 object classes. boxes, and the amount of data annotation is 7 times greater than the KITTI dataset.

The Inspur Information AI team participated in the vision track of the detection task. It is a highly competitive track, attracting top AI teams from around the world such as Baidu, Carnegie Mellon University, Hong Kong University of Science and Technology, MIT, Tsinghua University and University of California, Berkeley.

The vision track of the 3D detection task allows the use of only 6 cameras to provide full 3D object detection coverage around the vehicle without using additional sensor information such as LiDAR or millimeter-wave radar. Object detection includes vehicles, pedestrians, obstacles, traffic signals, traffic lights and other types of objects. In addition to detection, objects must be accurately evaluated for their position, size, orientation, velocity, and other information. The most challenging aspect is to accurately obtain the true depth and velocity of targets using 2D images. If the collected depth information is inaccurate, 3D perception tasks become very difficult. If the speed information captured is inaccurate, it can lead to dangerous decision-making and planning.

The Inspur Information AI team has developed an innovative multi-camera-based spatial-temporal fusion model architecture (Inspur_DABNet4D). Based on the technical framework of unified conversion of multi-view visual input to BEV (Bird’s Eye View) feature space, Inspur uses data model enhancement, depth-enhanced network, spatiotemporal fusion network, etc. to make the information more robust. And accurate BEV features, and object monitoring greatly optimizes velocity and displacement orientation estimation.

The multi-camera-based spatial-temporal fusion model architecture achieved four major technological advances.

  • First, the rich data sample enhancement algorithm maps the ground-truth with real 3D physical coordinates and provides expansion in the time series, which significantly improves the target detection accuracy. mAP (Mean Accuracy) increased by an average of 2%.

  • Second, a more powerful depth enhancement network works to enhance depth information, which is difficult to learn and model. This depth prediction is greatly improved by the optimization of deep network architecture, point cloud data monitoring and training, and depth completion and other technologies.

  • Third, a more refined – spatio-temporal fusion network further optimizes the solution to the spatio-temporal information dislocation fusion problem caused by the movement of the vehicle, but also introduces the random extraction of sweep frame data and the fusion of the current frame. Enable synchronization enhancement operation of data samples of different frames. This allows the model to learn more refined time series features with end-to-end learning.

  • Fourth, Inspur Information creates a more complete form of integrated modeling. It uses a unified modeling architecture with end-to-end feature extraction, fusion and detection head to create driving scenes with a wide perspective and large scale. The architecture is simple in construction, efficient in training, and universally applicable in diverse scenarios. A pre-trained model can replace a self-supervised model at any time, and testing and accuracy improvement can be completed quickly and conveniently.

Thanks to the advancement of more sophisticated algorithms and higher computing powers, the 3D object detection task results from nuScenes have improved greatly in 2022. The Inspur Information AI team raised the key indicator NDS to 62.4%, which is considered outstanding. Performance at the start of the year was 47%.

About Inspur information

Inspur Information is a leading provider of data center infrastructure, cloud computing and AI solutions. It is the 2nd largest server manufacturer in the world. Through engineering and innovation, Inspur Information provides cutting-edge computing hardware design and extensive product offerings to address critical technology areas such as open computing, cloud data center, AI and deep learning. Performance-optimized and purpose-built, our world-class solutions empower customers to meet specific workloads and real-world challenges. To learn more, visit


Source link