Training an autonomous vehicle to go from point A to point B without veering off into oblivion is a lot harder than it sounds (and it already sounds quite difficult). But after decades of advances in Machine Learning and Computer Vision, we’re currently in a point in history where we have openly available datasets at the tips of our fingers to train highly effective autonomous drivers.

One such dataset is called LiDAR. You’re about to learn why you should use them and how you can easily access them for your own autonomous driving projects.

What is LiDAR?

LiDAR stands for Light Detection And Ranging, and is a step up from traditional 2D camera data which can be rendered unreliable due to shoddy lighting or beaming sunlight. On the other hand, LiDAR works like radar but emitting infrared lasers to accurately map a cars’ surroundings without the usual issues in scene perception.

The results are then compiled into a ‘point cloud’ which essentially works like a 3D map of the world in real time—a map so detailed it can be used not just to spot objects, but to identify them. Once it can identify objects, the car's computer can predict how they will behave and how it should drive. Neat, isn’t it?

Although LiDAR 3D Point Cloud annotation comes with its own set of challenges, this cutting-edge method provides the richest 3D representations for accurate self driving vehicles.

LiDAR Datasets

Now that you know why LiDAR is the way to go in terms of autonomous vehicles, here’s a generous list of publicly available LiDAR datasets with all the details you need to know about them, including:

  • Location of capture
  • How the data was collected
  • Dataset size
  • File formats
  • Publications on or featuring the dataset
  • Where to download the dataset

Ford Campus Vision & LiDAR Dataset

 Courtesy of the Perceptual Robotics Laboratory (PeRL), this dataset was collected while driving a pickup truck around the Ford Research campus and downtown Dearborn, Michigan during late 2009.

The vehicle path trajectory in these datasets contain several large and small-scale loop closures, which should be useful for testing various computer vision and SLAM (Simultaneous Localisation and Mapping) algorithms.

Data collected using: Ford F250 Pickup Truck


  • Applanix POS LV
  • Xsens MTI-G
  • Inertial Measuring Unit (IMU)
  • Velodyne 3D-lidar scanner
  • Two push-broom forward looking Riegl Lidars
  • Point Grey Ladybug3 omnidirectional camera system

Size: ~100GB

File Format: .tgz


Files: Timestamp.log, Pose-Applanix.log, Pose-Mtig.log, Gps.log, PARAM.mat

Paper: Ford Campus Vision and Lidar Data Set

Download Ford Campus Vision and Lidar Dataset

KITTI Vision Benchmark Suite

Funded by the Karlsruhe Institute of Technology (KIT) and Toyota Technological Institute at Chicago (TTI-C), this dataset aims to develop challenging real-world computer vision benchmarks with specific focus on stereo, optical flow, visual odometry, 3D object detection and 3D tracking.

For this purpose, a standard station wagon was driven around the mid-size city of Karlsruhe, in rural areas and on highways. The first capture occurred in 2012, with updates made as recently as 2018.

Data collected using: Volkswagen Passat & Audi Q7


  • 2 × PointGray Flea2 grayscale cameras (FL2-14S3M-C), 1.4 Megapixels, 1/2” Sony ICX267 CCD, global shutter
  • 2 × PointGray Flea2 color cameras (FL2-14S3C-C), 1.4 Megapixels, 1/2” Sony ICX267 CCD, global shutter
  • 4 × Edmund Optics lenses, 4mm, opening angle ∼ 90◦, vertical opening angle of region of interest (ROI) ∼ 35◦
  • 1 × Velodyne HDL-64E rotating 3D laser scanner, 10 Hz, 64 beams, 0.09 degree angular resolution, 2 cm distance accuracy, collecting ∼ 1.3 million points/second, field of view: 360◦ horizontal, 26.8◦ vertical, range: 120 m

Size: ~180GB

File formats: .PNG, .TXT, OXTS

Folders: All sensor readings of a sequence are zipped into a single file named, where date and drive are placeholders for the recording date and the sequence number.

Paper: Vision meets Robotics: The KITTI Dataset

Download KITTI Vision Benchmark Suite Dataset.

 Sydney Urban Objects

Brought to you by the University of Sydney, this dataset contains a variety of common urban road objects collected in the central business district (CBD) of Sydney, Australia.

 There are 631 individual scans of objects across classes of vehicles, pedestrians, signs and trees. This insightful data was collected in 2013 with the goal to provide non-ideal sensing conditions that are representative of practical urban sensing systems, with a large variability in viewpoint and occlusion.

Sensor: Velodyne HDL-64E LIDAR

 Size: ~22MB

File Formats: .CSV, .BIN

Publication: Unsupervised Feature Learning for Classification of Outdoor 3D Scans

Download Sydney Urban Objects Dataset

Stanford Track Collection

Published by the Stanford Artificial Intelligence Laboratory, this dataset contains about 14,000 labeled tracks of objects as observed in natural street scenes from an autonomous vehicle research platform.

The data was collected during one hour of ‘360-degree, 10Hz depth information’ recorded while driving on busy campus streets and parked at busy intersections.

Sensor: Velodyne HDL-64E S2 LIDAR.

Size: 2GB

Format: The data is split into 'natural' and 'background' directories.

  1. Natural: data is from normal street scenes and are hand labeled.
  2. Background: data is from street scenes that were known to have no pedestrian, cyclists or cars in them.

Paper: Towards 3D Object Recognition via Classification of Arbitrary Object Tracks

Download The Stanford Track Collection Dataset 

Oakland 3D Point Cloud Dataset

 Published by the Robotics Institute at Carnegie Mellon University, this dataset contains labeled 3D point cloud laser data collected from a moving platform in a urban environment.

For the data collection, a Jeep was driven around the CMU campus in Oakland, Pittsburgh, PA, in 2009. As mentioned in the abstract of their publication: ‘We adapt a functional gradient approach for learning high-dimensional parameters of random fields in order to perform discrete, multi-label classification.’

Data collected using: 2000 Jeep Wrangler Sport


  • SICK LMS laser scanners
  • Used-in Push Brooms
Size: 33MB

Paper: Contextual Classification with Functional Max-Margin Markov Networks

Download Oakland 3D Point Cloud Dataset

Semantic3D NET

This dataset provided by Semantic3D NET in 2017 is used for the ‘Large-Scale Point Cloud Classification Benchmark’. This benchmark closes the gap and provides a large labelled 3D point cloud data set of natural scenes with over 4 billion points in total.

 It covers a range of diverse urban scenes: churches, streets, railroad tracks, squares, villages, soccer fields and castles.. It boasts a higher point density than both the Oakland data set and the Sydney Urban Objects data set mentioned earlier.

 Paper: Semantic3d.Net: A New Large-scale Point Cloud Classification Benchmark

Download Semantic3D NET Dataset

 TerraMobilita/IQmulus Urban Point Cloud Classification Benchmark

The benchmark is carried out by the MATIS Lab of the French National Mapping Agency (IGN) and the Centre for Mathematical Morphology in the framework of the TerraMobilita and IQmulus projects and is part of the IQmulus processing contest.

After that mouthful, what you may want to know is the dataset contains 3D MLS data from the urban environment in Paris, France. It’s composed of 300 million points and was collected in early 2013.

Format: .PLY

Paper: Terramobilita/Iqmulus Urban Point Cloud Classification Benchmark

 Download IQmulus & TerraMobilita Dataset

 Vaihingen 3D Airborne Dataset

Spurred by the Institute for Photogrammetry, a joint project comprised of ISPRS/EuroSDR called ‘Benchmark on High Density Aerial Image Matching’ was launched to ‘evaluate the potential of photogrammetric 3D data capture in view of the ongoing developments of software for automatic image matching’.

In short, the scope of the dataset spans the evaluation of 3D point clouds and DSM produced from aerial images with different software systems. You can expect to receive subsets of three aerial image blocks: Two datasets cover nadir imagery, which are captured at different land-use and block geometry, while the third data set includes oblique aerial images.

Unfortunately, there’s no clean-cut way to download this promising dataset. You’ll have to fill out a registration form requesting access and sign a user agreement. Once you’ve done that, the Institute will send you the password.

Request access to Vaihingen 3D Airborne Dataset

 Paris Lille 3D

MINES ParisTech gave way to this impressively large dataset composed of several point clouds of outdoor scenes in Paris and Lille, France.

It boasts more than 140 million hand labeled and classified points with more than 50 classes (e.g., the ground, cars and benches). With an aim to improve the techniques of automatic classification of urban scenes, this dataset features high quality segmentation and classification along with a wide variety of object classes, making it ideal for deep learning techniques.

Data collected using: MLS prototype of the centre for robotics of MINES ParisTech


  • GPS Novatel FlexPak 6
  • Velodyne HDL-32e LiDAR
Paper: Paris-Lille-3D: A Point Cloud Dataset for Urban Scene Segmentation and Classification

 Download Paris-Lille-3D Dataset

 Need a custom dataset tailored for your project?

 While you now have a solid list of publicly available LiDAR Datasets, there’s a good chance that none of them will truly fit the specifics of your project. This leaves you with the choice of either annotating data yourself or outsourcing custom annotation services to someone else (who actually has the time).

Granted, custom annotation services can be pricier than what you bargained for, but the truth is that your algorithms will only be as good as your training data. So this is not the time to skimp.

 So, if you want to differentiate your autonomous driver and kickstart its training with a highly custom dataset, drop us a note and we’ll discuss a plan that suits your needs.

Originally published on September 10, 2018 Topics: Machine Learning Deep Learning Data Science Datasets Autonomous driving


You may also like: