Is it possible for an autonomous drone to be a fair competitor to a human pilot? We entered as RISE-Q team to the Alpha Pilot AI Challenge competition [1] to explore the answer to these questions. Drones flown by human pilots can easily reach speeds in excess of 30m/s while successfully traversing various types of gates and performing extremely agile maneuvers in the process. This is another demonstration of the impressive human visual capacity and intelligence.   In contrast, state-of-the-art autonomous drones developed for racing can hardly reach speeds of up to 3.5m/s without missing gates[2] and in relatively simpler tracks [2,3]. Moreover, they present vision algorithms that work up to 20fps [4], and controllers that compute commands at up to 100 Hz [2]. The main limitation is the compute power available on-board [3]. In short, there is a considerable skill gap between human pilots and autonomous drones, a product of limited computing power,  when it comes to racing.

Historically, control of aerial platforms flying at high speeds (>25m/s) and performing agile maneuvers has been demonstrated. Particularly, in [8] using reinforcement learning via apprenticeship and data captured from professional pilot flight combined with Kalman filtering, authors achieved control of high-speed and agile maneuvers, sometimes even outperforming human performance. In [7] authors demonstrated agile recovery from unknown initial conditions using reinforcement learning with ground truth data and a policy trained in a simulation. More importantly, they showed that the same policy (trained in simulation) for agile recovery, achieved the same and sometimes better performance in reality than in simulation. In [9], the authors demonstrate high-speed (3m/s) accurate tracking of complex trajectories using a differential-flatness approach. In [11], it is demonstrated that robustness can be achieved by a policy learned through reinforcement learning by using guided policy search algorithm, in which an MPC controller guides the policy search. Recent advances in computation using parallel processing and GPU, and neural networks that take and quickly process (>50Hz) large amounts of data [5] and produce control outputs [6,7] much faster (latency < 10us), are opening new possibilities.  Particularly, in [10] authors demonstrate a NN, trained with data collected in the simulation from novice and professional pilots with real controllers but flying in a simulator. Also, their network can be run on embedded platforms (Nvidia Jetson TX1, TX2) with image processing speeds of up to 60fps. However, they test their approach only in simulation and race tracks in which all gates are at the same height and traversed horizontally.

We propose an approach that combines previous success using reinforcement learning for high-speed control, differential-flatness control for accurate tracking at high-speed, and guided policy search and MPC for robustness against non modeled dynamics.

[1] https://www.herox.com/alphapilot

[2] Kaufmann, E., Gehrig, M., Foehn, P., Ranftl, R., Dosovitskiy, A., Koltun, V., & Scaramuzza, D. (2018). Beauty and the Beast: Optimal Methods Meet Learning for Drone Racing. Retrieved from http://arxiv.org/abs/1810.06224

[3] Moon, H., et. al. (2018). Challenges and Implemented Technologies used in autonomous drone racing.

[4] Li, S., Ozo, M. M. O. I., De Wagter, C., & de Croon, G. C. H. E. (2018). Autonomous drone race: A computationally efficient vision-based navigation and control strategy. Retrieved from http://arxiv.org/abs/1809.05958

[5] Redmon, J., & Farhadi, A. (2018). YOLOv3: An Incremental Improvement. https://doi.org/10.1109/CVPR.2017.690

[6] Zhang, T., Kahn, G., Levine, S., & Abbeel, P. (2015). Learning Deep Control Policies for Autonomous Aerial Vehicles with MPC-Guided Policy Search, 485–492. https://doi.org/10.1103/PhysRevLett.115.104301

[7] Hwangbo, J., Sa, I., Siegwart, R., & Hutter, M. (2017). Control of a Quadrotor with Reinforcement Learning, (644227), 1–8. https://doi.org/10.1109/LRA.2017.2720851

[8] Abbeel, P., Coates, A., Quigley, M., & Ng, A. Y. (2007). An application of reinforcement learning to aerobatic helicopter flight. Advances in Neural Information Processing Systems 19, 19, 1. https://doi.org/10.5433/1679-0359.2014v35n5p2769

[9] Faessler, M., Franchi, A., & Scaramuzza, D. (2017). Differential Flatness of Quadrotor Dynamics Subject to Rotor Drag for Accurate Tracking of High-Speed Trajectories. https://doi.org/10.1109/LRA.2017.2776353

[10] Müller, M., Casser, V., Smith, N., Michels, D. L., & Ghanem, B. (2017). Teaching UAVs to Race Using Sim4CV. Retrieved from http://arxiv.org/abs/1708.05884​