NeBula: TEAM CoSTAR’s Robotic Autonomy Solution that Won Phase II of DARPA Subterranean Challenge

This paper presents and discusses algorithms, hardware, and software architecture developed by the TEAM CoSTAR (Collaborative SubTerranean Autonomous Robots), competing in the DARPA Subterranean Challenge. Specifically, it presents the techniques utilized within the Tunnel (2019) and Urban (2020) competitions, where CoSTAR achieved second and first place, respectively. We also discuss CoSTAR’s demonstrations in Martian-analog surface and subsurface (lava tubes) exploration. The paper introduces our autonomy solution, referred to as NeBula (Networked Belief-aware Perceptual Autonomy). NeBula is an uncertainty-aware framework that aims at enabling resilient and modular autonomy solutions by performing reasoning and decision making in the belief space (space of probability distributions over the robot and world states). We discuss various components of the NeBula framework, including (i) geometric and semantic environment mapping, (ii) a multi-modal positioning system, (iii) traversability analysis and local planning, (iv) global motion planning and exploration behavior, (v) risk-aware mission planning, (vi) networking and decentralized reasoning, and (vii) learning-enabled adaptation. We discuss the performance of NeBula on several robot types (e.g., wheeled, legged, flying), in various environments. We discuss the specific results and lessons learned from fielding this solution in the challenging courses of the DARPA Subterranean Challenge competition.


Introduction
Robotics and artificial intelligence (AI) are transforming our lives, with a growing number of robotic applications ranging from self-driving cars (Yurtsever et al., 2020), search and rescue (Balta et al., 2017), healthcare (Qin et al., 2020), and humanitarian missions (Santana et al., 2007), to robots under water (Kinsey et al., 2006) and robots beyond our home planet on Mars (Sasaki et al., 2020;Bajracharya et al., 2008) and the moon (Ford et al., 2020). Autonomy and AI are empowering these robots to carry out missions autonomously, increasing efficiency with reduced human risk, saving lives, and accomplishing tasks that are often in hazardous environments too dangerous for humans.
Extreme environments: Underground environments are an important example of the type of terrain that imposes a lot of risk for humans, with a wide range of terrestrial and planetary applications. On Earth, autonomous underground exploration is a crucial capability in search and rescue missions after natural disasters, in the mining, oil, and gas industry, and in supporting spelunkers and cave rescue missions. One prominent example is the Tham Luang cave rescue (Figure 1), where the international community aimed at rescuing 13 members of a football team from 4 km inside a partially flooded cave. Drones equipped with thermal cameras have been flown over Tham Luang to detect possible access points, and an underwater robot was deployed to send information back on the water depth and condition of the cave. However, at that time, no technology existed to autonomously reach the people, map the cave, and scan for people deep underground.
Planetary applications: Beyond our home planet, the research community has identified more than 200 lunar and 2000 Martian cave-related features (Cushing, 2012). Caves and subsurface voids, in general, are of utmost importance in space exploration for several reasons. First, their stable, radiation-shielding environment and potential to act as volatile traps make them an ideal habitat candidate for future human exploration (Kesner et al., 2007;Titus et al., 2020). Second, planetary cave environments may harbor life due to their shielding from cosmic rays, and if there is life beyond Earth, deep planetary caves are one of the most likely places to find it. Third, exploring  Fedschun (2018) and Vejpongsa (2018).
caves provides an unprecedented opportunity for scientists to study planetary volcanic processes and the geological history of planetary bodies. These reasons, among many others, have made subsurface exploration one of the main next frontiers for space exploration (Stamenković et al., 2019;Touma et al., 2020). While autonomy and AI technologies are growing fast, challenges still remain for operations in extreme environments. Technical challenges include perceptual degradation (darkness, obscurants, self-similarity, limited textures) in wholly unknown and unstructured environments, extreme terrain that tests the limits of traversability, mission execution under constrained resources, and high-risk operations where robot failure or component degradation is a real possibility. Most challenging of all, however, is the combination of the above features. Further work is needed to push the state of the art to enable systems that can robustly and consistently address these challenges simultaneously.
Contributions: In this paper, we discuss the NeBula autonomy solution and Team CoSTAR's contributions towards addressing some of the challenges in robotic exploration of unknown extreme surface and subsurface environments. We discuss these technologies in the context of the DARPA Subterranean Challenge (DARPA, 2018a), where Team CoSTAR won the Urban competition and ranked second in the Tunnel competition. The videos in CoSTAR Team (2020aTeam ( ,b,c,d, 2021 depict some highlights of these runs. As we will discuss in Section 2, this competition pushes the stateof-the-art boundaries in extreme environment exploration in mobility, perception, autonomy, and networking. Specifically, we will discuss Team CoSTAR's contributions in advancing the autonomy in the following fronts: 1. Vanguard operations: As the robots enter the environments, they explore the frontier with vanguard robots with highly capable sensing for mapping and artifact detection. 2. Mesh network expansion: As robots start the mission they aim at maintaining communication with the human supervisor by creating and extending a wireless mesh network inside the environments of networking. Ground robots do so by deploying communication pucks like breadcrumbs, and aerial scouts can self-deploy for either comms relays or added sensing. Mission autonomy will decide where and how to deploy these breadcrumbs.   Field Robotics, July, 2022 · 2:1432-1506 3. Leaving the mesh network: The environment is highly communication denied. Due to the large scale, complexity of the environment, and capacity of robots to carry communication nodes, the mesh network reach is typically limited to the parts of the environments in the vicinity of the base station (i.e., environment entrance). Hence, the robots will leave the communication network range soon and, for the most part, carry out a fully autonomous mission. 4. Autonomous mission: Robots perform search, mapping, and exploration. Autonomous mission guides them to the rendezvous points to exchange information with each other, or they come back to the mesh network to exchange information with the base station. 5. Dynamic task allocation: Robots continue simultaneous frontier exploration. They autonomously monitor (a) the state of the robot team, which includes (i) health, battery, and functionality level of the assets, (ii) robot locations, and (iii) the information value (e.g., the numbers of detected artifacts) on each robot; (b) the state of the world, which includes what robots learn about the environment, e.g., geometric and semantic maps; (c) the state of the mission, which includes (i) the remaining mission time, (ii) margin to the desired mission output, and (iii) acceptable risk thresholds; and (d) the state of communication: (i) network connectivity, throughput, etc., (ii) how long each robot is out of the comm range, and (iii) location of comm nodes. Given these states, the mission planner will decide to deploy new robots, or retask or reposition active robots in the environment. 6. Team behaviors: Vehicles and team formation are configured during the mission. Examples include (a) Return to Base, when a battery swap is needed, optimal, and possible at base; (b) Return to Mesh Network, to ensure the data are communicated, then continue; (c) Explore Frontier, to continue as is, aggressively prioritizing coverage; and (d) Act as a Data Mule, to retrieve data from a vehicle that cannot come back to the mesh network (due to limited battery, health, speed, etc.). Faster and healthier vehicles can act as data mules to carry the information between others agents and the mesh network. 7. Heterogeneous coverage: These behaviors continue until the entire course is explored. Due to the heterogeneous capabilities of the robots from mobility, sensing, and computation perspectives, the autonomy might dispatch different robots to the same parts of the course. This is to increase the confidence and coverage in mapping and artifact detection by providing multimodal information (e.g., thermal, radar, etc.) about the environment elements. For example, the drone might have reached and searched parts of the course but, given its limited sensing capabilities, autonomy will dispatch a ground robot to get a second vote on an artifact before submitting it to the server. All the data are submitted to the server prior to the end of mission time.

NeBula Autonomy Architecture
Resiliency is a key requirement to enable a repeatable and consistent robotic autonomy solution in the field. To enable a resilient autonomy solution, NeBula takes uncertainty into account to cope with unmodeled and unknown elements during the mission. Motivation/Insight: An important (if not the most important) cause of the brittleness of today's autonomy solutions is the disjointed design of various subsystems. Traditionally, when designing or advancing the performance of a certain module, the typical assumption is that the rest of the system functions properly and nominally. When it comes to real-world deployment, these assumptions typically break due to the discrepancy between the computational models and real-world models. This introduces uncertainty in the perception, inferences, decision making, and execution, potentially leading to suboptimal behaviors.
Key principle for resilient autonomy architectures: Focused on fielding autonomy in challenging environments, NeBula is built on the fundamental principle that "To achieve resilient autonomy, the decision-making, inference, and perception modules must be reciprocal and tightly co-designed." This implies reasoning over joint probability distributions across various componentlevel states as opposed to marginal distributions over a set of disjoint system states. Contrary to the typical sense → infer → plan → act sequence in autonomy solutions, NeBula architecture is built on a plan-to-(sense → infer → act) loop, where the planner dynamically plans for the acquisition of sufficient sensory information and plans for the quality of representation required to enable resilient and uninterrupted missions within the prescribed mission risk thresholds. NeBula's joint perception planning is formulated as an uncertainty-aware belief space planning problem. Belief captures the probability distribution over system's states including the robot pose, environment state, measurements, team coordination state, health state, communication state, etc. Planning over joint beliefs and taking cross-component uncertainty into account (which describes the interaction of connected modules), NeBula allows for each module to be not only robust to uncertainties within its own subsystem but also resilient to uncertainties in the integration process, resulted from imperfections and off-nominal performance of connected modules.
Illustrative example (perception-aware planning under uncertainty): Simultaneous localization and mapping (SLAM) is a fundamental problem in robotics that aims at simultaneously solving the localization problem ("where is the robot?") and the mapping problem ("what does the environment look like?"). This is a very well-studied problem and it is well known that solving SLAM (i.e., incorporating joint probability distributions between localization states and environment states) typically leads to optimal and resilient inference, whereas solving localization and mapping separately and putting their solutions together is suboptimal and can lead to a brittle inference system. Analogous to SLAM philosophy, NeBula extends this concept from pure inference to "joint inference and decision-making" (Figure 6). For example, NeBula develops solutions where mapping and planning are solved simultaneously using SMAP (simultaneous mapping and planning) to achieve resilient traversability and risk awareness. Similarly NeBula develops SLAP (simultaneous localization and planning) solutions where localization uncertainty is taken into account in the planning phase using belief space planners. Solving these joint problems typically leads to behaviors where the autonomous system is intelligently planning proactive actions to improve the "inference quality" (e.g., world model or robot model) and reduce uncertainty to the levels necessary to achieve mission goals within the prescribed risk thresholds. This is in contrast to typical solutions where the relationship is one way and the inference module serves the decision-making modules, and decision-making components react to inference output.
Modularity and scalability: In addition to resiliency, NeBula focuses on a modular and scalable framework. This requirement is driven by missions carried out by a team of networked heterogeneous Figure 6. Illustrative example of joint inference and decision making in NeBula's low-level navigation system. Denoting the state domain of the localization by pose x , mapping by map state m, and planning by policy parameters u, SLAM, SLAP, and SMAP aim at solving for (i.e., estimating or predicting) the joint distributions p(x , m), p (x , u), and p(m, u). The full joint problem, SPLAM (simultaneous planning, localization, and mapping), solves for the probability distribution p (x , m, u). robots. Each robot has different mobility (e.g., wheeled, legged, aerial, hybrid), sensing, and computational capabilities (e.g., Section 12). NeBula provides appropriate abstraction to allow for reusability and agnosticism to the specific robot and hardware. Any low-level hardware-specific modules should be properly isolated to increase the reusability of software. Further, NeBula supports networked systems where agents are intermittently losing and reestablishing communications, and sharing knowledge with each other and with the base station, enabling large-scale environment exploration with a limited number of robots. Each robot has a level of local autonomy to act individually when it is disconnected from the rest of the team.
System architecture: Figure 7 illustrates NeBula's high-level functional block diagram, and will serve as a visual outline of the sections of this paper. The system is composed of multiple assets: mobile robots, stationary comm nodes, and a base station, each of which owns different computational and sensing capabilities. The base station acts as the central component to collect data from multiple robots and distribute tasks, if and when a communication link to the base station is established. In the absence of the communication links the multi-asset system performs fully autonomously. The fundamental blocks are as follows: • Perception (Sections 5, 6, and 7): Perception modules are responsible to process the sensory data and create a world model belief. The local perception modules (Section 5) provide the odometry and state estimation information needed for local navigation, such as state (pose, velocity) and traversability maps. The global SLAM module, in Section 6, tracks the robot's position within a globally consistent frame while building a 3D map of the environment. The semantic understanding and artifact detection module (Section 7) adds semantic information to the map and finds objects of interest from the environment, and in conjunction with the global localization module reports their location. • Planning (Sections 8,9,and 11): Planning modules will make onboard decisions based on the current world belief. The planning modules are composed of multiple layers. The highest layer is the mission planning module (Section 11), which runs a mission according to its specifications, generates global goals for each robot, and allocates tasks to different robots. The second layer is the global motion planning layer (Section 9), responsible for exploration, search, and coverage behaviors in global scale and large environments. It makes plans to safely move the robot to a goal assigned by the mission planner. It also enables autonomous exploration of the environment in order to increase the knowledge and confidence about the world belief. The third layer is the traversability and local navigation component (Section 8), responsible for analyzing how and with what velocities different terrain elements can be traversed. It quantifies the motion risk, and optimizes/replans local trajectories with high frequency to enable aggressive traversability in obstacle-laden and challenging environments, while ensuring the risk levels stay within the prescribed mission specifications. NeBula abstracts motion models, enabling the planning stack to be robot agnostic and to support heterogeneous mobility platforms. • World belief : This block includes a probabilistic model of the world. It is jointly constructed by perception and planning modules, and enables a tight integration between these modules leading to perception-aware behaviors. World belief extends the traditional state database concept to a belief database, where we maintain probability distributions over various state domains as well as joint probability distributions across multiple domains. It includes belief over robot pose, environment map, mission state, system health, information roadmap, among other state domains. There are multiple variations of the world belief: (i) local to each robot, (ii) shared belief across robots, and (iii) predicted belief to assess future risk and performance to enable making perception-aware and uncertainty-aware decisions. During exploration tasks, robots develop their own local world models based on what they perceived with their limited sensor input. They generate the world model as an abstract representation of spatial and temporal information of their surrounding environment (e.g., maps, hazards) and internal state (e.g., pose, health). This world belief is internally predicted to enable uncertainty-aware decisions and actions based on this predicted model. The shared world model is synchronized among the robots and the base station using asynchronous bidirectional messaging with the publish/subscribe paradigm. The discussion of world belief is distributed across all sections of the paper. • Communications (Section 10): When possible, communication modules synchronize the shared world models across the robots and the base station. To cope with the dynamic and unstable nature of the underlying mesh network, the communication manager is responsible to provide reliable message transfer with buffering, compression, and retransmission. The modules also provide capabilities to maintain a mobile ad hoc network using radio devices. Static communication nodes can be dropped from particular robots to help form a network. • Operations (Section 11): Operations modules aid the human supervisor to effectively monitor the system performance and interact with it if and when communication links are established. One of the main roles is the visualization of complex world belief in a human-recognizable form. In the nominal operation scenarios, the human operator only interacts with the system by updating the world belief, when needed and when possible.
Over time, each layer adapts to the collected data as well as to improvements of models in other layers. Current implementation status: NeBula is a growing and evolving framework. Its current version (in 2020) has been deployed in several large projects for terrestrial and planetary applications. It has enabled autonomous operations on various vehicle types including (1) wheeled rovers, (2) legged robots, (3) flying multicopters, (4) hybrid aerial/ground vehicles, (5) 1/5th-scale race cars, (6) Field Robotics, July, 2022 · 2:1432-1506 tracked vehicles, and (7) full-size passenger vehicles. We refer to Section 12 for a detailed description of our implementation on these platforms.

State Estimation
One of the fundamental components of the NeBula architecture, shown in Figure 7, is reliable state estimation under perceptually degraded conditions. This includes environments with large variations in lighting, obscurants (e.g., dust, fog, and smoke), self-similar scenes, reflective surfaces, and featureless/feature-poor surfaces. NeBula relies on a resilient odometry framework that fuses a set of heterogeneous sensors to handle these various challenges. This section briefly describes this odometry solution. (For more details, please see Santamaria-Navarro et al. (2019); Palieri et al. (2020); Kramer et al. (2020);Fakoorian et al. (2020Fakoorian et al. ( , 2021; Tagliabue et al. (2021); Lew et al. (2019)).
Objective: The objective of the state estimation pipeline is to utilize multi-modal sensing to determine the robot's state, producing resilient, high-rate, and smooth estimates in a probabilistic sense. A key aspect to the proposed approach is assigning each sensor output a quality measure that can be used to identify healthy or unhealthy measurements before they are fused. We start by introducing our notation: p ∈ R 3 (global position, x, y, z); R ∈ SO(3) (global orientation, which can be described with minimal representation φ, θ, ψ); v ∈ R 3 (body linear velocity); ω ∈ R 3 (body angular velocity); a ∈ R 3 (body linear acceleration); α ∈ R 3 (body angular acceleration); and R, v, ω, a, α}).
Note that we restricted the quality of the state to binary values although it can be easily generalized to higher resolutions and even continuous representations.
HeRO architecture: The proposed architecture, shown in Figure 8, considers redundancy and heterogeneity in both sensing and estimation algorithms. It is designed to expect and detect failures while adapting the behavior of the system to ensure safety. To this end, we present HeRO, the Heterogeneous and Resilient Odometry Estimator (Santamaria-Navarro et al., 2019): a framework of estimation algorithms running in parallel supervised by a resiliency logic. Resilience logic has three main functions: (a) perform confidence tests in data quality (measurements and individual estimations) and check health of sensors and algorithms, (b) reinitialize those algorithms that might be malfunctioning, and (c) generate a smooth state estimate by multiplexing the inputs based on their quality. The output of this resiliency logic, which includes a state quality measure, is used by the guidance and control system to determine the best mode of operation that ensures safety (see, for instance, the NeBula interconnections between Perception and Planning modules in Figure 7). For example, guidance and control could switch to pure velocity control if the position estimates are unhealthy or issue a stop command if both position and velocity estimates are unreliable.
The key idea behind HeRO is that any single state estimation source can carry errors, either due to failures in sensor measurements, algorithms, or both, but having a complete failure becomes increasingly rare as the number of heterogeneous parallel approaches increases. HeRO is frontend agnostic, accepting various algorithmic solutions and with the ability to incorporate either tightly or loosely coupled approaches. However, to take advantage of all possible mobility modes, there is a need for estimating position, orientation, velocity, and, ideally, acceleration. HeRO is tailored to incorporate a vast variety of estimation algorithms. Figure 8 depicts the main sensor and algorithmic solutions developed and used by Team CoSTAR in the DARPA Subterranean Challenge. Our solution considers software-synchronized sensors (common clock synchronization after initialization), with extrinsic calibrations roughly obtained from the robot model designs and fine tuned used optimization approaches such as Kalibr (camera-IMU) (Furgale et al., 2013;Rehder et al., 2016) or LiDAR-align (LiDAR-LiDAR) (Millane and Taylor, 2019); or by aligning  with the robot manufacturer frames. We rely on a variety of in-house and open-source algorithms for sensor fusion. A few examples are as follows: WIO uses an extended Kalman filter (EKF) to fuse the measurements of the wheel encoders and those from an inertial measurement unit (IMU). CIO also takes advantage of an EKF but this time including the modeling of the contacts (Lew et al., 2019). The optical flow (OF) approach, VIO, and thermal imagery fusion (TIO) utilize a combination of open-source and commercial solutions, including PX4Flow (Autopilot, 2021), ORBSLAM (Mur-Artal and Tardós, 2017), Qualcomm VI-SLAM (Qualcomm Technologies, Inc.), and the MiT KimeraVIO (Rosinol et al., 2020), ROTIO (Khattak et al., 2019), among others. The RIO algorithm is our own development presented in Kramer et al. (2020). Finally, the fusion of LiDAR scans with IMU measurements is done by combining our LOCUS scan matching  with IMU measurements, either using our Kalman filter variant (AMCCKF, Fakoorian et al., 2021) or a factor graph optimization. As an example, in the following we describe the latter LiDAR-inertial odometer.

LiDAR-Inertial Odometry Estimation
LiDAR is one of the key sensors offering high range and accuracy. LiDARs also perform well in low-light conditions. Moreover, the combination with an IMU provides essential dynamic information and the registration of the gravity vector, which is of high importance for planning modules (e.g., computing traversability regions). There exist several methods exploiting LiDAR-IMU data fusion achieving remarkable accuracy (Shan et al., 2020;Ye et al., 2019;Hess et al., 2016). However, they do not consider potential failures of the fused sensing modalities, which are likely to be observed in real-world field deployments and can result in catastrophic degradation of the odometry performance if not robustly handled.
To enable reliable operation in extreme settings, our proposed LiDAR-inertial odometry estimation consists of three modules: (i) a LiDAR front end, which provides ego-motion estimation by analyzing LiDAR scans; (ii) an IMU front end based on pre-integration techniques; and (iii) a data fusion module, formulated as a factor graph optimization problem, which fuses the data provided by the front ends and also analyzes the LiDAR observability. This observability analysis is then used by HeRO to take informative decisions about using or not the LIO estimation. These modules are briefly described hereafter.

LiDAR Front End
LOCUS: NeBula's LiDAR-centered front end, referred to as LOCUS (Lidar Odometry for Consistent operations in Uncertain Settings) , is a multi-sensor LiDAR-centric solution for high-precision odometry and 3D mapping in real time. LOCUS provides a generalized iterative closest point (GICP) (Segal et al., 2009) based multi-stage scan matching unit equipped with a health-aware sensor integration module for robust fusion of additional sensing modalities in a loosely coupled scheme. The architecture of the proposed system, depicted in Figure 9, is made of three main components: (i) a point-cloud pre-processor, (ii) a scan matching unit, and (iii) a sensor integration module.
The point-cloud pre-processor is responsible for the management of multiple input LiDAR streams (e.g., syncing, motion correction, merging, filtering) to produce a unified 3D data product that can be efficiently processed in the scan matching unit. The scan matching unit then performs a cascaded GICP-based scan-to-scan and scan-to-submap matching operation to estimate the six-degree-offreedom (6-DOF) motion of the robot between consecutive LiDAR acquisitions.
The sensor integration module is a key component of the system to enable joint optimization of accuracy and robustness. In robots with multi-modal sensing, when available, LOCUS can use  Figure 9. Architecture of the proposed system. In the block diagram MDC stands for motion distortion correction, while FGA stands for flat ground assumption. an initial transform estimate from a non-LiDAR source to ease the convergence of the GICP in the scan-to-scan matching stage, by initializing the optimization with a near-optimal seed that improves accuracy and reduces computation, enhancing real-time performance. Multiple sources of odometry (e.g., VIO, KIO, WIO) and raw IMU measurements available on board are fed into a sensor integration module which selects the output from a priority queue of the inputs that are deemed healthy by a built-in health monitor, which prioritizes the order based on the expected accuracy of the methods. If the highest priority input is not healthy, then the next highest priority is used. If all sensors fail, the GICP is initialized with identity pose and the system reverts to pure LiDAR odometry. Notice how the confidence tests (health monitoring) depicted in Figure 8 are here incorporated within this sensor integration module. Palieri et al. (2020) provides more details on the system functioning.

LOCUS comparative results:
We compare LOCUS with state-of-the-art LiDAR odometry methods (Shan et al., 2020;Ye et al., 2019;Hess et al., 2016;Zhang and Singh, 2014;Nelson, 2016) in extreme, perceptually degraded subterranean environments and demonstrate high localization accuracy along with substantial improvements in robustness to sensor failures. For the evaluation, we use the data collected in the Tunnel and Urban Circuit rounds of the DARPA Subterranean Challenge, from a wheeled ground robot carrying two Velodyne LiDARs, an IMU and running WIO onboard. To assess accuracy, we compute the absolute position error (APE) of the estimated robot trajectory against the ground-truth reference, for the different methods over the different runs, and report in Figure 10 a summary of the results. Throughout all the operations, LOCUS achieves highly accurate performance. To assess robustness, we analyze the flexibility of the various methods with respect to sudden failures of an input source by testing the following failure scenarios: (i) failure of WIO/IMU, (ii) failure of WIO, and (iii) failure of LiDAR. In these scenarios, tightly coupled approaches and methods designed with synchronized callbacks stop operating when an input is missing. In contrast, LOCUS consistently achieves reliable ego-motion estimation and mapping, demonstrating efficient handling of sensor failures in a cascaded fashion, behavior that is desirable to accommodate the unforeseen challenges posed by real-world operations where hardware failures are likely to happen, or sensor sources can become unreliable (see Palieri et al. (2020) for details).

IMU Front End
IMU pre-integration: The IMU front end is based on a pre-integration technique of the inertial measurements. This module makes use of the state-of-the-art on-manifold pre-integration theory to summarize the high-rate IMU measurements into a single motion constraint (Forster et al., 2015a,b) for the subsequent pose-graph optimization performed in the LiDAR-IMU data fusion algorithms. IMU pre-integration is also used to guarantee a pose and velocity estimate at high rate and low Field Robotics, July, 2022 · 2:1432-1506 latency, regardless of the time taken by the optimizer used in the back end of the sensor fusion algorithm.

LiDAR-IMU Data Fusion
Smoothing framework (LION): The fusion of the relative ego-motion estimations, obtained from LOCUS and IMU pre-integration front ends, is performed via a fixed-lag smoother using a factor graph, as described in Tagliabue et al. (2021), where we introduce LION (LiDAR-Inertial Observability-aware Navigator for vision-denied environments). The state estimated by the proposed smoother consists of (a) the pose (position and attitude) W T B of the IMU-fixed reference frame B expressed in a slowly drifting inertial reference frame W ; (b) the linear velocity W v; (c) the IMU biases ( B b a for the accelerometer and B b g for the gyroscope), and (d) the extrinsic calibration B T L between the LiDAR-fixed frame L and the IMU frame B, introduced to reduce the effects of error in mounting the sensors, as well as to address the challenges in offline LiDAR-extrinsic calibration. A representation of the states and factors used in the factor graph with a window of k time steps can be found in Figure 11. Following Forster et al. (2015b), we model the smoothing problem using Georgia Tech Smoothing and Mapping (GTSAM) (Dellaert, 2012) and we solve the associated optimization with iSAM2 (Kaess et al., 2012). Observability module: It is crucial for LiDAR-based estimation algorithms to determine if the geometry of the scene can well constrain the estimation of the translational motion, since long shafts and corridor-like structures can severely impact motion observability. Following Gelfand et al. (2003) and Bonnabel et al. (2016), we propose an observability metric, computed within LION architecture, which can inform the HeRO switching logic (Figure 8) about potential unreliability in the odometry output of LiDAR-based estimators. Such a metric is based on the condition number κ(A tt ) := |λ max (A tt )| |λ min (A tt )| −1 of the translational part A tt of the Hessian of the cost minimized by the point-to-plane ICP algorithm. The eigenvector associated with the smallest eigenvalue of A tt is the least observable direction for translation estimation. As a consequence, the larger the condition number κ(A tt ) is, the more poorly constrained the optimization problem is in the translational part. More details are provided in our related work (Tagliabue et al., 2021).
Evaluation: We report the performance of the LiDAR-IMU fusion technique during the two tracks (A and B) of the Tunnel Competition at the DARPA Subterranean Challenge. The LOCUS output (selected, for this evaluation, to be the pure scan-to-scan matching from the GICP (Segal et al., 2009)) was fused at 10 Hz with the IMU, and consequently the fused output of LION, could be provided at up to 200 Hz. The sliding window of LION used here is 3 s and the factor graph optimization was tuned to use approximately 30% of one CPU core of an i7 Intel NUC. The rootmean-squared error (RMSE) for position (t(m)) and attitude estimation (R(rad)) and the percentage drift (t(%)) are reported in Table 4, where we include a comparison with WIO and LOCUS (scan-toscan), with the global localization algorithm LAMP  (presented in the following section) as ground truth. The results highlight that fusing inertial data with the odometry from the front end significantly reduces the drift of LiDAR's pure scan-to-scan matching. Additionally, Table 4. Estimation error of wheel-inertial odometry, scan-to-scan matching, and LION for two runs of the two tracks of the Tunnel Competition, computed for one of the robots deployed.   LION reliably estimates the attitude of the robot, and guarantees a gravity-aligned output provided at IMU rate. Last, in Figure 12, we report the performance of the observability module in an indoor, office-like environment, characterized by long corridors. The results show that when the observability module is not used (Figure 12, left), motion unobservability creates a LiDAR slip, producing a position estimation error of ≈ 9 m. When the observability module is used ( Figure 12, left), the switching logic in HeRO switches to WIO instead of LION for the section of the corridor without LiDAR features. The total error is ≈ 1 m ("Before loop closure"). Improved state estimation (reduced drift in the output of HeRO) benefits the global mapping solution (Section 6), which can now correctly detect a loop closure ( Figure 12, "After loop closure"), further reducing the drift.

Other Odometry Sources
Apart from the LiDAR-inertial odometry estimator, NeBula consists of other robust and resilient estimation algorithms developed to provide a robust state estimation while navigating in perception-challenging environments. Some examples are, for instance, a contact-inertial odometry estimation (Lew et al., 2019), where contacts are exploited to produce zero velocity updates into a Kalman filter that is integrating IMU measurements during a dead-reckoning situation, or a radar-inertial odometry (Kramer et al., 2020), which provides reliable ego-motion estimations even in the presence of obscurants thanks to the radar signal properties. The parallel combination of these heterogeneous estimation sources within the HeRO architecture provides a qualitative and robust state estimation that can be refined with a back-end algorithm providing global localization, as described in Section 6.

Large-Scale Positioning and 3D Mapping
NeBula's SLAM solution, called LAMP (Large-scale Autonomous Mapping and Positioning), achieves low-drift, multi-robot, multi-sensor SLAM over large scales in perceptually degraded conditions. LAMP produces a consistent global representation of an unknown environment, along with the associated covariances to enable uncertainty-aware solutions across the NeBula system ( Figure 7). In the context of the DARPA Subterranean Challenge, LAMP achieves the requirement Field Robotics, July, 2022 · 2:1432-1506  Figure 13. LAMP architecture. Each robot maintains its own factor graph (FG), which can then be fused with a multi-robot team on the base station, using a centralized architecture. The LiDAR and camera data that are used for loop closures and map building (white "Sensor Data" box) is labeled Keyed Sensor Data, being linked with a specific pose node. Handlers of artifact observations, IMU measurements, and ultrawideband (UWB) signals all process data to add constraints to the factor graph. Our robust optimization approach runs in parallel to optimize both the robot and multi-robot factor graphs.
for artifact localization error of less than 5 m over multiple kilometers of traverse. In this section, we will outline the architecture of LAMP, and then describe our approach to multi-sensor SLAM. Finally, we present results from representative field tests.

Subsystem Overview
As outlined in Figure 13, LAMP is a factor-graph-based SLAM solution, with the following key components: (a) an adaptable odometry input that can process individual or fused odometry sources, such as HeRO (Section 5), (b) a multi-modal loop closure module, based on LiDAR, visual, or semantic features, and (c) an outlier-resilient optimization of the factor graph, including multi-sensor inputs.
The flow of data starts with the odometry and sensor inputs, which add factors to the graph on the robot. Parallel processes then run loop closure searches and factor-graph optimization. Next, the graph is sent to the base station. The base station merges graphs from each robot into a common multi-robot graph that is further optimized with the addition of inter-robot loop closures. The main output products of LAMP are a set of poses describing the robot trajectory and the location of artifacts, as well as a point-cloud map.
Pose nodes and adaptable odometry input: To make the factor-graph optimization computationally tractable over large-scale, long-term multi-robot exploration, LAMP utilizes a sparse graph of pose nodes and edges ( Figure 14). The edges are obtained from an accumulation of odometry measurements between two consecutive nodes (odometry edges) or from translation and rotation estimates between nonconsecutive nodes (loop closures edges, described below). A new pose node and linking odometry edge is created after traveling more than a threshold translation or rotation. To address the challenge of perceptual degradation for these odometry edges, we use HeRO (Section 5) as the input odometry source.
Multi-modal loop closures: A crucial capability to reduce the accumulated error in the robot trajectory is loop closure detection: the ability to correctly assert when a robot revisits a previously visited location or known landmark. Loop detection enables the computation of rigid-body 3D transformations between nonconsecutive pose nodes that can be added as loop closure edges in the factor graph ( Figure 14). The multi-modal architecture of LAMP's loop closure module ( Figure 13) enables a robust and reliable system through the use of different sensing modalities. These loop closure sensing modalities include using LiDAR data , visual data (Rosinol et al., 2020), and semantic data (Ebadi et al., 2021).

Multi-robot fusion:
NeBula addresses the problem of multi-robot exploration of unknown environments by relying on LAMP's multi-robot architecture. This architecture is centralized, to make use of agents with greater computational resources (such as a base station); however, for decentralized applications of NeBula we utilize the techniques described in Choudhary et al. (2017). In the centralized architecture the agent with the greatest computational capability serves as the central agent to fuse factor graphs constructed by individual robots into a consistent multi-robot graph, along with the associated sensor data ( Figure 13). The factor graphs are fused using the same multi-modal loop closure modules as on the single robot, but instead of searching for intra-robot loop closures, these modules search for inter-robot loop closures. To further improve localization accuracy, we use the computational power of the central agent to perform batch loop closure analysis across the entire graph. This analysis identifies and computes additional inter-and intra-robot loop closures to add to the multi-robot graph. The updated multi-robot global graph is then optimized, and periodically sent back to the robots, for each agent to have a consistent global representation of the environment for global planning (Section 9).
Factor-graph optimization: Our factor-graph optimization (Kimera-RPGO, Rosinol et al. (2020)) uses a robust outlier rejection approach to reject the erroneous loop closures that can occur when operating in perceptually degraded conditions, such as with obscurants and self-similar environments. Kimera-RPGO rejects erroneous loop closures by finding the largest consistent set of loop closures for each set of single-robot and inter-robot loop closures, using a consistency graph and max clique search (an adaptation of Mangelson et al. (2018)). The loop closures that are not in the consistent set are discarded prior to optimization (see Ebadi et al. (2020) and Lajoie et al. (2020) for details). The updated factor graph is then optimized with a Levenberg-Marquardt solver that is implemented in GTSAM (Dellaert, 2012).

Additional Factors and Multi-Sensor Fusion
LAMP fuses multiple sensing inputs into the factor graph ( Figure 14) to improve the robustness and accuracy of the SLAM solution. We present four examples here: 1. IMU Gravity Factors: When the robot is stationary, the accelerometers on the IMU can be used to obtain a robust estimate of the local gravity vector, which is added to the factor graph as a constraint on roll and pitch (yellow factors in Figure 14).
Field Robotics, July, 2022 · 2:1432-1506 Figure 15. Example calibration to global frame from the DARPA Subterranean Challenge Urban Circuit. The coordinates of the survey prisms and reflective patches on the gate are provided and define the reference frame. A survey station measures the prisms to determine its pose in the global frame, after which it can measure the position of robots. LiDAR measurements of the reflective patches provide a yaw estimate and an IMU computes roll and pitch assuming gravity alignment of the reference frame.

Landmark Factors:
Measurements of distinct landmarks can either be used to detect loop closures or to directly provide constraints to the factor graph. These landmarks fall into two categories.
(a) Deployed Landmark Factors: These landmarks include visual beacons, ranging beacons, or retro-reflecting beacons and are deployed from a robot while exploring an unknown environment. For example, we have implemented deployable UWB ranging beacons in our system (see Section 12.2 for hardware details, and Funabiki et al. (2020) for algorithmic details). The signals from the beacons robustly and efficiently identify loop closures, to seed LiDAR-or vision-based alignment computations for single-and multi-robot teams (green node and factors in Figure 14). (b) Environmental Landmark Factors: Existing features in the environment such as signs, salient objects, and the shape of junctions (e.g., Ebadi et al., 2021) can be used as landmarks. For example, we use observations of specific objects, such as backpacks and fire extinguishers (called artifacts in SubT), with sets of range-bearing observations (dashed black lines in Figure 14) from the artifact relative-localization module (Section 7). By fusing the object observations into the factor graph we also ensure the most up-to-date global location of those objects for situational awareness (and scoring in SubT). 3. Calibration Factors: At the start of a mission, each robot is aligned with a global reference frame from a combination of LiDAR, IMU, and survey station measurements of the robot and a calibration gate (e.g., Figure 15). These initial calibration measurements, as well as any additional measurements generated during the mission, are added as constraints to the factor graph.

Metric and Semantic Map Generation
LAMP builds both a geometric and semantic global map from sensor measurements attached to the nodes in the factor graph. Both maps are built by projecting sensor measurements into the global reference frame by using the latest, optimized state of the associated pose nodes in the factor graph. For the geometric map, these sensor measurements (keyed sensor data in Figure 13) are either point clouds (from LiDAR, depth cameras, or visual feature tracking) or local occupancy grids. In particular, structures like confidence-rich occupancy grids  allow for encoding the environmental uncertainty, which then can be used for perception-aware coverage planning and enabling SMAP-like behaviors (Heiden et al., 2017). For the semantic map,  the sensor measurements are descriptive observations, such as detections of distinct objects (e.g., backpacks, survivors), semantic classifications of 3D spaces (e.g., doorways, stairs), or ambient measurements (e.g., temperature, pressure, gas concentration). The resulting 3D semantic map provides rich situational awareness to the operator of the robotic team, and can be the critical output data product of the overall system. The semantic map is especially important in the context of SubT, where the semantic map primarily consists of the globally localized artifact observations (both objects and ambient measurements), which is exactly the information needed for scoring (Section 7).

LAMP Performance
The performance of LAMP on a single robot dataset from a husky robot equipped with three LiDARs is demonstrated in Figure 16. LAMP achieves error at less than 0.2% of the distance traveled, with the IMU gravity factors assisting in reducing the z error in the latter portion of the trajectory (Figure 16(c)). Further single-robot tests are summarized in Table 5, from the five benchmark datasets shown in Figure 17. These benchmarks show LAMP achieving the accuracy better than 5 m on all other than the tunnel and cave datasets, which have motion-distorted LiDAR measurements. Multi-robot LAMP performance is demonstrated in Figure 18 with two huskies deployed in an urban environment. The results demonstrate the benefit of using UWB beacons ( Figure 18

Semantic Understanding and Artifact Detection
Semantic understanding of the environment and detecting objects of interest and artifacts are important capabilities to enable higher levels of robotic autonomy in unknown environments. Semantic mapping and artifact detection are among the main components of the NeBula autonomy framework. This section discusses NeBula's solution for detecting, localizing, and visualizing objects of interest on heterogeneous robots with different sensor configurations. Here, we focus on both Field Robotics, July, 2022 · 2:1432-1506 Figure 19. Multi-robot, multimodal object detection, localization, and visualization pipeline.
(i) static objects, with clear visual, thermal, or depth signature, and (ii) spatially diffused phenomena such as gas propagation and WiFi signal. The pipeline is explained in detail in Terry et al. (2020), with a summary and recent extensions detailed here. Requirements: The object detection system needs to (i) make detections in real time across multiple sensor modalities, (ii) permit high-accuracy localization, (iii) adjust the sensor configuration based on the detection and localization confidence, and (iv) apply filtering to present the most likely detection candidates to the mission supervisor (when a communication link is established). While the method presented in this section is general, in the context of the DARPA Subterranean Challenge, we focus on a set of predefined object types including gas sources (e.g., CO 2 source) and man-made objects such as fire extinguisher, drill, rope, helmet, survivor manikin, backpack, vent, and cell phone (Agrawal, 2019;Orekhov and Chung, 2021). Object signatures exist in one or more modalities: visual, thermal, depth, WiFi, audio, and chemical.
Architecture: Figure 19 shows the proposed object detection, localization, and visualization pipeline. We break down the underlying object detection problem into two stages: (1) an imagebased object detection pipeline to first find the object and (2) a relative localization filter applying projective geometry to the detection to estimate its position explicitly. By splitting the detection and localization tasks, we can utilize high quality and fast detection from existing algorithms and apply them to generic camera types in our relative localization approach. For temporally static objects, NeBula relies on multi-modal detection and, when available on robots, makes use of visual cameras, depth measurements, and thermal cameras.
Spatially diffuse phenomena: For temporally dynamic and spatially diffused phenomena, we rely on source-seeking methods based on gas sensors and WiFi sensing. The detection confidence provides the uncertainty assessment to the perception-aware planner. The planner motivates the sensor to adjust the configuration to make new measurements with higher fidelity, leading to more accurate detections. We discuss the three stages of detection, localization, and base station processing in the rest of this section.
Detection: For visually observable objects, detections are made in both the color and thermal spectra using state-of-the-art convolutional neural networks (CNNs). A CNN produces a bounding box on the image to pass to the relative localization module. NeBula relies on different CNN implementations to adapt to various processing capabilities. On ground robots, a YOLO Tiny (Redmon Field Robotics, July, 2022· 2:1432-1506 and Farhadi, 2018; Bochkovskiy et al., 2020) variant is used, leveraging GPU hardware (e.g., Nvidia Jetson Xavier) to run in real time on multiple cameras (see Section 12). On drones, a MobileDet variant (Xiong et al., 2020) is used, modified to run on a Google EdgeTPU. To achieve sufficient detection performance for our specific application, we fine-tuned the detection networks with the appropriate domain-specific dataset. In the context of SubT, we have produced more than 40,000 annotations of the following objects: fire extinguisher (12.26%), drill (9.98%), rope (17.90%), helmet (19.29%), survivor (19.84%), and backpack (20.72%). Our training is focused on maximizing recall that increases true positives. This is followed by an outlier rejection method using range, color, and size filters, reducing the false positives. For more details on training methods on this data, see Terry et al. (2020).
Localization: The detection networks produce 2D bounding boxes (within image) that are combined with depth measurements, used to compute the position of the artifact relative to the robot. The depth measurements can be obtained from multiple sources: depth cameras (such as from an RGB-D camera), LiDAR scans mapped into the camera frame, or a size-based projection, where the depth is computed such that the bounding box, when projected to 3D at that depth, matches the expected size of the object. These methods are detailed in Terry et al. (2020). For robustness to sensor failure, each method is run in parallel, with the highest priority method (LiDAR, then depth, then size projection) used if the corresponding sensor is available. All methods jointly filter multiple detections to produce a combined relative location reported to the LAMP module (Section 6) to compute the global location before sending the report to the base station. The relative location can also be computed without depth measurements, as an additional back-up, and on systems without depth cameras or LiDAR (such as drones). In this case, we use a monocular-based tracking approach over an image sequence (detailed in Ramtoula et al., 2020).
Multi-modal multi-robot artifact reconciliation: The artifact reports from each of the robots are further processed on robots with more powerful computational resources (or on base station). These reports include (i) detection class, (ii) detection confidence, (iii) reference RGB/thermal image, (iv) bounding box, and (v) location estimate. The base station processes the reports, rejecting outliers and matching observations of the same artifact instances from other agents or previous visits. To reduce the number of false positive reports, a larger and more performant detection network (YOLOv4, Bochkovskiy et al., 2020) is used to update the detection confidence of each report. Then, the report is compared to previous reports of the same class to identify repeat observations and observations of the same artifact instance reported by other agents. This comparison uses both location proximity and a comparison of NetVLAD visual image descriptors (Arandjelovic et al., 2016). The highest confidence report of each object is then saved to a database and visualized for review by the human supervisor using the mission executive interface (see Figure 20, visualization block). When needed, to increase the confidence on the detected semantics, the perception-aware motion planner seeks new measurements, e.g., from a closer or better angle to the target, or by sending a different robot with complementary sensors to get multiple readings from the target.

Spatially Diffuse Localization
To detect and locate spatially diffuse phenomena, such as gas leaks and WiFi sources, the robotic team is leveraged as a mobile sensor network, with distributed and moving ambient sensor measurements. Signal strength (e.g., CO 2 concentration or WiFi RSSI) is recorded at every robot position and is (i) used to augment the global 3D semantic map ( Figure 20, detection block) and (ii) processed to produce an initial location estimate at the area of peak signal strength. The combination of the spatially informative semantic map with an initial location estimate seeds a local search for source locations, based on the 3D geometry. Automation of this local search is ongoing work. In tests presented here, information is sent to the base station for displaying to the operator, who performs the local search on inspection of the metric-semantic map.
Detection performance: Figure 21 shows examples of a true and false positive detection for each visual artifact type. We observed that spray paint markings and existing equipment in the  environments, which share the same gross features as the target objects, are incorrectly picked up. For spatially diffuse detection (gas and WiFi), we extrapolate the source location by measuring the signal strength gradient and move the robot in directions that increase the detection confidence (see Figure 20, detection block).

Risk-Aware Traversability and Motion Planning
A fundamental component of NeBula is its risk-aware traversability and motion planning (see Figure 7). This component, which we call STEP (stochastic traversability evaluation and planning), allows the robots to safely traverse extreme and challenging terrains by quantifying uncertainty and risk associated with various elements of the terrain. In this section, we briefly discuss NeBula's Field Robotics, July, 2022 · 2:1432-1506 traversability analysis, motion planning, and controls. For more details please see  and Fan et al. (2021).

Design Philosophy
Challenges in extreme terrain motion planning: Unstructured obstacle-laden environments pose large challenges for ground roving vehicles with a variety of mobility-stressing elements. Common assumptions of a benign world with flat ground and clearly identifiable obstacles do not hold; environments introduce high risks to robot operations, containing difficult geometries (e.g., rubble, slopes) and nonforgiving hazards (e.g., large drops, sharp rocks) (Kalita et al., 2018;Léveillé and Datta, 2010). Additionally, subterranean environments pose unique challenges, such as overhangs, extremely narrow passages, etc. See Figure 22 for representative terrain features. Determining where the robot may safely travel has several key challenges: (i) localization error severely affects how sensor measurements are accumulated to generate dense maps of the environment; (ii) sensor noise, sparsity, and occlusion induces large biases and uncertainty in mapping and analysis of traversability; and (iii) the combination of various risk regions in the environment create highly complex constraints on the motion of the robot, which are compounded by the kinodynamic constraints of the robot itself.
System architecture: To address these issues, we develop a risk-aware traversability analysis and motion planning method, which (1) assesses the traversability of terrains at different fidelity levels based on the quality of perception, (2) encodes the confidence of traversability assessment in its map representation, and (3) plans kinodynamically feasible paths while considering mobility risks. Figure 23 shows an overview of the local motion planning approach. The sensor input (point cloud) and odometry are sent to the risk analysis module, evaluating the traversability risk with its estimation confidence. The generated risk map is used by hierarchical planners consisting of a geometric path planner and a kinodynamic MPC (model predictive control) planner. The planners replan at a higher rate to react to the sudden changes in the risk map. The planned trajectory is executed with a tracking controller, which sends a velocity command to the platform.
Robot agnosticism: Our approach is highly extensible and general to our different ground robot types, requiring only a change in the dynamics model of the system. Moreover, using this approach, we are able to specify a wide array of constraints and costs, such as limiting pitch or roll of the vehicle on slopes, preferring one direction of motion, keeping some distance from obstacles, fitting through narrow passages, or slowing down/stopping around risky areas. This flexibility has proven important in achieving robust navigation across the extreme traversability challenges encountered in highly unstructured environments (see Table 6).

Uncertainty-Aware Traversability
The importance of considering uncertainty: Precise traversability analysis and motion planning relies heavily on sensor measurements and localization. However, the quality of state estimation can often degrade, especially in perceptually challenging environments such as tunnels, mines, and caves. Additionally, sensors are subject to noise of various types, as well as occlusion, restricted field of view, etc. Therefore a key idea is to incorporate uncertainty awareness into our mapping for traversability and motion planning. We accomplish this by a multi-fidelity mapping approach in which we weigh more strongly high-confidence information from recent sensor measurements which are closer to the robot. Older and farther sensor measurements from the body of the robot are associated with higher uncertainties and decay more quickly. By aggregating these sensor measurements in an uncertainty-aware way, we create a robust and resilient belief-aware local map which is then used for traversability analysis. To reduce uncertainty and achieve higher levels of resiliency, we rely on multi-sensor high-FOV measurements (dense depth camera data and LiDAR Figure 24. Comparison of the mean, VaR, and CVaR for a given risk level α ∈ (0, 1]. The axes denote the values of the stochastic variable ζ , which in our work represents traversability cost. The shaded area denotes the 1 − α% of the area under p(ζ ). CVaR α (ζ ) is the expected value of ζ under the shaded area. data) to efficiently update local maps and reduce sensitivity to localization uncertainties (similar, related work includes Ahmad et al. (2021) and Hines et al. (2021)). Traversability risk analysis: To assess traversability risks, we utilize the constructed multifidelity local map. A ground segmentation method (Himmelsbach et al., 2010) is applied to the merged point cloud to filter obstacle and ceiling points. The ground point cloud is used to build a 2.5D elevation map for efficient query of terrain geometry. The elevation map and segmented point clouds are used to assess the risk of traverse from various perspectives including collision, tip-over, traction loss, and negative obstacles. Individual risk analysis is fused into a single risk value estimate with a confidence value, and sent to the planning module ( Figure 25). We define this risk value in terms of a CVaR (conditional value-at-risk) metric ( Figure 24), which quantifies the severity of the risk of a given path given all uncertainties ζ from the traversability risk analysis according to a desired threshold of probability α. This threshold can be changed by mission-level decision making in order to vary the level of acceptable risk during the mission. This risk metric is particularly useful as it captures tail events with low probability of occurrence, which may have high consequences on the success of the mission and should be taken into account. By approximating all uncertainties with a Gaussian distribution, the CVaR is efficiently evaluated to account for different types of terrain .
Semantic traversability factors: In addition to geometric traversability analyses, we can also identify certain terrain features semantically and incorporate them into the risk map. Features such as water are sometimes identifiable in LiDAR point clouds due to differences in reflectivity. Other features such as stairs can be identified using computer vision methods, which locate stairs semantically and also identify stair slope, angle, and height from known geometric priors. This information is useful for identifying nongeometric mobility risks, as well as notifying our planners to approach certain hazards differently (e.g., walking down stairs). Figure 26. Schematic of our risk-aware kinodynamic trajectory optimization approach, which combines a trajectory library-based search with convex optimization. Global path is determined by A* over a 2D grid. Heuristic paths (including v-turns, random input, etc.) are not necessarily feasible but are used to initialize the local trajectory optimization. Obstacles, constraints, and costs are convexified about the best initial path and used to iteratively solve convex quadratic programming (QP) optimization problems in real time.

Uncertainty-Aware Traversability Analysis and Motion Planning
Efficient risk-aware kinodynamic planning: Using the computed CVaR metric values on the map, we must search for a path which minimizes these values. This is done in a two-stage hierarchical fashion. The first stage operates on longer distances (40 m) and takes into account positional risk.
Using an A* algorithm over a 2D grid, this first stage yields a global geometric plan that minimizes the risk of this path. The second stage operates on shorter distances (8 m) and searches for a kinodynamically feasible trajectory that minimizes CVaR, while maintaining satisfaction of various constraints including obstacles, dynamics, orientation, and control effort. This kinodynamic planner operates in an MPC fashion, and is based on a combination of stochastic trajectory optimization and gradient-based convex optimization techniques . It runs efficiently in real time at 20-50 Hz, requiring roughly ∼ 20% of one CPU core. As shown in Figure 26, this accounts for a rich set of heuristics, making it robust to local minima. Once a trajectory is optimized, it is sent to an underlying tracking controller for execution on the platform.
Recovery behaviors: In the real world, failures are unavoidable. While our risk-aware traversability and planning architecture is meant to reduce the probability of failures, they do happen from time to time, as a consequence of failure in localization, undetected edge cases in traversability, hardware failures, or unknown unknowns. As a last line of defense, we design behaviors to recover the system from nonfatal failures. Recovery behaviors are autonomously executed when we locally detect that a commanded motion is not being followed, or no valid and safe path is found to move away from the current position. These behaviors include clearing/resetting the local traversability map, increasing the allowable threshold of risk (to try to escape an untraversable area), and moving the robot in an open-loop fashion towards the direction of maximum known free space. In most cases these recovery behaviors are sufficient to recover the robot from a stuck condition, as long as the robot has not suffered a catastrophic failure.
Learning and adaptation: Over the course of a mission we often see changes in vehicle dynamics or environmental factors. For example, decaying battery life, mechanical wear, or mud/water can all affect the vehicle's intrinsic dynamics. Additionally, changes in surfaces in the environment can strongly affect vehicle motion. To adapt to these changes we employ learning-based methods using Gaussian processes which adapt critical vehicle parameters and dynamics models based on the past history of performance (Fan et al., 2020a,b). By accounting for both the epistemic and aleatoric uncertainties of these statistical models, these methods ensure safety and robustness even in light of changing dynamics models.
Ongoing work: Our ongoing work lies in increasing the ability of NeBula to account for and handle uncertainties in both perception and planning. One major thrust of ongoing work involves perception-aware planning. By incorporating sensor models into our traversability maps, we can move towards perception-aware behaviors which maximize sensor coverage, optimally reduce uncertainty, and automatically generate active learning behaviors. This thrust is particularly important in perceptually degraded and complex environments filled with occlusion and traversability hazards. A second direction of ongoing work lies in incorporating localization uncertainty into our traversability risk mapping in a more theoretically satisfying way, while remaining computationally tractable. We call this approach "mapping using belief clouds," and the idea is to propagate the uncertainty of the robot pose while taking sensor measurements into an aggregated point cloud. The result is a belief cloud, i.e., a point cloud which encodes uncertainty information directly and efficiently. Our ongoing work involves extracting traversability metrics and risks from these uncertainty-aware mapping data (see Figure 27).

Uncertainty-Aware Global Planning
Autonomous global planning for environment exploration and coverage is a core part of the NeBula architecture (see Figure 7). NeBula formulates the autonomous exploration in unknown environments under motion and sensing uncertainty by a partially observable Markov decision process (POMDP), one of the most general models for sequential decision making. This formulation allows NeBula to jointly consider sequential outcomes of perception and control at the planning phase in order to achieve higher levels of resiliency during the mission operation. In this section, we discuss our POMDP-based global planning, referred to as PLGRIM (probabilistic local and global reasoning on information roadmaps). For more details, please see Kim et al. (2021), Bouman * et al. (2020 and CoSTAR Team (2020a).

Problem Formulation
POMDP formulation: A POMDP is described as a tuple S, A, Z, T, O, R , where S is the set of states of the robot and world, and A and Z are the set of robot actions and observations, respectively (Kaelbling et al., 1998;Pineau et al., 2003). At every time step, the agent performs an action a ∈ A and receives an observation z ∈ Z resulting from the robot's perceptual interaction with the environment. The motion model T (s, a, s ) = P (s | s, a) defines the probability of being at state s after taking an action a at state s. The observation model O(s, a, z) = P (z | s, a) is the probability of receiving observation z after taking action a at state s. The reward function R(s, a) returns the expected utility for executing action a at state s. In addition, a belief state b t ∈ B at time t is introduced to denote a posterior distribution over states conditioned on the initial belief b 0 Field Robotics, July, 2022 · 2:1432-1506 and past action-observation sequence, i.e., b t = P (s | b 0 , a 0:t−1 , z 1:t ). The optimal policy π * : B → A of a POMDP for a finite receding horizon is defined as follows: where γ ∈ (0, 1] is a discount factor for the future rewards, and r(b, a) = s R(s, a)b(s)ds denotes a belief reward which is the expected reward of taking action a at belief b. T is a finite planning horizon for a planning episode at time t. Given the policy for last planning episode, only a part of the optimal policy, π * t:t+∆t for ∆t ∈ (0, T ], will be executed at run time. A new planning episode will start at time t + ∆t given the updated belief b t+∆t . The computational complexity of a POMDP grows exponentially with the planning horizon (Pineau et al., 2003), and we tackle this challenge with hierarchical belief space representation and planning as to be detailed in Section 9.2.
Application to simultaneous mapping and planning (SMAP): To formalize our SMAP problem as a POMDP, we define the state s = (q, W ) as a pair of robot q and world state W . We further decompose the world state as W = (W occ , W cov ) where W occ and W cov describe the occupancy and the coverage states of the world, respectively associated with their uncertainties (e.g., Agha-mohammadi et al., 2019). A reward function for coverage can be defined as a function of information gain I and action cost C as follows: where is quantified as reduction of the entropy H in W cov after observation z, and C(W occ , q, a) is evaluated from actuation efforts and risks to take action a at robot state q on W occ . Minimizing this cost function in Eq. (1) simultaneously solves for the mapping and planning (SMAP), maximizing the coverage for artifact detection and minimizing the action risk (e.g., collision chance). This reward function can be generalized to SLAP problems (e.g., Agha-mohammadi et al., 2018or Kim et al., 2019b by incorporating information gain based on localization entropy reduction events, such as a loop closure or landmark detection.

Hierarchical Coverage Planning on Information Roadmaps
In this subsection, we introduce NeBula's solution for uncertainty-aware global coverage planning, PLGRIM (Probabilistic Local and Global Reasoning on Information roadMaps) (Kim et al., 2021). PLGRIM proposes a hierarchical belief representation and belief space planning structure to scale up to spatially large problems while pursuing locally near-optimal performance (see Figure 28). At each hierarchical level, it maintains a belief about the world and robot states in a compact form, called an information roadmap (IRM), and solves for a POMDP policy to generate a coverage plan over a nonmyopic temporal horizon, in a receding horizon fashion. Hierarchical POMDP formulation: First, we formulate the receding-horizon SMAP in Eq. (1) into a hierarchical POMDP problem (Kaelbling and Lozano-Pérez, 2011;Kim et al., 2019a). Let us decompose a belief state b into local and global belief states, b = P (q, W ) and b g = P (q, W g ), respectively. W is a local, rolling-window world representation with high-fidelity information, while W g is a global, unbounded world representation with approximate information (see Figure 29). With π and π g denoting the local and global policies, respectively, we approximate Eq. (1) as cascaded hierarchical optimization problems as follows: where Here r (b , π (b )) and r g (b g , π g (b g )) are approximate belief reward functions for the local and global belief spaces, respectively. Note that the co-domain of the global policy π g (b g ) is a parameter space Θ of the local policy π (b ; θ ), θ ∈ Θ . Hierarchical belief representation: For a compact and versatile representation of the world, we rely on a graph structure, G = (N, E) with nodes N and edges E, as the data structure to represent the belief about the world state. We refer to this representation as an IRM (Aghamohammadi et al., 2014). We construct and maintain IRMs at two hierarchical levels, namely, local IRM and global IRM (see Figure 29). The local IRM is a dense high-resolution graph that contains high-fidelity information about the occupancy, coverage, and traversal risks, but locally around the Field Robotics, July, 2022 · 2:1432-1506 robot. In contrast, the global IRM sparsely captures the free-space connectivity. It encodes uncovered area by so-called frontier nodes, which allow for effective representation of large environments, spanning up to several kilometers. In addition to the map uncertainty, IRM can be generalized to incorporate the robot localization uncertainty (e.g., Kim et al., 2019bor Agha-mohammadi et al., 2018 in the planning framework when traversing narrow passages and challenging environments where robot location uncertainty can hinder the robot's ability to navigate the environment. Hierarchical coverage Planning-GCP: Given local and global IRMs as the hierarchical belief representation of W and W g , respectively, we solve the cascaded hierarchical POMDP problems. At first, a global coverage planner (GCP) solves for the global policy in Eq. (4), providing global guidance to a local coverage planner (LCP) of Eq. (3). The global guidance enhances the coverage performance and global completeness. It is especially helpful when the LCP has fully covered the local area and needs global guidance to move to another area. In order for the GCP to scale up to very large problems, we adopt the QMDP approach (Littman et al., 1995) (see Figure 30). The main idea is to assume the state becomes fully observable after one step of action under uncertainty, so that the value function for further actions can be evaluated efficiently in a Markov decision process (MDP) setting. This assumption is acceptable for the GCP since its main role is to guide the robot to an uncovered area and let the LCP lead the local coverage behavior. In other words, the GCP's policy search can terminate at frontier nodes of the global IRM, and thus we can assume no changes in the coverage state during the GCP's planning episode and adopt QMDP for efficient large-scale planning. The complexity of the GCP is O(N g iter |N g | |N g nn | 2 ), where N g iter is the maximum number of iterations for Bellman update, |N g | is the number of nodes on the global IRM, and |N g nn | is the number of nearest-neighbor nodes for a node connected by an edge on the global IRM which is bounded to a small finite number. Thus, the complexity of the GCP grows only linearly with |N g |.
Hierarchical coverage planning-LCP: In the hierarchical optimization framework, the LCP solves Eq. (3), given a parameter input from the GCP. The LCP constructs a high-fidelity policy by considering the information gathering (with visibility check given obstacles), traversal risk (based on proximity to obstacles, terrain roughness, and slopes), and robot's mobility constraints (such as acceleration limits and nonholonomic constraints of wheeled robots). The LCP has two phases: (i) reach the target area based on the GCP's guidance and (ii) construct a local coverage path after reaching the target area. In the first case, when the target frontier is outside the local IRM range, the LCP instantiates high-fidelity motion commands to reach the target frontier. In the second case, when the target frontier is within the local IRM range, then the LCP performs the information-gathering coverage optimization, as described in Eq. (3). In order to solve the coverage optimization problem we employ the partially observable Monte Carlo planning (POMCP) algorithm (Silver and Veness, 2010;Kim et al., 2019b). POMCP is a widely adopted POMDP solver that leverages the Monte Carlo sampling technique to alleviate both the curses of dimensionality and Field Robotics, July, 2022 · 2:1432-1506 Figure 31. Illustrative example of coverage path planning on the local IRM with Monte Carlo tree search. The field of view of the robot's coverage sensor is represented by a blue circle. Macro actions (six steps on local IRM in this example) associated with the two tree branches, paths A and B, are shown. Note that the final world coverage states in both branches are identical. Path A is evaluated to be more rewarding than B since fewer actions were required to cover the same area.
history (see, e.g., Figure 31). Given a generative model (or a black box simulator) for discrete action and observation spaces, POMCP can learn the near-optimal value function of the reachable belief space with adequate exploration-exploitation trade-off. We limit the complexity of the LCP process to O(N iter N depth ) of calling the generative model, where N iter is the number of iterations for episodic forward simulation and N depth is the depth of planning horizon for each iteration. Since it is a local rolling-window planner, there is no increased complexity with the total size of the environment.

Experimental Evaluation
In order to evaluate our proposed framework, we perform high-fidelity simulation studies with a four-wheeled vehicle (Husky robot) and real-world experiments with a quadruped (Boston Dynamics Spot robot). Both robots are equipped with custom sensing and computing systems, enabling high levels of autonomy and communication capabilities . The entire autonomy stack runs in real time on an Intel Core i7 processor with 32 GB of RAM. The stack relies on a multi-sensor fusion framework. The core of this framework is 3D point-cloud data provided by LiDAR range sensors mounted on the robots . We refer to our autonomy stack-integrated Spot as Au-Spot (Bouman * et al., 2020).
Baseline algorithms: We compare our PLGRIM framework against a local coverage planner baseline (next-best-view method) and a global coverage planner baseline (frontier-based method).

Next-Best-View (NBV):
NBV is a widely adopted local coverage planner that returns a path to the best next view point to move to. It uses an information gain-based reward function as ours but limits the policy search space to a set of shortest paths to sampled view points around the robot. While NBV is able to utilize local high-fidelity information, it suffers from spatially limited world belief and sparse policy space.

Hierarchical Frontier-based Exploration (HFE): Frontier-based exploration is a prevalent global
coverage planning approach that interleaves moving to a frontier node and creating new frontiers until there are no more frontiers left (e.g., Umari and Mukhopadhyay, 2017). It optimizes for the global completeness of environment exploration but often suffers from local suboptimality due to its large scale of the policy space and myopic one-step look-ahead decision making. The performance of frontier-based methods can be enhanced by modulating the spatial scope of frontier selection, but it still suffers from downsampling artifacts and a sparse policy space composed of large action steps.

Simulation Evaluation
We demonstrate PLGRIM's performance, as well as that of the baseline algorithms, in simulated subway, maze, and cave environments. Figure 32 visualizes these environments. In our comparisons, in order to achieve reasonable performance with the baseline methods in complex simulated environments, we allow baseline methods to leverage our local and global IRM structures as the underlying search space.
Simulated subway station: The subway station consists of large interconnected, polygonal rooms with smooth floors, devoid of obstacles. There are three varying sized subway environments, whose scales are denoted by 1×, 2×, and 3×. Figure 33(a)-(c) shows the scalable performance of PLGRIM against the baselines. In a relatively small environment without complex features (Subway 1×), NBV performance is competitive as it evaluates high-resolution paths based on information gain. However, as the environment scale grows, its myopic planning easily gets stuck and the robot's coverage rate drops significantly. HFE shows inconsistent performance in the subway environments. The accumulation of locally suboptimal decisions, due to its sparse environment representation, leads to the construction of a globally inefficient IRM structure. As a result, the robot must perform time-consuming detours in order to pick up leftover frontiers.
Simulated maze and cave: The maze and cave are both unstructured environments with complex terrain (rocks, steep slopes, etc.) and topology (narrow passages, sharp bends, dead ends, open spaces, etc.). The coverage rates for each algorithm are displayed in Figure 33(d)-(e). PLGRIM outperforms the baseline methods in these environments. By constructing long-horizon coverage paths over a high-resolution world belief representation, PLGRIM enables the robot to safely explore through hazardous terrain. Simultaneously, it maintains an understanding of the global world, which is used when deciding where to explore next after exhausting all local information. In the cave, Figure 34. The local IRM (yellow, brown, and white nodes represent uncovered, covered, and unknown areas, respectively) is shown overlaid on the Riskmap. A yellow arrow indicates the robot's location. LCP plans a path (red) that fully covers the local area (snapshot A). When p(W ) updates, the path is adjusted to extend towards the large uncovered swath while maintaining smoothness with the previous path. Another p(W ) update reveals that the path has entered a hazardous area-the wall of lava tube (snapshot B). As a demonstration of LCP's resiliency, the path shifts away from the hazardous area, and the robot is redirected towards the center of the tube (snapshot C). One minute later, the robot encounters a fork in the cave. The LCP path curves slightly toward the fork apex (for maximal information gain) before entering the wider, less-risky channel (snapshot D).
NBV's reliance on a deterministic path, without consideration of probabilistic risk, causes the robot to drive into a pile of rocks and become inoperable. NBV exhibits similarly poor performance in the maze. However, in this case, NBV's myopic planning is particularly ineffectual when faced with navigating a topologically complex space, and the robot ultimately gets stuck. As was the case in the subway, HFE suffers in the topologically complex maze due to the accumulation of suboptimal local decisions. In particular, frontiers are sometimes not detected in the sharp bends of the maze, leaving the robot with an empty local policy space. As a result, the robot cannot progress and spends considerable time backtracking along the IRM to distant frontiers.

Real-World Evaluation
We extensively validated PLGRIM on physical robots in real-world environments. In particular, PLGRIM was run on a quadruped robot in Valentine lava tube, located in Lava Beds National Monument, Tulelake, CA. The cave consists of a main tube, which branches into smaller, auxiliary tubes. The floor is characterized by ropy masses of cooled lava. Large boulders, from ceiling breakdown, are scattered throughout the tube. Figures 34 and 35 discuss how PLGRIM is able to overcome the challenges posed by large-scale environments with complex terrain and efficiently guide the robot's exploration. Figure 33(f) shows the area covered over time.

Multi-Robot Networking
Multi-robot systems offer advanced capabilities to enable complex and time-constrained missions in large-scale complex environments. Resilient wireless mesh networking solutions are a prerequisite for reliable and efficient multi-robot missions. NeBula is inherently a "networked" solution ( Figure 7). While it can be implemented on a single autonomous robot, it also allows for faster and more efficient missions with multiple potentially heterogeneous robots (see Figure 5). NeBula's goal in the SubT challenge is to map an unknown subterranean environment, locate artifacts, and communicate that information to the base station via a wireless mesh network for submission to the DARPA Command Post. Inter-robot wireless communication in subterranean environments is particularly challenging and uncertain in the reliability, capacity, and availability of communication links because of (i) limited line-of-sight opportunities, (ii) the complicated nature of the interaction of radio signals Figure 35. Portions of the global IRM constructed in the lava tube are visualized-yellow nodes represent frontiers, brown nodes represent breadcrumbs. Gray arrows associate a frontier with a snapshot of the robot exploring that frontier. GCP plans a path (blue) along the global IRM to a target frontier after the local area is fully covered (snapshot E). The robot explores the area around the frontier (snapshot F), and then explores a neighboring frontier at the opening of a narrow channel to its right. LCP plans a path (green) into the channel (snapshot G). Later, after all local areas have been explored, the robot is guided back towards the mouth of the cave along the breadcrumb nodes (snapshot H).
with the environment (e.g., reflecting, scattering, multipath), and (iii) the unknown nature of the environment. In this section, we will go over NeBula's Collaborative High-bandwidth Operations with Radio Droppables (CHORD) communication system for comm-degraded subterranean environments. The objective of CHORD is to maintain high-bandwidth links to multiple robots for efficient commanding, autonomous operation, and data gathering in complex unknown environments. For more details on the development of NeBula's networking solutions, see Otsu et al. (2020) and Ginting et al. (2021). Robots without a route through the network to the base station must be fully autonomous and return to an area with communication coverage to return data. Alternatively (not pictured), multiple robots without a wireless route to the base station may share data and have one act as a data mule to carry the data back to communication range.
responsible for handling inter-agent communication. This further isolates intra-robot communication from inter-agent communication and prevents inadvertent radio traffic. Intra-robot communications are monitored using ROS 1 topic statistics.
Inter-agent communication: Even with relatively high power/bandwidth radios, bandwidth is still a shared, limited, and temporary resource. Efforts must be taken to manage bandwidth usage and be robust to communication loss when robots operate outside the range of the radios. CHORD uses ROS 2 over the wireless mesh network for inter-agent communication. The advanced quality of service (QoS) features of ROS 2 are used to guarantee delivery of important priority data while maintaining network stability over low-bandwidth links. This configuration enables traffic prioritization and resource control. We configured two categories of QoS for inter-robot topics with different mission requirements. Topics that require full message history transfer for post-processing or that deliver mission-critical information have higher priority and are configured so that the messages are reliably delivered even though the network may be down for periods of time. Estimated link capacities, data routes, and data traffic are also monitored to ensure stability.
inter-robot communication, we isolated the ROS 1 networks completely and avoided unintended data flows between agents. The network isolation also helped to diagnose network issues easily as every inter-robot ROS topic passes through the bridge node. Second, we were able to keep network traffic inside our bandwidth budget, which contributed to the stability of the dynamic network. For more details on the results, see Ginting et al. (2021).

Mission Planning and Autonomy
Having the capability to autonomously plan, reconfigure, and perform tasks for a multi-robot system is a crucial component of the NeBula autonomy framework (see Figure 7), enabling exploration of large, complex, and unknown environments. Especially in the context of the SubT challenge, when there is none or unreliable intermittent communication between robots and a single human supervisor, autonomy is crucial to achieve mission objectives, under time and resource constraints. In this section, we present NeBula's mission autonomy components, while integrating and allowing a single human supervisor to oversee and interact with a team of more than five heterogeneous robots at the same time, under range-limited and unreliable communication in challenging environments. For technical details, please see Otsu et al. (2020), Vaquero et al. (2020), and Kaufmann et al. (2021). Figure 37 illustrates the components of the mission autonomy architecture and their interface to components. In the following paragraphs, we describe key components of the system's mission planning and autonomy.  Field Robotics, July, 2022 · 2:1432-1506

Architecture
• IRM Manager: The IRM is a key data structure to exchange information between humans and machines. Certain robot autonomy behaviors and assistive capabilities utilize the latest belief states in the IRM within autonomous decision-making and planning processes. • Mission Watchdog: The mission watchdog monitors mission progress and communication links.
It is an external mechanism that ensures that the robots can transmit their world belief back to the base station regardless of the mission executive states. • Copilot and Assistive Behaviors: The mission autonomy assistant, Copilot for short, encompasses a series of monitoring and assistive capabilities (e.g., system health monitor, comm node drop assistance) that perform autonomous tasks and keep the human in the loop if possible and needed. The details of Copilot, including its assistive capabilities, planning and scheduling, and human interaction via user interfaces, are explained in Section 11.3. • User Interface: The user interface consists of carefully designed web-based components and RViz displays for increased situational awareness. It provides an efficient interface to send various levels of commands with minimal control.

Autonomy
Exploring unknown and complex environments with a team of multiple robots comes with several challenges. The difficulty of operating a single robot under limited available communications bandwidth, let alone a team, motivates the need for full autonomy during different phases of an exploration mission when communications are not available. In situations where robots are outside of each other's communication range, robots must remain fully autonomous and independently reason about their environment to determine their next task. A subset of this problem arises when robots are within each other's communication range and the robots must devise a coordinated plan. This section describes how we utilize various levels of mission autonomy. Planning and scheduling: Mission planning and task scheduling constitute the highest layer of the planning modules and are integral components in achieving full autonomy outside and within a communication range (Figure 4). The mission planner maintains a world belief, as described in Section 3, which comprises the state of the robot team (e.g., robot health, robot location, detected artifacts on each robot), the state of the world (e.g., geometric and semantic maps), the state of the mission (e.g., remaining mission time, margin to desired mission output), and the state of communication (e.g., network connectivity, location of comm nodes). As the world belief increases and improves, the mission planner dynamically retasks robots to new goals or deploys new robots in order to achieve the mission goals defined in the mission specification. Ongoing work focuses on using semantic information about the world (e.g., stair wells, door frames, intersections, room volume) in the mission planner to provide additional belief of where critical information may lie to help improve mission success. The task scheduler works in conjunction with the mission planner and robot autonomy behaviors to actively schedule and assign agents for each task given various constraints. This is one of the mission autonomy features that allows the system to actively deal with temporal uncertainties, dependencies, dynamic resource constraints, and varying risks and mission strategies. The scheduling component is modular and can be interchanged with a variety of existing planning and scheduling frameworks. Currently, a planning problem is formulated using the Planning Domain Definition Language (Fox and Long, 2002) and is solved using OPTIC (Benton et al., 2012). The solver updates the task schedule at a fixed cadence.
Executive: The executive is a task manager that ensures each scheduled task is dispatched to each respective agent at the correct time. It tracks the state of all tasks and requests the task scheduler to reschedule when tasks fail, or relaxes temporal constraints when the current schedule is infeasible.
Mission specification: A mission file is used to describe a mission and combines several highlevel robot autonomy behaviors to create the flow for more complex scenarios. For this, we use the Traceable Robotic Activity Composer and Executive (TRACE) proposed in (de la Croix et al., 2017) and the Business Process Modelling Notation (BPMN). Figure 38 shows one example "Exploration" mission file that was used in one of our Cave exploration scenarios. The mission starts off in the upper-left corner with a prompt for the user to begin the mission which ensures the robots can start moving autonomously in a safe manner. Each rectangular box depicts a service task-they represent robot behaviors in our architecture. Once the mission is started, the mission flow moves towards the parallel gateway. This gateway allows multiple flows to branch off of a single flow. In the case of the exploration BPMN, this allows five behaviors to run in parallel. Robot autonomy behaviors: Behaviors used in the example mission file ( Figure 38) represent a subset of NeBula's robot autonomy behaviors; some are described here: • Move to Next Frontier -Receives the current frontier as a goal from the global planning module and commands the agent to move to it. • New Roadmap Comm Node Monitor -Monitors the IRM for new mesh network extension requests and interrupts the Move to Next Frontier behavior temporarily to initiate Comm Drop Autonomy. • Comm Drop Autonomy -Autonomous selection of optimal target location to drop a communication node to maximize communication coverage while minimizing the risk of violating safety and operational constraints for the robots traversing the local environment . • Transit to Comm Node Location -Commands the agent to move towards the dropping location through the IRM. • Drop Comm Node -Instructs the dropper firmware to drop a comm node immediately. • Collision Avoidance -Prevents inter-robot collision by monitoring inter-robot distances.
When robots are too close, this behavior performs a prioritized motion planning to resolve the situation. • Comms Heartbeat Regain Monitor -Continuously monitors for heartbeat messages from the base station and terminates successfully when a consistent stream of messages is detected. • RTB (Return to Base) -Locates the agent within the current IRM and finds the shortest path to the base IRM node. Then sends this path to the mobility manager and terminates successfully when the agent has reached the base. • Stairs Helper -Detects stairs and assists during a stair climbing procedure.

Assistive Autonomy and Human-Robot Interaction
In situations where there is sufficient communication bandwidth to interface with a team of multiple robots, having the capability to provide situational awareness of all robot activities and the mission progress to the operator becomes very useful. Assistive autonomy at the base station facilitates this human-robot interaction especially under limitations like the single-supervisor requirement of the SubT challenge, or available cognitive workload.
Autonomy Copilot MIKE: The Multi-robot Interaction assistant for unKnown cave Environments (Copilot) is introduced in Kaufmann et al. (2021) and supports the single human supervisor during the setup and mission phases of complex multi-robot operations. Copilot treats the human supervisor as one node (or a member) of the multi-robot system and it actively schedules and reschedules the tasks for all members while considering world beliefs, resources, and other constraints (e.g., human cognitive workload, available comm nodes, etc.). Some tasks can be automatically executed or resolved depending on the allowed autonomy level. Currently, Copilot comprises assistive capabilities, robot behaviors, assistive task scheduling, and user interfaces ( Figure 37) used to autonomously control the robots and guide the human supervisor's decision making while keeping them in the loop if a communication link is established.
Assistive capabilities: Assistive capabilities are distributed across the robots and the base station to reduce the supervisor's workload. These capabilities assist to (1) detect system anomalies, (2) create tasks for the Copilot core module, (3) monitor the resolution of these tasks, and (4) resolve tasks autonomously or inform the supervisor that actions are needed. The assistive capabilities run independently without knowing the complete system status, which allows a modular extension of the autonomy functionality. Assistive capabilities address human limitations and autonomously intervene during mission execution as needed and when possible. Tasks created by these capabilities can inform the supervisor of the status of communication links between the robots and the base (e.g., communication loss, communication node drop suggestions), a robot's sensor suite status or mobility status (e.g., stuck, tilted, undesirable oscillations), and the system's autonomy software status (e.g., process health).
Assistive task scheduling: The assistive task scheduler shares the implementation of the task scheduler mentioned prior and considers the operator(s) as an agent (or resource) to which tasks can be assigned. As more tasks become automated and can be assigned to the Copilot, the operator becomes more readily available to assist with mission-critical tasks.
User interface: The user interface consists of several web-based components and a configurable RViz view showing a 3D or 2.5D map representation of the explored environment . Figure 39 illustrates the single screen interface which includes the Copilot and robot status components. The Copilot component is designed to increase situational awareness, inform the supervisor of mission tasks, and dispatch urgent system notifications. The limited interaction between the system and the single human supervisor increases the supervisor's trust in the system's performance, as they are kept in the loop and may still intervene if and when a communication link is established and if the mission strategy requires a change (e.g., taking a more risky, more conservative posture). The user interface provides an additional robot status component for each robot to directly interact with agents or conduct troubleshooting. Figure 39 gives an example of this component which is used when communication links exist; the human supervisor needs to shift their attention to a single robot to directly interact with it. This component comprises elements to reflect real-time housekeeping data, such as current position, sensor health status, estimated battery remaining, and remaining comm nodes. It also integrates controls for changing a robot's role to either act as a vanguard explorer or provide support to the leading agents in the mission. Lastly, the component integrates the current BPMN mission diagram and highlights an agents' active autonomy behavior to reflect the robot internal state of the mission executive.
Human trust in autonomy: Copilot has been deployed in multiple field tests and mission simulations in diverse environments providing us with useful feedback regarding the interface design and multi-robot operations. Previous to the implementation of Copilot, we learned that our human supervisors were overwhelmed with remembering tasks instead of strategically overseeing the robots' activities and overall mission progress. Now our human supervisors let the Copilot handle several decision-making processes and must trust its autonomy capabilities (Kaufmann et al., 2021), even if a supervisor could make their own decisions.

Mobility Systems and Hardware Integration
As described in Section 3, to simultaneously address various challenges associated with exploring unknown challenging terrains, we rely on a team of heterogeneous robots with complementary capabilities in mobility, sensing, computing, and endurance. These assets are deployed in the mission as the robots learn about the terrain specifications. Figure 40 shows four classes of our robots: (i) wheeled rovers, to cover general and relatively smooth surfaces and mild obstacles; (ii) legged robots, to cover more challenging and uneven terrains where surmounting obstacles or staircases are required (Bouman * et al., 2020;CoSTAR Team, 2020a;Miller et al., 2020;Jenelten et al., 2020); (iii) tracked robots, to complement legged platforms in handling different surface material; and (iv) aerial and hybrid locomotion, to enable traversing vertical shafts, and areas that are not accessible by ground robots.

Ground Robots
Our ground robots are able to carry heavy payloads. Hence they are equipped with high levels of sensory and processing capabilities enabling complex autonomous behaviors and artifact detection. Their battery capacity allows them to have longer operational time than flying vehicles. Below, we discuss the NeBula Autonomy Payload on these ground robots.
Architecture: The NeBula payload ( Figure 41) consists of the NeBula Sensor Package (NSP), NeBula Power and Computing Core (NPCC), NeBula Diagnostics Board (NDB), and NeBula Communications Deployment System (NCDS). Its electronics and software architecture is modular, to facilitate adaptation to varying mechanical and power constraints of each platform in our heterogeneous robotic fleet.
Design principles: The key design principles are as follows: • Durability: Shock proofing of the system increases longevity and self-recovery chances when exploring challenging terrains. • Lightweight materials: Payload reduction maximizes robot agility and battery life.  • Modularity: A critical feature when developing a heterogeneous fleet of robots with various capabilities is modularity. For example, processors and sensors are adaptable across different mobility systems with differing mass/size constraints. • Portability: Ease of transportation minimizes damage to valuable hardware.
• HW-autonomy co-design: Adaptable design process enables reconfiguring sensors to adapt to autonomy evolution.

NeBula Sensor Package (NSP):
The NSP empowers the NeBula autonomy solution on the robots by gathering sensory information in real time from the environment. NSP is heterogeneous; on each robot NSP consists of a subset of the following sensors: LiDARs, monocular, stereo, and thermal cameras, external IMUs, encoders, contact sensors, ultra-high-lumen LEDs, radars, gas sensors, and UWB and wireless signal detectors. NSP is protected by custom superstructures with impact protection. A combination of hard resin urethane, semi-rigid carbon-infused nylon, and aluminum was used. The NSP interfaces with the NPCC via high-bandwidth USB and Ethernet for data and custom serial messages and push-pull connectors for high-amp power. One version is depicted in Figure 42, which shows NSP payloads utilizing an array of Velodyne Puck VLP-16, Intel Realsense d435i, and FLIR Boson 640, among several others; the sensors are interchangeable and reconfigurable to accommodate different sensor arrangements, manufacturers, and sensing modalities.

NeBula Power and Computing Core (NPCC):
The NPCC is an auxiliary payload which provides power to all NeBula sensors and computers used for autonomy. Aluminum enclosures provide protection to the internal electronics in the event of atypical loads and impacts such as falls and collisions. It is designed with considerations for thermal cooling and haptics due to extensive cycling of the connector interface panel. The modular, auxiliary-mount design was tweaked to accommodate for the reduced flight weight of the drones. The NPCC is powered from an external lithium high-capacity battery to provide isolation and extended battery life for the internal battery across the heterogeneous fleet. The payload uses two high-power computers for sensing, autonomy, and semantic scene understanding and also hosts the low-level microprocessor. On some robots, the NPCC is equipped with a GPU-based system-on-module with a custom interface to the power modules, cameras, and sensors to accommodate machine learning and semantic understanding functionalities. The various configurations of the NPCC can be seen in Figure 43.

NeBula Diagnostics Board (NDB):
The NDB implements the system diagnostics, which monitors the vital power elements of the robot such as battery voltage, input current, and individual regulator voltages. When the robot boots initially, all voltage regulators are powered up in a staggered sequence to limit the inrush loads to the NDB. After each voltage regulator is enabled, the processor checks that the voltage is within the expected range and reports errors if any are found. In addition, the current monitoring checks for high current draw when each of the regulators is enabled to detect possible short circuits in the various robot sensors and mechanisms. The power module has an input protection circuit to protect against voltage transients, reverse polarity, and undervoltage. A custom ROS message combines all these measurements and is constantly publishing the hardware power status to a specific robot diagnostic topic using the rosserial interface.

NeBula Comm Deployment System (NCDS):
As discussed in our ConOps in Section 3, we construct and expand a wireless mesh network near the environment entrance, to extend the reach of the base station. In order to do so, ground robots are equipped with a comm deployment system (NCDS), which allows them to carry and deploy communication radios (comm nodes) and static assets during the mission. The radios are autonomously deployed using the NCDS which mounts at the rear of the robots seen in Figure 44. The comm nodes are encased in aluminum until release, and the NCDS electronics are mounted locally on the mechanism and sealed for ingress protection. The core protection and brake-lock/release mechanism was modularized across the fleet though the deployment capacity was reduced for the legged and tracked vehicles due to available mounting points, payload constraints, and sensor occlusion. It is driven by a geared brushless motor and activated via a custom embedded system communicating over rosserial. A ROS message provides the status of each radio (loaded, dropped). The NCDS circuit board interfaces with the NPCC via serial communication. It monitors the hard stop switches used for calibration and the switches responsible for detecting if radios are loaded/released. A high-level representation is depicted in Figure 45.
The NCDS software architecture relies on a finite state machine (FSM), with the following representative activities: • Start up: Checks for battery power. Enables motor power and establishes links with motors.
• Calibration: Servos is actuated until limit switches are contacted. Encoder positional values are then stored in local memory. • Load/Unload: Manual switches allows the user to load and swap radios. • Deploy Radio: Perform radio deployment. Use sensors to detect if deployment was successful.

Static Assets
Static assets refer to hardware solutions that are not capable of changing their position after deployment. Our static assets are composed of comm nodes and UWB modules as pictured in Figure 46. The comm nodes construct NeBula's mesh network to enable more efficient inter-robot information exchange, while the UWB provides auxiliary landmarks and provides ranging measurement to assist the SLAM and global localization modules, as explained in Section 6. Static nodes can be powered prior to the start of a mission or prior deployment. They are carried by mobile robots and deployed by the NCDS.

Flying Vehicles
NeBula has been implemented on several flying robots of varying sizes. Flying robots are responsible for exploring vertical shafts and areas not accessible with ground robots, or relaying data from the ground robots to the base station by quickly flying to regions with strong communication connectivity, e.g., for data muling (see Section 10). Their processors and sensing capability are much more constrained than our ground robots. The concept behind the drone development is to keep a balance between the payload, the size of the drone, and the endurance. Thus, an iterative design has been performed in order to conclude in an optimal hardware configuration (Jung et al., 2021). Figure 47 shows several NeBula-powered aerial vehicles: A specific example is shown in Figure 48, a custom drone extensively utilized in this project that carefully balances speed, weight, autonomy capability, and flight time. The vehicle's weight is 1.5 kg and provides 12 minutes of flight time. A 2D rotating RPLiDAR A2 is mounted on top of the vehicle, providing range measurements at 10 Hz and a monocular visual sensor at 30 FPS. The velocity estimation is based on the PX4Flow optical flow sensor at 20 Hz, while the height measurements are provided by the single beam LiDAR-lite v3 at 100 Hz, both installed on the Field Robotics, July, 2022 · 2:1432-1506  bottom of the vehicle pointing down as indicated in Figure 48. Furthermore, the aerial platform is equipped with two 10 W LED light bars in both front arms for providing additional illumination for the forward looking camera and four low-power LEDs pointing down for providing additional illumination for the optical flow sensor.

Hybrid
To extend the range of the flying drones, our team has been investigating and designing new hybrid ground/aerial vehicles, referred to as "rollocopters." Rollocopters are designed to mainly roll on the ground; when rolling is not possible, they can fly, negotiate a nonrollable terrain element, and land on the other side, and continue rolling. The combined rolling/flying behavior can extend the operational lifetime severalfold. Furthermore, the structure used for rolling the robot is designed to provide impact resilience while flying which provides further robustness. Our hybrid platforms consist of an adapted version of the above-mentioned NeBula payload along with the pixhawk firmware (Meier et al., 2015). Fan et al. (2019) describes the details of integration of the NeBula autonomy stack on the rollocopter platform.
Design evolution: Figure 50 shows different variations of the rollocopter platform. The Hytaq version (Kalantari and Spenko, 2014) consists of a rotating cage, which encloses the drone to enable rolling. Figure 49 shows a timelapse for early hybrid mobility tests on reinforced versions of this platform at JPL's Mars Yard (CoSTAR Team, 2018a). Its modular, multi-agent version, called Shapeshifter Tagliabue et al., 2020), self-assembles this shell using permanent electromagnets. The passive-wheeled-rollocopter (PW-rollocopter) Lew et al., 2019) uses two independent passive wheels to enable large sensory FOV and autonomy. To reduce propeller-generated dust, DrivoCopter  and BAXTER (Choi et al., Field Robotics, July, 2022 · 2:1432-1506   (Kalantari and Spenko, 2014), modular hybrid vehicle , (bottom row, left to right) autonomous rollocopter Lew et al., 2019), DrivoCopter , BAXTER (Choi et al., 2020), spherical rollocopter (Sabet et al., , 2019. 2020) implement dedicated wheeled actuators, at the expense of a slight reduction in flight time. The latest work is focused on omnidirectional spherical rollocopters that provide maximum agility and maneuverability while having the minimum friction (Sabet et al., 2019.

Experiments
In this section, we present Team CoSTAR's multi-year effort to validate NeBula technologies on physical and virtual systems. We first briefly discuss our simulation-based validation approach, and then discuss the performance of NeBula on heterogeneous physical robot teams in various challenging real-world environments. The videos in CoSTAR Team (2020aTeam ( ,b,c,d, 2021 depict some highlights of these runs.

Simulation Results
Simulator configuration: We use simulations both for component development and integration testing. NeBula relies on multiple simulator configurations at different fidelity levels to enable faster and more focused development. Examples include a high-fidelity Gazebo simulator for testing ground vehicle behaviors, flight stack software-in-the-loop simulations for analyzing flight performance, a docker-based multi-robot networking simulator, and a low-fidelity dynamics simulator for Monte Carlo analysis. Our simulator setup is portable, and used in local and cloud environments depending on the computational resources it requires and the number of agents that are deployed during the test.   Figure 51 shows selected examples of our simulated robot and environment models. In addition to many existing open-source models including the ones available at SubT Tech Repo (DARPA, 2018b), we built our custom robot and environment models, some of which are submitted to and available on the same repository. For example, Figure 52 shows a comparative study of drone navigation algorithms using the robot and environment models that are publicly available on the repository.

Robot and environment models:
Self-organized simulation events: We regularly organize simulation-based events ("virtual" demos emulating the virtual DARPA SubT Challenge; see Chase and Kitchen, 2020) to track performance statistics over time. In each virtual demo, we evaluate the latest system performance based on several evaluation metrics. We developed a set of automated analysis tools to evaluate statistics on exploration, localization, mapping, artifact scoring, and human intervention. Simulation enables us to measure the statistics from large-scale tests which cannot be obtained easily with  hardware experiments. Figure 53 shows exploration and operation statistics of a single robot simulation in the Gazebo simulator. The competition statistics (Figure 53(a)) provide high-level evaluation with the number of sectors covered, number of artifacts scored, and robot trajectories in the course. The exploration statistics (Figure 53(b), (c)) evaluate how quickly and efficiently the autonomous exploration behavior covered the large-scale environment. The human supervisor intervention is monitored to measure the reliability of the autonomous system ( Figure 53(b)). The localization and mapping performance ( Figure 53(b), (d)) is evaluated against the ground truth dataset generated from the simulator. Finally, the artifact scoring performance is quantified with detection and localization evaluation (Figure 53(b)).

Field Tests and Demonstrations
Our system has been rigorously field tested in over 100 field tests in 17 different off-site locations, from lava tubes to mines 240 m underground as well as numerous on-site locations including JPL's Mars Yard. This testing regime is a fundamental part of field hardening our system to perform robustly in real-world challenging environments. A snapshot of some of our field-test locations is shown in Figure 54. In this section, we discuss highlights of our system's performance at these self-organized field tests and demonstrations. Mine and tunnel field tests-ground vehicles: Field tests in tunnel environments as well as various underground mines stressed a variety of system capabilities, from dusty silver mines with narrow passages, to muddy coal mines with multiple decision points and massive scale.   presents three maps from these tests. The Arch Mine tests (Figure 55, center) were in a portion of an active coal mine, and included extreme traversability hazards (see Figure 22), complex topology, and severely degraded communications. The multi-robot map is the result of a 30-minute operation, with a total of 400 × 100 m 2 of area covered by the two-robot team. Eagle Mine and Beckley Exhibition Mine are maps from a single robot.
Tunnel field tests-aerial vehicles: Aerial vehicles are deployed to perform fast navigation and exploration, while providing a rough topological map and collecting data from areas inaccessible to ground robots. Figure 56 shows two of the several hundred successful flight experiments conducted in tunnels located in the United States, South Korea, and Sweden. One specific example of drone flight is shown in Figure 57 (CoSTAR Team, 2018b). In this example the drone shows successful flight autonomy capability with local planning method in a confined and cluttered environment. The reliability of the developed system is best exemplified with the well over a thousand logged minutes of safe autonomous flight in complex and perceptually challenging tunnel environments.
Urban field tests-ground vehicles: Urban field testing provides more abundance of possible locations, but also a large variety of challenging conditions. We tested our system, in narrow cubicle farms of multi-level office buildings, with regions of self-similar corridors (Figure 58 (a) Time sequence of planned hybrid transition from rolling to flying and back to rolling. Colored blocks are occupied voxels colored by z height. Blue/Red path indicates hybrid planned path where blue is rolling and red is flying. Pink sphere is goal waypoint. Motion primitives from local planner are shown. Note that as the vehicle moves forward, an obstacle is revealed and a small hop over it is planned and executed.
(b) Power-energy consumption comparison with hybrid (green), rolling-only (blue), and flying-only (red) mobility modes. The hybrid mode flies over three obstacles, while rolling-only mode was tested without any obstacles. to navigate in multi-level environments. Autonomous hybrid mobility testing results  show the benefit of hybrid mobility compared to pure ground or flying vehicles. The hybrid mode allows the robot to fly over obstacles that block the way for ground robots (Figure 59(a)). In comparison to the pure aerial vehicles, the hybrid mode shows better energy consumption profile by rolling on easy flat terrain (Figure 59(b)).

DARPA Organized Events
This section outlines results from testing NeBula in the series of DARPA facilitated test events as part of the Subterranean Challenge (Orekhov and Chung, 2021). The timeline of these events is summarized in Figure 60, with substantial developments over each of the 6-month intervals between Field Robotics, July, 2022 · 2:1432-1506 Figure 60. Overview of the timeline of DARPA organized test activities, with the expectation of increasing capability from one event to the next.

SubT Integration Exercise (STIX)
Environment and robots: STIX was held in April 2019 at the Edgar Experimental Mine, Idaho Springs, CO. The participating teams were offered two practice sessions and one simulated scored run in an environment representative of the Tunnel Circuit. For Team CoSTAR, one of the primary objectives was to quantify the capability and limitation of various mobility systems through the evaluation in a real mine environment. To that end, we deployed four different types of mobility platforms shown in Figure 61, including a Clearpath Husky wheeled vehicle, Flipper tracked vehicle, Scout quadcopter, and Rollocopter aerial/ground hybrid vehicle.
Performance: Each platform demonstrated autonomous exploration capabilities in the mine. Husky autonomously traversed 200 m until it reached the first fork. During the drive, Husky detected five artifacts and reported back to the base station: two of them were correct in type and locations. Soon after Husky turned into a side passage, the communication to the base station was lost, which triggered the return to base behavior as designed. The robot successfully returned to the base by following the breadcrumb nodes in IRM. Rollocopter performed four 3-minute runs during the practice sessions, each of which consisted of autonomous take-off, exploration, and landing operations. The robot exhibited robust navigation capabilities, traveling over 100 m in cumulative distance. Dust was a major issue for drone flight in the narrow passageways, causing vision-based state estimation failures. Our approach of using heterogenous odometry sources enabled us to be resilient to these failures to a large degree . Dust largely obscured all cameras after a few minutes into each run. With no measurement updates from odometry sources, we relied on IMU-only inertial odometry to safely land the vehicle. These results led to future work on improving state estimation resilience in the presence of dust and variable lighting. It also helped us to improve the camera placement design for the various scales of environments. The Flipper was able to navigate over the train tracks in the tunnel because of its tracked design but it was much slower than the Husky. This is because of greater contact area of the tracks that causes slower turning behavior compared to differential drive (wheeled) robots for the same commanded track/wheel speeds (Dixit and Burdick, 2020). This is why we decided to incorporate a hybrid vehicle that had both wheels and tracks (Telemax) in the next circuit. We extensively tested Telemax in the Arch mine  and were able to navigate over different types of terrain (in tracked mode) while maintaining speed when the robot was on flat terrain (in wheel mode).

Tunnel Circuit
Environment: The Tunnel Circuit took place in August 2019 at the NIOSH mine complexes in Pittsburgh, PA. There were two courses, Experimental (EX) and Safety Research (SR), focusing on different aspects of the tunnel environmental challenges. The EX course is composed of long straight corridors with featureless flat walls. The SR course has a grid-like structure that provides many decision points and loop closures.
Robots: Team CoSTAR staged seven robots at the starting gate (see Figure 62). The robot roster includes four Clearpath Husky robots (Husky 1-4), one Telerob Telemax track/wheel hybrid robot (Telemax 1), one high-speed RC car (Xmaxx 1), and one aerial drone (Scout 1). Our strategy was to adaptively deploy the heterogeneous robot team based on the complexity and challenges of the unseen course elements. The information from vanguard robots told that the environment is  benign and mostly accessible by the ground vehicles, leading the decision to rely more on wheeled platforms that show higher endurance.
Performance: In each run, we sent three or four robots to the course and achieved more than 2 km combined traversal ( Figure 63). The video in CoSTAR Team (2020c) depicts some highlights of these runs. The longest single-robot drive was 1.26 km by Husky 4 at the SR mine on Day 3, including long periods of no-communication autonomous exploration and successful returning to the communication range at the end of the mission. Figure 64 shows the map created by Husky 4 in this run, with an error under 1% of the distance traveled. Four communication nodes are deployed from the robots to build a backbone network, covering the areas near the mine entrance and extending the reach of base station for faster data retrieval. We detected 16 artifacts during our four runs of the Tunnel Circuit, leading to a circuit score of 11 and the second-place finish among 11 teams (Table 7).
Challenges: During the Tunnel Circuit event, we faced many real-world challenges which contributed to improve our system toward the next Circuit events. The flat featureless walls of the EX course affected our localization performance which was purely LiDAR based as of the Tunnel Circuit. This motivated the development of multi-modal methods (Section 5 and Section 6). Multi-robot operations in a comm-degraded environment also posed challenges to our networking system. Based on the analysis of the Tunnel experience, we carefully redesigned the inter-robot networking protocol and deployed it in the Urban Circuit (Section 10). The drone traveled 35 m in 43 seconds before it experienced critical state estimation error due to poor lighting. To eliminate this single-point failure in the future flights, we put more efforts on multi-modal sensing and parallel estimation (Section 5).

Urban Circuit
Environment: The Urban Circuit took place in February 2020 at the Satsop Business Park in Elma, WA. The unfinished power plant in the park was chosen for the place of the second circuit event where the robots were exposed to challenges from urban structures. The two courses, Alpha and Beta, cover two floors of the power plant with size around 90×90×15 m 3 . There are many small rooms and narrow corridors divided by thick walls that prevent direct wireless communications. Robots: Team CoSTAR staged seven robots and deployed four of them during this event (Figure 65), including the newly introduced Boston Dynamics Spot quadruped robots (Spot 1 and 2). The other two robots are Clearpath Husky (Husky 1 and 4) with major upgrades in onboard electronics and sensor stack. All robots are deployed in each run, acting in the vanguard and supporting roles based on the assignment, their location, and time of the mission. One Spot was dedicated to climbing stairs to explore the floor at a different level.
Performance: We detected 25 artifacts during our four runs of the Urban Circuit, leading to a circuit score of 16 and the first-place finish among 10 teams (Table 7). Figure 66 shows a multi-robot map generated at the Beta course, including details on scored artifacts at multiple levels and at the furthest extent of exploration. The four-robot team traversed a combined total of 2.3 km (Figure 63), including a large closed loop by the two Husky robots around the central round structure. The video in CoSTAR Team (2020a) depicts Spot robots exploring the multi-level courses autonomously. Figure 67 shows the breakdown of scoring in the two highest scoring runs of the Urban competition, summarizing the reasons for not scoring the remaining artifacts. In addition to the Figure 66. Multi-robot map, trajectories, scored artifacts, and ground truth artifacts. All robots start from the star on the bottom right with trajectory colors cycling from red through yellow to blue and back to red based on distance traveled. Red inset shows multi-level exploration and scoring (upper-level survivor and lower-level backpack) by Spot 1 with a stair descent (green part of trajectory). Green inset shows the furthest extent of lower-level exploration where a cell phone was scored by Spot 1. Yellow inset shows an instance of an unscored artifact, where it is placed out of view of ground robots, requiring flight. Other unscored artifacts are seen in the center of the map where no robot arrived near them. artifacts missed due to the robot not reaching the artifact locations (Figure 66), we missed artifacts due to the limited field of view of the sensory suite on the robots (Figure 66) and other challenges in the artifact detection pipeline.
Challenges: Overall, Team CoSTAR's system showed significant improvement from the Tunnel Circuit. Nonetheless, the circuit identified numerous ongoing challenges calling for further development. These development areas include improved depth and breadth of coverage, enhanced sensor fields of view for artifact detection, aerial robot developments, autonomous fault recovery, and careful attention to comm node placement/configuration to maximize wireless link quality for fast data retrieval. 2020). The length of the cave is approximately 300 m with an elevation change of 20 m. The tortuous and steep terrain in the cave limited the range of operations of wheeled platforms; hence tests were focused on the Spot quadrupeds (drone testing was not possible due to regulatory challenges).
Circuit event organization: The test was run as close to the competition setup as possible. The global frame and artifact locations were determine based on a pre-existing, high-precision Faro scan of the cave. A fiducial gate was surveyed into the cave frame and used to calibrate the robots. All artifact classes other than CO2 and Vent were placed throughout the cave, with a prioritization on ropes and helmets (the cave-specific artifacts).
Results: The cave demonstration was remarkable in that entire runs were fully autonomous with zero human intervention: the operator only started the mission. Figure 68 shows a map from one such run. Spot traversed 400 m on average over four runs. Each mission was ended not by a time limit, but by the full accessible environment being covered, with side passages limited by traversability hazards (very low ceilings, cliffs). The videos of the cave demonstration are available at CoSTAR Team (2020bTeam ( ,d, 2021.

Lessons Learned
At the time of submission, NeBula's uncertainty-aware and perception-aware principles are implemented in a number of modules throughout our autonomy solution. The module interactions have been tightly co-designed and joint probability distributions across some of the modules are incorporated to bolster the overall system's resiliency. This uncertainty-aware architecture is verified through intensive field testing campaigns using heterogeneous platforms with different variations of mobility, sensing, and computing capabilities. We observed that the joint perception-planning approach brought resiliency to the system behaviors in challenging real-world environments, where uncertainty is ubiquitous. This section summarizes the lessons learned from developing and fielding the NeBula autonomy solution.

Heterogeneous System Design and Integration
Designing a system with heterogeneous robots is a nontrivial task. Accomplishing the objectives of complex real-world operations (such as the one in the SubT) with constraints on time, resources/cost, robot size, weight, power, etc., can be too difficult or impossible for a single robot. This calls for multi-robot solutions. An important lesson learned and a major future direction is, "It is critical to have principled tools that can translate the design choices to overall mission metrics." These design choices range from the selection of type and number of robots to the selection of sensors, instruments, and algorithms, to the type of wires and connectors, and batteries. As some concrete examples that our team has encountered: "A slight change in the weight (and hence operation time) of one single robot" or "selection of a specific wire that can slightly affect the data traffic" can have a significant effect on the overall aggregated performance of the robot team and mission metrics. There are hundreds if not thousands of such design parameters and choices (both in HW and SW), which calls for systematic tools to quickly abstract them and translate them to high-level mission metrics.
An important observation is that for the fast-paced development of a very large autonomy architecture for a multi-robot system, the architecture should be highly modular and adaptive. Since the architecture must incorporate robots' heterogeneous capabilities (mobility, sensing, and compute), appropriate abstractions need to be implemented to operate robots in a unified framework. We have performed many iterations of the architecture redesign as our entire system capabilities grow. A key lesson learned is that integration testing at a regular interval is essential to maintain the stability of such a growing system. Team CoSTAR has organized monthly virtual and physical integration demonstrations with competition-like set-ups. Regular testing enforces the new module to comply with the system interaction rules, ensuring functionality despite constant augmentations by multiple subteams throughout the development. Regular performance tracking also allows us to understand the current technology state and highlights the next domain to be studied.

Resilient State Estimation
In challenging real-world robotic operations, perceptual degradation is common, and state estimation can suffer from deterioration of sensor measurements and estimation quality. In addition, physical systems are exposed to the risk of sensor failures that are made prevalent by frequent field testing. These failures are difficult to model in many cases, yet systems must be able to react to these uncertain events to ensure system stability is not lost. One key lesson from our experience is to let the system predict a failure and quickly adapt to it. We observed that adding redundancy (e.g., HeRO at Section 5) in a principled way shows a lot of potential to improve the resiliency of an odometry system by incorporating multi-modal sensors and algorithms. HeRO's built-in health check mechanism allows for preliminary detection of various types of failures and adapting the system to mitigate the effect. This proactive approach has been effective in our field tests and circuit events, providing the foundation to support high-level autonomous behaviors.
Another critical observation is on "how and when to trust the sensor models." Following NeBula's philosophy of uncertainty-aware predictive planning, the planner needs to be able to predict the joint distributions over the system's future actions and estimated states. To predict the future performance, we divide unknowns into "known unknowns" and "unknown unknowns." Known unknowns refer to uncertainties that can be reasonably modeled using probability distribution functions. Unknown unknowns refer to unmodeled uncertainties resulting from unexpected events in the system operation. To be resilient and robust to these uncertainties, we have observed that a cascaded framework (like HeRO) of loosely coupled and tightly coupled layers is promising. The cascaded framework first copes with unknown unknowns, by relying on a loosely coupled layer and checking the consensus across different sensory modalities to detect anomalies and reject faulty estimation channels. Once left with inliers, it handles "known unknowns" by tightly fusing the estimation channels using their predicted probabilistic models.

Large-Scale Positioning and Mapping
The SLAM design and deployment efforts presented in this paper highlight the importance of using complementary sensor information for localization and mapping. In complex environments, various sensors have their own advantages and disadvantages. For example, wheel-encoder-based odometry or visual odometry provides a useful source of information in long corridors where LiDAR scan matching is subject to drift; IMU data allow for accurate short-term relative rotation estimation while history-based scan matching can resolve longer-term displacement estimation. This need to fuse heterogeneous sensors also emphasizes the importance of a flexible SLAM framework where one can easily define and fuse sensor data when available. In this sense, the use of a factor graph framework reduced the integration efforts and allowed us to design a unified SLAM back end that can be easily reconfigured. The importance of developing a reconfigurable framework is further exacerbated by the desire of running our SLAM solution on a heterogeneous set of platforms (e.g., robots with different sensor suites and different computational capabilities), which demanded our framework to be sensor agnostic (e.g., adjust to different sensors with minimal parameter tuning) and reconfigurable (e.g., enable and disable sensors to fuse via configuration files).
While having a reliable odometry solution is critical in large-scale mapping, even the most accurate odometry systems accumulate error over long distances across extreme terrains. This remarks the importance of loop closures to keep the localization errors bounded. However, when it comes to complex, large-scale environments, with perceptually degraded conditions, a key lesson learned is that loop closure needs to be incorporated in a resilient manner into the overall framework. Computational constraints will limit the ability to search the map history and find correct loop closures in a reasonable time. Even after loop closures are found, it is crucial to maintain multiple hypotheses or at least remove incorrect loop closures resulting from perceptual aliasing. In this paper, we observed that graph-based outlier rejection (e.g., a variation of Mangelson et al. (2018)) is helpful in filtering some of the incorrect hypotheses. This family of methods relies on approximate max clique solvers for consistency maximization, which becomes computationally expensive due to the size and density of the SLAM graphs in competition settings. Unfortunately, they often fail to select inlier loop closures, inducing "jumps" in the robot trajectory when the solver is stuck in suboptimal maximal cliques. Future work includes improving loop closure detection (e.g., prioritizing the most informative loop closures, using different sensors for place recognition) and adopting more recent methods for outlier rejection based on graduated nonconvexity.
Computational aspects can be also improved by sharing the workload across robots. Centralized multi-robot SLAM methods require increasing computational resources for larger teams of robots. Adopting a distributed SLAM system can improve scalability and reduce communication bandwidth. In general, distributing the overall computation across various robots (based on their capabilities) can better scale to large teams of robots.
An important lesson learned is that augmenting geometric information with semantics can increase the resiliency of the SLAM system. For instance, incorporating semantic information such as intersections, stairs, and man-made objects in the graph can increase the robustness of the loop closures. Further, identifying a set of stairs in the environment provides a readily usable prior on the geometry of the stairs; similarly, identifying doors provides actionable information for navigation. Metric-semantic mapping is an active research area and a tight integration of geometric, semantic (and physical) reasoning has the potential to improve the robustness and accuracy of the map built by the robots.
Finally, while this section is concerned with localizing the robots and building a map, the quality of the map reconstruction is highly affected by the trajectories covered by the robots. Following NeBula's uncertainty-aware planning perspective, performing active loop closures to reduce the uncertainty in the robot location has a major impact on the SLAM accuracy. Active loop closures can be enabled by guiding robots to rendezvous points to create inter-robot measurements or by visiting parts of the environment that have unambiguous detection signatures. Towards this goal, quantifying and actively reducing uncertainty is crucial and is a fundamental trait of the NeBula framework. Active localization and mapping go beyond loop closures, and tightly couple perception, action, and communication.
Uncertainty-aware traversability analysis: Navigating over perceptually degraded challenging terrains requires a radical departure from systems designed to operate in known environments with clear landmarks and easily distinguishable obstacles. Designed to detect complex geometric hazards at various scales, traversability analysis highly depends on the quality of local mapping and state estimation. As such, an important lesson has been that accounting for the "perception uncertainty" in planning is a key to building a reliable traversability system operating on challenging terrains and perceptual conditions. We learned that multi-fidelity mapping approaches (similar to the one presented in Section 8) improve the balance between computational constraints and accuracy in the presence of degraded state estimation and sensor measurement uncertainty. Uncertainty-aware estimation trusts and accumulates measurements in the world belief based on measurements' accuracy and quality. Using this world belief, the traversability layer precisely quantifies the perception-aware traversability risks and costs when negotiating challenging terrains. Finally, a critical lesson in traversability algorithm design and development is constant field testing: The presented uncertainty-aware approach has been field hardened in over 100 field test sites with diverse traversability-stressing elements.

Scalable Belief-Space Global Planning
Global planning for area coverage and exploration behaviors is one of the modules where the awareness of uncertainty plays a critical role in achieving high performance in the real world. We remark several key dilemmas encountered while developing and testing the global planner, related to uncertainty representation and uncertainty-aware decision making: (i) scalability vs. information fidelity in world belief representation, (ii) planning horizon vs. planning time for each receding-horizon planning episode, and (iii) the consistency of plans for smooth motions vs. resiliency to sudden changes in world belief over time. We learned that hierarchical approaches (similar to the ones described in Section 9) have the potential to address the above challenges. Our method, PLGRIM, utilizes (i) hierarchical IRMs, (ii) longer-horizon, higher-resolution POMDP solvers with manageable computation load, and (iii) receding horizon planning with resiliency logic. However, those are still open problems. An important future direction is "quick online adaptation" of parameters that balance the environment scales and complexity with the computing system and sensor limitations. Another important open problem is related to encoding and capturing a more accurate and reliable representation of the high-dimensional world belief.

Semantic Understanding and Artifact Detection
Detecting objects and understanding the semantics of the environment in perceptually degraded subterranean settings is a challenging task. In the DARPA Subterranean Challenge, artifacts have multiple signatures, ranging from visual, thermal, auditory to gas-based, and radio-based artifacts. These various signatures and payload constraints (size, weight, power, etc.) on different heterogeneous platforms call for a method that assesses the value of adding various sensors and sensing modalities to each vehicle. Further, the choice and configuration of each sensor on each robot have a significant impact on the scoring performance (see Section 7). For example, for artifacts that can be detected visually, the salient camera parameters include field of view, reliability, image quality, resolution, and frame rate. Hence an important lesson has been that the artifact collection system needs a tight co-design of software and hardware. This includes the whole pipeline from the sensors, to the cables, to the processors and algorithms. For example, a change in a USB hub or cable can significantly impact the choices of other hardware and algorithms by removing certain data choking elements in the pipeline. A related unexpected failure mode (which occurred in one of our competitions) at the system level was sensor data transfer and communication, caused by both USB driver and networking bandwidth limitations. Correcting this problem required a thorough analysis Field Robotics, July, 2022 · 2:1432-1506 of the hardware configuration, including the development and testing of custom driver software by our team.
The second major lesson was related to data. While the existing standard object detection datasets are quite large and diverse, learning to detect objects in the perceptually degraded environments (e.g., variable lighting and with obscurants), such as the ones in the DARPA Subterranean Challenge-like environments, are still out of distribution in relation to mainstream datasets.
Finally after these trade-offs, there are still many limitations in the perception side of the artifact detection architecture. So the third, and the most important, lesson learned is that the planning and perception for artifact detection need to be tightly co-designed. In other words, active perception is required where the planner needs to take actions that compensate for perception shortcomings. Examples of such actions are (i) executing local search maneuvers where the robot sweeps the larger parts of the scene with its sensors to compensate for its limited field of view, and (ii) actively getting closer to certain targets or making measurements from various angles to increase the detection confidence. This mode might also include deciding which robot and which sensor should gather more information to increase the confidence. It also includes (iii) changing perception pipeline parameters such as the camera resolution, input rate, etc., to handle computational constraints by focusing and limiting the attention of the system on the important parts of the input channel. Active perception for semantic understanding is a highly open area, with a lot of future work in these domains.

Bandwidth-Aware Communication System Design
Communications between computational units (e.g., ROS nodes) should ideally take into account the predicted available bandwidth in the link whether that link is on the same computer, via a high-speed Ethernet, a larger wired network, or radio/wireless. We took a number of successful steps in this direction: (i) separating inter-agent (ROS 2) and intra-agent (ROS 1) communications, (ii) using different QoS settings for different classes of topics, (iii) monitoring inter-robot communications and estimated bandwidth, and (iv) monitoring intra-robot communications. As future multi-agent projects move away from ROS 1 and towards ROS 2 (or another middleware/communication solution), the same principles apply. We see that there is a need for continued improvements and autonomy in routing and QoS systems to optimally use communication resources in these future systems.
Another important lesson learned is that, due to the communication-degraded nature of the subterranean environments, the planner needs to support and improve the communication and multirobot networking performance by dropping communication nodes at the optimal locations and by actively carrying (muling) data using mobile robots between the various nodes. Hence, aligned with NeBula's philosophy, (1) taking networking uncertainties into account, (2) evaluating and predicting the potential value of the communicated information, and (3) co-designing the planning system and networking system are the critical observations to increase the robustness and performance of the overall multi-robot robot networking system in communication-degraded environments.

Supervised Autonomy to Full Autonomy
Achieving mission success depends heavily on the mission-level autonomy from both single-and multi-robot perspectives. This is especially the case in contexts such as the SubT Challenge where large robot teams must intelligently coordinate themselves under strict communication, time, and resource constraints to explore and map kilometer-scale environments and find objects of interest.
Closer to the course entrance a communication link with the human supervisor can be established. Thus the system must be capable of autonomously balancing between deeper exploration, operational risk, the value of the collected onboard information, and the cost of bringing the systems closer to the surface for the data retrieval. A key lesson learned is that even when robots are close to the course entrance and a communication link with the human supervisor is established, with a large heterogeneous multi-robot team, a single human supervisor becomes a bottleneck in the control loop due to excessive cognitive workloads of the supervisor. Hence, for successful operations, NeBula's mission executive and Copilot assistant are designed to support such tasks by adjusting the system autonomy levels, gradually delegating human tasks where the human is viewed as a resource, and providing suitable instructions and feedback to reduce the cognitive load of the single supervisor. Integral to the effort is maintaining a world belief that can be easily interpreted by the system or supervisor, and trigger behaviors or sequences of actions that bring the system closer towards achieving mission completion.
However, there is a lot of future research in this area. Our ongoing efforts have been focusing on some of these areas, including efficient ways of specifying more complex tasks and missions, balancing human-machine task distribution, planning using semantic information, and scheduling under uncertainty in task execution and future event occurrence.

Simulator-Based Development
The development of our system was greatly accelerated by the use of different computer simulation environments, discussed herein. The challenges inherent to real-world robotics in uncertain and fully autonomous settings pose tremendous risks to naive robots and algorithms in the early stages of development. Taking development into a simulation and out of hardware has afforded us numerous opportunities to pursue and achieve high-risk-high-gain algorithmic strategies. Many safety-critical features are difficult to test with physical hardware. By carefully modeling different fleet and hardware configurations, we were able to develop, test, and eventually deploy our heterogeneous multi-robot team in the real world, with optimal networking, hardware, and software payloads. Although simulators offer an enormous benefit to robotic systems development, there are significant differences between simulations and real-world systems which must be accounted for. It is important to understand simulator limitations and always perform proper validation and testing on physical hardware platforms.

Conclusion
Team CoSTAR's approach to the DARPA Subterranean Challenge lies in our autonomy solution, NeBula (Networked Belief-aware Perceptual Autonomy), which emphasizes resilience and intelligent decision making through uncertainty awareness. NeBula has led to second and first place in DARPA Subterranean Challenge's Tunnel (in 2019) and Urban (in 2020) competitions, respectively. When dealing with exploration and operation in unknown environments, uncertainty is inherent to all decisions. As the main principle, NeBula focuses on quantifying and taking advantage of uncertainty at multiple levels of our autonomy stack, including state estimation, mapping, traversability, planning, communications, and other state domains. We combine these technologies in a synergistic way, which examines the interaction between interrelated modules.
During this competition we have demonstrated and hope to continue to demonstrate autonomous exploration in extreme environments on multiple platforms (including wheeled, legged, and aerial). Solving this problem remains of paramount interest in a wide range of applications, especially when it comes to missions to explore unknown planetary bodies beyond our home planet. We believe the uncertainty-aware and platform-agnostic nature of most NeBula components is a step towards resilient and safe robotic autonomy solutions in unknown and extreme environments with both single-and multi-robot systems. UAV Unmanned Aerial Vehicle UWB Ultrawideband VIO Visual-Inertial Odometry VO Visual Odometry WIO Wheel-Inertial Odometry Eric Heiden https://orcid.org/0000-0002-2031-8564