Abstract

With the emergence of a large number of smart devices, the radio environment in which unmanned aerial vehicles (UAVs) take tasks is becoming more and more complex, which puts forward higher requirements for UAVs’ situational awareness and autonomous obstacle avoidance capabilities. To tackle this issue, we propose a three-dimension (3D) UAV path planning method under communication connectivity constraints guided by radio environment maps (REMs), which are distributed by ground edge servers in the form of compressed global REMs and detailed local REMs. An interfered fluid dynamic system (IFDS) model is deployed on UAVs to allow them to avoid obstacles and plan paths. We propose a twin-delayed deep deterministic policy gradient- (TD3-) based deep reinforcement learning (DRL) method to optimize the reaction coefficients of UAVs to avoid obstacles and improve the signal to interference plus noise ratio (SINR). The simulation results show that the proposed algorithm can effectively avoid static obstacles and dynamic interference under communication connectivity constraints, significantly improve the communication stability with a higher receive signal SINR and reduce the cost of UAV performing tasks with the shortest path.

1. Introduction

Due to the rapid innovation and technological subversion of the unmanned aerial vehicle (UAV) manufacturing industry, more and more UAVs are being used for aerial surveillance, air cargo, and interference monitoring. And UAVs must not only avoid space obstacles but also avoid radio interference so that communication functions can be maintained [1]. The radio environment faced by UAVs has the characteristics of obstacle (interference) intensive, dynamic, and uncertain. The complex radio environment brings great challenges to the flight safety of UAVs and also puts forward higher requirements for the autonomous control capabilities of UAVs. It has become a research issue for UAVs to recognize the complex radio environment and improve their autonomous obstacle avoidance ability.

Radio environment map (REM) is an important tool for awareness of complex radio environments. It combines geographic terrain coordinates, communication policies, radio environment parameters, and other related information to describe the radio environment from multiple dimensions such as time, frequency, space, and power [2]. REM can assist UAVs in cognition of space and radio environment, plan paths efficiently, or avoid obstacles in real-time. [3] proposed a 3D REM-assisted UAV path design method. By combining the spatial 3D map and the radio propagation model, the UAV is assisted in designing a path that maintains a cellular connection. However, the coverage area of the base stations is defined as the same altitude as the UAV, which is actually a 2D path planning problem. In [4], the UAV constructs REM through the synergies between vision and communication in the edge network, which assists the UAV in the realization of online path planning and autonomous flight. The authors in [5] proposed a UAV path planning method, which exploits a compressed global map of the environment combined with a cropped but uncompressed local map showing the vicinity of the UAV. This method of distributing global and local map information for UAV path planning is inspired. In path planning, the distant information will cause general direction decisions, and close information will cause immediate action, such as avoiding obstacles. Therefore, the details of the distant obstacles can be less than the details of the surrounding obstacles for UAVs. The above work has not carried out detailed research on the path planning and autonomous obstacle avoidance method of UAVs.

The traditional path planning methods mainly include the model predictive control [6], optimization algorithm [79], stochastic programming [10], and geometric calculation [11]. However, these methods are designed to deal with 2D path planning. When extended to 3D path planning, the amount of computation will increase exponentially, and the paths generated in discrete environments have poor smoothness. An additional smoothing algorithm is required to optimize the path, which increases the complexity of the path planning algorithm. The authors in [12] proposed the artificial potential field (APF) method based on the concept of force field in physics. The goal point has a “gravitational force” on the UAV, and the obstacles have “repulsive forces” on the UAV. Finally, by calculating the resultant force to control the movement of the UAV, this method is suitable for 3D path planning, which has low computational complexity and can plan a smooth path. However, the method could also fall into a local optimum at certain locations and even get into the interior of obstacles. In [13], the interfered fluid dynamical system (IFDS) model was proposed for the first time, which draws on the macroscopic characteristics of natural water flow. When there are no obstacles, the water flows in a straight line. While encountering an obstacle, the water would smoothly bypass the obstacle. The algorithm has low computational complexity and can handle complex radio interference and obstacles of different shapes.

In the complex spatial obstacle and radio interference environment (such as the coexistence of static and dynamic obstacles of different shapes and sizes), the position of the obstacle changes dynamically, and the environmental information must be updated in real-time. And it is also necessary to optimize the reaction coefficients of the IFDS model to get the best path for the UAV which is surrounded by obstacles so that the UAV flight path is the shortest. In [14], the neural network is used to optimize the reaction coefficients of the IFDS model. The relative positions between UAV, the destination and obstacles are extracted from the sample data as the input of the neural network, and the reaction coefficient of the IFDS model is used as the output of the neural network. The authors in [15, 16] adopt the deep reinforcement learning (DRL) algorithm to optimize the reaction coefficients, which retains the advantages of the analytical method and maintains a high calculation speed. The algorithm has great application potential. (1)In this paper, we propose a 3D REM-guided path planning method for UAVs in order to improve the environmental awareness and autonomous obstacle avoidance capabilities of UAVs. The ground edge server distributes the compressed global REM to the UAV before the UAV launches. The UAV adapts the IFDS model to preplan a path according to the global REM and starts to fly. When the edge server detects an obstacle coming within a safe distance of the UAV, it distributes a cropped but uncompressed detailed local map for the UAV. Then the UAV adapts the IFDS model to avoid obstacles efficiently and optimize the reaction coefficients of IFDS based on the twin-delayed deep deterministic policy gradient (TD3) algorithm to obtain the shortest path and improve the signal to interference plus noise ratio (SINR). Our principal contributions are summarized as follows(1)We propose a 3D REM-guided path planning method for UAVs. The compressed global REM provides the UAV with global spatial obstacle and radio interference information and preplans a path for the UAV. When an obstacle or interference is detected, the cropped but uncompressed detailed local map is distributed to the UAV to avoid interference. The method makes the SINR of the UAV received signal exceed the interference threshold to avoid losing communication with the base station(2)We propose an obstacle avoidance model for UAVs based on IFDS. By adjusting the repulsive reaction coefficient, the tangential reaction coefficient and the tangential direction coefficient, the path of the UAV in the 3D space is optimized(3)To optimize the reaction coefficients in the IFDS model, we propose a DRL algorithm based on TD3, which makes the UAV flight path the shortest while satisfying the communication connectivity constraints

The remaining sections of this paper are organized as follows. Section 2 describes the global and local REM construction and distribution methods. Then, the IFDS model and problem formulation are introduced in Section 3. Section 4 specifies the implementation details of our TD3-based DRL algorithm. Performance evaluations are provided in Section 5, and Section 6 concludes the paper.

2. The Construction and Distribution Method of REMs

We consider a UAV flying from the starting point to the goal point in an edge network, as shown in Figure 1. The UAV needs to maintain communication with the base station with an edge server while avoiding obstacles and interference to reach the goal point in the complex radio environment.

We proposed a global and local REM construction and distribution method for UAV obstacle avoidance and path planning. In the complex radio environment, devices with sensing function such as UAVs, fixed monitoring stations, vehicle-mounted receivers, handheld spectrum analyzers, and other devices are deployed to sense spectrum data in 3D space and upload the data to the edge server [17]. According to the collected data, the edge server adopts the Kriging interpolation algorithm [18] to interpolate the unknown point. The spectrum data of the unknown points can be calculated as where is the spectral data sensed by the sensor devices and is the weight of the data sensed by the sensors to the unknown point .

In the complex radio environment, spectrum resources and available bandwidth are limited. If the edge server directly distributes a high-resolution global REM, it will cause a great burden on the communication bandwidth and high delay, which may cause the UAV to lose communication or collide with obstacles due to sudden radio interference. To tackle the issue, we propose a method of sending a low-resolution global REM and a high-resolution local REM for UAV path planning.

The edge server performs low-resolution interpolation based on the collected spectrum data before the UAV launches, which analyzes the location of radio interference and space obstacles. Then the edge server distributes a compressed global REM to the UAV. When a sudden interference or dynamic obstacle occurs, the edge server distributes a high-resolution local REM to the UAV periodically, so that the UAV can avoid the obstacle in real-time according to the position and threat level of the interference or obstacle. The 3D REM-guided path planning method implemented in the UAV is shown in Figure 2.

The edge server analyzes and extracts features from the 3D REM and identifies space obstacles. Then, we set an SINR threshold based on the UAV’s received signal from the base station and interfering signals. And the edge server abstracted radio interference as spheres and spatial obstacles as spheres, cones, and cylinders. Spatial obstacles and radio interferences in the environment can be equivalent to the standard convex envelope equation where determine the size of the obstacle, control the shape of the obstacle. When and , the obstacle is a sphere. When , , and , the obstacle is a cylinder. When , , and , the obstacle is approximately a cone. represents the center coordinate of the obstacle . denotes the safe distance of the UAV. , , and express that the UAV position is located outside on the surface and inside the equivalent envelope of the obstacle, respectively.

3. IFDS Model and Problem Formulation

3.1. IFDS Model

A UAV with velocity flies from the current position to the goal point , and the Euclidean distance between the two points can be calculated as . When there are no obstacles on the path of the UAV flying from to the goal point , the initial flow field is a straight line, and the initial flow velocity of the UAV can be denoted as

When there are obstacles in the environment, the weighted sum of the interference matrix of all obstacles to the UAV is indicated as where represents the weight of -th obstacle. It is determined by the distance from the UAV to the equivalent envelope of the obstacle. The larger the distance, the smaller the weight, and the smaller the interference effect on the UAV

The interference matrix of obstacle can be calculated as where is the unit attraction matrix. The second and third terms of Equation (6) are the repulsion matrix and the tangential matrix, respectively. and correspondingly denote the repulsive reaction coefficient and tangential reaction coefficient of the UAV to the obstacle , which determines the timing and safe distance for the UAV to avoid obstacles. is the vertical vector from the UAV to the obstacle surface, which can be expressed as

represents the tangent matrix perpendicular to the vertical vector and tangent to the equivalent envelope surface of the obstacle , which is derived as

A coordinate system is established with , , and as the , , and axes, respectively. Any unit tangent vector in the tangent plane can be denoted as where is the angle from the tangent vector to the -axis. It determines the direction of the UAV around the obstacle. The tangent vector in the coordinate system can be transformed as of the original coordinate system through the coordinate transformation matrix , which can be calculated as

The coordinate transformation matrix can be represented as where , , , , and . Then the flow velocity can be corrected by the total interference matrix to the initial flow velocity of the UAV where is the weighted sum of the velocity vectors of all obstacles. It can be indicated as where is the velocity vector of the obstacle .

3.2. UAV Kinematic Constraints

Since the UAV is affected by the inertia and the delay of REM construction and distribution when moving at high speed, the UAV will move forward for a period of time according to the original velocity before changing the flight state, and this period is the minimum step size . In addition, due to the limited energy carried by the UAV, the maximum path length that the UAV can fly is .

The total time of the UAV from the start point to the goal point is , the position at time is , and the flight path length of the UAV at time can be expressed as where denotes the corrected flow velocity at time . The climb angle and the yaw angle of the UAV can be calculated as

When the UAV is turning too fast, the yaw angle is too large or the UAV turns sharply, which will cause the UAV to lose its balance and deviate from the original flight path, even cause a crash. And the climb angle of the UAV is related to its own thrust-weight ratio and lift-drag ratio. Therefore, the climb angle and the yaw angle need to satisfy the kinematic constraints where and are the climb angle and the yaw angle after satisfying kinematic constraints. and are the maximum constraint angles for the climb and yaw angles, respectively.

3.3. Problem Formulation

The goal of UAV path planning is to adjust the repulsion reaction coefficients, tangential response coefficient, and tangential direction coefficient in the IFDS model to make the UAV flight path the shortest under communication connectivity constraints. Therefore, the objective function of UAV path planning is expressed as where represents the value range of the reaction coefficients, indicates the value range of the climb angle and the yaw angle, and express the kinematic constraints, means that the UAV cannot fly more than the longest distance it can fly. shows that the receive signal SINR must be higher than the SINR threshold.

As shown in Figure 3, the combination of different coefficients can determine the shape and direction of the path. In previous researches [1416], receding horizon control (RHC) strategy was mostly used to optimize these coefficients online. However, the serial solution mechanism of RHC cannot well meet the real-time requirements in complex radio environments. Therefore, in this paper, the DRL algorithm is adopted to optimize the coefficients in the IFDS model, so that the path planned by the UAV to avoid obstacles is the shortest.

4. TD3-Based Path Planning Algorithm

According to the above objective function, we propose a DRL algorithm based on TD3 to optimize the repulsion reaction coefficient, tangential reaction coefficient, and tangential direction coefficient in the IFDS model. This section first defines the state space, action space, and reward function of the DRL algorithm. Then we introduce the proposed TD3-based path planning algorithm in detail.

4.1. State Space, Action Space, and Reward Function

According to the IFDS model, the state space, action space, and reward function are defined as follows. (a)State space

The state space can be presented by where denotes the relative position of the UAV and the obstacle at time , expresses the relative velocity of the UAV and the obstacle , indicates the distance from the UAV to the surface of the obstacle , and and represent the climb angle and the yaw angle of the UAV, respectively. (b)Action space

The action space can be denoted as where correspondingly indicate the repulsion reaction coefficient, tangential reaction coefficient, and tangential direction coefficient of the obstacle at time . The flying velocity and path of the UAV are affected by adjusting the reaction coefficient in the IFDS model. (c)Reward function

Generally, the goal of DRL is to maximize the reward, and our goal is to minimize the flight path of the UAV. So the immediate reward is defined as where is the received signal power of the base station, denotes the sum of the received signal power of all radio interference, indicates the power of Gaussian noise, expresses the distance from the current position to the goal point, and represents the distance from the starting point to the goal point.

4.2. TD3-Based Path Planning Algorithm

Considering that there are three continuous variables in the proposed action space, we focus on the policy gradient method, such as the deep deterministic policy gradient (DDPG) algorithm which is often used to deal with continuous action spaces. In [19], the DDPG algorithm is used to optimize the reaction coefficients of the IFDS model.

There are four neural networks in the DDPG algorithm: action reality network, action target network, critic reality network, and critic target network. The parameters of the two critic networks are randomly set, and the parameters of the two action networks are obtained by fitting the input and output. However, the function of the critic network in the DDPG algorithm will overestimate the values, resulting in the policy invalidation due to the error in the function. The TD3 (Twin-Delayed DDPG) algorithm adds two functions to each critic network, and the smaller value is used as the target in the Bellman error loss function.

In addition, the TD3 algorithm adds noise to the action target network, which makes the policy more difficult to exploit errors in the function.

Based on the above definitions, the proposed TD3-based path planning algorithm is presented in Algorithm 1. First, we randomly initialize critic reality network parameters , , and actor reality network parameters . Then the target network parameters are set as the reality network parameters. Initialize UAV position, get state in each episode. Observe state and select action with exploration noise , at time slot . Then the UAV take the action to get reward and next state . The transition is stored in replay buffer . Finally, sample a minibatch of transitions from to update the parameters of reality networks and target networks.

1:Initialize: Critic reality network , with parameters , , and actor reality network with parameters , replay buffer.
2: Set target network parameters , ,
3: for episode =1 to Mdo.
4: Initialize UAV position, get state
5: fortodo:
6:  Observe state and select action with exploration noise
   .
7:  Take action , observe reward and next state .
8:  Store in replay buffer.
9:  Sample a minibatch of Z transitions from .
10:  Compute target actions by Equation (24).
11:  Compute targets by Equation (23).
12:  Update critic reality network parameters.
13:  if mod then:
14:  Update by the deterministic policy gradient.
    .
15:  Update target networks: ,.
16:  end if.
17: end for.
18: end for.

5. Simulation Results

5.1. Parameter Setting

In the 3D space of , the UAV flies from the start point (0, 2, and 5) to the goal point (10, 10, and 6), and the velocity is 30 m/s, passing multiple obstacles abstracted as sphere, cylinder, and cone. The base station with an edge server is located at (5, 5, and 0). The transmit power of the base station is 10 W. And the transmit power of the interference source is 100 mW.

We simulate our method in two environments, static and dynamic. The maximum climb angle is , the maximum yaw angle is , and the minimum time step is . The TD3 algorithm has a discount factor of 0.99, buffer size of , and sampling size of . Detailed simulation parameters are listed in Table 1.

5.2. Result Analysis

We compare the proposed 3D REM-guided UAV path planning scheme (REM-guided scheme) with the one without REM (Without REM scheme). The simulator result in Figure 4 shows that the proposed REM-guided scheme assists the UAV to effectively avoid interference, and the average SINR exceeds 10 dB. However, the average SINR of the Without REM scheme is only 5.51 dB, and even in some locations the SINR is lower than −20 dB, which means the UAV loses communication with the base station.

To evaluate the performance of the proposed algorithm, we compare it with the IFDS model with fixed coefficients (Fixed coefficients scheme) and the IFDS model optimized based on the DDPG algorithm (DDPG-based scheme) [19]. And we test the three schemes in two environments, static path planning with global REM and real-time dynamic path planning with local REMs.

In the static environment, there are static obstacles of spheres, cylinders, and cones on the ground, and static radio interference abstracted as spheres. And the edge server distributes a compressed global REM to the UAV for global path planning.

As shown in Figure 5, all three schemes can plan a collision-free path for the UAV based on the global REM. However, since the Fixed coefficients scheme does not optimize the reaction coefficients of the IFDS model, the planned path is conservative, and the path length is 15.9 km. The planned paths of the DDPG-based scheme and the proposed TD3-based scheme partially overlap, but the DDPG-based scheme cannot find the optimal combination of coefficients when the UAV avoids obstacles. The path length of DDPG-based scheme is 14.8 km, and the path length of proposed scheme is 14.3 km.

We compare the climb and yaw angles of the three schemes in Figure 6. Since the UAV is subject to kinematic constraints, the climb change angle and the yaw change angle of the UAVs do not exceed the maximum constraint angle during flight. The average climb change angle and the average yaw change angle of the Fixed coefficients scheme are and , and the average climb change angle and the average yaw change angle of the DDPG-based scheme are and . The average climb and average yaw change angles of the proposed TD3-based scheme are and . The proposed algorithm has the smallest changes in the climb angle and the yaw angle during the UAV flight, which means that the energy consumed is relatively less.

In a dynamic environment, we set the radio interference as dynamic interference and distribute local REMs in real-time for the UAV to avoid obstacles and interferences.

As shown in Figure 7(a), when the UAVs of the three schemes encounter the first dynamic interference sphere, the three UAVs choose different directions to avoid the interference. Then they encounter the second interference sphere in Figure 7(b) (), the DDPG-based and TD3-based schemes choose to fly from the bottom of the interference sphere, and the Fixed coefficients scheme flies from the left. From the overall view in Figure 7(c), the three schemes can effectively avoid obstacles and interference. However, the Fixed coefficients scheme conservatively avoids obstacles and interferences, which keeps a far safe distance. The two DRL schemes optimize the path length and avoid obstacles and interference at the same time through learning. The path length of the Fixed coefficients scheme is 17.3 km, the path length of the DDPG-based scheme is 16.7 km, and the path length of the proposed TD3-based scheme is 15.5 km, which is the shortest path of the three schemes.

We also compare the climb and yaw angles of the three schemes in dynamic environment in Figure 8. The average climb change angle and the average yaw change angle of the Fixed coefficients scheme are and , and the average climb change angle and the average yaw change angle of the DDPG-based scheme are and . The average climb and average yaw change angles of the proposed TD3-based scheme are and . The proposed algorithm has the smallest changes in the climb angle and the yaw angle in both static and dynamic environment.

The path planning and obstacle avoidance capabilities of the proposed algorithm are tested by distributing global and local REMs to UAVs in static and dynamic environments, respectively. Compared with the DDPG-based scheme and the Fixed coefficients scheme, the proposed scheme has the shortest path, while the climb angle and the yaw angle change minimally. The simulator results prove that the proposed REM-guided path planning scheme can effectively deal with the complex radio environment under communication connectivity constraints.

6. Conclusions

In the complex radio environment, due to the limitation of sensors carried by UAVs, the ability to perceive the environment is limited, and it is impossible to effectively avoid complex geographical obstacles and radio interference. In view of this, we proposed a 3D REM-guided path planning method for UAVs, which distributes compressed global REMs and detailed local REMs to UAVs to improve their awareness of the radio environment. An IFDS model is deployed on UAVs to allow them to avoid obstacles and plan paths. We proposed a TD3-based algorithm to optimize the reaction coefficients of the IFDS model. And the simulation results show that the proposed algorithm can effectively avoid static obstacles and dynamic interference under communication connectivity constraints and significantly improve the communication stability with a higher receive signal SINR and reduce the cost of UAV performing tasks with the shortest path.

Data Availability

The data used to support the findings of this study are included within the article.

Conflicts of Interest

The authors declare that there is no conflict of interest regarding the publication of this paper.

Acknowledgments

This work was supported by the National Natural Science Foundation of China under Grants 62171449, 62001483, and U19B2024.