Chaoxu Mu·Hao Luo·Ke Wang·Changyin Sun
Abstract In this paper,the neural network-based adaptive decentralized learning control is investigated for nonlinear interconnected systems with input constraints.Because the decentralized control of interconnected systems is related to the optimal control of each isolated subsystem,the decentralized control strategy can be established by a series of optimal control policies.A novel policy iteration algorithm is presented to solve the Hamilton–Jacobi–Bellman equation related to the optimal control problem.This algorithm is implemented under the actor-critic structure where both neural networks are simultaneously updated to approximate the optimal control policy and the optimal cost function,respectively.The additional stabilizing term is introduced and an improved weight updating law is derived,which relaxes the requirement of initial admissible control policy.Besides,the input constraints of interconnected systems are taken into account and the Hamilton–Jacobi–Bellman equation is solved in the presence of input constraints.The interconnected system states and the weight approximation errors of two neural networks are proven to be uniformly ultimately bounded by utilizing Lyapunov theory.Finally,the effectiveness of the proposed decentralized learning control method is verified by simulation results.
Keywords Decentralized control·Actor-critic learning·Neural network·Input constraints
The nonlinear interconnected systems are a class of complex systems which are composed of several nonlinear subsystems.The interconnected systems play an important role in the field of control because many control systems contain the interconnection terms [1],such as power systems [2],robotic systems [3] and inverted pendulum systems [4].Most interconnected systems have the characteristics of high dimension,strong coupling relations and strong uncertainty.As a result,it is difficult to design effective controllers directly for interconnected systems.To solve this problem,the decentralized control method was proposed in [5].The core of this method is to convert the control problem of interconnected systems to each isolated subsystem.When the decentralized control strategy is applied to design the controllers,only the local information of subsystem is required.Hence,the decentralized control has become a promising method in recent years [6,7],which gives us an effective approach to deal with the control problem of interconnected systems.
It was shown in [8] that the decentralized control of interconnected systems was related to the optimal control of isolated subsystems.When it comes to the optimal control of nonlinear systems,the Hamilton–Jacobi–Bellman (HJB)equation is required to solve first [9].However,this equation is difficult to solve analytically due to the existence of partial differential terms [10].The emergence of adaptive dynamic programming (ADP) provides a new idea for solving HJB equation [11,12],which can overcome this difficulty using neural network (NN) approximation.As a typical method of implementing ADP algorithm,policy iteration (PI) is widely used in the optimal control of nonlinear systems,such as discrete-time nonlinear systems [13],time-delay nonlinear systems [14],uncertain nonlinear systems [15].
The PI algorithm includes two step iterations:policy evaluation and policy improvement [16].Both iterations are implemented under the actor-critic structure that contains two NNs.One is the critic network and it is employed to approximate the solution of HJB equation.The other is the actor network and it attempts to improve the current control policy.Simultaneous PI algorithm was first proposed to solve the optimal control problem for nonlinear systems in [17].The weights of both neural networks were tuned at the same time.Many scholars have done a lot of research on simultaneous PI algorithm [18–20].However,solving the decentralized control problem of interconnected systems via simultaneous PI algorithm has rarely been studied.It is worth emphasizing that the above mentioned control approaches need the initial admissible control policy during the learning process.Unfortunately,the initial admissible control condition is difficult to obtain,which brings limitations to the implementation of this learning algorithm.
In the practical control systems,the saturation nonlinearity of actuator is inevitable [21–23].Hence,it is necessary to take control constraints into account.The existence of control constraints brings new challenges to the system stability control.The constrained-input problem for nonlinear fuzzy systems was investigated in [24].In [25],an adaptive constrained optimal control method was proposed to settle the stabilization problem of discrete-time systems.The mentioned references [21–25] mainly focus on general nonlinear systems with input constraints.However,the interconnected systems with control constraints are still worthy of further investigation.
Motivated by these investigations,in this paper,we develop the neural network-based learning approach to solve the decentralized control problem of interconnected systems with input constraints.The main contributions of this paper include the following aspects.First,the input constraints of interconnected systems are taken into account,which is more practical than those unconstrained systems.The HJB equation is solved in the presence of input constraints.Second,the actor and critic neural networks are simultaneously updated to approximate the optimal control policy and the optimal cost function,respectively.By adding an additional stabilizing term,the initial admissible control policy is not necessary.Third,the states of interconnected systems and the weight approximation errors of two NNs are proven to be uniformly ultimately bounded.
The rest of this paper is organized as follows.In Sect.2,the problem formulation and decentralized control strategy are stated.In Sect.3,the neural network-based learning control method is introduced and the stability analysis is also included.The simulation results are presented in Sect.4.Finally,the conclusion and some future research are shown in Sect.5.
This section consists of two parts.First,the decentralized control of interconnected systems is transformed into the optimal control of each isolated subsystem.Second,the corresponding HJB equation is derived.
Consider a class of continuous-time nonlinear interconnected systems composed ofNsubsystems with input constraints
Assumption 2(cf.[6]) For theith subsystem,the matched interconnection termdi(x(t)) is bounded as
A common choice forψi(?) is the hyperbolic tangent function tanh(?),and then (7) can be further rewritten as
However,the HJB equation is difficult to solve analytically due to the existence of partial differential terms,especially when it is coupled with constrained-input terms.Aiming at this problem,in the next section,an algorithm based on NNs is used to get the approximate solution of HJB equation.
In this section,we investigate an adaptive learning algorithm based on NNs to solve the HJB equation.Then the approximate optimal control policy for theith isolated subsystem can be obtained.The related stability analysis is also included.
Suppose that the cost functionVi(xi) is continuous and differentiable.Then the critic NN is used to approximate the cost functionVi(xi),which follows
Given that the related cost function is estimated by (17),the Hamilton equation (15) can be rewritten as
where the residual erroreHiis expressed as
This residual error is bounded on the compact set Ω,i.e.,whereis a positive constant.Because the ideal weightωiis unknown,we utilizeNestimated weight vectors to constructNcritic NNs.For theith isolated subsystem,the approximate optimal cost function is
Using the chain derivation rule,the weight updating law of critic NN can be expressed as
However,it is noticed that the initial control policy should be admissible when implementing the weight tuning rule (26).The initial stabilizing control policy is usually difficult to obtain,which brings some difficulties to the application of this algorithm.
To address this issue,an improved weight updating law is proposed.Prior to proceeding,the following assumption is discussed.
Assumption 3(cf.[27,28]) Letπi(xi) be the continuous differentiable Lyapunov function for theith isolated subsystem(5),and its time derivative satisfies
Remark 2The first term in (29) shares the same feature with(26),which is intended to minimize the objective function(25).The second term in (29) attempts to guarantee the stability of the closed-loop subsystem during the learning process of the critic network.To be more specific,let the derivative of the Lyapunov function candidate for subsystem (5) be
If the closed-loop subsystem is unstable,then Φi(xi)>0 .Next,differentiatingγdiΦi(xi) along the negative gradient direction,we have
Under these circumstance,Γi(xi)=1,then the second term in (29) works.By contrast,if Φi(xi)<0,we can conclude that the closed-loop subsystem is stable,then Γi(xi)=0,the second term in (29) disappears.Therefore,no initial stabilizing control policy is required due to the existence of the second term in (29).
Defining the weight estimation error of critic NN
Hence,the derivative of critic NN estimation error is derived as
Remark 3To guarantee that the estimated weightconverge to the ideal weightωiprecisely,the persistent excitation (PE) condition should be satisfied.
In this section,we present the simultaneous PI algorithm including actor NN and critic NN.The weights of two NNs are updated at the same time,which are intended to obtain the optimal control policy and the optimal cost function of each subsystem,respectively.
The structure of actor NN should be first developed.Based on (16),the approximate optimal control policy is given as follows:
Here,we give the simultaneous weight tuning law that can guarantee the closed-loop system is stable.Similar to (13),we define
Before the stability analysis,the following assumption is introduced.
Assumption 4(cf.[28])
The convergence of actor and critic NN weights,the uniformly ultimately bounded stability of theith isolated subsystem are shown in the following theorem.It should be noted that all other subsystems are the same as theith subsystem.
Theorem 1Let Assumptions1–4hold.For the ith isolated subsystem described as(5),let the optimal cost function be approximated via(21)and the optimal control law be obtained via(35).Let the critic NN weights be tuned using(38)and the actor NN weights be tuned using(39).Then the states of the ith isolated subsystem xi(t) ,the critic NNweight estimation errorand the actor NN weight estima-tion errorare uniformly ultimately bounded.
ProofThe Lyapunov function for theith subsystem is defined as
Synthesizing the derivations of two cases,ifhold,then(t)<0 is always ensured.Therefore,the states and the weight estimation errors can be guaranteed to be uniformly ultimately bounded.This completes the proof.
In this section,two simulation examples are given to verify the effectiveness of the proposed algorithm.
We first study the nonlinear interconnected system described by the following equations:
To satisfy Assumption 2,we setα1(x1)=‖x1‖ andα2(x2)=‖x2‖,then chooseρ11=ρ12=0.1,ρ21=ρ22=0.1,it is obvious that the interconnection termsd1andd2are upper bounded.We set the initial state of (58) asx0=[1,?1,1,?1]T.LetQ1=Q2=I2,whereI2is the identity matrix with rank(I2)=2,S1=S2=1.presented in Fig.1,which demonstrates that the initial control policy cannot stabilize this system.
Fig.1 The state trajectories of subsystem 2 under u2(x2)=0
Fig.2 The critic weight convergence for subsystem 1
Fig.3 The actor weight convergence for subsystem 1
Fig.4 The critic weight convergence for subsystem 2
The PE condition can be satisfied by adding probing noise to the control inputs during the training process.The probing noise is selected asν(t)=cos(0.1t)+sin2(?1.2t) cos(0.5t)+sin2(1.2t)+sin3(2.4t)×cos(2.4t)},whereThe weight convergence for subsystem 1 is presented in Figs.2 and 3,which shows the weight vectors of both NNs converge to[0.128,?0.449,0.268,?0.744,?0.282]T.The weight convergence for subsystem 2 is presented in Figs.4 and 5.It is clear that both critic and actor NNs converge to [1.787,1.642,0.800,2.246,0.395]T.
Fig.5 The actor weight convergence for subsystem 2
Next,based on (35),the approximate optimal control policies for two subsystems can be obtained,which are described in Fig.6.The approximate optimal control pairis used to stabilize (58).Figure 7 shows the overall system state trajectories using this decentralized control policies.Apparently,all states are regulated to zero underwhich demonstrates the validity of this control method.
Fig.6 The control policies for two subsystems
Fig.7 The overall system state trajectories under (x)
To further verify the application of the proposed algorithm in the practical system,we consider the two inverted pendulums connected by springs which was investigated in [4].The model of inverted pendulums is shown in Fig.8,which can be described as
Fig.8 Schematic diagram of inverted pendulum system
The parameters used in this example are set as follows:ζ1=ζ2=1,m1=1 kg,m2=1.5 kg,?1=?2=0.5 m,?0=1 m,g=9.8 m/s2,κ1=κ2=0.009,k=30,A=0.1,and the spring positiona1=a2=0.1.
The simulation results are presented in Figs.9,10,11,12,13,14.From Figs.9 and 10,it can be seen that two subsystem states are regulated to 0.As for pendulum 1,the weights of critic and actor NNs converge to=[0.0316,0.184,0.124]Taccording to Figs.11 and 12.Figures 13 and 14 provide the weight convergence for pendulum 2,which are regulated to=[0.0694,0.394,0.231]T.All the simulation results show that the proposed algorithm has an effective application in the inverted pendulum system.
Fig.9 The trained state trajectories of inverted pendulum 1
Fig.10 The trained state trajectories of inverted pendulum 2
Fig.11 The critic weight convergence for inverted pendulum 1
Fig.12 The actor weight convergence for inverted pendulum 1
Fig.13 The critic weight convergence for inverted pendulum 2
Fig.14 The actor weight convergence for inverted pendulum 2
In this paper,we have investigated the decentralized learning control for a class of constrained-input interconnected systems based on neural networks.The decentralized control law can be represented by a group of optimal control policies of isolated subsystems.The HJB equation with control constraints is approximately solved and the optimal control policy can be obtained.This proposed algorithm can relax the requirement of initial admissible control condition.Finally,two representative simulations are presented to demonstrate the correctness and superiority of this proposed learning control strategy.
It is noticed that time-triggering control strategy is mainly used and the proposed learning control is a periodic control.For the future work,the event-triggering scheme can be implemented to reduce the communication and calculation burden.Besides,the experience replay technique could be applied to improve the iteration efficiency.
AcknowledgementsThis work was supported by the National Key R&D Program of China (No.2018AAA0101400) and the National Natural Science Foundation of China (Nos.62022061,61921004).
Control Theory and Technology2021年3期