Neural network-based adaptive decentralized learning control for interconnected systems with input constraints

2021-10-13 07:16:50ChaoxuMuHaoLuoKeWangChangyinSun

Control Theory and Technology 2021年3期

Chaoxu Mu·Hao Luo·Ke Wang·Changyin Sun

Abstract In this paper,the neural network-based adaptive decentralized learning control is investigated for nonlinear interconnected systems with input constraints.Because the decentralized control of interconnected systems is related to the optimal control of each isolated subsystem,the decentralized control strategy can be established by a series of optimal control policies.A novel policy iteration algorithm is presented to solve the Hamilton–Jacobi–Bellman equation related to the optimal control problem.This algorithm is implemented under the actor-critic structure where both neural networks are simultaneously updated to approximate the optimal control policy and the optimal cost function,respectively.The additional stabilizing term is introduced and an improved weight updating law is derived,which relaxes the requirement of initial admissible control policy.Besides,the input constraints of interconnected systems are taken into account and the Hamilton–Jacobi–Bellman equation is solved in the presence of input constraints.The interconnected system states and the weight approximation errors of two neural networks are proven to be uniformly ultimately bounded by utilizing Lyapunov theory.Finally,the effectiveness of the proposed decentralized learning control method is verified by simulation results.

Keywords Decentralized control·Actor-critic learning·Neural network·Input constraints

1 Introduction

The nonlinear interconnected systems are a class of complex systems which are composed of several nonlinear subsystems.The interconnected systems play an important role in the field of control because many control systems contain the interconnection terms [1],such as power systems [2],robotic systems [3] and inverted pendulum systems [4].Most interconnected systems have the characteristics of high dimension,strong coupling relations and strong uncertainty.As a result,it is difficult to design effective controllers directly for interconnected systems.To solve this problem,the decentralized control method was proposed in [5].The core of this method is to convert the control problem of interconnected systems to each isolated subsystem.When the decentralized control strategy is applied to design the controllers,only the local information of subsystem is required.Hence,the decentralized control has become a promising method in recent years [6,7],which gives us an effective approach to deal with the control problem of interconnected systems.

It was shown in [8] that the decentralized control of interconnected systems was related to the optimal control of isolated subsystems.When it comes to the optimal control of nonlinear systems,the Hamilton–Jacobi–Bellman (HJB)equation is required to solve first [9].However,this equation is difficult to solve analytically due to the existence of partial differential terms [10].The emergence of adaptive dynamic programming (ADP) provides a new idea for solving HJB equation [11,12],which can overcome this difficulty using neural network (NN) approximation.As a typical method of implementing ADP algorithm,policy iteration (PI) is widely used in the optimal control of nonlinear systems,such as discrete-time nonlinear systems [13],time-delay nonlinear systems [14],uncertain nonlinear systems [15].

The PI algorithm includes two step iterations:policy evaluation and policy improvement [16].Both iterations are implemented under the actor-critic structure that contains two NNs.One is the critic network and it is employed to approximate the solution of HJB equation.The other is the actor network and it attempts to improve the current control policy.Simultaneous PI algorithm was first proposed to solve the optimal control problem for nonlinear systems in [17].The weights of both neural networks were tuned at the same time.Many scholars have done a lot of research on simultaneous PI algorithm [18–20].However,solving the decentralized control problem of interconnected systems via simultaneous PI algorithm has rarely been studied.It is worth emphasizing that the above mentioned control approaches need the initial admissible control policy during the learning process.Unfortunately,the initial admissible control condition is difficult to obtain,which brings limitations to the implementation of this learning algorithm.

In the practical control systems,the saturation nonlinearity of actuator is inevitable [21–23].Hence,it is necessary to take control constraints into account.The existence of control constraints brings new challenges to the system stability control.The constrained-input problem for nonlinear fuzzy systems was investigated in [24].In [25],an adaptive constrained optimal control method was proposed to settle the stabilization problem of discrete-time systems.The mentioned references [21–25] mainly focus on general nonlinear systems with input constraints.However,the interconnected systems with control constraints are still worthy of further investigation.

Motivated by these investigations,in this paper,we develop the neural network-based learning approach to solve the decentralized control problem of interconnected systems with input constraints.The main contributions of this paper include the following aspects.First,the input constraints of interconnected systems are taken into account,which is more practical than those unconstrained systems.The HJB equation is solved in the presence of input constraints.Second,the actor and critic neural networks are simultaneously updated to approximate the optimal control policy and the optimal cost function,respectively.By adding an additional stabilizing term,the initial admissible control policy is not necessary.Third,the states of interconnected systems and the weight approximation errors of two NNs are proven to be uniformly ultimately bounded.

The rest of this paper is organized as follows.In Sect.2,the problem formulation and decentralized control strategy are stated.In Sect.3,the neural network-based learning control method is introduced and the stability analysis is also included.The simulation results are presented in Sect.4.Finally,the conclusion and some future research are shown in Sect.5.

2 Problem formulation and decentralized control strategy

This section consists of two parts.First,the decentralized control of interconnected systems is transformed into the optimal control of each isolated subsystem.Second,the corresponding HJB equation is derived.

2.1 Problem formulation

Consider a class of continuous-time nonlinear interconnected systems composed ofNsubsystems with input constraints

Assumption 2(cf.[6]) For theith subsystem,the matched interconnection termdi(x(t)) is bounded as

2.2 The optimal control for the ith isolated subsystem

A common choice forψi(?) is the hyperbolic tangent function tanh(?),and then (7) can be further rewritten as

However,the HJB equation is difficult to solve analytically due to the existence of partial differential terms,especially when it is coupled with constrained-input terms.Aiming at this problem,in the next section,an algorithm based on NNs is used to get the approximate solution of HJB equation.

3 Adaptive learning algorithm based on neural networks for interconnected systems

In this section,we investigate an adaptive learning algorithm based on NNs to solve the HJB equation.Then the approximate optimal control policy for theith isolated subsystem can be obtained.The related stability analysis is also included.

3.1 Policy iteration algorithm

3.2 The design of critic NN and its weight tuning law

Suppose that the cost functionVi(xi) is continuous and differentiable.Then the critic NN is used to approximate the cost functionVi(xi),which follows

Given that the related cost function is estimated by (17),the Hamilton equation (15) can be rewritten as

where the residual erroreHiis expressed as

This residual error is bounded on the compact set Ω,i.e.,whereis a positive constant.Because the ideal weightωiis unknown,we utilizeNestimated weight vectors to constructNcritic NNs.For theith isolated subsystem,the approximate optimal cost function is

Using the chain derivation rule,the weight updating law of critic NN can be expressed as

However,it is noticed that the initial control policy should be admissible when implementing the weight tuning rule (26).The initial stabilizing control policy is usually difficult to obtain,which brings some difficulties to the application of this algorithm.

To address this issue,an improved weight updating law is proposed.Prior to proceeding,the following assumption is discussed.

Assumption 3(cf.[27,28]) Letπi(xi) be the continuous differentiable Lyapunov function for theith isolated subsystem(5),and its time derivative satisfies

Remark 2The first term in (29) shares the same feature with(26),which is intended to minimize the objective function(25).The second term in (29) attempts to guarantee the stability of the closed-loop subsystem during the learning process of the critic network.To be more specific,let the derivative of the Lyapunov function candidate for subsystem (5) be

If the closed-loop subsystem is unstable,then Φi(xi)>0 .Next,differentiatingγdiΦi(xi) along the negative gradient direction,we have

Under these circumstance,Γi(xi)=1,then the second term in (29) works.By contrast,if Φi(xi)<0,we can conclude that the closed-loop subsystem is stable,then Γi(xi)=0,the second term in (29) disappears.Therefore,no initial stabilizing control policy is required due to the existence of the second term in (29).

Defining the weight estimation error of critic NN

Hence,the derivative of critic NN estimation error is derived as

Remark 3To guarantee that the estimated weightconverge to the ideal weightωiprecisely,the persistent excitation (PE) condition should be satisfied.

3.3 The design of actor NN and simultaneous policy iteration algorithm

In this section,we present the simultaneous PI algorithm including actor NN and critic NN.The weights of two NNs are updated at the same time,which are intended to obtain the optimal control policy and the optimal cost function of each subsystem,respectively.

The structure of actor NN should be first developed.Based on (16),the approximate optimal control policy is given as follows:

Here,we give the simultaneous weight tuning law that can guarantee the closed-loop system is stable.Similar to (13),we define

3.4 Stability analysis

Before the stability analysis,the following assumption is introduced.

Assumption 4(cf.[28])

The convergence of actor and critic NN weights,the uniformly ultimately bounded stability of theith isolated subsystem are shown in the following theorem.It should be noted that all other subsystems are the same as theith subsystem.

Theorem 1Let Assumptions1–4hold.For the ith isolated subsystem described as(5),let the optimal cost function be approximated via(21)and the optimal control law be obtained via(35).Let the critic NN weights be tuned using(38)and the actor NN weights be tuned using(39).Then the states of the ith isolated subsystem xi(t) ,the critic NNweight estimation errorand the actor NN weight estima-tion errorare uniformly ultimately bounded.

ProofThe Lyapunov function for theith subsystem is defined as

Synthesizing the derivations of two cases,ifhold,then(t)<0 is always ensured.Therefore,the states and the weight estimation errors can be guaranteed to be uniformly ultimately bounded.This completes the proof.

4 Simulation results

In this section,two simulation examples are given to verify the effectiveness of the proposed algorithm.

4.1 An interconnected nonlinear system composed of two subsystems

We first study the nonlinear interconnected system described by the following equations:

To satisfy Assumption 2,we setα1(x1)=‖x1‖ andα2(x2)=‖x2‖,then chooseρ11=ρ12=0.1,ρ21=ρ22=0.1,it is obvious that the interconnection termsd1andd2are upper bounded.We set the initial state of (58) asx0=[1,?1,1,?1]T.LetQ1=Q2=I2,whereI2is the identity matrix with rank(I2)=2,S1=S2=1.presented in Fig.1,which demonstrates that the initial control policy cannot stabilize this system.

Fig.1 The state trajectories of subsystem 2 under u2(x2)=0

Fig.2 The critic weight convergence for subsystem 1

Fig.3 The actor weight convergence for subsystem 1

Fig.4 The critic weight convergence for subsystem 2

The PE condition can be satisfied by adding probing noise to the control inputs during the training process.The probing noise is selected asν(t)=cos(0.1t)+sin2(?1.2t) cos(0.5t)+sin2(1.2t)+sin3(2.4t)×cos(2.4t)},whereThe weight convergence for subsystem 1 is presented in Figs.2 and 3,which shows the weight vectors of both NNs converge to[0.128,?0.449,0.268,?0.744,?0.282]T.The weight convergence for subsystem 2 is presented in Figs.4 and 5.It is clear that both critic and actor NNs converge to [1.787,1.642,0.800,2.246,0.395]T.

Fig.5 The actor weight convergence for subsystem 2

Next,based on (35),the approximate optimal control policies for two subsystems can be obtained,which are described in Fig.6.The approximate optimal control pairis used to stabilize (58).Figure 7 shows the overall system state trajectories using this decentralized control policies.Apparently,all states are regulated to zero underwhich demonstrates the validity of this control method.

Fig.6 The control policies for two subsystems

Fig.7 The overall system state trajectories under (x)

4.2 The parallel inverted pendulum system

To further verify the application of the proposed algorithm in the practical system,we consider the two inverted pendulums connected by springs which was investigated in [4].The model of inverted pendulums is shown in Fig.8,which can be described as

Fig.8 Schematic diagram of inverted pendulum system

The parameters used in this example are set as follows:ζ1=ζ2=1,m1=1 kg,m2=1.5 kg,?1=?2=0.5 m,?0=1 m,g=9.8 m/s2,κ1=κ2=0.009,k=30,A=0.1,and the spring positiona1=a2=0.1.

The simulation results are presented in Figs.9,10,11,12,13,14.From Figs.9 and 10,it can be seen that two subsystem states are regulated to 0.As for pendulum 1,the weights of critic and actor NNs converge to=[0.0316,0.184,0.124]Taccording to Figs.11 and 12.Figures 13 and 14 provide the weight convergence for pendulum 2,which are regulated to=[0.0694,0.394,0.231]T.All the simulation results show that the proposed algorithm has an effective application in the inverted pendulum system.

Fig.9 The trained state trajectories of inverted pendulum 1

Fig.10 The trained state trajectories of inverted pendulum 2

Fig.11 The critic weight convergence for inverted pendulum 1

Fig.12 The actor weight convergence for inverted pendulum 1

Fig.13 The critic weight convergence for inverted pendulum 2

Fig.14 The actor weight convergence for inverted pendulum 2

5 Conclusions

In this paper,we have investigated the decentralized learning control for a class of constrained-input interconnected systems based on neural networks.The decentralized control law can be represented by a group of optimal control policies of isolated subsystems.The HJB equation with control constraints is approximately solved and the optimal control policy can be obtained.This proposed algorithm can relax the requirement of initial admissible control condition.Finally,two representative simulations are presented to demonstrate the correctness and superiority of this proposed learning control strategy.

It is noticed that time-triggering control strategy is mainly used and the proposed learning control is a periodic control.For the future work,the event-triggering scheme can be implemented to reduce the communication and calculation burden.Besides,the experience replay technique could be applied to improve the iteration efficiency.

AcknowledgementsThis work was supported by the National Key R&D Program of China (No.2018AAA0101400) and the National Natural Science Foundation of China (Nos.62022061,61921004).

Control Theory and Technology2021年3期

Control Theory and Technology的其它文章: Distributed projection subgradient algorithm for two-network zero-sum game with random sleep scheme; H∞ output feedback control for large-scale nonlinear systems with time delay in both state and input; A characteristic modeling method of error-free compression for nonlinear systems; Adaptive Kalman filter for MEMS IMU data fusion using enhanced covariance scaling; Extremum seeking-based optimal EGR set-point design for combustion engines in lean-burn mode; Heuristic dynamic programming-based learning control for discrete-time disturbed multi-agent systems

国产日韩欧美一区二区三区三州_亚洲少妇熟女av_久久久久亚洲av国产精品_波多野结衣网站一区二区_亚洲欧美色片在线91_国产亚洲精品精品国产优播av_日本一区二区三区波多野结衣 _久久国产av不卡