System identification and real-time pattern recognition by neural networks for an activated sludge...

13
Pergamon Environment Intematimal, VoL 21, No. 1, pp. 57-69,1995 Cop4rripht @1995 Elrevier Science Lid Printedin the USA. All righta resewed 0160-412w5 $9.50+.00 0160.4120(94)00024-7 SYSTEM IDENTIFICATION AND REAL-TIME PAITERN RECOGNITION BY NEURAL NETWORKS FOR AN ACTIVATED SLUDGE PROCESS Chunsheng Fu Department of Chemical Engineering, University of Cincinnati, OH 45221-0171, USA Mane1 Poch Unitat d’Enginyeria Qulmica, Universitat Autboma de Barcelona 08193, Bellaterra, Barcelona, Spain EI 9403-135 M (Received 15 March 1994; accepted 11 August 1994) This study introduces the application of a neural network method to estimate chemical oxygen demand (COD) in a wastewater treatment process. It presents the back propagation algorithm with a generalized sigmoid function in detail. Several simulations were investigated in order to select suitable learning rates and relative coefficients in the network for accelerating the speed of convergence in learning. The results of simulation to estimate real sampling data of COD for a specific real world wastewater treatment process are explained. INTRODUCTION Effective control of the dynamic behavior of a unit process, or of the entire treatment plant, depends on three factors (Lessard and Beck 1991): 1) the ability to observe the state of the process and its response to various perturbations (i.e. monitoring); 2) the ability to relate causes (inputs and controls) to effects (out- puts, responses); and 3) the capacity to act by manipulat- ing the causes (control inputs) to correct undesirable effects or bring about more desirable effects. Like other biotechnological processes, real-time control for an activated sludge process is a difficult problem due to a lack of reliable on-line measure- ment instruments, a lack of a truly compelling output performance specification for the exercise of opera- tional control, and a limited technological capacity to implement control actions. In addition, there are some specific problems like the great variability of input (both in quantity and quality) and complex interactions between the different microorganism populations present in such a process (Serra and Mane1 1993). Pattern recognition is among several reasonable methodologies for evaluation of a process’ (physio- logical) state and a sound basis for decision making. Patterns are derived, i.e. related, from a series of several concomitant direct measures or estimates of state variables. The link between the time courses of

Transcript of System identification and real-time pattern recognition by neural networks for an activated sludge...

Pergamon Environment Intematimal, VoL 21, No. 1, pp. 57-69,1995

Cop4rripht @1995 Elrevier Science Lid Printed in the USA. All righta resewed

0160-412w5 $9.50+.00

0160.4120(94)00024-7

SYSTEM IDENTIFICATION AND REAL-TIME PAITERN RECOGNITION BY NEURAL NETWORKS FOR AN ACTIVATED SLUDGE PROCESS

Chunsheng Fu Department of Chemical Engineering, University of Cincinnati, OH 45221-0171, USA

Mane1 Poch Unitat d’Enginyeria Qulmica, Universitat Autboma de Barcelona 08193, Bellaterra, Barcelona, Spain

EI 9403-135 M (Received 15 March 1994; accepted 11 August 1994)

This study introduces the application of a neural network method to estimate chemical oxygen demand (COD) in a wastewater treatment process. It presents the back propagation algorithm with a generalized sigmoid function in detail. Several simulations were investigated in order to select suitable learning rates and relative coefficients in the network for accelerating the speed of convergence in learning. The results of simulation to estimate real sampling data of COD for a specific real world wastewater treatment process are explained.

INTRODUCTION

Effective control of the dynamic behavior of a unit process, or of the entire treatment plant, depends on three factors (Lessard and Beck 1991): 1) the ability to observe the state of the process and its response to various perturbations (i.e. monitoring); 2) the ability to relate causes (inputs and controls) to effects (out- puts, responses); and 3) the capacity to act by manipulat- ing the causes (control inputs) to correct undesirable effects or bring about more desirable effects.

Like other biotechnological processes, real-time control for an activated sludge process is a difficult problem due to a lack of reliable on-line measure- ment instruments, a lack of a truly compelling output

performance specification for the exercise of opera- tional control, and a limited technological capacity to implement control actions. In addition, there are some specific problems like the great variability of input (both in quantity and quality) and complex interactions between the different microorganism populations present in such a process (Serra and Mane1 1993).

Pattern recognition is among several reasonable methodologies for evaluation of a process’ (physio- logical) state and a sound basis for decision making. Patterns are derived, i.e. related, from a series of several concomitant direct measures or estimates of state variables. The link between the time courses of

58 Chunsheng Fu and Mane1 Poch

NOTATIONS Xl On-line or off-line measurable state and

:ODi DO E f(e) Fin Fr H j m 0 OUR tji

Proportional coefficient Input COD Dissolved oxygen Mean square error Nonlinear function vector Input flow Recirculation flow Number of neurons in the j-th layer Number of output neurons output Oxygen uptake rate in the reactors Bias providing a threshold for the activation of the i-th neuron in the j-th layer

V Total volume of the bioreactors

output vector x2 On-line unmeasurable or off-line

measurable sampling data Xin Concentration of biomass in the reactors W Weighing vector

Yi Desired value of the i-th output unmeasurable or off-line measurable element

GREEK LETTERS

Saturation value for the neuron’s output Modification factor

+i Learning rates (i=1,...4)

05 Coefficient for the momentum terms 2 Time delay

of several variables provides more information than the sum of the interrelations of the individual state variables exploited. Patterns do not need to describe the entire process duration (full-length trajectories), they can code individual fractions, such as typical phases of a process (Bernhard and Georg 1993). There are many different pattern recognition methods such as statistic pattern recognition (Fukunaga 1972), sub- space pattern recognition (Schurmann 1978), and fuzzy pattern recognition (Chen and We 1986).

With the recent progress in artificial intelligence, neural computation, involving computation with neural networks, has been attracting much attention in the field of pattern recognition (Kaoru 1992). The idea of using artificial neural networks (ANNs) as a poten- tial solution strategy for problems which require com- plex data analysis is not new (Massimo 1991). Over the last 40 to 50 y, scientists have been attempting to emulate the real neural structure of brain and to develop an algorithmic equivalent of the learning process. The principal motivation behind this re- search is the desire to achieve the sophisticated level of information processing that the brain is capable of. The structure, however, of the human brain is ex- tremely complex. Indeed, whilst the function of a single neuron is relatively well understood, the col- lective role of neurons within the conglomeration of cerebrum elements is less clear and a subject of avid postulations. Consequently, the architecture of an ANN is based upon a primitive understanding of the functions of the biological neural system. Even if neural physiology could untangle the complexities of

the brain, due to the limitations of current hardware technology, it would be extremely difficult, if not impossible, to emulate exactly its immensely distributed structure. Thus, rather than attempt to accurately model the intricacies of the human cerebral functions, ANN’s attempt is to capture and utilize the connec- tionism philosophy on a more modest and manage- able scale. Indeed, applications of ANNs to solve process engineering problems have already been reported (Massimo 1991; Hoskins and Himmelblau 1988; Ungar 1990; Bhat and Mcavoy 1990).

The aim of this article is to introduce an applica- tion of artificial neural networks in system iden- tification and pattern recognition for an activated sludge process.

PROPOSED ANN METHOD FOR SYSTEM IDEN- TIFICATION

The ANN system can be considered as a large-scale parallel network system consisting of a great number of simple processing elements. It works under a com- plex environment. It continuously carries out learn- ing and relearning from a new environment or from new characteristics of an existing environment. A state transformation will take place when an ANN system obtains strong enough input information. Competition and coordination are carried out under the way of the transformation. As soon as astate over- whelms its opposite state and becomes winner, it will become a steady state during the process of learning, Based upon different judgement rules, ANN systems have different steady states which demonstrates a

Estimation of COD in a wastewater treatment process 59

basic feature in a biological neural network system (Guangcheng 1991).

Two classes of ANNs have received considerable attention in recent years: 1) multilayered neural net- works and 2) recurrent networks. Multilayered net- works have proved extremely successful in pattern recognition problems (Burr 1988; Gorman and Sej- nowski 1988; Widrow 1988), while recurrent net- works have been used in associative memories as well as for the solution of optimization problems (Hopfield 1982; Hopfield and Tank 1985; Tank and Hopfield 1986; Rauch and Winarske 1988). How- ever, several ANN topologies have been proposed. Each of them differs in the number and character of the processing neurons, the connections, the training procedure, and whether the input-output values are continuous or discrete. Back propagation (BP) neural networks are the most prevalent ANN architectures for practical problems and it has proven successful in its ability to model nonlinear relationships. The network topology chosen for this work is specified by a multilayered BP neural network with general- ized sigmoid functions for all neurons, described as follows:

ANNs include a broad class of computing architec- tures and algorithms. They are all inspired by biologi- cal networks in which a large number of relatively simple independent processors (neurons) are connected by links (dendrites and synapses). From an engineer- ing standpoint, they can be most elegantly described

by a simple function providing a mapping from input space to output space.

Consider the following general form of a discrete and nonlinear dynamic system:

Ik*l) x2 - f (Xik), Xik) ) (1)

where xlck) (ER”) is an o n-line or off-line measurable state and output vector, x2(k) (eRm) is an unmeasurable state vector at k time, or a vector difficult to be measured (due to the expensive price of its measure- ment instrument or considerable measurement delay). However, off-line sampling data may be obtainable at some moment, and f(e) (eRm) is a nonlinear func- tion vector which maps input states at k time into the states at k+l time.

Assuming that the internal structure of Eq. 1 is unknown, but a series of measurement values of x1(k), x2(k), xl&+t), x2(k+l) can be obtained, then the dynamic characteristics of the above nonlinear sys- tem presented by Eq. 1 can be reflected by the non- linear mapping of a neural network from input samples to output samples with the meaning of system iden- tification. On the other hand, the network serves as state estimation, carrying out the mapping from cur- rent states to next states of an investigated objective.

Considering a general multilayered neural network architecture (as shown in Fig. 1). the circles repre- sent neurons arranged in one input layer with M input variables, L hidden layers, and an output layer with m output variables. Input variables consist of n

the L-th hidden layer the first hidden layer

Fig. 1. Multilayered artificial neural network.

60 Chunsheng Fu and Mane1 Poch

measurable variables and m unmeasurable variables which may be off-line measurable at some time or estimable variables (M=m+n). Weighing vectors are expressed by:

Furthermore, hig(j)(j=l. . . . . L, q=l, . . . . H., i=l, . . ..Hj+t) is the input from the q-th neuron in t h e j-th hidden layer to the i-th neuron in the (j+l)-th hidden layer or the output layer when j=L. Hj is the number of neurons in the j-th layer.

Each neuron must have a rule for combining the input from other neurons to provide a new state of activation. This activity is a reflection of the inten- sity with which input impinges on the neuron. For this study, the activity of the i-th neuron in the j-th hidden layer as shown in Fig. 2 was given by:

q-1 zji - c hji,Oj-l,,] + t’f

(2) Pl

where O._, 9 is an output from the q-th neuron in the (j- 1)-th /ayer,

is the weight gain vector, and tji is a bias providing a threshold for the activation of the i-th neuron in the j-th layer. The bias is essential in order to classify network input pattern into various subspaces. The hidden neurons in the network are functions of the observed variables, but typically do not have clear physical interpretations. The state of Zji(t) (j=O, . . . . L+l) over the sets of neurons capture what the system is presenting at any time t. Thus, processing in a neural network can be viewed as the evolution of a pattern of activity through time.

The neuron must have an output function that maps the current state of activation to an output signal.

Fig. 2. Features of an artificial neuron,

Oji

-7.5 -2.5 2.5 7.5

zji

Fig. 3. Generalized sigmoid function: O,,(Z,J=eJ(l +exp(-PliZr)) (curve 1: @,=l, p,,=l; curve 2: 0#=1, p,,=O.5; curve 3: 0,,=1.5,

p#=l: curve 4: &=1.5. &=O.S).

Typically, the output function used is a threshold function for which no output is produced unless the neuron’s activation exceeds a particular threshold value. This threshold quality forms the basis for dis- criminant functions, which result in cutting the fea- ture space Rm into regions corresponding to different classifications. The output function for the i-th neuron in the j-th layer in this work is a generalized sigmoid function given by:

(4)

where 8ji is saturation value for the neuron’s output, pji is used to modify slope of the function in S form as shown in Fig. 2 and Fig. 3.

Equation 4, approximating a threshold logic unit, embodies a number of concepts from the pattern recognition literature. But the combination of these concepts is novel. The output from any neuron may be input to other neurons. Since the output of each neuron depends only on its input and its threshold, each neuron can be considered a separate processor operating in parallel with other ones. Due to the fact that the function given by Eq. 4 is differentiable and nondecreasing, it can be used in conjunction with the Back propagation learning procedure to train multi- layered nets. The network of Fig. 1 learns by making changes in its weights and other relative parameters which appears in Eq. 4. Indeed, learning can be used to determine the proper values of the connection strengths that allow all neurons to achieve the correct

Estimation of COD in a wastewater treatment process

state of activation for a given pattern of inputs. Once the pattern of activation is established, the resulting outputs let the network classify an input pattern. The adaptive nature of ANN allows the weights to be learned by experience, thus producing a self-organiz- ing system.

The most investigated supervised learning algo- rithm, the so-called Generalized Delta Rule (GDR) is used for this work. GDR follows an iterative gradient descent algorithm designed to minimize the mean square error E between the actual outputs from the output layer and desired or target outputs for all the input patterns, i.e.:

E- f: 4 P-1

(6)

where P and m denote the number of training patterns presented to input layer and the number of units in the output layer, respectively, and yi (i.e., Xzi(k) presented in Fig. 1 for this work) represents the desired value of the i-th output unmeasurable or off- line measurable element, while OL+l,t’ i.e., Xzi(k) for this work, is the actual output of the same element from the network.

Learning for this work begins by initializing all the parameters w, t, 0 and p presented in Eq. 2 and Eq. 4 as small real values between -0.5 to 0.5 ran- domly selected. If the error obtained by Eq. 5 for all the input-output pairs (training patterns) is below a specified learning tolerance (0.015 in this work), no changes for all the parameters take place. Otherwise, they are adjusted according to:

Wjiqtk+l) - w&d - $y$- + 4% hjiq(k) -j&-l) 1 lb

(7)

tji (k+l) _ tji (k) - 0,s + 4s 1 tjf (k) -tji tkB1) 1 ji (8)

pji (k+l) _ pji(k)- c#3,% + @s[Pji(k)-Pji(k-l)l (9)

f$,(k+l) - O,,(k)- 43% + 4J,~ej,uc)-ej,(k-1)1 (10)

where k corresponds to the number of iteration, @t, &, 9s and e4 are usually called the learning rates, $S denotes the coefficient of the momentum terms. Ac- cording to Eq. 2 and Eq. 4, the following formulas \ can be obtained:

-sL azi _1

Qd

(11)

(12)

(13)

a0 f -I+- -0ji) Zji (14) a ji - Ojl(l eji

ao i at Zji\ * Oji (l-2) Pji (15)

Assume:

2% az,, - -bji

aEp_ ao,, -Oji (17)

Then:

aEp _ a-% azji %iq az,, awjiq

_ -8jioj_lsi (18)

z- azjl a4 - -5ji (19) azji at,,

62 Chunsheng Fu and Mane1 Poch

aE, aEp ao,,

ap,i a~,, ap,,

All the variables except 6ji and ~ji in the Equations from (18) to (21) can be calculated in the feedforward way. Both 6ji and aji are proportional to the dif- ference between the actual output OL+I i and the target output yL+l,i’ given as follows: ’

For the output layer:

4 _ .._!%?__ aE, do,+, f L+l.i azL+l,i aoL+l,f G

according to Eq. (5) and Eq. (15), then:

(22)

0 6 L+l,i - (yi-OL+l,i) OL+l,i (1-e’ P&+1,1. (23)

L+l,f

OL+l, 1 - yf-oL+l,i (24)

For the hidden layers:

-Oji (25)

according to Eq. (2) and Eq. (16). then:

j+l,iWj*l,iQ (26)

6ji is determined by:

(27)

“ji for a hidden layer depends only on 6 for the Q+l)-th layer in the network. Therefore oi for the same j-th layer can be calculated in para lel. The learning process involves two phases. First, the in- puts are propagated in a feedforward fashion through the network to produce output values. Then, those output values are used to compare with the desired or target outputs, resulting in an error signal for each of the,output neurons. Second, the error is propagated backward through the network using the following procedure. SL+t i and <TL+t,i for the output layer are

calculated first, according to Eq. 23 and Eq. 24. These &+I i and oL+r i are used to calculate Eli and 8Li for tke L-th hidden layer using Eq. 26 and Eq. 27, respectively. Then, 6.. and aji comes recursively out of Oj+l i and ~j+t i i’ ayer by layer till 6ti and bli, These 4i and Oji can then be used to update all the parameters: Wjis,

according to Eqs. t’i, 8.., and Pji for all the layers

7-10 atd 18-21. After representation of the first input-output pair, one proceeds with the second pair, and so on.

PROCESS AND PROBLEMS

The activated sludge process is the most common- ly used biological system for wastewater treatment. It mainly consists of several biological reactors and solid-liquid separators. The reactors’ function is to bring the biomass into as intimate a contact as pos- sible with the substrate (sewage) and to transform the biodegradable constituents (substrate) into new biomass, carbon dioxide, water, and residual organics matter with dissolved oxygen supplied by aerators (Serra and Manuel 1993). The clarifier’s functions are to separate the sludge from the treated sewage and thicken the sludge before it is recycled to the reactor. The presence of the recycle loop requires a good understanding of the interaction between the bioreac- tor and clarifier, if process control is to be successful, especially for the control of COD in the output.

Figure 4 presents the scheme of the investigated plant. There are two primary settling tanks, two bioreac- tors with mechanical aerators, and two secondary clarifiers as the last units of technological process in the plant. Each bioreactor consists of three cells. Volume of each cell is 1800 m3. The process can be briefly described as follows:

After the pretreatment through the primary set- tling tanks, input water is treated in bioreactors by the microorganisms’ action, the concentration of sub- strate is reduced. Then, the water flows to the secon- dary clarifiers. The clarified water is collected at the top of clarifiers and it goes out of the plant. A fraction of the sludge is returned to the input of the bioreactor in order to maintain an appropriate concentration of biomass, allowing the oxidation of the organic mat- ter. The rest of the sludge is wasted.

There are no automatic controls in the plant. Two main control loops, recirculation flow to control COD in the output and the working time and the state of aerators to control dissolved oxygen (DO) in the reactors, are presently all based on manual operation. But, automatic computer control for the two loops is being developed and will be realized in the near future. Some key parameters, such as COD in both the input

Estimation of COD in a wastewater treatment process 63

and the output, concentration of biomass in the bioreactors have to be measured by off-line analysis, so that time delay to obtain results of the measure- ment is considerable. However, there are substantial (several years) historical data available in the plant.

IDENTIFICATION AND PATTERN RECOGNITION

Over the last decade, biosensor technology has been evolving rapidly, however, the benefits of their application are still to be realized on an industrial scale. Lack of biosensor reliability and more impor- tantly the financial consequences of sensor failure in its widest sense have served to maintain the prevalence of off-line sample analysis for bioprocess monitoring and supervision. However, often the fre- quency of off-line analysis of the controlled process output is insufficient to maintain tight process regula- tion. A potential solution to this problem is to use on-line estimation or recognizor to provide fast in- ferences of variables during the off-line analysis in- tervals.

After the analysis of the data from the investigated real world process, a phenomenon has been found

that there are quite different characteristics for a group of fixed values of some obtainable input vari- able. An important reason could be the difference between the activity of biomass at different times. In order to understand changed characteristics of such a running process as soon as possible, a pattern recogni- tion software was developed for the investigated process. For simplicity, two different kinds of process patterns are considered. One, defined as “strong active” process pattern, assumes that the value of COD in the output is comparatively higher under a group of fixed control actions (a fixed amount of recirculation flow and a group of fixed working states of the stirring motors for all the bioreactors). The other is a “weak active” process pattern with a comparatively lower COD value in the output under a group of fixed control actions.

The work for identification and pattern recogni- tion was divided into two steps. First, benefitting from the data of the process, system identification was carried out by using the ANN method in order to obtain two primary recognition models for the two patterns, respectively. Secondly, pattern recognition

1 I*& , biore;(l-6)

, I T

o- on line

O.-.- .._...-. off line

n COD0

secondary i/

I E3Lxlm secondary

settler2

Fig. 4. Wastewater treatment process.

64 Chunsheng Fu and Mane1 Poch

software with a learning function was developed, based on the two primary recognition models.

System id8t’ItifiCatiOn

To design the structure of ANN for the process, five input variables as five neurons in the input layer were selected, such as input flow (Fin), COD in the input (CODi), the concentration of biomass (Xin) and the oxygen uptake rate (OUR) in the reactors, and recirculation flow (Fr). Among the five vari- ables, Fin and Fr can be measured on line, CODi and Xin can be obtained by off-line analysis, and OUR can be calculated by the authors’ developed estima- tion model which is based upon the on-line meas- urable variable DO (dissolved oxygen). Although COD in the output (CODo) sometimes can be obtained by off-line analysis, time delay is considerable. The time delay z for obtaining CODo mainly depends on its corresponding Fin and Fr. The smaller the sum of Fin and Fr are, the bigger z is. z can be determined by:

T- V

Fin+Fr (28)

where V is the total volume of the bioreactors which is constant. It is obvious that, for a group of input- output data sampled at the same time k, input [Fin(k), Fr(k), Codi( Xin(k), OUR(k)] does not influence the output [CODo(k)]). However, there must be a cause and effect relationship for the data ([Fin(k), Fr(k), Codi( Xin(k), OUR(k)], [CODo(k+z)]). From an engineering standpoint, prediction to CODo(k+z) based on the data [Fin(k), Fr(k), Codi( Xin(k), OUR(k)] is significant. Therefore, CODo is selected as only one neuron in the output layer in order to obtain the steady relationship between the input variables and CODo by means of identification. The number of hidden layers is considered to be two since the characteristics of the objective are com- plex and highly nonlinear. The numbers of neurons in each of both the hidden layers are variables which can be modified manually in this study. After the com- parison of several constitutions, the constitution with 15 neurons for the first hidden layer and 8 neurons for the second hidden layer is determined, considering the speed of iteration during the learning in identification and accuracy of the model. Four key parameters of the BP algorithm proposed above, el, &, I$~, & are studied with different values while e2 is kept a constant (0.2).

Since the action of the momentum terms presented by Eqs. 7-10 is considered to speed up convergence and prevent divergent oscillations in learning, one

strategy to select the coefficient of the momentum terms is heuristically determined by:

4% - aE (29) where:

E>O.S a- 0.52iDO.l

EsO.1 (30)

The problem in the data is that historical data of CODo with the exact time delay z are absent because z is variable depending on Fin and Fr. To solve this problem, it is assumed that a historical data set ([Fin (k), Fr (k), CODi (k), Xin (k), OUR (k)], [CODo (k)]) (k=l,2,...) can be obtained. A new data set ([Fin(k), Fr(k), CODi( Xin(k), OUR(k)], [CODo (k’)]) (k, k’=1,2,...) for identification is determined by the following way of data selection.

Assuming that an index is given by:

P(k,i) - [k+r (k) I- (k+i) 1 (31)

where z(k) is calculated by:

r(k) - V Fin (k)+Fr (k)

then:

(32)

COD,(k')- i COD,(k+i) : Plk,i)

(33) _ Mid p(k, j) ) 1 k,k’-1,2,...

3

The results of training on the rearranged historical data set are shown in Fig. 5a, b, and c. Based on these results, several conclusions can be presented as fol- lows:

First, the result shown in Fig. 5a reveals that the higher the learning rate 4, is, the bigger the final total error will be. If a too high learning rate Q1 is selected, divergence and oscillation in training will happen, especially for highly distributed training data like the process mentioned above.

Second, Fig. 5b explains that the sizes of r$., and I$~ should be moderate. Too big or too small sizes are not good for convergence in training.

Third, the result shown in Fig. 5c suggests that training with a large size of 4s (the momentum coef- ficient) will have a faster convergent speed at the initial part of training, but finally, divergence will happen. Otherwise, training with a too small size or even zero of es will result in a lower convergency speed. Those phenomena could be explained as fol- lows, In the initial period of the training time, the error between the training data and those from ANNs

Estimation of COD in a wastewater treatment process 65

I I

20000 40000

times of iteration

t B . -

2.5 4#3=0+5, tp4=0.5

v - .#3=0.4, #4=0.4 2 . - 93-0.6. qs4-0.6 0 2.0

t: Q) 1.5

I I

20000 40000 times of iteration

I

60000

I I

25000 50000

times of iteration

Fig. 5. Learning behaviors for various learning factors and coefficient with different size. A: 92=0.2, $3=0.5, +4=0.5, @S=heuristic; B: +1=0.6, 92=0.2, $S=heuristic; C: (pl=O.6, 92=0.2, $3=0.5, $4=0.5.

for each iterative step is usually large. So, in this period of time, a comparatively large size of 41~ will be good for fast convergence of the training. How- ever, the error’will be very small after a long time of training. The second terms on right sides in Eqs. 7- 10 are small. The third terms on the right sides in Eq. 7-10 would be comparatively large and may result in divergence if 4r5 is still large. Therefore, the heuris- tic value of e5 given by Eq. 29 and 30 in this work is helpful to speed up the convergent speed without divergence.

After the comparison of various sizes of learning factors and the coefficient pi (i=l,3,4.5), the struc- ture of ANN is determined by the specification of 5 neurons for the input layer, 15 neurons for the first hidden layer, 8 neurons for the second hidden layer,

1 neuron for the output layer, (p1=0.6, e2=0.2, Q3=0.5, $4=0.5, $5=heuristic values. Based on the specified structure of the ANN, two ANN prediction models for the strong active and the weak active processes have been established after training on two different his- torical data bases of the plant.

From an engineering standpoint, it is easy to un- derstand that each estimated COD, value by the model identified on other new rearranged historical data set is, in fact, a predictive value of COD,, which is helpful for understanding and controlling the process.

The quality of prediction of the identified model was characterized using the criteria of the Residual Standard Deviation (RSD) which provides an indica- tion of the accuracy of the prediction:

66 Chunsheng Fu and Mane1 Poch

(34)

where yi and Oi are real data (COD,,) and model output, respectively, and m is the number of output neurons (m=l for this work).

Four different numbers of predictive steps are car- ried out by one of the two developed models, based on the new rearranged historical data set. The results, as shown in Fig. 6 a-d, explain that the performance of the one-step prediction (2 h) is the best. With increasing the number of predictive steps, the per- formance of the prediction will come down. These results indicate that the performance of prediction for a steady-state process is better since the ANN model developed by the method of BP is, in essence,

a steady-state model. Therefore, if the model devel- oped on a given training data set is to predict a process similar to the training data set, its result will be good enough generally. Otherwise, its perfor- mance of prediction will be dropped. However, the amount of historical data for the initial model train- ing is not sufficient to describe all possible charac- teristics of the process. In addition, with the passage of time, changes in the amount of wastewater and the concentration of COD in the input flow to be treated will be inevitable, so that the initial identified input patterns will be out-of-date at that time. For this reason, a learning function has been developed to modify the parameters of the identified prediction model. A data base for the long-term training of the models’ parameters is going to be established, in

-0 10 20 ti3&e (h 40 50 60

-9 -12 I I I I I 1

0 10 20 30 40 50 60

time (h )

._ 0 10 20 ti$e

(h 7 50 60

._ 0 10 20

tige <Ry 50 60

Fig. 6. Model predictive errors of COD in the output for four kinds of prediction steps. A: l-step; B: 2-step: C: 3-step; D: 4-step.

Estimation of COD in a wastewater treatment process 67

points of sample

points of sample

Fig. 7. One-step prediction by neural network model comparing with real data. A: for strong process; B: for weak process.

order to make the performance of the prediction and recognition increasingly better.

Pattern recognition

Pattern recognition for the investigated wastewater treatment process is fulfilled by two steps. The first step is to predict COD, by the two typical kinds of identified prediction models for both the strong and week processes and to obtain two predictive values,

ICOD, (k) - COD,, (k) 1 z iCOD, (k) - COD,,(k) 1 (35)

If the above relationship holds, the pattern recog- nition will give the result that the current running

process belongs to the strong active process. Other- wise, the current running process belongs to the weak active process.

Since the performance of the one-step prediction is the best for comparing with other multi-step predic- tions, one-step prediction strategy is considered for real-time prediction of COD,, if sampling and analytical data of COD, can be obtained for each of the two hours. As soon as a new group of sampling and analytical data is obtained, a short-term (once) iteration and training on the new group of data will be carried out to obtain a new group of parameters of the prediction model, so that the predictive error for the next step will be reduced.

The results of the one-step prediction test for both the strong and weak active processes are shown in Fig. 7 a and b, based on a new set of historical data (54 groups of rearranged coupled, in possible degree,

68 Chunsheng Fu and Mane1 Poch

input and output data). These simulation results are satisfactory.

CONCLUSIONS

ANNs provide pattern recognition facilities which can be used and interpreted in several ways. They perform multiple nonlinear regression on input-out- put pairs. Although adaptive networks have many similarities with well-established statistical tech- niques for system identification, they still offer promise of a major benefit, primarily in suggesting new equations, architecture, and algorithms. ANN formalism proposes several powerful nonlinear func- tional forms to use. Use of highly interconnected nonlinear systems allows unexpected interactions to be captured.

A special learning algorithm of ANNs with general- ized sigmoid functions is presented here for system identification. After the selection of several topologies, a multi-layer ANN with 5, 15, 8, 1 neurons for the input, the first hidden, the second hidden, and the output layers, respectively, is determined for the development of the estimation and prediction models for a real-world activated sludge process. Several possible sizes for some of the learning rates and the coefficient proposed in this algorithm have been studied. Especially, proposed heuristic values for the coefficients of the momentum terms in Eqs. 7-10 provide an obvious advantage for convergent speed of training. However, it should be explained that all the results in this work, i.e., the sizes and steps of various operations during identification and learn- ing are especially applicable to the plant and recon- stituted data set mentioned above. Results from identification by ANNs would be different for other processes and data sets. But, the principle for select- ing suitable sizes of the parameters and learning rate for ANNs should not be different.

Real-time parameter identification and pattern recognition play key roles in activated sludge process control. In fact, given the inherent difficulty of ob- taining direct process information, computer process- ing of simple measurements seems the only way to obtain an updated picture of the process development. Hence, of particular importance in this work is using the ability of the ANN model to provide fast in- ference and prediction of important, but difficult to measure or/and measurable with considerable time- delay, process output (like COD, for an activated sludge process) from other measured or/and cal- culated variables. Several different steps of predic- tion by the model have been studied. The result of one-step prediction of COD in the output is satisfac-

tory. Therefore, as a concluding remark on this ar- ticle, applications to data obtained from industrial processes reveal that, given an appropriate topology and parameters, ANN could be trained to characterize the behavior of the systems considered.

APPENDIX

Recursive estimation of OUR is based on the fol- lowing differential equation:

de 2-E

- K,, A ( c,,, - CL’ ) - OUR (36)

where:

F tot

V OUR Ci

C C .*t RI. A

F + cot ( Cl - c ) V

total input flow including input flow and recirculation flow (m3/h); volume of the reactor (m3); OUR in the reactor (mg/Lh); dissolved oxygen (DO) in the bioreactor input (mg/L); DO in the bioreactor output (mg/L); saturated DO (mg/L); global mass transfer coefficient of oxygen; aeration factor relating to the number of aerators switched on at each moment.

From Eq. 36, if K,, is known, OUR can be simply determined. However, in practice, K,, is unknown and varies gradually in time. Thus, it had better to be determined simultaneous with OUR.

The following presentation is to briefly introduce how to obtain estimated values of OUR and K,,, arising from measures of DO and having A(t) and F,,t as disturbances:

Considering a stationary working point of DO is C,, state differences a(t), ca(t), our, c&t) for A, CA, OUR, and Co, respectively, can be defined by:

a (Cl -A(t) -A, (37)

ca(t) - C(t)A(t) - (C(t)A(t)) e (38)

oux - OUR - OUR, (39)

c,(t) - c,(t) - c,, (40)

Estimation of COD in a wastewater treatment process 69

fl _ e+%ocT

(41)

f2 - (1 - f,)/F,,, (42)

where T is the sampling interval. Then, from Eq. 36, the discrete model for the identification can be writ- ten as:

c(t) - f,c( t-11 + fi [K,,C,,,a(t-1)

(43) + F,,,c,( t-l) 1 - K-,cF,,, (t-1) - our

and functions defined as:

y(t) - c(t) - f,c(t-1)

4(t) - rf,(C,a(t-l)-ca

and a vector:

- Ftocco (t-1)

(t-111 , -fJ

6 T( t) - r K,, , our 1

(44)

(45)

(46)

Thus, the recursive equation can be written as:

0 (t) - B(t-1) + K(t) [y(t) - $(t)8(&-1)1 (47)

where K(t) is a recursive coefficient. Using the method of Recursive Least Squares, both OUR and K,, can be estimated.

Acknowlcdgmcnl - This work has been partially supported by the CICTY Spanish project Rob 91/1139. C.S. Fu acknowledges financial support from a Spanish grant (SB-90/437).

REFERENCES

Bernhard. S.; Georg. L. Pattern recognition for bioprocess con- trol. In: Verbmggen, H.B.; Rodd, M.G.. eds. Proc. 1992 IFAC/ IFIP/IMACS international symposium on Artificial Intelligence

in Real-Time Control, IFAC symposia series, 6: 359-361. New York, NY: Pergamon Press; 1993.

Bhat, N.; Mcavoy, T.J. Use of neural nets for dynamic modelling and control of chemical process systems. Comput. Chem. Eng. 14: 573-583; 1990.

Burr, D.J. Experiments on neural net recognition of spoken and written text, IEEE Trans. Acoust. Speech Signal Process. 36: 1162-l 168; 1988.

Chen, S.Q.; We, H.J. Pattern recognition. Theory and its applica- tion. Chengdu, China: Chengdu Telecommunication Engineer- ing Institute Press; 1986: 75-93.

Fukunaga, K. Introduction to statistical pattern recognition, New York, NY: Academic Press; 1972: 84-125.

Gorman, R.P.; Sejnowski, T.J. Learned classification of sonar targets using a massively parallel network, IEEE Trans. Acoust. Speech Signal Process. 36: 11351140; 1988.

Guangcheng, Xi. A tentative investigation of the learning process of a neural network system. Acta Autom. Sin. 17: 311-316; 1991.

Hopfield, J.J. Neural networks and physical systems with emer- gent collective computational abilities. Proc. Nat. Acad. Sci., Washington, DC; 79: 2554-2558; 1982.

Hopfield, J.J.; Tank, D.W. Neural computation of decisions in optimization problems. Biol. Cybernetics 52: 141-152; 1985.

Hoskins, J.C.; Himmelblau, D.M. Artificial neural network models of knowledge representation in chemical engineering. Comput. Chem. Eng. 12: 881-890; 1988.

Kaoru, 0. Analysis of the state characteristics of sake brewing with a neural network. J. Ferment. Bioeng. 73: 153-158; 1992.

Lessard, Paul; Beck, M.B. Dynamic modeling of wastewater treat- ment processes-its current status. Environ. Sci. Technol. 25: 30-39; 1991.

Massimo, C.D. Bioprocess model building using artificial neural networks. Bioprocess Eng. 7: 77-82; 1991.

Rauch, H.; Winarske, T. Neural networks for routing communica- tion traffic. IEEE Control Syst. Mag. 8: 26-30; 1988.

Schurmann, J. A multifont word recognition system for postal address reading. IEEE Trans. Acoust. Speech Signal Process. 28: 722-732; 1978.

Serra, P.; Manel, P. Development of a real-time expert system for wastewater treatment plants control. In: Verbmggett, H.B.; Rodd, M.G., eds. In: Proc. 1992 IFAC/IFIP/IMACS intema- tional symposium on Artificial Intelligence in Real-Time Con- trol; IFAC symposia series; 6.: 347-350; New York, NY: Pergamon Press; 1993.

Tank, D.W.; Hopfield, J.J. Simple ‘neural’ optimization networks: An A/D converter, signal decision circuit, and linear program- ming circuit. IEEE Trans. Syst. Man. Cybernetics CAS 33: 533-541; 1986.

Ungar, L.H. Adaptive networks for fault diagnosis and process control. Comput. Chem. Eng. 14: 561-572; 1990.

Widrow, B; Winter, R.G.; Baxter, R.A. Layered neural nets for pattern recognition. IEEE Trans. Acoust. Speech Signal Process. 36: 1109-l 118; 1988.