The start and early history of chemometrics: Selected interviews. Part 1

18
JOURNAL OF CHEMOMETRICS, VOL. 4, 337-354 (1990) THE START AND EARLY HISTORY OF CHEMOMETRICS: SELECTED INTERVIEWS. PART 1 PAUL GELADI Research Group for Chemomerrics, University of Umei, S-90187 Umei, Sweden AND KIM ESBENSEN Norwegian Computer Center, PB 114 Blindern, N-0314 Oslo 3, Norway Part 1. Interviews (J. Chemometrics, 4, 337 (1990)) Part 2. Discussion (J. Chemometrics, 4, 389 (1990)) SUMMARY This is a first foray into the historical start and early years of chemometrics from about 1972 onwards. We have gathered interviews with three originators (Kowalski, Wold and Massart) as well as with a selected group of six other well-known chemometricians who gradually became active in the 1970s (Christie, Clementi, Hopke, Martens, Brown and Deming). The interviews include amongst a host of subjective recollections a succinct record of the key historical literature as highlighted by the interviewees’ own rankings of ‘earliest’ and ‘best’. A discussion of the most general commonalities in these interviews together with other historical material is presented in the second part of the paper. KEY WORDS Chemometrics Chemometrics Society History of chemometrics Pioneers of modern chemometrics INTRODUCTION Studying the start and early years of a specific branch or discipline of science is almost always very difficult. The older the subject, the more difficult it is to find facts and to check them. Even though chemometrics as a specific part of chemistry is not very old - less than some 20 years - it already deserves a first review into the past activities. This is especially so because today’s chemometrics is based on a large number of application successes and has had a significant impact on academic, industrial and educational environments in chemistry. Indeed, chemometrics is so young that almost everybody involved in its start and early development is still very much active. It was considered natural to start this endeavour by collecting historical documentations from some of the instrumental persons, by interviewing them on their subjective views on, and early years in, chemometrics. All were - needless to say - very willing to contribute. The first part of the paper contains inverviews with three personalities who can be considered as true originators of modern chemometrics: Bruce Kowalski, Luc Massart and Svante Wold. 0886-9383/90/050337-18$09.00 0 1990 by John Wiley & Sons, Ltd. Received 3 May 1990 Accepted I1 May 1990

Transcript of The start and early history of chemometrics: Selected interviews. Part 1

JOURNAL OF CHEMOMETRICS, VOL. 4, 337-354 (1990)

THE START AND EARLY HISTORY OF CHEMOMETRICS: SELECTED INTERVIEWS. PART 1

PAUL GELADI Research Group for Chemomerrics, University of Umei, S-90187 Umei, Sweden

AND KIM ESBENSEN

Norwegian Computer Center, PB 114 Blindern, N-0314 Oslo 3, Norway

Part 1. Interviews (J. Chemometrics, 4, 337 (1990)) Part 2. Discussion (J . Chemometrics, 4, 389 (1990))

SUMMARY

This is a first foray into the historical start and early years of chemometrics from about 1972 onwards. We have gathered interviews with three originators (Kowalski, Wold and Massart) as well as with a selected group of six other well-known chemometricians who gradually became active in the 1970s (Christie, Clementi, Hopke, Martens, Brown and Deming). The interviews include amongst a host of subjective recollections a succinct record of the key historical literature as highlighted by the interviewees’ own rankings of ‘earliest’ and ‘best’.

A discussion of the most general commonalities in these interviews together with other historical material is presented in the second part of the paper.

K E Y WORDS Chemometrics Chemometrics Society History of chemometrics Pioneers of modern chemometrics

INTRODUCTION

Studying the start and early years of a specific branch or discipline of science is almost always very difficult. The older the subject, the more difficult it is to find facts and to check them. Even though chemometrics as a specific part of chemistry is not very old - less than some 20 years - it already deserves a first review into the past activities. This is especially so because today’s chemometrics is based on a large number of application successes and has had a significant impact on academic, industrial and educational environments in chemistry. Indeed, chemometrics is so young that almost everybody involved in its start and early development is still very much active. It was considered natural to start this endeavour by collecting historical documentations from some of the instrumental persons, by interviewing them on their subjective views on, and early years in, chemometrics. All were - needless to say - very willing to contribute.

The first part of the paper contains inverviews with three personalities who can be considered as true originators of modern chemometrics: Bruce Kowalski, Luc Massart and Svante Wold.

0886-9383/90/050337-18$09.00 0 1990 by John Wiley & Sons, Ltd.

Received 3 May 1990 Accepted I1 May 1990

338 P. GELADI AND K. ESBENSEN

The questions in the interviews below are centred on three main themes:

(1) the early history and nature of chemometrics (2) the interviewee’s personal contributions to chemometrics (3) biography (e.g. non-chemometric activities).

It is of course never possible to ask a completely relevant set of questions giving ultimate and sufficient answers - especially not for amateurs in the journalistic trade. The results will show how close we managed to get. The interview subjects have all had a second chance to edit their own first interview, so they agree on the contents and are responsible for their own views. In this editing round the interviewees were also asked to highlight all specific literature references that were deemed of representative, historical interest. Thus the accompanying reference list contains a subjective selection of literature pertaining to the start and early development history of chemometrics. This literature compilation contains a number of real gems, not just of historical interest, many of which may not have been properly presented for some newcomers to chemometrics.

The three principal interviews are complemented by six ‘second-generation’ interviews which appear in an appendix, though there is nothing in these interviews that is of ‘appendix nature’ whatsoever. These interviews include: Olav Christie, Sergio Clementi, Philip Hopke, Harald Martens, Steven Brown and Stan Deming.

INTERVIEWS

These three interviews were recorded at the EUCHEM Conference on ‘Chemometrics in Organic and Bio-organic Chemistry’ in Trieste, Italy, 19-26 June 1988. They are cited almost 100% verbatim from transcripts of the tape-recorded originals

Interview with Bruce Kowalski

Q: How did chemometrics start?

BK: Should we start from the first question and go down? Well I think your questions are excellent. I’ve been thinking a long time about this. Obviously the question occurred to me: when did chemometrics start? I think it’s almost impossible to figure that out. It is almost a choice as if a group of people decided when it started. I can honestly say I can’t see any one event at one particular time. A very important thing was the Chemometrics Society, so in a sense I would say that when chemometrics as a field really got started would be 10 June 1974. We decided to form the Chemometrics Society and it was going to be informational. At that time we (Svante and I) had put enough thought into it independently that we knew we needed something and we knew what it should be doing. But before that, from my perspective in analytical chemistry I know my work in using multivariate methods and analytical data went back to the mid 1960s, the early learning machines and things like that. 1 - 5 It was not only us at the University of Washington - Tom Isenhour, Peter Jurs and myself - there was other work in Australia,6 but it wasn’t really a field. Before that you can trace back to people using least squares and MLR. If we picked any of those things as beginnings they would be weak beginnings, they would not be strong beginnings.

What prompted the inception of chemometrics was I think the need to get more information from vast amounts of data generated by computerized instruments. And this business about what I always say: data leads to information leads to knowledge. Only in my case

EARLY HISTORY OF CHEMOMETRICS. PART 1 339

measurements are the data. People like myself, Svante and others were just sort of tired of seeing all those numbers floating around and started scratching our heads and saying: there must be some way to pull information out of all these numbers. When I was at graduate school we were already interfacing PDP8s to instruments and collecting data - thousands and thousands of numbers - and we were all fully realizing that it was impossible to get the information from this data so we said: well, what’s going to give us information and we said mathematics and there is a lot out there that we don’t know about. That really was a driving force.

Q: If anything else was important besides instruments, what would that be?

BK: Well it would probably be a mathematical inclination that some people like Svante Wold and I have. We are both mathematically inclined, he more from statistics and me more from mathematics. I had a B.Sc. in mathematics and therefore I was looking for something in that and I think other people were the same way - a reason or a driving force to use some of that.

My first paper, chemometrics paper, again I can trace back. I said the learning machine papers were there but I think it was a paper on archaeological artefacts’ or the two papers that we published in 1972-1973 in JACS (the Journal of the American Chemical Society).s99 They turn out to have a lot of references and were quoted often, so when people say pattern recognition in chemistry they often reference those papers. While I’ve published a lot of others, those are the ones that people have recognized. I think I published pattern recognition in NMR in the Journal of Physical Chemistry long before that. lo,ll I am not answering your question like: this is it. I can give you all those papers if you want.

Q: Now we also would like you to evaluate, what do you consider your best paper to date?

BK: It’s a great question, In preparation - how’s that for an answer? Frankly I think, honest to God, the paper that I am starting to write now is the best thing I’ve ever worked on. Backing off that, when you say best, I would say which one made the biggest difference? Which one do I think has the potential of making the biggest difference? The two papers in JACS.8*9 The first one really. You might say that was the best paper because it had a big effect. My favorite more recently would be the GRAM paper with Sanchez in 1985-1986. l2 Related to this work are References 13-15. For me it opened up the second-order world and was a direct solution to a problem rather than an iteration and I think it leads to a lot of things. The GRAM paper is my biggest contribution, but it’s a potential contribution. No-one is running around using it. There were a lot of other ones in the past but I think for anybody, when they think about their career, it boils down to just a couple of publications that were pivotal.

Q: You mentioned: the one that I’m writing right now.

BK: To me that’s it for sure. I think it’s going to change the field of experimental design, but we are still working out some problems.

Q: Lets go over to Arthur now. The history of Arthur.

BK: Oh! Arthur, yeah. I am not even sure. We started Arthur in 1971-1972. No. Tracing back. The first pattern recognition system that I know about was at Shell Development Company and I can’t say anything about that. It was a collection of a couple of methods that were out at the time and when I went to Lawrence Livermore Laboratory, which is in 1971, I worked with Charlie Bender who was a theoretical chemist there and a really fantastic

340 P. GELADI AND K. ESBENSEN

programmer, a really good mathematician. Charlie and I recognized that we had to have some software so Charlie began writing a program called RECOG and it had a lot of good stuff in it. He could write a method overnight, I mean in an hour. I mean this guy was the most brilliant programmer I have ever run into. So we developed that system. Then when I left Lawrence Livermore I couldn’t take it with me. I went to Colorado State University and started to write, right away, and we had the old cards in. I forgot what computer they had there. When I left there less than two years later I had two boxes of cards, 4000 statements and that was the beginning Arthur. It had non-linear mapping, hierarchical clustering, it could write the hierarchical dendrogram and it had a linear learning machine, MLR essentially, things like that, scaling, the basic things you see. It had a clustering method. Oh, and it had a minimal spanning tree which was interesting. When I went to Washington - this is December 1973, the first part of 1974 - obviously my students started to use that one. I had a couple of students in Colorado State University and I had more students and Dave Deuwer at Washington and they started to use it and these guys were real programmers, much better than I was. It didn’t take them too long to throw the whole two boxes of cards away and start all over again. Dave Duewer was the guy who really wrote Arthur and Jim Koskinen who was a post-doc was the real architect of the program itself. He was an organic chemist Ph.D., post doc, good programmer; he is a professional programmer today; he really knew about how to write programs correctly and Dave Duewer was a real good programmer. Jim built the structure and Dave started working on it. Other people started contributing modules to it and Dave rewrote some of the ones I had; the hierarchical clustering is pretty much unchanged. And it grew and grew and Dave is an unusual character in a positive sense. At one point he had something like a 50-hour day and a six-hour night. He was just experimenting at the same time and he would write code like crazy. As you know it’s gone through a number of modifications.

That’s it. It was just an expression of our needing a program to work with the group, something that we could contribute out there. I remember there were two other big software projects at that time. They were not in pattern recognition but in other fields of chemistry. We went to a Gordon Conference one time and they were saying: how can we get hold of these programs? And these architects and authors of these programs said: no, no, you can’t use these programs they are too difficult and everything else. At that time we were giving Arthur away on a tape for $35. Later on I was informed by the university that $35 didn’t cover all the expenses: mailing, tape, the manual, etc. We were losing 50 bucks a shot. Our philosophy was: give it away. These other philosophies were: keep it. This last one prevails nowadays.

Q: Does it still have a function now?

BK: Actually, yes it does. Infometrix really did a lot of work on it. It was never to be meant like: here is the box, the black box, plug it in; it was always the expert’s tool. The expert would know if he wanted to use LLM or if he didn’t want to, or to test his new method against it. It was there for them. When we put methods in there, we were not advising people to use the methods. It was a bag of all tricks. It was a research tool. In that sense, people use it today. It is still viable. There are other packages that you could use obviously. Better or worse, I don’t know. You’ve got to be a huge programmer, that’s for sure.

Q: Can you give us some thoughts on our field chemometrics and its relation to other fields?

BK: All right, formally I don’t know of any connections. Otherwise, conceptually it’s the same. I feel that biology has worse data than we have in chemistry, but it’s better than in

EARLY HISTORY OF CHEMOMETRICS. PART 1 341

psychology. So psychology has to work very hard, therefore they have psychometrics, biology obviously also, they have biometrics and we have chemometrics. Why we developed so late is interesting. We have been a science to rely on our theory and the foundation of chemistry is in its theory. In biology and psychometrics the theories are more qualitative, so they have to work harder. We are sort of following them on the heels, we’re a little late. Physics-ometry, physiometry or whatever you want to call it will probably never exist because physicists can’t leave theory. So that’s it. It’s a natural extension of the metrics idea in chemistry. Svante Wold who coined the term chemometrics, coming from statistics, was obviously reading biometrics literature. That is an obvious connection. Psychometrics, he read Biometrika and the Journal of Biornetrics a long time ago. So one can say that chemometrics follows naturally from other metrics. Remember that what we call PLS was described by P. Horst in 1961 in Psychornetrika, l6 where Horst himself claims that these kinds of problems and their solutions were already known to Hotelling. l7

Q: What are your professional activities besides chemometrics?

BK: Analytical instrumentation, remote sensing, process modeling, chemical sensors in process analysis and control. I have served as co-director for the Center for Process Analytical Chemistry (an NSF university/industry co-operative research center) since its inception in 1984.

Q: What are your spare-time activities?

BK: Horse-breeding and riding, sailing (cruising and racing), hiking and backpacking. Harley- Davidison motorcycle, skiing.

Q: What are your favorite sports activities?

BK: Dressage (horse-riding) and sailing.

Q: OK, here’s a quick one. Besides the two journals. where do you look for chemometrics papers?

BK: Analytica Chimica Acta, Analytical Chemistry, Applied Spectroscopy - there’s some very nice stuff there by spectroscopists - mathematicians are starting to do some real nice stuff, and Technometrics, and beyond Technometrics just a whole bunch of other journals. Wherever the literature takes us. It is almost like an infinite resource out there. More recently though, mathematics journals, statistics journals. I think we are pushing mathematics. That’s part of my talk today. They want to know about trends. You will get that in my talk,” but one of the trends is that in the past we’ve been acceptors of statistics and mathematics and now I know in my own research group we’re pushing math and statistics. There’s math out there that we need to have that isn’t in existence so either we have to invent it ourselves (equations and mathematical tools) or attract mathematicians to those areas.

Q: Earlier today we had a discussion about the theoretical strength of chemometrics ... and the reasons behind it. You may just want to summarize here.

BK: This may be really off the wall so I wouldn’t want to buy stock in all these ideas, but if you looked at let’s say: there are some strongholds in the U.S. universities or so where chemometrics isn’t even thought of, and Germany, e.g. West Germany. There is not a lot of activity, but they are both big chemistry countries. In chemistry the foundation is theory. Ab

342 P. GELADI AND K. ESBENSEN

initio quantum mechanics is at the high end of the pecking order, molecular dynamics, etc. That is why there has been a resistance to, ignorance of chemometrics in those countries. The other countries I think are a little more open to things. Where the foundation and the power structure exists of theoretical physical chemistry, for example, it is going to be difficult to talk about soft modeling. That’s why analytical chemistry came up with this. Analytical chemistry has the data and has the need.

Q: Good, that’s what we want to hear. Could you list five methodological topics that are the most important for you in your work in chemometrics?

BK: OK. Linear algebra certainly is the most important tool along with multivariate statistics. They are so connected, I don’t know how to separate them. Computational statistics, also numerical analysis, tensor analysis is also becoming more important for me now. What else? Some optimization and control theory because of other interests that I have: CPAC.’9-22 Those are the main ones. Linear algebra is indispensible, leading toward tensor analysis and also other areas of mathematics, but I don’t know what they are.

Q: Happy hunting. Name three applications that are the most important for you.

BK: Analytical calibration of any kind of analytical instrument is important to me, but more particular multivariate analysis. That’s important, from spectroscopy to arrays of sensors. Another one that is becoming important is in chemical process monitoring and control. And there we are reaching out, holding hands with chemical engineers. What I mean by that is industrial manufacturing, biotechnology (reactions, not simple nitration of ethylbenzoate), although that is not always simple, but things beyond that, where the reaction or the process that we are working on is not that simple and there may be some unknown aspects to it. That is the second application. The third one that I have done a little bit with and would like to do more with is environmental chemistry. I am interested in every kind of application.

Q: Which topics were left behind in the first 15 years of chemometrics?

BK: If there were I’d be working on them. I still think structure-activity correlations is somewhat left behind, I think it is still pretty much ignored. It is not that the modeling methods are not powerful, it’s the data that’s going into it. I don’t have too much enlightening to say about that. There have been so many shotgun application papers in the literature about, take classification methods in food science, geochemistry, so pretty much everything has been covered.

Q: A little bit about chemometrics education. You have teaching experience in universities and outside.

BK: Actually, I have teaching experience in chemometrics, but I wouldn’t say anything about a lot of educational experience. I’ve been doing more research, including graduate education. In the future I’m getting more teaching assignments. I’ve always been interested in undergraduate education. A sincere and authentic interest rather than just saying that, but honestly, I have not been doing much of it. I have a real interest in getting back into it and I don’t know when that is going to occur. Next year, maybe five years from now.

Q: What would you consider the best place for a chemometrics course in a curriculum?

BK: I would say, before the students do any experimentation they need courses in

EARLY HISTORY OF CHEMOMETRICS. PART 1 343

experimental design. I can just tell you a story about this. Our university, the Chemistry Department, came up with a course called computers in chemistry, a lot of schools do this. We also have a full graduate course, in the study catalog, called chemometrics. So the University of Washington has got the premiere course I would say. But they came up with this course for the seniors and undergraduates. It was a good one about AID conversion, etc. They asked me to give some lectures on chemometrics and I did for a couple of years, six lectures or something like that. I picked in the grab-bag for some methods and talked about them. Introduction to experimental design, something with multivariate, PCA, etc. and then simplex because that is a cute method. They didn’t know that this was just the tip of the iceberg. A senior-level student was taking the course, it’s a fourth-year course, and he came up to me, two weeks before the school gear is over - this guy was going to graduate in two weeks after I gave my course of full and fractional factorial designs - he came up to me and he was upset. He said: I’m upset, because I’m a senior graduating with a B.Sc. in chemistry, and two weeks before graduating I am told that there are optimal ways to do experiments to get the maximum amount of information with the least amount of work. He said: what horrifies me is that I almost didn’t get this. This is a new course, and you are sort of an add-on. So my conclusion is that we should have experimental design and ways to handle uncertainty at an early stage and across the science and engineering curriculum. I intend to have my own contribution towards this, plus graduate-level courses. They are very popular at the University of Washington.

Q: The three last questions can be answered very quickly. Do both of the journals have a future?

BK: Yes. I think they both have a future.

Q: Your topic of today: the future of chemometrics.

BK: How far?

Q: Year 2000.

BK: The future to the year 2000 or so. I feel fairly certain that chemometrics will be a strong and valued part of the field of chemistry. Both strong and valued. I do not think that it will be enormous in size. It will not be the size of analytical chemistry. It will be much more modest in number of people and activities. It will find its rightful place as chemists doing chemistry. Chemometricians are chemists. Therefore chemometrics does chemistry, we’ll play around with methods, we’ll be pushing mathematicians to come up with new mathematics, pushing statisticians to come up with better cross-validations, etc. and more of a foundation to all the crazy methods that we are using today. Just two words: strong and valued.

Q: If you had to choose a special property, characteristic that is typical for a chemometrician, what would you choose?

BK: An interest in both chemistry and mathematics.

Interview with Luc Massart

Q: So I will ask you the question: how did you get involved in chemometrics?

344 P. GELADI AND K. ESBENSEN

LM: More or less by accident. The person who recruited me is Auke Dijkstra. I went to a meeting in Heidelberg. I thing it was a EUROANALYSIS meeting but I’m not too sure about that; anyway, it was a meeting of analytical chemistry in, I think, 1972 and I went there with the graph method,23 the one where you try to separate ions using the shortest path through the graph. And Dijkstra came to me and he said he was generally interested in formal methods applied in analytical chemistry, specifically analytical chemistry, and the collaboration started. One of the first products of that I think was about information theory and about clustering related to finding amount of information. There are a few papers on that in analytical chemistry - about information content when you have one, two, three, four, etc. gas chromatographic columns.

I think this happened before I heard the word chemometrics. The word chemometrics was introduced to me by a letter if I remember well from either Svante Wold or Bruce Kowalski or both of them, asking me I think to become a member of the Chemometrics Society. And they did that after they saw a paper by Kaufmann and me in Analytical Chemistry - a paper about I think operations research in analytical chemistry,24 something like that. That’s my own history. I entered chemometrics not by design, but because I had a problem which I wanted to solve and I was interested in methods.

Q: On a slightly more general level - when do you consider that chemometrics started?

LM: I think that it depends on how you define chemometrics. But I think that an article which Bruce Kowalski published about using principal components on archeometric data’ is the oldest one I remember and which I would associate with modern chemometrical movement, and there have been papers here and there in the literature much before that. For instance there are very good papers about univariate regression and confidence limits and all kinds of different regression published by Box, Hunter, etc. which in retrospect are chemometrics papers. But my starting point I would put in Bruce’s article I think in the Journal of the American Chemical Society. ’ Q: Now this is a deep one, next question. What do you consider prompted the inception of chemometrics?

LM: I think that many of the ideas which are present in chemometrics have existed in analytical chemistry since the start of analytical chemistry, but that two factors have gone together which made possible and made necessary the more systematic views of mathematical methods. Those factors are computers, which made it possible, and multivariate analysis, which made it necessary. More or less that. I am always talking about analytical chemistry, I leave out the rest.

Q: 1 think we have asked you the next two questions: what do you consider to be the first chemometrics paper and what is your first chemometrics paper?

LM: I have answered that too. It’s the paper about the network and it appeared in Analytical Chemistry. I think it’s 1972.23

Q: So what do you consider your best chemometrics paper to date, if in any way you can grade it?

LM: That’s very difficult. I can easily say which is the paper which I like which has been used less, which has been most underused. I know with certainty. It’s a paper which appeared in

EARLY HISTORY OF CHEMOMETRICS. PART 1 345

the Journal of Chr~matography~~ and which was about how to use clustering to determine what I hear now are called markers, which were then called probes, to try and predict gas chromatographic data for other substances. And that I thought was a good paper but has been cited very little so I can answer that easily. For the rest I think that the best thing I have written is really my book.26

Q: We can go on about this book. This is the first monograph on chemometrics that ever appeared. Can you tell us something on how it came about? How you thought and how you organized it?

LM: How it came about is mostly a question of personal ambitions and attitudes. First thing I wanted to do when I became a scientist was sometime, someday to write a book. So I took the earliest opportunity and in fact I went to Elsevier (who have played an important role in chemometrics, I think they should be mentioned somewhere). I went to them by accident because my father (my father has been rector of a university, and quite an important personality) was a member of the administration council or advisory council or something like that of Elsevier. He knew somebody and I contacted that somebody who is called Mr. Atkins and who has long been responsible for Analytica Chimica Acta. So that was how it happened practically and I think that I did it with Dijkstra mainly, there was a third author, Kaufmann, but really the main work was done by Dijkstra and myself. Kaufmann who is a statistician made sure that we didn’t make too many errors in the equations and notation. But really it was Dijkstra and me and I think more or less that it must have originated because we had already given schools about chemometrics, not called chemometrics at that time but something with the word automation in it. Working together and also putting a syllabus together gave us the idea of writing a book about it. That must more or less have been how it happened.

Q: Now here is an easy one: in which journals do you normally look for or find chemometrics papers?

LM: Not counting the two journals. Analytical Chemistry, Analytica Chimica Acta and then also and more specifically in my field there have been a few good papers in the Journal of Chromatography, but also in a journal which is called the Journal of Chromatographic Science. 26 This is a more theoretical journal about chromatography and which even now often carries good theoretical chemometrical papers. I would say that those were the ones that I looked at most.

Q: To carry on more on that subject but in a broader sense. What do you consider the relationship between chemometrics and the other metrics disciplines psychometrics, biometries, technometrics? Similarities, dissimilarities or other relationships.

LM: It is clear that the ones you mention are older than chemometrics and some of us have known a few things about these fields and have extracted chemometrics from those fields. Other have reinvented methods which have existed in these fields. I for instance started with clustering after hearing a lecture by a bacteriologist who was talking about numerical taxonomy and this introduced me to the book by Sneath and Sokal. 28 I have done a lot about clustering and that was really the origin. For many of us there must have been unknown origins in those three. Other metrics are now starting: Danny Coomans has tried to put up a project about dentometrics, I am writing a project (for financial reasons) which is called toxicometrics, other people will use these words too. So chemometrics will have been intermediary between those and metrics in more special fields.

346 P. GELADI AND K . ESBENSEN

Q: Now something else that is rather characteristic for the way that chemometrics is practised. There is a distinct geographical spread. What [s your opinion on this geographical spread? Why did this develop the way it did and what do you think it represents?

LM: It had to with people. I am not too sure about the American situation but it is clear that two strong poles have existed (I am saying poles have existed, it is equalizing now) in Europe and they have been Scandinavia, and that surely has been an effect of Wold, and a heavy nucleus in the Dutch-speaking world which is Flanders plus The Netherlands. That has been the effect of first Dijkstra and I think myself, but also afterwards of people who have become extremely good chemometricians, e.g. the Vandeginste-Kateman team. Now you see a new nucleus springing up in Italy. That clearly is again the effect of a few people, probably Forina and Clementi. So this geographical location has to do with people according to me but eventually all this will merge. I would like to make one remark. There are also some blank spots on the geographic map, e.g. Germany, they do not have a chemometrical school. If the Germans want to have a chemometrics school, they turn to the nucleus in Austria which was mainly started by myself and Wegscheider and Reich. But for the rest they have a blank spot there.

Q: Is there a historical or philosophical reason for that, that blank spot? Is it something in their education or in their philosophy about science or is it just by accident?

LM: I think it is by accident because there was no particular person who was at the same time a good chemometrician and able to explain it well.

Q: Now out of the top of your head, can you give us five methodological topics that are the most important for you in chemometrics.

LM: For me and my work I would say: not in order of priority but the order we touched upon them. Operations research and decision theory as a first topic. Then information theory. Then clustering and derived from that supervised pattern recognition and then, now, artificial intelligence. But generally, if I have to teach chemometrics, if a company asks me and tells me to come one day and talk about chemometrics then the accents I put are on experimental design and on principal components. These are really the two absolute priorities when you teach chemometrics and also the most important methods. Of course there is a lot derived from each of them and hanging onto them but these are the two main methods.

Q: Which topics were left behind by chemometrics in the first 15 years. Are there any blank spots in the methodological apparatus?

LM: That’s a bit difficult. One field where chemometrics could have been important but where it is not used as much as it should be is method validation. There has been a lot published about how to use the best model for regression or calibration, for example, but the impact of chemometrics on this field, which is very suited for it, has been too low.

Q: How can this topic be entered in chemometrics?

LM: By having chemometricians look at it and find it amusing to look at it and having them give lectures about it to people who want to use it. To have chemometricians talk to people who apply method validation and explain that there are a few things they know which could be useful.

EARLY HISTORY OF CHEMOMETRICS. PART 1 347

Q: I would like to ask a question about computing, since computing is very important in chemometrics. Could you tell us something about your own computing history, how you came in contact with computers and how it evolved over time?

LM: My very first contact was in 1962 but I was not allowed to touch the computer myself. There is a technique called gradient elution in chromatography and I needed a model. This model is not difficult. There are one or two exponents in it, but I needed to compute it and I went to the computer center of my university. That was the very first time I have been in contact with it. And then the first application I did myself was a regression application. When I started working we still had mechanical calculators. So I had a regression problem and I had to regress: one of the things was a cross-section in activation analysis but I forget what the other was. When you apply least squares in a regression you can get into trouble with the number of significant digits. Even with a good calculator I didn’t have enough digits so I went again to the computer center and I wrote the first program, a regression analysis using double precision: 16 digits instead of eight.

Q: So I assume it was a Fortran program?

LM: It was a Fortran program.

Q: Have you been in contact with many languages over the years or did you stick to Fortran all the time?

LM: No, for teaching purposes I went over to basic when I started teaching in Brussels because it is easy to teach. At the moment I started teaching, Pascal was not yet known as a good language. Perhaps if I had to start over again I would have done it in Pascal. I have also learned a little bit of Prolog of course. Different applications such as dbase each have a language. You can program in dbase. I have learned several of those languages.

Q: How about the relation between the big computer and the small micromachine, single user? Can you say something about that?

LM: We tend to try to do something with microcomputers, for simple organizational reasons within my university. They have a computer center that is manned by people who think that the computer center is the center of the universe. Also, when we want to make our methods profitable to other people in the analytical world it means you have to do it on a micro. Otherwise it will not be interesting.

Q: Backing up a little. You have a tremendous experience teaching. What is your opinion on the optimal way of bringing chemometrics to the curricula at the universities?

LM: That’s a very difficult question because I have only been able to introduce chemometrics in the curriculum of my own university this year. I have not been allowed to teach chemometrics before that except as a small part of analytical chemistry. So I do not have experience on how to do that in the university. This has also to do with the fact that I teach pharmacists and medical people and I do not teach chemists. Only from this year on will I be allowed to teach chemometrics to chemists. So I have no experience on how to do it in a university. My experience is on how to do it in the world at large.

Q: Please tell us a little bit more about that.

348 P. GELADI AND K. ESBENSEN

LM: Analytical chemists are very practical people. What you must do is show that chemometrics may solve certain problems for them which otherwise may not be solved. You must not try to give them philosophical stories. You must not give them a lot of mathematics. Just start with applications. Show that it can work. One thing that also appeals to analytical chemists is that chemometrics can be seen as a kind of fundamental frame for analytical chemistry. It is really about the fundamentals: how do you extract information? In many countries analytical chemistry is viewed as a very applied, very untheoretical field and it pleases analytical professors or at least some of them to be able to have a theoretical background. Sometimes that is the way to introduce it to some people.

Q: In the end we would like to ask you a few questions about the future. Let’s start out with one that must be dear to you. Do both chemometrics journals have a future?

LM: Well, our journal is doing well. In fact we are now debating whether there should be three volumes instead of two. So we have enough papers coming in. I think the quality is generally acceptable. The number of issues sold is quite sufficient for Elsevier apparently. The only thing that worries me a bit is that there are not enough new authors coming in. It is turning a little bit into the same circle of people. Very clearly, that is true for the other journal too. The question that worries me most is: how are we going to find people from outside who will also publish in our journals? But at least the immediate future is no problem.

Q: So your worries would be getting the younger chemometricians to publish?

LM: Yes.

Q: And then the largest question of all: what is the future of chemometrics?

LM: That is difficult. Chemometrics became important due to two factors: the multivariate nature of things and computers. It is clear that we will always have better computer techniques and the multivariate possibilities are increasing not only in analytical chemistry but also outside. For example, the QSAR people who were ten years ago looking at log(P) and a few others now have broader ranges and are beginning to understand what multivariate means. So there certainly is a future and need for it, but where it is going to have its exit in ten years’ time, I am not going to offer a prognosis for that.

Interview with Svante Wold

Q: If you would like to answer the written questions and elaborate a little bit? When did chemometrics start?

SW: When did chemometrics start? As I see it, it must have been about 1920 with G ~ s s e t ~ ~ who invented the t-test. He was a chemist who worked at Guinness and started doing something with the statistical aspects of chemicaI processes. Then came Youden3’ and Box3’ and many others, but about 1920 I would say.

The first chemometrics paper is Gosset’s ‘Student’s t-test’.32 But chemometrics as it is now ... statistics handles a lot of chemical problems, Box and Youden, etc., but this didn’t enter chemistry until the 1970s. If we call that chemometrics then it would have been Malinowski, Kowalski, Eisenhour and Jurs at the end of the 1960s and that was in analytical chemistry.

EARLY HISTORY OF CHEMOMETRICS. PART I 349

Q: And your own first chemometrics paper?

SW: My first chemometrics paper, I think it was one of Michael (Sjostrom) and me about statistical analysis of the Hammett relation. 33

Q: What was your best chemometrics paper? SW: My best paper, I don’t know, but what I consider most relevant right now is multivariate design. The Lerici paper. 34

Q: What is the relation of chemometrics to other ‘metrics’?

SW: That is also interesting. Chemometrics has learned a lot from psychometrics. I think the strongest relation is to psychometrics. And then there is some amount of technometrics and qualimetrics, but psychometrics strongest. Biometrics almost nothing.

Q: What journals do you read?

SW: Well, the two chemometrics journals, Technometrics, Analytical Chemistry, QSAR and a little of everything.

Q: Do you have anything to say about the geographic spread of chemometrics ‘centers’?

SW: The geographic spread is extremely uneven but that is typical at the start of something. How it came about and what started it are interesting questions. There was originally an American school with Eisenhour, Jurs and Kowalski, but that one is almost dead now. Only Kowalski is still active in Seattle, but directed towards analytical chemistry. That is very specific. And then Scandinavia is well represented, also Belgium, Holland, and Italy and France a little bit, resulting from the early co-operation between different groups.

Q: Persons you mean?

SW: Persons, exactly.

Q: What are your other professional activities besides chemometrics?

SW: I am a professor and do a lot of administration. I am also a member of the NFR (Swedish National Natural Sciences Research Council).This generates a lot of administration, but it also gives me a good insight into what goes on in the chemical world in Sweden. Then I am a private consultant and part owner in three companies: Umetri AB (Umetrics, in UmeH), MDS (Multivariate Design Systems, in Boston) and ‘Gullsjo Sheep and Computing’ (Gullsjo, Vannas, Sweden). From the name of the last-mentioned company you can deduce that I am also a farmer.

Q: What are your spare-time activities?

SW: Family life, I have a wife and two children. Reading: novels and books about politics and economics. Eating and drinking (more than is good for me). Sports: skiing, jogging, walking. And on a lesser scale: music and hunting. And then of course farming, but I don’t know whether this is work or fun, maybe both.

Q: Are you interested in arts? Which forms?

350 P. GELADI AND K. ESBENSEN

SW: Classical music and paintings (impressionists and the like, some modern painters too).

Q: What are your favourite sports activities?

SW: Skiing, down-hill and cross-country, jogging, walking and on occasions tennis and golf.

Q: Let’s stop here for a while and talk about something that is not on the list. It is a question about something that is very important for chemometrics in many countries and that is the SIMCA program. Could you tell us something about the history of the SIMCA program, how you imagined it at first and how it has evolved in time?

SW: It started with that in 1966 - no, earlier in 1964 - I helped my father Herman to program principal components in Fortran. Then I was at Battelle one summer, 1965, and there we had a program that allowed us to calculate principal components, do simulation and other things. After a while we put in chemical data and out came interesting and to us meaningless results, and I thought a lot about that. But before 1970 or 1971 I didn’t understand what these principal components were. Then I realized that one could do pattern recognition which by then was interesting and good. I wrote a first version of SIMCA that I took to America in 1973 in the autumn, and I finished it when I was there, about October 1973. I took it to Bruce and showed him it. Dave Duewer took out parts and included them in Arthur. They got a very primitive variant of SIMCA. Then when I returned home, I rewrote my own stuff into a primitive version of SIMCA. That was SIMCA 2T, which existed in the period 1975-1980.

Then microcomputers came. The first version was SIMCA 1 which we didn’t sell to anybody. Of SIMCA 2T we may have sold 10-20. Then a microcomputer came along: the ABC80. I considered it very interesting to put in SIMCA there. It worked against everybody’s expectations. That made us give up the big computers completely. Then came ordinary 8-bit computers with CPM and Basic. We had a version ready in 1981 or 1982 that we called SIMCA 3B, and that one has been developed since then. We still have SIMCA 3B, even though it :s now much more developed. Afterwards it got a PLS module, in 1983. The last thing we put in was Bert’s (Skagerberg) program with color graphics. And now we have decided to start with a completely new version that we will call something different, but it will work on big and small computers. It will be written in C and be user-friendly. It will take at least a year.

Q: Response surfaces, etc.?

SW: We have that now in our ordinary SIMCA. One can add quadratic and cross-terms of variables in FVAR and run response surfaces. The one thing that we don’t have is 3D response surface plots. Bert (Skagerberg) and Jerker (Ohman) have all kinds of those, but they are not sold with SIMCA. CARS0 is the name of an old invention. It was G n a n a d e ~ i k a n ~ ~ who showed in 1977 that projection onto quadratic surfaces is possible and that it equals appending the data matrix with squared variables and cross-terms. There is also a big-computer version of SIMCA called SIMCA 3R, written by HAkan FridCn and working on VAX, IBM and Prime. That version will also be abandoned. That was a short history of SIMCA.

A fully story about the beginning, when I was visiting Bruce (I had only run a few examples, among them ketones) and I held a seminar, it was in the spring of 1974. Bruce had invited me to come there for a month. So the other day I held a seminar and said that SIMCA was much better than anything else, as you do when you have a seminar. Bruce found this slightly rude that I came to his institution and said so. He was a little bit upset. But the graduate students were very interested, especially Dave Duewer, Jim Koskinen and Alice Harper. We worked two

EARLY HISTORY OF CHEMOMETRICS. PART 1 35 1

days around the clock and were able to enter and test SIMCA in Arthur. Then we took two data sets, sets that they had worked on earlier, every day and tested them with K-nearest neighbours, linear learning machine, etc., and for all these data sets SIMCA showed itself as good as or better than the other methods. After six or seven data sets we all got very excited. And then when we came to data set 10 Bruce came in and said: ‘OK, this is fine’.

Q: That was classification?

SW: That was classification. I think that classification is important, because we partition the world in classes, but more interesting are quantitative models as in partial least squares (PLS). One should have homogeneous data sets to make things function. Therefore it is necessary that certain data sets are partitioned in classes. However, one should not stop there. One should continue, and the advantage with SIMCA is that the step between a SIMCA model and a PLS model is short. Scores, etc. look very much alike and one could even use the PLS model for classification. I am not very fond of attempts to optimize classification, that is not our goal. We want something easy that we can apply here and there. I am very pleased with SIMCA. The idea and the principle. 36

Q: Could you tell us which methodological topics you find important?

SW: What I consider most important is design and sampling; they come together and they come before anything else. They are the most neglected parts in chemistry and everywhere else. Chemometricians should spend 50% of their time on design and sampling. Then multivariate analysis; all chemical data is multivariate. Here we can partition in classification and modeling. Something that I don’t consider important is clustering. That is in some way only misleading for most persons. Of course there are sometimes clustering problems, but these can be handled very well with graphics and principal components. I believe very strongly in interactive data-analytical situations. I believe very little in automated activities. Expert systems is the wrong track as far as I am concerned. Experimental design, sampling, multivariate data analysis: that is three. Time series is essential, which can be seen as a variant of multivariate analysis. Relations between observations in one time point and those in another time point. If we take multivariate analysis, classification, quantitative modeling similar to PLS, response surface, and time series. That was the methodological part. Then there are the applications. That is something else.

What I am interested in is QSAR. That is complicated and interesting. Generally speaking, organic chemistry, QSAR and such things. It is very nice in a way that one can really test one’s predictions.

Q: Validation?

SW: Yes, validation too.

Q: What topics are the most neglected in chemometrics?

SW: Those are sampling and relations between chemometrical models and basic models; what I mean is fundamental models like the Gullberg-Waage equilibrium and exponential rate models. That is a psychological problem, to try to make the rest of the world use such things. Therefore we have to prove that it works there too. Show that they go over into each other and can be projected onto each other. There I consider things like non-linear PLS models important. There, traditional models can be entered in a multivariate way.

352 P. GELADI AND K . ESBENSEN

Q: In the X-Y relation?

SW: Exactly, in the X-Y relation. And one can slide between soft and hard. Sampling has been very much overlooked. It is sampling that gives the largest sources of

variability in analytical chemistry and there the least amount of money and energy is spent. Typical for academic activity.

Q: Is problem specification also among the neglected topics? You do emphasize design very much.

SW: I agree, but that is more philosophy than chemometrics. If we look at the development of science, we see that the first relevant thing we learned was to measure, somewhere at the beginning of the 1600s. It was Bacon in the 1400s but it took until the 1600s before it really took off ... with telescopes and thermometers, etc. After that we learned to analyze measured data, with Gauss and Laplace and others. In this century, the most important thing we have learned is design. How to set up an experiment. And in the next century, we may learn how to formulate a problem in a meaningful way. If you consider that as a part of chemometrics, then it is the most important. But I think it resides outside chemometrics. Problem formulation is important in all branches of science.

Q: Assuming that chemometrics started 15 years ago. What do you then see as the first paper representing the development?

SW: The first was Kowalski’s 1972 pattern recognition in the Journal of the American Chemical Society.’ That set the world on fire. That was before the Chemometrics Society started and even before the word chemometrics was invented. But then 1975 and later, if we look at design, it was Deming and Morgan and their articles on simplex optimization and design. Unluckily these articles have had too large an influence, so everything became simplex optimization. The important thing is that they used design instead of one variable at a time.

Q: And your own work, if we forget the latest developments?

SW: Probably the SIMCA article36 and the Oslo article3’ which contains an overview including the development towards PLS, etc. Those two.

Q: Chemometrics education. Do you have opinions about how to organize it at the universities, but also outside?

SW: I have got a lot of opinions about that one. It depends on how you see the role of chemistry in the world. Chemistry is part pure methodology but also part understanding of a given problem area, how to use it. Methodology can be taught in courses, common for all chemists. One can learn to carry out experiments, simple multivariate methods, regression and correlation. That should according to me be included in the start of all chemical education, so that students know how to plan an experiment, how to measure, etc. It will take 100 years before we are there.

Then there is a very important problem-specific part, e.g. QSAR, calibration in analytical chemistry, models in organic chemistry, how to model syntheses, how to model spectra. This should be taught together with organic or analytical chemistry as part of the curriculum. Then we have the question of what computer software is used in education. We are facing a new phase in software development. The programs that we have now are the end-products of a ten-

EARLY HISTORY OF CHEMOMETRICS. PART 1 353

year period that started when we got the first microcomputers. The graphical possibilities were low during that period. Now we have advanced graphics capabilities and we can make real interactive systems with powerful machines. I hope that this will lead to programs that are not closed products but open-ended ones, such that the user can generate e.g. experimental designs. Press control-s and a vector with pluses and minuses shows up on the screen ... the user can combine these bits and pieces. According to Hunter (Stu) there should be a core that organizes the simplest designs from the beginning. From this one should be able to construct factorial designs, reduced factorials, composite designs, etc. And on the data analysis side one should be able to set up a PLS model and quickly calculate results. There we should have a core with principal components, two-block PLS, consensus PLS, regression, ... and all of that.

All of this should be supported by graphical representations as in image analysis, where one can look at many types of graphical information at the same time. Look at a lot of things at once, make a change, look at what happens by feedback, etc. There I think that what you (Esbensen and Geladi) have done is very important - looking at data in two ways at the same time. 38,39 This is so powerful that it has never been shown before; it is a whole new dimension. One should be able to formulate also non-image problems in that way: see context and similarity in the multivariate space at the same time and contrast them against each other. These things don’t exist today, but they will be important ten years from now.

Q: Two questions about the future. Do both journals have a future?

SW: Yes, if they don’t get too academic. Today it seems that Chemolab (Chemometrics and Intelligent Laboratory Systems) has had a better policy, a more casual style, more applications, more proceedings, announcements, course reports, . . . . Wiley (Journal of Chemometrics) is less casual, more academic, has stricter referees. If they go on like that I predict a hard future for them. On the other hand, if they become more open and require more simplicity and didactical qualities from their authors it will go well for them. But the potential literature volume is large. There is an infinite amount of stuff to write and report about.

Q: And a short finishing question. The future of chemometrics. How does it look up till the year 2000?

SW: That is problematic. In all areas similar to chemometrics there are two futures, a bright and a dark one. And as always they exist in parallel and we will experience both of them. One can already see the signs. One of them is what I call real chemometrics, with people in chemical institutions working on chemical problems, like Nils Vogt or Olav Kvalheim, you two (Geladi and Esbensen), ... . They develop methods that have to work and they find out what exists in an intelligent way. That is the right way for chemometrics. This way will continue. The incorrect way: academization, standardized methods, expert systems. Nowadays there is this dark trend; many weak souls are attached to it. That will also go on existing. It is the same in statistics. There we have presently a catastrophic emphasis on theory, but there are guys like Stu Hunter and George Box and some more who do the right things. This kind of dualism will always exist.

Q: You mean chemical practice is decisive?

SW: It is a healthier development. This does not mean that methodology should be forgotten. One has to understand what one does but it shouldn’t take over. Mathematics is a tool not a goal in itself.

354 P. GELADI AND K. ESBENSEN

Q: The whole thing centers around chemical practice?

SW: Right, otherwise it is no longer chemometrics. It will become statistics and then it is better to put it in a statistics institution.

Q: If you had to choose a special property, characteristic that is typical for chemometricians, what would you choose?

SW: They are pragmatic and they are beer drinkers.

The reference list will appear in Part 2.