Data mining in design of products and production systems

10
Data mining in design of products and production systems Andrew Kusiak * , Matthew Smith Intelligent Systems Laboratory, Industrial Engineering, 3131 Seamans Center, The University of Iowa, Iowa City, IA 52242-1527, USA Received 9 December 2006; accepted 5 March 2007 Available online 9 April 2007 Abstract Data mining is acquiring its own identity by refining concepts from other disciplines, developing generic algorithms, and entering new application areas. Engineering design and manufacturing have been affected by the data mining pursuit. This paper outlines areas of product and manufacturing system design that are particularly suitable for data-mining applications. One of the emerging areas is innovation. The key challenges of data mining in the domains discussed in the paper are outlined. # 2007 Elsevier Ltd. All rights reserved. Keywords: Data mining; Data analysis; Product design; Manufacturing; Innovation; Production systems 1. Introduction Corporations are interested in innovative ways of conducting their business. Some innovation can be attributed to the growing use of data in design and manufacturing. Traditionally, the flow of data and information in design and manufacturing systems has been essentially unidirectional as illustrated in Fig. 1. Any local bidirectional flow (loops) of information has often been attributed to imperfections of the process, e.g., design negotiation, manufacturing errors. The developments in networking, data warehousing, and data mining have con- tributed to the emergence of the closed loop system illustrated in Fig. 2. Products and components generate a data trail across life- cycle phases such as market analysis, design engineering, manufacturing, and service. Data-mining algorithms extract knowledge from this large volume of data leading to significant improvements in the next generation of products and services. In fact, the knowledge discovery activity could become the key factor to innovation and business success. The basic capabilities of data analysis tools are outlined next. 1.1. Pattern discovery Important patterns might be hidden in the industrial data. For example, data mining applied to the customer domain may reveal answers to questions such as What characterizes frequent buyers? What characterizes customers who react to promotions? What characterizes customers making quick purchase decisions? What characterizes customers who do not purchase? Most database systems, such as MS-Access and Oracle, provide some query capabilities providing answers to some of these higher-level questions. However, for in-depth analysis, data-mining algorithms are needed (Witten & Frank, 2005). 1.2. Trends detection Industrial companies are increasingly developing data warehouses to collect business data. Data-mining algorithms cannot only extract the static patterns in data, but can also discover dynamic trends. Mining time series is an active research area (Kusiak & Song, 2006). The trends reflect customer interest shifts, technology development, and the response to marketing strategies. www.elsevier.com/locate/arcontrol Annual Reviews in Control 31 (2007) 147–156 * Corresponding author. Tel.: +1 319 335 5934; fax: +1 319 335 5669. E-mail address: [email protected] (A. Kusiak). URL: http://www.icaen.uiowa.edu/ankusiak 1367-5788/$ – see front matter # 2007 Elsevier Ltd. All rights reserved. doi:10.1016/j.arcontrol.2007.03.003

Transcript of Data mining in design of products and production systems

www.elsevier.com/locate/arcontrol

Annual Reviews in Control 31 (2007) 147–156

Data mining in design of products and

production systems

Andrew Kusiak *, Matthew Smith

Intelligent Systems Laboratory, Industrial Engineering, 3131 Seamans Center,

The University of Iowa, Iowa City, IA 52242-1527, USA

Received 9 December 2006; accepted 5 March 2007

Available online 9 April 2007

Abstract

Data mining is acquiring its own identity by refining concepts from other disciplines, developing generic algorithms, and entering new

application areas. Engineering design and manufacturing have been affected by the data mining pursuit. This paper outlines areas of product and

manufacturing system design that are particularly suitable for data-mining applications. One of the emerging areas is innovation. The key

challenges of data mining in the domains discussed in the paper are outlined.

# 2007 Elsevier Ltd. All rights reserved.

Keywords: Data mining; Data analysis; Product design; Manufacturing; Innovation; Production systems

1. Introduction

Corporations are interested in innovative ways of conducting

their business. Some innovation can be attributed to the

growing use of data in design and manufacturing.

Traditionally, the flow of data and information in design and

manufacturing systems has been essentially unidirectional as

illustrated in Fig. 1.

Any local bidirectional flow (loops) of information has often

been attributed to imperfections of the process, e.g., design

negotiation, manufacturing errors. The developments in

networking, data warehousing, and data mining have con-

tributed to the emergence of the closed loop system illustrated

in Fig. 2.

Products and components generate a data trail across life-

cycle phases such as market analysis, design engineering,

manufacturing, and service. Data-mining algorithms extract

knowledge from this large volume of data leading to significant

improvements in the next generation of products and services.

In fact, the knowledge discovery activity could become the key

factor to innovation and business success.

The basic capabilities of data analysis tools are outlined

next.

* Corresponding author. Tel.: +1 319 335 5934; fax: +1 319 335 5669.

E-mail address: [email protected] (A. Kusiak).

URL: http://www.icaen.uiowa.edu/�ankusiak

1367-5788/$ – see front matter # 2007 Elsevier Ltd. All rights reserved.

doi:10.1016/j.arcontrol.2007.03.003

1.1. Pattern discovery

Important patterns might be hidden in the industrial data. For

example, data mining applied to the customer domain may

reveal answers to questions such as

� W

hat characterizes frequent buyers?

� W

hat characterizes customers who react to promotions?

� W

hat characterizes customers making quick purchase

decisions?

� W

hat characterizes customers who do not purchase?

Most database systems, such as MS-Access and Oracle,

provide some query capabilities providing answers to some

of these higher-level questions. However, for in-depth

analysis, data-mining algorithms are needed (Witten & Frank,

2005).

1.2. Trends detection

Industrial companies are increasingly developing data

warehouses to collect business data. Data-mining algorithms

cannot only extract the static patterns in data, but can also

discover dynamic trends. Mining time series is an active

research area (Kusiak & Song, 2006). The trends reflect

customer interest shifts, technology development, and the

response to marketing strategies.

Fig. 1. Data flow in traditional design and manufacturing systems.

A. Kusiak, M. Smith / Annual Reviews in Control 31 (2007) 147–156148

1.3. Data dimensionality reduction

Modern databases may contain large number of rows

(transactions) and columns (features). An important research

area is the concept of dimensionality reduction. Unrelated data

items and features can be eliminated from the dataset to reduce

the data-mining effort.

1.4. Visualization

Visualization tools enhance human understanding of data.

For example, graphs, charts, and tables make information easier

to understand than the original data. The relationships between

different data items become obvious when they are displayed.

To make full use of the data visualization tools, data and

knowledge are needed.

2. Knowledge discovery

There are two general classes of data mining, descriptive and

predictive. The goal of descriptive data mining is to discover

patterns, e.g., product configurations formed in mass customi-

zation applications. The predictive data mining aims at building

models to determine (predict) an outcome, e.g., a stock level.

Since the width of data analyzed by the data-mining algorithms

is essentially unlimited, the patterns discovered are usually not

anticipated and are of interest to different users. The value

Fig. 2. Data and knowledge flow in a modern design and manufacturing system.

delivered by these patterns is related to the quality of data and

textual databases. Besides the comprehensiveness of data

processing, data mining brings yet another advantage—it

supports the needs of an individual object, e.g., a part or a

customer.

Data-mining algorithms have been successfully deployed in

engineering, medical, and business applications (Da Cunha,

Agard, & Kusiak, 2006; Harding, Shahbaz, Srinivas, & Kusiak,

2006; Kusiak, 2006). The design and manufacturing domain is

a natural candidate for data-mining applications because it

contains extensive data. Besides enhancing innovation, data-

mining methods can reduce the risks associated with

conducting business and improve decision-making.

Some of the most widely used data-mining algorithms are

(Witten & Frank, 2005):

� D

ecision-tree algorithms.

� D

ecision-rule algorithms.

� B

ayesian algorithms.

� N

eural networks.

� C

lustering.

� R

egression.

The goal of data mining may range from obtaining a general

understanding of the nature of data to very accurate modeling

and prediction, e.g.:

� D

ata description and summarization. Description of data

characteristics, typically in elementary and aggregated form.

� S

egmentation. Separation of data into interesting and

meaningful subgroups or classes.

� C

oncept description. Description of concepts or classes in an

understandable form.

� D

ependency analysis. Finding a model that describes

significant dependencies between objects or events.

� C

lassification. Building classification models that assign a

correct class (label) to previously unseen and unlabeled

objects.

For data mining to be effective, several technologies have to

work together. Data-mining algorithms extract patterns from

data to create a meaning that otherwise would be non-existent.

Visualization techniques provide visual understanding of data,

rules, patterns and trends. Data warehousing is critical for

organizing, cleaning and preparing data for mining. The

computer network infrastructure is important, especially for

distributing data mining. These technologies need to be

integrated for effective data mining.

Some of the applications of data mining in design and

manufacturing are discussed in the next section of the paper.

3. Data-driven design

Engineering design has been lagging in the development of

data mining; however, the potential for benefits is significant.

Product complexity reduction and modularity are two of many

potential examples discussed next.

A. Kusiak, M. Smith / Annual Reviews in Control 31 (2007) 147–156 149

Increasing the modularity among products is a common goal

for many companies. Some of the benefits of modularity

include the potential for (Chandrasekaran, Stone, & McAdams,

2004; Kusiak & Huang, 1996):

� e

conomy of scale,

� in

creased feasibility of product-component change,

� in

creased product variety,

� r

educed order lead-time,

� d

ecoupled risks,

� e

asier product diagnosis, maintenance, repair, and disposal,

� p

art reduction,

� s

Fig. 3. Example of a product family.

Fig. 4. Example of modules constructed from lower-level modules.

implified design.

Modularity may be implemented at different levels such as

products, assemblies, or even components. It is accomplished by

combining functions into distinct building blocks or modules

(Pahl & Beitz, 1988). The modules themselves can be defined as

physical structures that have a one-to-one correspondence with

functional structures (Ulrich & Tung, 1991).

Three main types of modularity are discussed in the

literature: component swapping, bus modularity, and compo-

nent sharing (Ulrich & Tung, 1991). ‘‘Component swapping

can be considered as a means to improve the versatility of a

product by enabling different levels of performance and/or

different applications’’ (Chandrasekaran et al., 2004). Exam-

ples of this are installing winter tires in a car to improve

handling in icy conditions or switching the faceplate on a

cellular phone to change its appearance. Bus modularity is used

when a module with two or more interfaces can be matched

with any number of the components selected from a set of basic

components. The module interfaces accept any combination of

the basic components (Huang & Kusiak, 1998). Component-

sharing modularity is used when one component is designed for

usage across multiple products. An example of component

sharing can be seen in the Black and Decker1 VersaPakTM

portfolio of products. Each product in the portfolio shares the

same battery (Dahmus, Gonzalez-Zugasti, & Otto, 2001). A

consumer can then purchase one battery and share the battery

across the different products s/he owns.

The current literature focuses on designing modules for

finished products. Examples of this can be found with the

VersaPakTM portfolio of products (Dahmus et al., 2001),

general consumer products (Stone, Wood, & Crawford, 2000),

and automobiles (Dahmus et al., 2001). An area explored in this

paper identifies groups of components that can be used to build

modules. Using sub-modules offers the same benefits as

implementing modularity at the product level.

Identifying potential modules requires data related to

product functionality, part functionality, energy flow through

the product, and component interactions (Huang & Kusiak,

1998). In industrial applications, generating and gathering such

data is difficult. Much of the data needed may not be previously

available, and domain experts may be needed to generate this

data.

This paper presents modularity identification methods that

require limited information. The results produced are often in

the form of logical modules rather than physical modules. A

logical module is a collection of parts that are used across

numerous products. These parts may or may not have the ability

to form physical modules; however, there are many benefits of

identifying logical modules. For example, logical modules are

useful in supply chain management. By identifying parts that

are commonly used, demand uncertainty across different

products can be pooled together. This, in turn, offers the ability

to plan orders for parts and/or raw materials (Fig. 3).

This paper discusses component sharing modularity,

specifically in cases internal to the product (unlike the

VersaPakTM example).

3.1. Industrial case study

In this case study, industrial data obtained for a product

family was used. The product family consisted of ten products.

Each product contains a base which is unique to the specific

product. In addition each product includes a deck, frame,

operator station, and engine which are collectively referred to

as feature groups. Each feature group (deck, frame, operator

station, and engine) consists of n different subassemblies. Each

of these subassemblies is used by a subset of the ten products.

The example in Fig. 4 illustrates feature groups forming a

product. This product is of a modular nature at the feature-

group level. The case study is concerned with the modularity of

the subassemblies implemented at the feature-group level

illustrated in Fig. 4, where engine subassemblies 1, 2, and 3 are

used to construct engines A, B, and C.

In the case study, one data set was provided for each feature

group in the product family. Each data set contained a list of the

Table 1

Example of a part–subassembly incidence matrix

Part SA1 SA2 SA3 SA4 SA5

1 1 1 1 1

2 1 1 1

3 1 1 1

4 1 1 1

5 1 1 1

6 1 1 1

7 1 1 1

8 1 1 1

Table 3

A. Kusiak, M. Smith / Annual Reviews in Control 31 (2007) 147–156150

different products that were present in that feature. All data was

extracted from an industrial ERP (Enterprise Resource

Planning) system.

3.1.1. Modularity methodology

The modularity methodology discussed in this paper

identifies similar subassemblies within and across different

products based on the data from an existing industrial database.

The similarity is measured by the number of common parts

used by these subassemblies.

Similarity between subassemblies may not be obvious;

however, it can be visualized using the matrix representation

illustrated in Example 1.

Example 1. Table 1 is a part–subassembly incidence matrix

with each row representing a part and each column representing

a subassembly. The information represented in Table 1 is a

scaled-down version of the data extracted from the industrial

database. An entry 1 in cell (i, j) indicates that part i is included

in subassembly j. It is difficult to tell if any of the subassemblies

in Table 1 share similar part lists.

The data of Table 1 can be used to compute the similarity

between subassemblies (see Table 2). The similarity metric

between any two subassemblies a and b is defined in (1).

sab ¼Xn

i

di (1)

where di = 1 if for part i the entries corresponding to the

subassemblies a and b are equal; otherwise di = 0; n = the total

number of parts.

The data in Table 2 provides useful insights into modularity.

For example, the similarity between subassemblies SA1 and

SA4 as well as the subassemblies SA3 and SA5 is 7.

For the logical modules formed, approaches aimed at

increased customer satisfaction, such as delayed product

Table 2

Similarities between the subassemblies of Table 1

SA1 SA2 SA3 SA4 SA5

SA1 8 2 2 7 3

SA2 2 8 4 3 3

SA3 2 4 8 1 7

SA4 7 3 1 8 2

SA5 3 3 7 2 8

differentiation and assembly-to-order strategies, can be

followed (Kusiak, 1999). For example, if two subassemblies

share 57 out of 60 parts, the similar parts (or at least a subset of

them) can be either positioned for assembly or assembled early

so that the remaining three parts can be added to the assembly

when an order for a specific assembly has arrived. Exploring the

similarities and dissimilarities among subassemblies may lead

to the redesign of some parts.

3.2. Identifying common parts

Product data stored in a part–subassembly incidence matrix

indicates commonly used parts. To reveal the information

contained in this incidence matrix, a greedy modularity

algorithm is proposed.

The modularity algorithm is partially based on the concept

used in design of facilities. Rather than using the traffic

intensity between facilities, the modularity algorithm is based

on similarity between parts. The similarity metric used here is

the same as in (1). The difference is that the similarity measured

here is between parts rather than subassemblies. The

modularity algorithm based on the part–subassembly incidence

matrix is presented next.

The greedy modularity algorithm:

1. Randomly select a part for the solution set

2. Determine the similarity between the selected part and every other part

3. Find the maximum similarity value

4. Place the part corresponding to the maximum similarity next to the

selected part in the solution set

5. Label the part on the right of the solution set i and the part on the left

of the solution set j

6. Determine the similarity between part i and every part not included in

the solution set. Determine the similarity between part j and every part

not in the solution set

7. Place the part with the maximum similarity from Step 6 next to

its corresponding part (to the right if part i, to the left if part j)

8. Determine if all parts are in the solution set. If not, go back to Step 5

The greedy modularity algorithm is illustrated in Examples

2 and 3.

Example 2. The relationships between parts and subassem-

blies are represented as the incidence matrix in Table 3.

Part 1 is selected for the solution set at random. The

similarity values between Part 1 and all other parts are shown in

Table 4. Part 1 is isolated from the other parts in the solution set.

The bold similarity value (for Part 4) indicates the similarity of

Part–subassembly incidence matrix

Part Subassembly

A B C D

1 1 1 1

2 1 1 1

3 1 1

4 1 1 1

5 1 1

6 1 1 1

Table 4

Step 2 of the greedy modularity algorithm

Table 5

Matrix illustrating a partial solution

Table 6

Iteration 2

Table 7

Iteration 3

Table 8

Iteration 4

A. Kusiak, M. Smith / Annual Reviews in Control 31 (2007) 147–156 151

the part which is to be added next to the solution set (the highest

similarity part).

Part 4 is incorporated into the solution set next to Part 1, as it

has the maximum similarity value. Table 5 shows the next step

of the algorithm. The column labeled Sim 1 contains the

similarities between the part sequenced first and the remaining

parts. The column labeled Sim 2 includes the similarities

between the last part in the solution set and the remaining parts.

Tables 6–8 show the results produced in the subsequent

iterations of the algorithm.

Table 9 provides the sequence of the parts generated by the

greedy modularity algorithm. It can be seen from this example

that parts which share similar features can be grouped.

Example 3. Consider a group with seven parts (Group A,

Parts 1–7) which extend across five different subassemblies, a

group of three parts (Group B, Parts 8–10) extending across

four subassemblies (Table 10). Group C (Parts 11–13)

contains parts that have the function of parts in group B for

SA 5.

Groups A, B, and C form logical modules. As the

functionality of Group B and Group C is identical, Parts 8

through 10 could be redesigned to accommodate the

functionality of both groups. This way the part count could

be lowered, and one large module could be considered instead

of three smaller ones. The feasibility of such changes is product

dependent; however, the modularity algorithm identifies

candidates for such design modifications.

It is also important to note here that while the main goal is

finding candidates for physical modules, the algorithm

identifies logical modules. Some of the logical modules can

then become physical modules.

3.3. Combining results from similar subassemblies and

common parts

Due to the complementary nature of identifying similar

subassemblies and common parts, the results can be combined

to benefit modularity. The two scenarios presented next

highlight some of the potential benefits.

Table 9

The final solution

Table 10

Part–subassembly matrix for Example 3

Table 13

Scenario 1 organized incidence matrix

SA1 SA2 SA3

P1 1 1 1

P7 1 1 1

P3 1 1

P2 1 1

P5 1

P6 1

P4 1

Table 14

Scenario 2 incidence matrix

SA1 SA2 SA3

P1 1

P2 1

P3 1

P4 1

P5 1

P6 1

P7 1

A. Kusiak, M. Smith / Annual Reviews in Control 31 (2007) 147–156152

Scenario 1. Consider the part–subassembly incidence matrix

in Table 11 with parts P1 through P7 and subassemblies SA1,

SA2, and SA3.

The similarity matrix derived from this incidence matrix is

presented in Table 12, and the part–subassembly incidence

matrix organized with the greedy modularity algorithm is

presented in Table 13.

Table 11

Scenario 1 incidence matrix

SA1 SA2 SA3

P1 1 1 1

P2 1 1

P3 1 1

P4 1

P5 1

P6 1

P7 1 1 1

Table 12

Scenario 1 similarity matrix

SA1 SA2 SA3

SA1 7 2 5

SA2 2 7 4

SA3 5 4 7

Based solely on the similarity matrix (Table 12), SA3 is

similar to both SA1 (s13 = 5) and SA2 (s23 = 4). Based on the

data from the similarity matrix, it is difficult to determine the

best candidates for modularity. However, the organized

incidence matrix (Table 13) shows that while the similarities

are almost identical, the underlying structures are different. The

subassemblies SA1 and SA3 share two parts, while three

instances of similar parts are not present. The subassemblies

SA2 and SA3 involve four common parts. Even though the

similarity between SA1 and SA3 is higher, SA2 and SA3 could

be better candidates for modularity as they share more common

parts.

Using other similarity metrics could simplify the analysis

and provide additional insights into the part commonality and

modularity issues. Analysis of the relationship between

subassemblies with another similarly metric is illustrated in

Scenario 2.

Scenario 2. Consider the part–subassembly incidence matrix

in Table 14 with parts P1 through P7 and subassemblies SA1,

SA2, and SA3.

The similarity between subassemblies derived from this

incidence matrix is presented in Table 15, and the part–

subassembly incidence matrix organized with the modularity

algorithm is presented in Table 16.

In this scenario all three subassemblies do not share any

parts. This is obvious from the similarity matrix in Table 15, as

Table 15

Scenario 2 similarity matrix

SA1 SA2 SA3

SA1 7 0 0

SA2 0 7 0

SA3 0 0 7

Table 16

Scenario 2 organized incidence matrix

SA1 SA2 SA3

P1 1

P2 1

P3 1

P4 1

P5 1

P6 1

P7 1

Fig. 6. Clustered rows and columns of Fig. 5.

Fig. 7. Clustered rows of Fig. 5.

A. Kusiak, M. Smith / Annual Reviews in Control 31 (2007) 147–156 153

every non-diagonal entry is 0. The same conclusion can be

reached from the organized incidence matrix of Table 16. This

scenario shows that the same result can be validated from

different perspectives.

The presented analysis has focused predominantly on the

similarity of components or functions. This represents an

important aspect of design, however, other aspects e.g.,

maintainability should be considered. One way of handling

these additional design aspects is through constraints. The fact

that a number of alternative structures (matrices) is generated

may lead to satisfaction of multiple evaluation criteria.

4. Mass customization

Mass customization is defined as permitting ‘‘customized

manufacture on a mass basis’’ (Davis, 1989). According to Da

Silveira, Borenstein, and Fogliatto (2001), there have been

three drivers of mass customization. The first was due to the

advent of flexible manufacturing and information technologies

that enabled production systems to deliver a higher variety of

products at lower costs. The second was due to the fact that

consumers are constantly increasing their expectations for

product variety and customization. Finally, the shortening of

product life cycles and expanding industrial competition has led

to the shift from mass production, increasing the need for

production strategies focused on individual customers.

To realize the benefits of mass customization companies are

seeking new production strategies (Agard & Kusiak, 2004). The

two traditional production strategies are make-to-stock and

make-to-order. While the former strategy results in excessive

inventory and applies minimal pressure on the process set-up

reduction, the latter leads to low inventory levels and calls for

the process set-up reduction.

The assemble-to-order strategy offers a compromise

between the two traditional production strategies and supports

Fig. 5. Sales records of a simple tractor.

the mass customization concept. To illustrate this strategy,

consider five sales records of a simple tractor (see Fig. 5).

Clustering the data in Fig. 5 produces the matrix in Fig. 6.

The four subassemblies formed by S1 through S4 are used to

realize the assembly-to-order strategy.

An approach followed by some companies, short of the

assembly-to-order strategy, aims at developing preassembled

configurations at attractive prices. Grouping the rows (custo-

mers) of the matrix in Fig. 5 has resulted in the matrix in Fig. 7.

The first two rows in Fig. 7 are labelled P1, the next two P2,

and the last row remains unlabeled. The configurations P1 and

P2 could be further transformed by offering two engine

upgrades, one cabin downgrade, and two backhoe upgrades to

the configurations shown in Fig. 8.

The transformation for the matrix in Fig. 5 to the matrix in

Fig. 6 was accomplished by changing the sequence of rows and

columns with a similarity-based algorithm (Kusiak, 2000). The

matrix in Fig. 8 was obtained from the matrix in Fig. 5 by

sequencing of rows using the same algorithm.

5. Supply chain management

A supply chain is a contractual linkage among various

parties ideally to achieve a ‘‘just-in-time’’ flow of goods. The

purpose of the supply chain designer is to quickly generate the

electronic trade scenarios. Supply chain management involves

the adoption of electronic linkages between two businesses that

are related as supplier/customer within a single industry

channel or supply chain (Westland & Clark, 1999).

Fig. 8. Transformed data of Fig. 7.

A. Kusiak, M. Smith / Annual Reviews in Control 31 (2007) 147–156154

Data mining is a powerful tool for supply chain manage-

ment, especially in the e-commerce environment, as it can be

used to:

� C

ontrol the inventory (Chen, Huang, Chen, & Wu, 2005). For

example, in retail business, inventory is very expensive and

represents a large liability. Knowledge of data-mining output

can analyze past business, monitor present transactions, and

predict future sales. With better control of inventory, the

retailer can achieve higher profitability. The same concept

applies to distributors and manufacturers.

� P

redict the customer’s behavior (Kim, Kim, & Lee, 2003).

When customers buy products online, they always want to

receive the goods as soon as possible. In most cases, the

manufacturing process takes time to produce goods. To meet

the customer’s requirements, certain inventory levels have to

be kept. If we have better predictions of customers’ behaviour

patterns, we can achieve a better balance between inventory

levels and customers’ needs, and thus increase profitability.

� C

Fig. 9. Major phases of the product life-cycle and the corresponding data.

ustomer Relationship Management (CRM) is another hot

research issue in recent supply chain research (Li, Shue, &

Lee, 2006; Lin, Su, & Chien, 2006; Tseng & Huang, 2007).

The CRM should integrate customer data with different

resources. It should also provide a deeper understanding of

customer behavior and needs.

� R

educe the level of risk to the business (Enke &

Thawornwong, 2005; Huang, Hung, & Jiau, 2006). Most

of the payments in e-commerce are through credit cards.

Checking the customer’s credit history is a very important

measure to reduce business risk. In the traditional supply

chain, it is time-consuming and error-prone work. Imagine

companies that receive thousands of orders per day. Each

processing clerk can only spend a limited amount of time on

each case to make decisions. The quality of the decision

depends on his/her previous experience and intuition

because there is not enough time to analyze all relevant

data. Using data-mining techniques, we can find very useful

patterns to support decisions. The rules are much easier for

humans to understand than the rough data since the rules are

extracted from large, otherwise incomprehensible data sets.

Decisions based upon the extracted rules will be more

reliable.

Fig. 10. Integrated requirements tree.

6. Data-driven innovation

6.1. Concept introduction

Recent years have brought about a renewed interest in

innovation, especially after the Innovate America Report

(NIIR, 2004) was published. Though innovation has been a

subject of intensive studies by diverse research communities,

many will agree that the results produced have not translated

into meaningful innovation gains in the industry (Carlson &

Wilmot, 2006). Rather, industry is awaiting methodologies,

processes, and tools leading to innovation breakthroughs.

It appears that the product-life cycle data is of importance to

innovation.

The approach presented here emphasizes innovation by

using the data collected throughout the product life cycle. This

data-driven innovation can be implemented by:

� I

ncorporating new functions, e.g., a copy machine plus

digitizer plus fax plus email.

� I

ncorporating inventions into existing artifacts.

� I

ntegrating existing inventions.

� E

xtending existing inventions.

� I

mpacting the sales environment, e.g., marketing.

A product creates a data trail at every phase of its life cycle,

as illustrated in Fig. 9. Some of the data serves the existing

product while other data is stored for future use.

Various analyses could be performed at various product life-

cycle phases, including extraction of innovation fostering

requirements. The locally extracted requirements could be inte-

grated into an innovation-inspiring innovation tree (see Fig. 10).

Fig. 11. Discovery of innovation principles through data and text mining.

Fig. 12. Matrix structuring: (a) decomposed matrix; (b) organized dependency

structure matrix.

A. Kusiak, M. Smith / Annual Reviews in Control 31 (2007) 147–156 155

The vector (genome) X of requirements would be grouped

into classes (chromosomes), such as (Kusiak, 2007):

� F

unction;

� F

orm;

� C

ulture;

� S

urprise.

The ultimate goal is to define the genome X maximizing

function F(X) representing a certain goal, e.g., market share,

social needs. It needs to be stressed that the innovation goal

does not have to be necessarily business oriented, though it

often may lead to economic benefits.

In addition, the genome could contain stochastic information

that could be represented in various ways, e.g., in the form of a

probability gene.

The list of requirements in Fig. 10 is by no means

exhaustive. As the innovation process will become better

understood, the list of requirements will be modified. Among

the four types of requirements shown in Fig. 10, function

and form have received most attention. There is a need

to study ‘‘soft’’ requirements, e.g., culture, surprise, and

others that will emerge, which could prove to be most

beneficial.

People have been fascinated with inventions and innova-

tions from the beginning of civilization. In fact progress

and development across civilizations have been fueled by

human inventiveness. Analysis of the large volumes of

historical information, e.g., studying inventors and the

discovery of commonality among invention processes, could

lead to the creation of a body of innovation knowledge (see

Fig. 11).

6.2. Innovation modeling: an expanded dependency model

The requirements-driven approach to innovation advocated

in this paper has numerous merits. Many will agree that the

approach presented is valid, as it draws on a broad range of

requirements categorized as functional, form, culture, surprise,

and categories to be defined. Furthermore, a multi-source

approach is proposed for the generation of requirements.

It should be stressed that the innovation-fostering require-

ments constitute a subset of all requirements, and most of them

will be expressed through the functional and form require-

ments.

For the expanded set of requirements, a relationship matrix

with the product or process functions and forms can be built. A

similar concept has been previously used with success in the

quality function deployment (Hauser & Clausing, 1988),

system decomposition (Kusiak, 1999), and dependency

structure matrix (Kusiak, Wang, He, & Feng, 1995). The latter

two concepts are illustrated in Fig. 12.

The decomposition concept is illustrated with the require-

ment-product function matrix in Fig. 12(a) representing two

disjoint requirement-product function clusters. The rows of this

matrix represent the requirements collected from the sources

advocated in this paper, i.e., provided by the customers, experts,

and derived from the data stream collected over the product life-

cycle. The columns could include product functions (or product

parameters), process functions (or parameters), and parameters

representing other pertinent product life-cycle phases.

In addition to the row–column matrix (e.g., Fig. 12(a)), one

can certainly consider row–row (e.g., Fig. 12(b)), and column–

column matrices to gain additional insights into the relation-

ships between the requirements themselves and the parameters

characterizing the product, process, product life-cycle phases.

Depending on the goal of innovation analysis, the innovation

matrix will take different forms and become the basis of the

innovation model. The model built based on the matrix will allow

the introduction of various constraints and objective functions,

thus covering diverse innovation optimization scenarios.

7. Conclusion

Although numerous successful applications of data mining

in design and manufacturing have been reported, many

challenges are ahead. Some challenges come from the data

mining itself and others come from the application domains.

The main challenges are as follows:

A. Kusiak, M. Smith / Annual Reviews in Control 31 (2007) 147–156156

� M

ake greater use of unstructured information. Most data-

mining algorithms have been designed to process numeric or

textual data. Design and manufacturing systems could

provide other forms of data, e.g., geometry, audio, and

video. The top priority appears to be processing geometry.

� I

ntegration of data-mining algorithms with the existing

applications. Most value-adding industrial software systems

interact with internal and external systems. Data-mining

applications should subscribe to time and spatial integration.

� M

ake data-mining models comprehensible to users. Users are

not data-mining experts. A good presentation format is

crucial for full utilization of data-mining results. Due to the

variety of formats, data-mining results can be expressed in

different ways. For example, the decision tree is a good

method of presenting classification results.

� S

calability of data-mining applications. Most data-mining

algorithms have been developed under the assumption that

the data is stored in operating memory. The size of databases

is constantly growing, and therefore data-mining software

needs to accommodate the growing number of parameters

and size of data sets.

� S

tandardization and legal aspects of data mining. To facilitate

exchange of data-derived knowledge, standards are needed.

Some standardization efforts have been initiated. Legal aspects

of knowledge exchange and use have been gaining momentum.

The most lasting value provided by data mining might be

that of innovation. As data plays an increasingly important role

in innovation, data-mining algorithms are likely to make

valuable contributions the innovation challenge.

References

Agard, B., & Kusiak, A. (2004). A data-mining based methodology for the

design of product families. International Journal of Production Research,

42(15), 2955–2969.

Boly, V. (2004). Ingenierie de l’innovation: Organisation et methodologies des

entreprises innovantes. Paris, France: Lavoisier.

Carlson, C. R., & Wilmot, W. W. (2006). Innovation: The five disciplines for

creating what customers want. New York: Random House.

Chandrasekaran, B., Stone, R., & McAdams, D. (2004). Developing design

templates for product platform focused design. Journal of Engineering

Design, 15(3), 209–228.

Chen, M.-C., Huang, C.-L., Chen, K. Y., & Wu, H.-P. (2005). Aggregation of

orders in distribution centers using data mining. Expert Systems with

Applications, 28(3), 453–460.

Da Cunha, C., Agard, B., & Kusiak, A. (2006). Data mining for improvement of

product quality. International Journal of Production Research, 44(18/19),

4027–4041.

Dahmus, J., Gonzalez-Zugasti, J., & Otto, K. (2001). Modular product archi-

tecture. Design Studies, 22(5), 409–424.

Davis, S. (1989). From ‘‘future perfect’’: Mass customizing. Planning Review,

17(2), 16–21.

Da Silveira, G., Borenstein, D., & Fogliatto, F. (2001). Mass customization:

Literature review and research directions. International Journal of Produc-

tion Economics, 72(1), 1–13.

Harding, J. A., Shahbaz, M., Srinivas, S., & Kusiak, A. (2006). Data mining in

manufacturing: A review. ASME Transactions: Journal of Manufacturing

Science and Engineering, 128(4), 969–976.

Hauser, J., & Clausing, D. (1988). The house of quality. Harvard Business

Review, 66(3), 63–73.

Huang, C., & Kusiak, A. (1998). Modularity in design of products and systems.

IEEE Transactions on Systems, Man, and Cybernetics: Part A, 28(1), 66–77.

Huang, Y.-M., Hung, C.-M., & Jiau, H. C. (2006). Evaluation of neural networks

and data mining methods on a credit assessment task for class imbalance

problem. Nonlinear Analysis: Real World Applications, 7(4), 720–747.

Enke, D., & Thawornwong, S. (2005). The use of data mining and neural

networks for forecasting stock market returns. Expert Systems with Appli-

cations, 29(4), 927–940.

Kim, E., Kim, W., & Lee, Y. (2003). Combination of multiple classifiers for the

customer’s purchase behavior prediction. Decision Support Systems, 34(2),

167–175.

Kusiak, A. (1999). Engineering design: Products, processes, and systems. San

Diego, CA: Academic Press.

Kusiak, A. (2006). Data Mining: Manufacturing and service applications.

International Journal of Production Research, 44(18/19), 4175–4191.

Kusiak, A. (2007). Innovation Science: A Primer. International Journal of

Computer Applications in Technology, 28(2–3), 140–149.

Kusiak, A. (2000). Computational intelligence in design and manufacturing.

New York: John Wiley.

Kusiak, A., & Huang, C. (1996). Development of modular products. IEEE

Transactions on Components, Packaging, and Manufacturing Technology,

Part A, 19(4), 523–538.

Kusiak, A., & Song, Z. (2006). Combustion efficiency optimization and virtual

testing: A data-mining approach. IEEE Transactions on Industrial Infor-

matics, 2(3), 176–184.

Kusiak, A., Wang, J., He, D. W., & Feng, C. X. (1995). A structured approach

for analysis of design processes. IEEE Transactions on Components,

Packaging and Manufacturing Technology: Part A, 18(3), 664–673.

Li, S.-T., Shue, L.-Y., & Lee, S.-F (2006). Enabling customer relationship

management in ISP services through mining usage patterns. Expert Systems

with Applications, 30(4), 621–632.

Lin, Y., Su, H. Y., & Chien, S. (2006). A knowledge-enabled procedure for

customer relationship management. Industrial Marketing Management,

35(4), 446–456.

NIIR. (2004). Innovate America. Washington, DC: Council for Competitive-

ness, National Innovation Initiative Report.

Pahl, G., & Beitz, W. (1988). Engineering design: A systematic approach.

London, UK: Springer.

Stone, R., Wood, K., & Crawford, R. (2000). A heuristic method for identifying

modules for product architectures. Design Studies, 21(31), 5–31.

Tseng, T. Z., & Huang, C.-C. (2007). Rough set-based approach to feature

selection in customer relationship management. Omega, 35(4), 365–383.

Ulrich, K., & Tung, K. (1991). Fundamentals of product modularity. In A.

Sharon (Ed.), Issues in design/manufacture integration, DE 39. New York:

ASME.

Westland, J. C., & Clark, T. H. K. (1999). Global electronic commerce: Theory

and case studies. Cambridge, MA: MIT Press.

Witten, I. H., & Frank, E. (2005). Data mining: Practical machine learning

tools and techniques. New York: Elsevier.

Dr. Andrew Kusiak is a Professor in the Department of Mechanical and

Industrial Engineering at the University of Iowa in Iowa City, Iowa. He is

interested in applications of computational intelligence in automation, energy,

manufacturing, product development, and healthcare. Dr. Kusiak has published

numerous books and technical papers in journals sponsored by professional

societies, such as AAAI, ASME, IEEE, IIE, ESOR, IFIP, IFAC, INFORMS,

ISPE, and SME. He speaks frequently at international meetings, conducts

professional seminars, and consults for industrial corporations. Dr. Kusiak has

served on editorial boards of over 35 journals. He is the IIE Fellow and the

Editor-in-Chief of the Journal of Intelligent Manufacturing.

Mathew R. Smith is a graduate student in Industrial Engineering in the

Department of Mechanical and Industrial Engineering at the University of

Iowa, Iowa City, IA. He has obtained a BS degree in Industrial Engineering from

the same department and is interested in applications of operations research in

engineering design and manufacturing. He is a member of the Intelligent

Systems Laboratory.