Data mining in design of products and production systems
Transcript of Data mining in design of products and production systems
www.elsevier.com/locate/arcontrol
Annual Reviews in Control 31 (2007) 147–156
Data mining in design of products and
production systems
Andrew Kusiak *, Matthew Smith
Intelligent Systems Laboratory, Industrial Engineering, 3131 Seamans Center,
The University of Iowa, Iowa City, IA 52242-1527, USA
Received 9 December 2006; accepted 5 March 2007
Available online 9 April 2007
Abstract
Data mining is acquiring its own identity by refining concepts from other disciplines, developing generic algorithms, and entering new
application areas. Engineering design and manufacturing have been affected by the data mining pursuit. This paper outlines areas of product and
manufacturing system design that are particularly suitable for data-mining applications. One of the emerging areas is innovation. The key
challenges of data mining in the domains discussed in the paper are outlined.
# 2007 Elsevier Ltd. All rights reserved.
Keywords: Data mining; Data analysis; Product design; Manufacturing; Innovation; Production systems
1. Introduction
Corporations are interested in innovative ways of conducting
their business. Some innovation can be attributed to the
growing use of data in design and manufacturing.
Traditionally, the flow of data and information in design and
manufacturing systems has been essentially unidirectional as
illustrated in Fig. 1.
Any local bidirectional flow (loops) of information has often
been attributed to imperfections of the process, e.g., design
negotiation, manufacturing errors. The developments in
networking, data warehousing, and data mining have con-
tributed to the emergence of the closed loop system illustrated
in Fig. 2.
Products and components generate a data trail across life-
cycle phases such as market analysis, design engineering,
manufacturing, and service. Data-mining algorithms extract
knowledge from this large volume of data leading to significant
improvements in the next generation of products and services.
In fact, the knowledge discovery activity could become the key
factor to innovation and business success.
The basic capabilities of data analysis tools are outlined
next.
* Corresponding author. Tel.: +1 319 335 5934; fax: +1 319 335 5669.
E-mail address: [email protected] (A. Kusiak).
URL: http://www.icaen.uiowa.edu/�ankusiak
1367-5788/$ – see front matter # 2007 Elsevier Ltd. All rights reserved.
doi:10.1016/j.arcontrol.2007.03.003
1.1. Pattern discovery
Important patterns might be hidden in the industrial data. For
example, data mining applied to the customer domain may
reveal answers to questions such as
� W
hat characterizes frequent buyers?� W
hat characterizes customers who react to promotions?� W
hat characterizes customers making quick purchasedecisions?
� W
hat characterizes customers who do not purchase?Most database systems, such as MS-Access and Oracle,
provide some query capabilities providing answers to some
of these higher-level questions. However, for in-depth
analysis, data-mining algorithms are needed (Witten & Frank,
2005).
1.2. Trends detection
Industrial companies are increasingly developing data
warehouses to collect business data. Data-mining algorithms
cannot only extract the static patterns in data, but can also
discover dynamic trends. Mining time series is an active
research area (Kusiak & Song, 2006). The trends reflect
customer interest shifts, technology development, and the
response to marketing strategies.
Fig. 1. Data flow in traditional design and manufacturing systems.
A. Kusiak, M. Smith / Annual Reviews in Control 31 (2007) 147–156148
1.3. Data dimensionality reduction
Modern databases may contain large number of rows
(transactions) and columns (features). An important research
area is the concept of dimensionality reduction. Unrelated data
items and features can be eliminated from the dataset to reduce
the data-mining effort.
1.4. Visualization
Visualization tools enhance human understanding of data.
For example, graphs, charts, and tables make information easier
to understand than the original data. The relationships between
different data items become obvious when they are displayed.
To make full use of the data visualization tools, data and
knowledge are needed.
2. Knowledge discovery
There are two general classes of data mining, descriptive and
predictive. The goal of descriptive data mining is to discover
patterns, e.g., product configurations formed in mass customi-
zation applications. The predictive data mining aims at building
models to determine (predict) an outcome, e.g., a stock level.
Since the width of data analyzed by the data-mining algorithms
is essentially unlimited, the patterns discovered are usually not
anticipated and are of interest to different users. The value
Fig. 2. Data and knowledge flow in a modern design and manufacturing system.
delivered by these patterns is related to the quality of data and
textual databases. Besides the comprehensiveness of data
processing, data mining brings yet another advantage—it
supports the needs of an individual object, e.g., a part or a
customer.
Data-mining algorithms have been successfully deployed in
engineering, medical, and business applications (Da Cunha,
Agard, & Kusiak, 2006; Harding, Shahbaz, Srinivas, & Kusiak,
2006; Kusiak, 2006). The design and manufacturing domain is
a natural candidate for data-mining applications because it
contains extensive data. Besides enhancing innovation, data-
mining methods can reduce the risks associated with
conducting business and improve decision-making.
Some of the most widely used data-mining algorithms are
(Witten & Frank, 2005):
� D
ecision-tree algorithms.� D
ecision-rule algorithms.� B
ayesian algorithms.� N
eural networks.� C
lustering.� R
egression.The goal of data mining may range from obtaining a general
understanding of the nature of data to very accurate modeling
and prediction, e.g.:
� D
ata description and summarization. Description of datacharacteristics, typically in elementary and aggregated form.
� S
egmentation. Separation of data into interesting andmeaningful subgroups or classes.
� C
oncept description. Description of concepts or classes in anunderstandable form.
� D
ependency analysis. Finding a model that describessignificant dependencies between objects or events.
� C
lassification. Building classification models that assign acorrect class (label) to previously unseen and unlabeled
objects.
For data mining to be effective, several technologies have to
work together. Data-mining algorithms extract patterns from
data to create a meaning that otherwise would be non-existent.
Visualization techniques provide visual understanding of data,
rules, patterns and trends. Data warehousing is critical for
organizing, cleaning and preparing data for mining. The
computer network infrastructure is important, especially for
distributing data mining. These technologies need to be
integrated for effective data mining.
Some of the applications of data mining in design and
manufacturing are discussed in the next section of the paper.
3. Data-driven design
Engineering design has been lagging in the development of
data mining; however, the potential for benefits is significant.
Product complexity reduction and modularity are two of many
potential examples discussed next.
A. Kusiak, M. Smith / Annual Reviews in Control 31 (2007) 147–156 149
Increasing the modularity among products is a common goal
for many companies. Some of the benefits of modularity
include the potential for (Chandrasekaran, Stone, & McAdams,
2004; Kusiak & Huang, 1996):
� e
conomy of scale,� in
creased feasibility of product-component change,� in
creased product variety,� r
educed order lead-time,� d
ecoupled risks,� e
asier product diagnosis, maintenance, repair, and disposal,� p
art reduction,� s
Fig. 3. Example of a product family.Fig. 4. Example of modules constructed from lower-level modules.
implified design.
Modularity may be implemented at different levels such as
products, assemblies, or even components. It is accomplished by
combining functions into distinct building blocks or modules
(Pahl & Beitz, 1988). The modules themselves can be defined as
physical structures that have a one-to-one correspondence with
functional structures (Ulrich & Tung, 1991).
Three main types of modularity are discussed in the
literature: component swapping, bus modularity, and compo-
nent sharing (Ulrich & Tung, 1991). ‘‘Component swapping
can be considered as a means to improve the versatility of a
product by enabling different levels of performance and/or
different applications’’ (Chandrasekaran et al., 2004). Exam-
ples of this are installing winter tires in a car to improve
handling in icy conditions or switching the faceplate on a
cellular phone to change its appearance. Bus modularity is used
when a module with two or more interfaces can be matched
with any number of the components selected from a set of basic
components. The module interfaces accept any combination of
the basic components (Huang & Kusiak, 1998). Component-
sharing modularity is used when one component is designed for
usage across multiple products. An example of component
sharing can be seen in the Black and Decker1 VersaPakTM
portfolio of products. Each product in the portfolio shares the
same battery (Dahmus, Gonzalez-Zugasti, & Otto, 2001). A
consumer can then purchase one battery and share the battery
across the different products s/he owns.
The current literature focuses on designing modules for
finished products. Examples of this can be found with the
VersaPakTM portfolio of products (Dahmus et al., 2001),
general consumer products (Stone, Wood, & Crawford, 2000),
and automobiles (Dahmus et al., 2001). An area explored in this
paper identifies groups of components that can be used to build
modules. Using sub-modules offers the same benefits as
implementing modularity at the product level.
Identifying potential modules requires data related to
product functionality, part functionality, energy flow through
the product, and component interactions (Huang & Kusiak,
1998). In industrial applications, generating and gathering such
data is difficult. Much of the data needed may not be previously
available, and domain experts may be needed to generate this
data.
This paper presents modularity identification methods that
require limited information. The results produced are often in
the form of logical modules rather than physical modules. A
logical module is a collection of parts that are used across
numerous products. These parts may or may not have the ability
to form physical modules; however, there are many benefits of
identifying logical modules. For example, logical modules are
useful in supply chain management. By identifying parts that
are commonly used, demand uncertainty across different
products can be pooled together. This, in turn, offers the ability
to plan orders for parts and/or raw materials (Fig. 3).
This paper discusses component sharing modularity,
specifically in cases internal to the product (unlike the
VersaPakTM example).
3.1. Industrial case study
In this case study, industrial data obtained for a product
family was used. The product family consisted of ten products.
Each product contains a base which is unique to the specific
product. In addition each product includes a deck, frame,
operator station, and engine which are collectively referred to
as feature groups. Each feature group (deck, frame, operator
station, and engine) consists of n different subassemblies. Each
of these subassemblies is used by a subset of the ten products.
The example in Fig. 4 illustrates feature groups forming a
product. This product is of a modular nature at the feature-
group level. The case study is concerned with the modularity of
the subassemblies implemented at the feature-group level
illustrated in Fig. 4, where engine subassemblies 1, 2, and 3 are
used to construct engines A, B, and C.
In the case study, one data set was provided for each feature
group in the product family. Each data set contained a list of the
Table 1
Example of a part–subassembly incidence matrix
Part SA1 SA2 SA3 SA4 SA5
1 1 1 1 1
2 1 1 1
3 1 1 1
4 1 1 1
5 1 1 1
6 1 1 1
7 1 1 1
8 1 1 1
Table 3
A. Kusiak, M. Smith / Annual Reviews in Control 31 (2007) 147–156150
different products that were present in that feature. All data was
extracted from an industrial ERP (Enterprise Resource
Planning) system.
3.1.1. Modularity methodology
The modularity methodology discussed in this paper
identifies similar subassemblies within and across different
products based on the data from an existing industrial database.
The similarity is measured by the number of common parts
used by these subassemblies.
Similarity between subassemblies may not be obvious;
however, it can be visualized using the matrix representation
illustrated in Example 1.
Example 1. Table 1 is a part–subassembly incidence matrix
with each row representing a part and each column representing
a subassembly. The information represented in Table 1 is a
scaled-down version of the data extracted from the industrial
database. An entry 1 in cell (i, j) indicates that part i is included
in subassembly j. It is difficult to tell if any of the subassemblies
in Table 1 share similar part lists.
The data of Table 1 can be used to compute the similarity
between subassemblies (see Table 2). The similarity metric
between any two subassemblies a and b is defined in (1).
sab ¼Xn
i
di (1)
where di = 1 if for part i the entries corresponding to the
subassemblies a and b are equal; otherwise di = 0; n = the total
number of parts.
The data in Table 2 provides useful insights into modularity.
For example, the similarity between subassemblies SA1 and
SA4 as well as the subassemblies SA3 and SA5 is 7.
For the logical modules formed, approaches aimed at
increased customer satisfaction, such as delayed product
Table 2
Similarities between the subassemblies of Table 1
SA1 SA2 SA3 SA4 SA5
SA1 8 2 2 7 3
SA2 2 8 4 3 3
SA3 2 4 8 1 7
SA4 7 3 1 8 2
SA5 3 3 7 2 8
differentiation and assembly-to-order strategies, can be
followed (Kusiak, 1999). For example, if two subassemblies
share 57 out of 60 parts, the similar parts (or at least a subset of
them) can be either positioned for assembly or assembled early
so that the remaining three parts can be added to the assembly
when an order for a specific assembly has arrived. Exploring the
similarities and dissimilarities among subassemblies may lead
to the redesign of some parts.
3.2. Identifying common parts
Product data stored in a part–subassembly incidence matrix
indicates commonly used parts. To reveal the information
contained in this incidence matrix, a greedy modularity
algorithm is proposed.
The modularity algorithm is partially based on the concept
used in design of facilities. Rather than using the traffic
intensity between facilities, the modularity algorithm is based
on similarity between parts. The similarity metric used here is
the same as in (1). The difference is that the similarity measured
here is between parts rather than subassemblies. The
modularity algorithm based on the part–subassembly incidence
matrix is presented next.
The greedy modularity algorithm:
1. Randomly select a part for the solution set
2. Determine the similarity between the selected part and every other part
3. Find the maximum similarity value
4. Place the part corresponding to the maximum similarity next to the
selected part in the solution set
5. Label the part on the right of the solution set i and the part on the left
of the solution set j
6. Determine the similarity between part i and every part not included in
the solution set. Determine the similarity between part j and every part
not in the solution set
7. Place the part with the maximum similarity from Step 6 next to
its corresponding part (to the right if part i, to the left if part j)
8. Determine if all parts are in the solution set. If not, go back to Step 5
The greedy modularity algorithm is illustrated in Examples
2 and 3.
Example 2. The relationships between parts and subassem-
blies are represented as the incidence matrix in Table 3.
Part 1 is selected for the solution set at random. The
similarity values between Part 1 and all other parts are shown in
Table 4. Part 1 is isolated from the other parts in the solution set.
The bold similarity value (for Part 4) indicates the similarity of
Part–subassembly incidence matrix
Part Subassembly
A B C D
1 1 1 1
2 1 1 1
3 1 1
4 1 1 1
5 1 1
6 1 1 1
Table 4
Step 2 of the greedy modularity algorithm
Table 5
Matrix illustrating a partial solution
Table 6
Iteration 2
Table 7
Iteration 3
Table 8
Iteration 4
A. Kusiak, M. Smith / Annual Reviews in Control 31 (2007) 147–156 151
the part which is to be added next to the solution set (the highest
similarity part).
Part 4 is incorporated into the solution set next to Part 1, as it
has the maximum similarity value. Table 5 shows the next step
of the algorithm. The column labeled Sim 1 contains the
similarities between the part sequenced first and the remaining
parts. The column labeled Sim 2 includes the similarities
between the last part in the solution set and the remaining parts.
Tables 6–8 show the results produced in the subsequent
iterations of the algorithm.
Table 9 provides the sequence of the parts generated by the
greedy modularity algorithm. It can be seen from this example
that parts which share similar features can be grouped.
Example 3. Consider a group with seven parts (Group A,
Parts 1–7) which extend across five different subassemblies, a
group of three parts (Group B, Parts 8–10) extending across
four subassemblies (Table 10). Group C (Parts 11–13)
contains parts that have the function of parts in group B for
SA 5.
Groups A, B, and C form logical modules. As the
functionality of Group B and Group C is identical, Parts 8
through 10 could be redesigned to accommodate the
functionality of both groups. This way the part count could
be lowered, and one large module could be considered instead
of three smaller ones. The feasibility of such changes is product
dependent; however, the modularity algorithm identifies
candidates for such design modifications.
It is also important to note here that while the main goal is
finding candidates for physical modules, the algorithm
identifies logical modules. Some of the logical modules can
then become physical modules.
3.3. Combining results from similar subassemblies and
common parts
Due to the complementary nature of identifying similar
subassemblies and common parts, the results can be combined
to benefit modularity. The two scenarios presented next
highlight some of the potential benefits.
Table 9
The final solution
Table 10
Part–subassembly matrix for Example 3
Table 13
Scenario 1 organized incidence matrix
SA1 SA2 SA3
P1 1 1 1
P7 1 1 1
P3 1 1
P2 1 1
P5 1
P6 1
P4 1
Table 14
Scenario 2 incidence matrix
SA1 SA2 SA3
P1 1
P2 1
P3 1
P4 1
P5 1
P6 1
P7 1
A. Kusiak, M. Smith / Annual Reviews in Control 31 (2007) 147–156152
Scenario 1. Consider the part–subassembly incidence matrix
in Table 11 with parts P1 through P7 and subassemblies SA1,
SA2, and SA3.
The similarity matrix derived from this incidence matrix is
presented in Table 12, and the part–subassembly incidence
matrix organized with the greedy modularity algorithm is
presented in Table 13.
Table 11
Scenario 1 incidence matrix
SA1 SA2 SA3
P1 1 1 1
P2 1 1
P3 1 1
P4 1
P5 1
P6 1
P7 1 1 1
Table 12
Scenario 1 similarity matrix
SA1 SA2 SA3
SA1 7 2 5
SA2 2 7 4
SA3 5 4 7
Based solely on the similarity matrix (Table 12), SA3 is
similar to both SA1 (s13 = 5) and SA2 (s23 = 4). Based on the
data from the similarity matrix, it is difficult to determine the
best candidates for modularity. However, the organized
incidence matrix (Table 13) shows that while the similarities
are almost identical, the underlying structures are different. The
subassemblies SA1 and SA3 share two parts, while three
instances of similar parts are not present. The subassemblies
SA2 and SA3 involve four common parts. Even though the
similarity between SA1 and SA3 is higher, SA2 and SA3 could
be better candidates for modularity as they share more common
parts.
Using other similarity metrics could simplify the analysis
and provide additional insights into the part commonality and
modularity issues. Analysis of the relationship between
subassemblies with another similarly metric is illustrated in
Scenario 2.
Scenario 2. Consider the part–subassembly incidence matrix
in Table 14 with parts P1 through P7 and subassemblies SA1,
SA2, and SA3.
The similarity between subassemblies derived from this
incidence matrix is presented in Table 15, and the part–
subassembly incidence matrix organized with the modularity
algorithm is presented in Table 16.
In this scenario all three subassemblies do not share any
parts. This is obvious from the similarity matrix in Table 15, as
Table 15
Scenario 2 similarity matrix
SA1 SA2 SA3
SA1 7 0 0
SA2 0 7 0
SA3 0 0 7
Table 16
Scenario 2 organized incidence matrix
SA1 SA2 SA3
P1 1
P2 1
P3 1
P4 1
P5 1
P6 1
P7 1
Fig. 6. Clustered rows and columns of Fig. 5.
Fig. 7. Clustered rows of Fig. 5.
A. Kusiak, M. Smith / Annual Reviews in Control 31 (2007) 147–156 153
every non-diagonal entry is 0. The same conclusion can be
reached from the organized incidence matrix of Table 16. This
scenario shows that the same result can be validated from
different perspectives.
The presented analysis has focused predominantly on the
similarity of components or functions. This represents an
important aspect of design, however, other aspects e.g.,
maintainability should be considered. One way of handling
these additional design aspects is through constraints. The fact
that a number of alternative structures (matrices) is generated
may lead to satisfaction of multiple evaluation criteria.
4. Mass customization
Mass customization is defined as permitting ‘‘customized
manufacture on a mass basis’’ (Davis, 1989). According to Da
Silveira, Borenstein, and Fogliatto (2001), there have been
three drivers of mass customization. The first was due to the
advent of flexible manufacturing and information technologies
that enabled production systems to deliver a higher variety of
products at lower costs. The second was due to the fact that
consumers are constantly increasing their expectations for
product variety and customization. Finally, the shortening of
product life cycles and expanding industrial competition has led
to the shift from mass production, increasing the need for
production strategies focused on individual customers.
To realize the benefits of mass customization companies are
seeking new production strategies (Agard & Kusiak, 2004). The
two traditional production strategies are make-to-stock and
make-to-order. While the former strategy results in excessive
inventory and applies minimal pressure on the process set-up
reduction, the latter leads to low inventory levels and calls for
the process set-up reduction.
The assemble-to-order strategy offers a compromise
between the two traditional production strategies and supports
Fig. 5. Sales records of a simple tractor.
the mass customization concept. To illustrate this strategy,
consider five sales records of a simple tractor (see Fig. 5).
Clustering the data in Fig. 5 produces the matrix in Fig. 6.
The four subassemblies formed by S1 through S4 are used to
realize the assembly-to-order strategy.
An approach followed by some companies, short of the
assembly-to-order strategy, aims at developing preassembled
configurations at attractive prices. Grouping the rows (custo-
mers) of the matrix in Fig. 5 has resulted in the matrix in Fig. 7.
The first two rows in Fig. 7 are labelled P1, the next two P2,
and the last row remains unlabeled. The configurations P1 and
P2 could be further transformed by offering two engine
upgrades, one cabin downgrade, and two backhoe upgrades to
the configurations shown in Fig. 8.
The transformation for the matrix in Fig. 5 to the matrix in
Fig. 6 was accomplished by changing the sequence of rows and
columns with a similarity-based algorithm (Kusiak, 2000). The
matrix in Fig. 8 was obtained from the matrix in Fig. 5 by
sequencing of rows using the same algorithm.
5. Supply chain management
A supply chain is a contractual linkage among various
parties ideally to achieve a ‘‘just-in-time’’ flow of goods. The
purpose of the supply chain designer is to quickly generate the
electronic trade scenarios. Supply chain management involves
the adoption of electronic linkages between two businesses that
are related as supplier/customer within a single industry
channel or supply chain (Westland & Clark, 1999).
Fig. 8. Transformed data of Fig. 7.
A. Kusiak, M. Smith / Annual Reviews in Control 31 (2007) 147–156154
Data mining is a powerful tool for supply chain manage-
ment, especially in the e-commerce environment, as it can be
used to:
� C
ontrol the inventory (Chen, Huang, Chen, & Wu, 2005). Forexample, in retail business, inventory is very expensive and
represents a large liability. Knowledge of data-mining output
can analyze past business, monitor present transactions, and
predict future sales. With better control of inventory, the
retailer can achieve higher profitability. The same concept
applies to distributors and manufacturers.
� P
redict the customer’s behavior (Kim, Kim, & Lee, 2003).When customers buy products online, they always want to
receive the goods as soon as possible. In most cases, the
manufacturing process takes time to produce goods. To meet
the customer’s requirements, certain inventory levels have to
be kept. If we have better predictions of customers’ behaviour
patterns, we can achieve a better balance between inventory
levels and customers’ needs, and thus increase profitability.
� C
Fig. 9. Major phases of the product life-cycle and the corresponding data.
ustomer Relationship Management (CRM) is another hot
research issue in recent supply chain research (Li, Shue, &
Lee, 2006; Lin, Su, & Chien, 2006; Tseng & Huang, 2007).
The CRM should integrate customer data with different
resources. It should also provide a deeper understanding of
customer behavior and needs.
� R
educe the level of risk to the business (Enke &Thawornwong, 2005; Huang, Hung, & Jiau, 2006). Most
of the payments in e-commerce are through credit cards.
Checking the customer’s credit history is a very important
measure to reduce business risk. In the traditional supply
chain, it is time-consuming and error-prone work. Imagine
companies that receive thousands of orders per day. Each
processing clerk can only spend a limited amount of time on
each case to make decisions. The quality of the decision
depends on his/her previous experience and intuition
because there is not enough time to analyze all relevant
data. Using data-mining techniques, we can find very useful
patterns to support decisions. The rules are much easier for
humans to understand than the rough data since the rules are
extracted from large, otherwise incomprehensible data sets.
Decisions based upon the extracted rules will be more
reliable.
Fig. 10. Integrated requirements tree.
6. Data-driven innovation
6.1. Concept introduction
Recent years have brought about a renewed interest in
innovation, especially after the Innovate America Report
(NIIR, 2004) was published. Though innovation has been a
subject of intensive studies by diverse research communities,
many will agree that the results produced have not translated
into meaningful innovation gains in the industry (Carlson &
Wilmot, 2006). Rather, industry is awaiting methodologies,
processes, and tools leading to innovation breakthroughs.
It appears that the product-life cycle data is of importance to
innovation.
The approach presented here emphasizes innovation by
using the data collected throughout the product life cycle. This
data-driven innovation can be implemented by:
� I
ncorporating new functions, e.g., a copy machine plusdigitizer plus fax plus email.
� I
ncorporating inventions into existing artifacts.� I
ntegrating existing inventions.� E
xtending existing inventions.� I
mpacting the sales environment, e.g., marketing.A product creates a data trail at every phase of its life cycle,
as illustrated in Fig. 9. Some of the data serves the existing
product while other data is stored for future use.
Various analyses could be performed at various product life-
cycle phases, including extraction of innovation fostering
requirements. The locally extracted requirements could be inte-
grated into an innovation-inspiring innovation tree (see Fig. 10).
Fig. 11. Discovery of innovation principles through data and text mining.
Fig. 12. Matrix structuring: (a) decomposed matrix; (b) organized dependency
structure matrix.
A. Kusiak, M. Smith / Annual Reviews in Control 31 (2007) 147–156 155
The vector (genome) X of requirements would be grouped
into classes (chromosomes), such as (Kusiak, 2007):
� F
unction;� F
orm;� C
ulture;� S
urprise.The ultimate goal is to define the genome X maximizing
function F(X) representing a certain goal, e.g., market share,
social needs. It needs to be stressed that the innovation goal
does not have to be necessarily business oriented, though it
often may lead to economic benefits.
In addition, the genome could contain stochastic information
that could be represented in various ways, e.g., in the form of a
probability gene.
The list of requirements in Fig. 10 is by no means
exhaustive. As the innovation process will become better
understood, the list of requirements will be modified. Among
the four types of requirements shown in Fig. 10, function
and form have received most attention. There is a need
to study ‘‘soft’’ requirements, e.g., culture, surprise, and
others that will emerge, which could prove to be most
beneficial.
People have been fascinated with inventions and innova-
tions from the beginning of civilization. In fact progress
and development across civilizations have been fueled by
human inventiveness. Analysis of the large volumes of
historical information, e.g., studying inventors and the
discovery of commonality among invention processes, could
lead to the creation of a body of innovation knowledge (see
Fig. 11).
6.2. Innovation modeling: an expanded dependency model
The requirements-driven approach to innovation advocated
in this paper has numerous merits. Many will agree that the
approach presented is valid, as it draws on a broad range of
requirements categorized as functional, form, culture, surprise,
and categories to be defined. Furthermore, a multi-source
approach is proposed for the generation of requirements.
It should be stressed that the innovation-fostering require-
ments constitute a subset of all requirements, and most of them
will be expressed through the functional and form require-
ments.
For the expanded set of requirements, a relationship matrix
with the product or process functions and forms can be built. A
similar concept has been previously used with success in the
quality function deployment (Hauser & Clausing, 1988),
system decomposition (Kusiak, 1999), and dependency
structure matrix (Kusiak, Wang, He, & Feng, 1995). The latter
two concepts are illustrated in Fig. 12.
The decomposition concept is illustrated with the require-
ment-product function matrix in Fig. 12(a) representing two
disjoint requirement-product function clusters. The rows of this
matrix represent the requirements collected from the sources
advocated in this paper, i.e., provided by the customers, experts,
and derived from the data stream collected over the product life-
cycle. The columns could include product functions (or product
parameters), process functions (or parameters), and parameters
representing other pertinent product life-cycle phases.
In addition to the row–column matrix (e.g., Fig. 12(a)), one
can certainly consider row–row (e.g., Fig. 12(b)), and column–
column matrices to gain additional insights into the relation-
ships between the requirements themselves and the parameters
characterizing the product, process, product life-cycle phases.
Depending on the goal of innovation analysis, the innovation
matrix will take different forms and become the basis of the
innovation model. The model built based on the matrix will allow
the introduction of various constraints and objective functions,
thus covering diverse innovation optimization scenarios.
7. Conclusion
Although numerous successful applications of data mining
in design and manufacturing have been reported, many
challenges are ahead. Some challenges come from the data
mining itself and others come from the application domains.
The main challenges are as follows:
A. Kusiak, M. Smith / Annual Reviews in Control 31 (2007) 147–156156
� M
ake greater use of unstructured information. Most data-mining algorithms have been designed to process numeric or
textual data. Design and manufacturing systems could
provide other forms of data, e.g., geometry, audio, and
video. The top priority appears to be processing geometry.
� I
ntegration of data-mining algorithms with the existingapplications. Most value-adding industrial software systems
interact with internal and external systems. Data-mining
applications should subscribe to time and spatial integration.
� M
ake data-mining models comprehensible to users. Users arenot data-mining experts. A good presentation format is
crucial for full utilization of data-mining results. Due to the
variety of formats, data-mining results can be expressed in
different ways. For example, the decision tree is a good
method of presenting classification results.
� S
calability of data-mining applications. Most data-miningalgorithms have been developed under the assumption that
the data is stored in operating memory. The size of databases
is constantly growing, and therefore data-mining software
needs to accommodate the growing number of parameters
and size of data sets.
� S
tandardization and legal aspects of data mining. To facilitateexchange of data-derived knowledge, standards are needed.
Some standardization efforts have been initiated. Legal aspects
of knowledge exchange and use have been gaining momentum.
The most lasting value provided by data mining might be
that of innovation. As data plays an increasingly important role
in innovation, data-mining algorithms are likely to make
valuable contributions the innovation challenge.
References
Agard, B., & Kusiak, A. (2004). A data-mining based methodology for the
design of product families. International Journal of Production Research,
42(15), 2955–2969.
Boly, V. (2004). Ingenierie de l’innovation: Organisation et methodologies des
entreprises innovantes. Paris, France: Lavoisier.
Carlson, C. R., & Wilmot, W. W. (2006). Innovation: The five disciplines for
creating what customers want. New York: Random House.
Chandrasekaran, B., Stone, R., & McAdams, D. (2004). Developing design
templates for product platform focused design. Journal of Engineering
Design, 15(3), 209–228.
Chen, M.-C., Huang, C.-L., Chen, K. Y., & Wu, H.-P. (2005). Aggregation of
orders in distribution centers using data mining. Expert Systems with
Applications, 28(3), 453–460.
Da Cunha, C., Agard, B., & Kusiak, A. (2006). Data mining for improvement of
product quality. International Journal of Production Research, 44(18/19),
4027–4041.
Dahmus, J., Gonzalez-Zugasti, J., & Otto, K. (2001). Modular product archi-
tecture. Design Studies, 22(5), 409–424.
Davis, S. (1989). From ‘‘future perfect’’: Mass customizing. Planning Review,
17(2), 16–21.
Da Silveira, G., Borenstein, D., & Fogliatto, F. (2001). Mass customization:
Literature review and research directions. International Journal of Produc-
tion Economics, 72(1), 1–13.
Harding, J. A., Shahbaz, M., Srinivas, S., & Kusiak, A. (2006). Data mining in
manufacturing: A review. ASME Transactions: Journal of Manufacturing
Science and Engineering, 128(4), 969–976.
Hauser, J., & Clausing, D. (1988). The house of quality. Harvard Business
Review, 66(3), 63–73.
Huang, C., & Kusiak, A. (1998). Modularity in design of products and systems.
IEEE Transactions on Systems, Man, and Cybernetics: Part A, 28(1), 66–77.
Huang, Y.-M., Hung, C.-M., & Jiau, H. C. (2006). Evaluation of neural networks
and data mining methods on a credit assessment task for class imbalance
problem. Nonlinear Analysis: Real World Applications, 7(4), 720–747.
Enke, D., & Thawornwong, S. (2005). The use of data mining and neural
networks for forecasting stock market returns. Expert Systems with Appli-
cations, 29(4), 927–940.
Kim, E., Kim, W., & Lee, Y. (2003). Combination of multiple classifiers for the
customer’s purchase behavior prediction. Decision Support Systems, 34(2),
167–175.
Kusiak, A. (1999). Engineering design: Products, processes, and systems. San
Diego, CA: Academic Press.
Kusiak, A. (2006). Data Mining: Manufacturing and service applications.
International Journal of Production Research, 44(18/19), 4175–4191.
Kusiak, A. (2007). Innovation Science: A Primer. International Journal of
Computer Applications in Technology, 28(2–3), 140–149.
Kusiak, A. (2000). Computational intelligence in design and manufacturing.
New York: John Wiley.
Kusiak, A., & Huang, C. (1996). Development of modular products. IEEE
Transactions on Components, Packaging, and Manufacturing Technology,
Part A, 19(4), 523–538.
Kusiak, A., & Song, Z. (2006). Combustion efficiency optimization and virtual
testing: A data-mining approach. IEEE Transactions on Industrial Infor-
matics, 2(3), 176–184.
Kusiak, A., Wang, J., He, D. W., & Feng, C. X. (1995). A structured approach
for analysis of design processes. IEEE Transactions on Components,
Packaging and Manufacturing Technology: Part A, 18(3), 664–673.
Li, S.-T., Shue, L.-Y., & Lee, S.-F (2006). Enabling customer relationship
management in ISP services through mining usage patterns. Expert Systems
with Applications, 30(4), 621–632.
Lin, Y., Su, H. Y., & Chien, S. (2006). A knowledge-enabled procedure for
customer relationship management. Industrial Marketing Management,
35(4), 446–456.
NIIR. (2004). Innovate America. Washington, DC: Council for Competitive-
ness, National Innovation Initiative Report.
Pahl, G., & Beitz, W. (1988). Engineering design: A systematic approach.
London, UK: Springer.
Stone, R., Wood, K., & Crawford, R. (2000). A heuristic method for identifying
modules for product architectures. Design Studies, 21(31), 5–31.
Tseng, T. Z., & Huang, C.-C. (2007). Rough set-based approach to feature
selection in customer relationship management. Omega, 35(4), 365–383.
Ulrich, K., & Tung, K. (1991). Fundamentals of product modularity. In A.
Sharon (Ed.), Issues in design/manufacture integration, DE 39. New York:
ASME.
Westland, J. C., & Clark, T. H. K. (1999). Global electronic commerce: Theory
and case studies. Cambridge, MA: MIT Press.
Witten, I. H., & Frank, E. (2005). Data mining: Practical machine learning
tools and techniques. New York: Elsevier.
Dr. Andrew Kusiak is a Professor in the Department of Mechanical and
Industrial Engineering at the University of Iowa in Iowa City, Iowa. He is
interested in applications of computational intelligence in automation, energy,
manufacturing, product development, and healthcare. Dr. Kusiak has published
numerous books and technical papers in journals sponsored by professional
societies, such as AAAI, ASME, IEEE, IIE, ESOR, IFIP, IFAC, INFORMS,
ISPE, and SME. He speaks frequently at international meetings, conducts
professional seminars, and consults for industrial corporations. Dr. Kusiak has
served on editorial boards of over 35 journals. He is the IIE Fellow and the
Editor-in-Chief of the Journal of Intelligent Manufacturing.
Mathew R. Smith is a graduate student in Industrial Engineering in the
Department of Mechanical and Industrial Engineering at the University of
Iowa, Iowa City, IA. He has obtained a BS degree in Industrial Engineering from
the same department and is interested in applications of operations research in
engineering design and manufacturing. He is a member of the Intelligent
Systems Laboratory.