Reference Ontology and (ONTO)2 Agent: The Ontology Yellow Pages
Gellish: an information representation language, knowledge base and ontology
-
Upload
independent -
Category
Documents
-
view
1 -
download
0
Transcript of Gellish: an information representation language, knowledge base and ontology
Information Representation 1 02/06/2014
Gellish Formal English
An information representation language, knowledge base and ontology
by Ir. Andries van Renssen
Shell Global Solutions International [email protected]
2003
Abstract
Data storage and data exchange and interoperability lack a common standard widely
applicable data model as well as a common data language with a dictionary-taxonomy of
concepts and a grammar for data exchange messages. This article presents a solution to this
problem in the form of the new “open” industry standard Gellish Formal English language,
as a further development of the standard data model and ontology of two new ISO standards.
The article states that Gellish is a suitable language for neutral data exchange between
systems. The definition of Gellish includes an extensive Formal English Dictionary-
Taxonomy with definitions of a large number of concepts and relation types. This also
provides an ontology with standard reference data for customization of systems in a
standardized way to be prepared for data harmonization, data integration and data exchange.
The article illustrates that a collection of databases tables or data files with a common
identical structure or format, the Gellish Expression Format, is suitable to express a wide
range of kinds of facts about kinds of things as well as facts about individual things, and that
it can replace conventional data models.
Keywords: knowledge representation, formal language, data model, taxonomy, ontology, semantic web, knowledge base, data warehouse, interoperability, data exchange standard, information model
Table of Content
1 Introduction ...................................................................................................................... 2
1.1 Standard data models, ontologies and reference data .............................................. 3
2 Issues in data modeling .................................................................................................... 4
3 The Gellish language and ontology ................................................................................. 4
4 Storage and exchange of data as well as semantics in Gellish ........................................ 7
5 Interpretation of expressions .......................................................................................... 10
6 The Gellish Knowledge Base ......................................................................................... 15
7 Experiences and applications ......................................................................................... 16
8 Conclusions .................................................................................................................... 17
9 References ...................................................................................................................... 17
Information Representation 2 02/06/2014
1 Introduction
Conventionally, each software system stores its data using its own data model and
communicates with other systems usually using a dedicated interface data structure, which
means that it applies a dedicated interface data model. The large variety of data models
cause that data exchange between systems is costly because of the required conversion of
the data from the semantics of one data model to the other. This demonstrates the urgent
need for widely applicable common standard data models or a data representation
language.
Often systems can be ‘customized’ by adding ‘reference data’ as instances, such as the
definition of equipment types, document types, activity types, property types, etc. However,
reference data are usually different per implementation, even when database structures of
different systems are equal, such as is the case with several implementations of the same
system. This also holds for different implementations of the same system, such as a CAD,
CAE, PDM, PLM, ERP or CRM system. The consequence is that data in those
implementations can still not be compared, integrated or exchanged without costly data
conversion processes. This illustrates the urgent need for a common dictionary,
classification system, taxonomy or ontology of reference data and shared knowledge.
Unfortunately there is currently not such a standard user data language.
In the current systems there is a separation between the world of data models and the world
of instances. Data models are developed by IT specialists (data modelers) who document
them using either proprietary tools or using a standard data modeling language, such as
EXPRESS (ISO 10303-11) or UML, which languages are especially designed to define data
models. Once a data model is defined in such a language, the data model acts as another
language in which the reference data as well as the user data has to be expressed. The use of
two different languages, one for the model, one for the user data, illustrates the barrier
between the two worlds. It is as if the English language definition is expressed in Chinese.
On top of this comes that each programmer and each reference data producer is free to use
those data definition languages to define his own terminology!
The current situation is sketched by Smith and Welty (2001) as follows: “Out of the
apparent chaos, some coherence is beginning to emerge. Gradually, computer scientists are
beginning to recognize that the provision, once for all, of a common, robust reference
ontology – a shared taxonomy of entities – might provide significant advantages over the ad-
hoc, case-by-case methods previously used”.
From a business perspective, a “common language” would provide the opportunity for huge
cost savings, because of the advantages of the possibility to combine or integrate data from
different sources, to automate the verification of their consistency, and to exchange data
between parties (as in e-commerce), even if those parties apply different systems and
catalogues.
Several attempts are made to develop an ‘upper ontology’, such as SUMO by Niles and
Pease (2001), the IEEE Standard Upper Ontology, SUO (2001), the Cyc ontology, Lenat
(1995) and GOL, Degen et al (2001). However none of them integrates a generic data model
with reference data and a language for the description of knowledge and of individual
objects and processes.
This article presents a solution to the above-mentioned issues in the form of the Gellish
language. Gellish Formal English is defined in a smart dictionary, which has the form of a
taxonomy and knowledge base or ontology, and includes the concepts from standard generic
data models (it is a further development of ISO 15926-2 and ISO 10303-221) and concepts
Information Representation 3 02/06/2014
from various sources including other ISO standards, IEC standards, VDI standards, and
knowledge stemming from proprietary sources. As such it is an integration of an upper
ontology with a lower ontology. The total ontology includes also the definition of standard
fact types (or relation types) that defines the grammar of the Gellish language. Gellish
satisfies the criteria for proper ontologies as expressed by Degen et al (2001 par 6.1), but is
not limited to an upper ontology. Gellish is extendable just as any natural language. Its
taxonomy and knowledge base uses unique identifiers for concepts, thus allowing for
synonyms and multiple names in various languages. The latter enables the expression of
propositions about facts in one natural languages and their automatic translation and
presentation in any other natural language.
Gellish eliminates the traditional barrier between the data model definitions of classes and
the data instances. The Gellish language demonstrates that this barrier is not necessary and
that there are clear advantages when class definitions, reference data and user data are
expressed in one and the same language.
An extended version of the Gellish Formal English language is described in the book:
Semantic Modeling in Formal English [Ref. 15].
1.1 Standard data models, ontologies and reference data
There are several developments of standard lower level ontologies and reference data
libraries, stimulated among others by requirements of the e-commerce ‘market places’ and
the developments around The Semantic Web promoted by Lee et al (2000). For example, the
UNSPSC code, Ecl@ss, Trade Ranger, etc. These standards have their value mainly in the
standardization of terminology, but do not provide a standard language or a standard data
model for general use, because of their limited semantic expression power due to the fact
that they apply only a few relation types and lack of integration with a rich upper ontology.
There have also been several attempts to develop standard data models for data exchange or
for data storage. Some of them are proprietary, but others are in the public domain. Those
standard data models are defined independent of a particular system, and are therefore called
‘neutral’. Those standard data models are usually developed for a particular application
domain instead of being limited to a particular system.
Examples of standard data models are the STEP family of standards in ISO 10303, such as a
graphics data model AP203, a data model for the automotive industry (AP214), one for
piping systems (AP227), one under development for the defense industry (AP239, PLCS),
etc. The integration of all those data models into one overall data model is not yet fully
achieved. Although the scopes of these valuable standard data models are wide, they are still
limited to particular application area’s and do not provide a general ‘common language’ yet.
A standard data model with a generic scope is ISO 15926-2, which has a counterpart within
the STEP family (AP221). Although these two data models are stemming from the process
industries, their nature of being an upper ontology makes them applicable in other
application domains as well. To become practically applicable in a particular application
domain, these generic data models need a ‘reference data library’ or ‘ontology’ to add
definitions of application domain specific concepts to specialize the generic data model. The
Gellish Formal English Dictionary (earlier called STEPlib) provides such standard reference
data library or ontology. A part of that has been standardized as ISO 15926-4.
The Gellish language definition can be regarded as a very large generic data model that is
further extendable by adding subtypes to the existing concepts in the ontology hierarchy. It
can also be seen as a base knowledge library of product models and business processes that
can be extended with your own knowledge by expressing and exchanging them in the
Gellish language.
Information Representation 4 02/06/2014
2 Issues in data modeling
There is a language barrier between data models terminology and user data terminology.
This is strengthened by the strict separation between on one hand the concepts defined in a
data model and on the other hand the reference data (instances) to customize a system and
the operational data. This separation also implies that the semantic concepts of the data
model are not accessible by users of that data. Elimination of that barrier by integration of
the entity type and attribute type definitions with the customization data could reduce this
language barrier and would give users easier access to the exact interpretation rules for their
data.
Data models are fixed once databases are created and thus the semantics for the
interpretation of the user data is fixed. Any extension of this semantics requires a
redefinition of the structure of the database and a conversion of the data from the old to the
new database structure. A fixed data model is often seen as an advantage as it fixes the
‘rules of the game’. But the disadvantage is that it prevents an increase of knowledge and
semantics in the data model. Flexible data models would reduce costs of time consuming
modifications or extensions of a data model.
The scope of data models is limited. This puts constraints on the data storage capabilities of
systems, which is a problem in case of business changes. The normal solutions to this are
either to create very big data models or to create very generic data models. A very big data
model is difficult to understand, to manage and to apply. Generalization of data models
leads to smaller data models, but also to abstract entity types with semantics that is difficult
to grasp as is illustrated by the ISO 15926-2 standard data model. A common language with
a very wide scope as described in this paper might provide a third and better solution to this
problem.
Finally, each data model is different, so that exchange of data between different systems
means that the data shall be converted from one data structure to the other and vice versa.
This is caused by the fact that currently there is no systematic and standardized approach to
the reuse of earlier defined concepts. The promises of ‘object libraries’ did not yet provide a
general solution. A widely applicable data language is still needed to solve this problem.
The result of the current state of the art is that data storage is done in a Babylonian mix of
“languages”/data models with the consequence that exchange of data between systems is
impossible, except where dedicated bilateral translators are created between each pair of
languages/data models.
3 The Gellish language and ontology
Gellish is a public domain standard data and knowledge representation language and
ontology that does not have the above mentioned constraints and does not have the barrier
between the user data and the IT data model data. On the contrary, the ontology defines a
rich and extensible semantics in natural language terminology, expressed in Gellish itself.
This ontology is equivalent to a data model of over 20.000 entity types, attribute types and
relationship types.
Gellish is not object oriented, but fact oriented. The basic Gellish object is therefore a fact.
Each (atomic) fact is expressed as a relation between (two) objects.
For example, fact 1 is expressed by a particular relation between objects with unique
identifiers (UID’s) 100 and 101. This expression (1, 100, 101) illustrates the structure of
each basic Gellish expression. Gellish requires that both the objects and the fact must be
classified explicitly by standard classes, including standard relation types. The standard
Information Representation 5 02/06/2014
classes are predefined in the Gellish ontology. In addition to that, objects may have a name.
This enables that the expression can be interpreted correctly by software.
If a certain fact cannot be expressed in the current Gellish language, then new classes that
define the missing concepts can be added to the Gellish dictionary (the definition can always
be expressed using the existing Gellish language) and if necessary new fact types can also be
added to the Gellish grammar. This enables to express new kinds of facts about new kinds of
objects in Gellish. Note that those new definitions have to be exchanged with the party that
receives the message, to enable him to interpret the message correctly.
Gellish and the above mentioned ISO standards are both based on the understanding that
there appears to exist a limited set of application independent standard relation types that are
sufficient to model all kinds of products and processes, whereas each of those relation types
require well defined role types that can be played by particular object types.
A large part of that set is defined in the ISO standards and an extended set is defined in the
TOP part of the Gellish language definition.
A standard implementation of Gellish is defined in the Gellish Expression Format (GTF, see
below) in which names of the objects and the classification of the fact are combined in one
record (the classification of the objects is done via separate classification facts in additional
records). In this GTF format the basic Gellish expression becomes:
Left hand
object
UID
Left hand
object
name
Fact
UID
UID of
relation type
Name of relation
type
Right hand
object UID
Right hand
object name
100 thing-1 1 4658 is related to 101 thing-2
Note that the relation type also has a UID which is not shown in the above record nor in the records
below.
The semantic expression capabilities of Gellish are defined by the allowed relation types.
They define the kinds of relations that are possible to express facts. They also define the
roles that the related objects play towards each other.
Some examples of facts and standard Gellish relation types are:
Left hand
object UID
Left hand
object name
Fact
UID
UID of
relation
type
Name of relation
type
Right hand
object UID
Right hand
object name Scale
130091 diesel engine 2 1146 is a specialization of 130108 engine
104 M-1 3 1225 is classified as a 130091 diesel engine
130802 cylinder 4 1146 is a specialization of 730063 artifact
107 C-1 5 1225 is classified as a 130802 cylinder
107 C-1 6 1260 is a part of 104 M-1
107 C-1 7 1727 has as aspect 108 volume of C-1
108 volume of C-1 8 1225 is classified as a 550140 internal volume
108 volume of C-1 9 5025 has on scale a value
equal to 922235 1800 cm3
104 M-1 10 4760 is a subject of 110 order-1
The above table illustrates:
- The standard Gellish relation types, that classify the facts. The variation of standard
relation types determine the expression capabilities and semantics of Gellish.
- Examples of the large number of standard object types predefined in Gellish. For
example: engine, diesel engine, cylinder, artifact, internal volume, 1800 and cm3.
Information Representation 6 02/06/2014
- New concepts (objects types) can be added: such as fact 2 and 4. In this case they
already exist in the Gellish Formal English Dictionary, but if diesel engine and
cylinder would not have existed, they could have been added in this way.
- It is possible in Gellish to express facts, such as the volume of C-1, without the need
that such a fact is pre-modeled in a data model. Although such a fact type could be
defined in Gellish, after which this particular instance can be verified against such a
definition.
- One table is suitable to express many kinds of facts.
Note: The table above presents just an example of some of the capabilities of Gellish. For
example, Gellish also allows to express in which language the facts are expressed,
whether the objects are real or imaginary, what the communicative intent is, who the
author of a proposition is and the addressee, etc.
Gellish is not limited to specific application domains, although the current ontology (the
dictionary) does not yet cover the scope of a natural language. This wide applicability is
illustrated by the following example from a complete different domain:
Left
hand
object
UID
Left hand object
name
Fact
UID
Relation type
name
Right
hand
object
UID
Right hand
object name
Scale
111 Andries 11 is classified as a 990007 man
112 Rose-Mary 12 is classified as a 990013 woman
111 Andries 13 is married with 112 Rose-Mary
111 Andries 14 is born at 19460309 March 9, 1946
111 Andries 15 is author of 113 Gellish Handbook
113 Gellish Handbook 16 is classified as a 490193 manual
The flexibility of the semantics of Gellish is achieved by:
1. Defining an extendable specialization hierarchy of fact types (or relation types),
whereas each relation type is defined by two required role types, while for each role
type it is defined which kind of object can play such a role.
There are three hierarchies of fact types:
Fact types which express that members of a class can be related to members
of another class in a particular way. Facts of this type express knowledge
about classes.
For example: a pipe can have as aspect a diameter
Fact types which express that individual objects are related to other
individual objects. Facts of this type express information about individual
objects.
For example: John is performer of action#1
Fact types which express that individual objects are related to classes or can
be related to members of classes. This includes facts that express
classifications and facts that express that individual objects can have relations
with members of certain classes.
For example: action#1 is classified as maintenance
2. Expressing knowledge about classes (types of things) in addition to knowledge about
individual objects in the same data structure.
Information Representation 7 02/06/2014
3. Eliminating the difference in treatment between attribute types and instances, by
expressing the definition of the attribute types in the same way as the instances and
by expressing instantiations as explicit classification relations between individual
objects (instances) and the applicable classes that classify them (instances of class).
4. Eliminating the difference between entity types and attribute types by expressing the
relation between entities and their attributes as explicit individual facts, expressed as
individual relations between instances that are classified as ‘possession of aspect’
relations.
Figure 1 compares the essential concepts in the current methodologies with the concepts in
the Gellish language.
Figure 1, Comparison of Data modeling concepts with Gellish concepts
These principles provide the flexibility of the Gellish language and makes it an extensible
data model that can integrate the knowledge of many application domains, which knowledge
is not hidden in the data model, but is visible for the user as (meta) data that defines his
application data.
4 Storage and exchange of data as well as semantics in Gellish
In this paragraph I will describe how knowledge, data and semantics are represented in
Gellish.
I will use the example of the fact:
- a particular pump (‘P-1’) is pumping a particular stream (‘S-1’).
In a conventional database it is required to declare some entity types and attribute types that
define the semantics in the form of a data model. In case of the example, the data model
could for example consist of the entity types ‘pump’, ‘process’ and ‘stream’, whereas each
entity type possesses some attributes.
Current Data Modeling Concepts
1. Instantiation
- Implicit classification relations
with limited number of classes (entity types).
2. Entities have Attributes- Implicit relation types between
entity and attributes.
3. Subtyping of entities (not of attributes)
- Methodology does not require a consistent
subtyping strategy.
- Usually a limited use of inheritance.
4. Entity and attribute types
are not instances (fixed data model)
- Fixed knowledge model outside database.
Gellish Language Concepts
1. Explicit relations
- Explicit classification of individuals
with unlimited number of classes (subtypes).
- Explicit specialization hierarchy of classes.
2. Explicit Relation types between objects
- Explicit classification of relations
to standard relation types.
3. Specialization relations between classes
- Methodology requires that every class
has at least one supertype, which results in
a consistent specialization hierarchy.
- Full use of inheritance,
applicable for all objects, including also
properties, relations, occurrences, etc.
4. Entity types and attribute types
(classes) are instances- Flexible knowledge model
integrated with reference data and user data.
Information Representation 8 02/06/2014
In Gellish, the concepts ‘pump’, process’ and ‘stream’ are not defined as such entity types,
because concepts in Gellish do not imply ‘attributes’. Instead they are defined as concepts
that can be related to a flexible number of other concepts of any kind. The collection of
relations form a knowledge base. The definitions only contain the minimum number of
expressions of what is by definition the case, without specifying what can or shall be the
case. Such definitions have the general structure of a ‘basic semantic pattern’, which
comprises the fundamental ontological concepts of Gellish. That pattern is also applicable
for the definition of additional semantic concepts.
For the definition of a new concept it is required to define a coherent set of elementary facts,
expressed as relations between the new concept and the existing concepts. In other words,
each new concept requires the creation of a structure of expressions as presented in figure 2.
Figure 2, Basic semantic pattern
Figure 2 presents the minimum structure of ‘basic semantic concepts’ that are the axioms of
Gellish and which meaning should be understood.
The basic elementary facts in the figure are expressed in the Gellish Format Table below.
These facts form a template for other facts that are expressed in Gellish (the fact UID’s in
the table correspond with the numbers in the figure):
Left hand
object UID
Left hand
object name
Fact
UID
UID of
relation
type
Name of relation
type
Right hand
object UID
Right hand
object name
201 object-1 1A 5234 is player of 202 role-1
202 role-1 1B 1991 is played in 205 relation-1
203 object-2 2A 5234 is player of 204 role-2
204 role-2 2B 1991 is played in 205 relation-1
205 relation-1 3 1225 is classified as a 206 relation type-1
201 object-1 4 1225 is classified as a 207 object type-1
203 object-2 5 1225 is classified as a 208 object type-2
202 role-1 6 1225 is classified as a 209 role type-1
204 role-2 7 1225 is classified as a 210 role type-2
is a
kind of thing
is (a)
relationrole
(of something
in a relation)
anything playing
a rolerequirement
of role
is a
- object-1 - role-1- relation-1
- object-2 - role-2
plays
played by requires
in
1A 2A
34 5
2B1B
6 7
is a
specialization
of
is a
specialization
of
Information Representation 9 02/06/2014
The basic concepts are:
- anything - role - relation / relations - plays role - requires role - is / is a (is classified as a) - individual thing / individual things - kind of thing / kind of things - single thing / plural thing - specialization of class (is a specialization of)
The structure of figure 2 holds for facts about classes as well as facts about individual
objects (instances) or relations, but also for single objects as well as for plural objects. In
other words, object-1 and object-2 in figure 2 can be either a single or plural individual
object, relation or class. The lines in the top left corners of the boxes indicate that the
structure is a typical instance, because it defines instances of the concept ‘class’.
Any other ‘atomic fact’ is expressed as such a structure. In other words, any atomic fact is
expressed as an ‘atomic relation’ between two or more ‘objects’ and by the classification of
the ‘objects’, the ‘roles’ and the ‘relation’. This implies that an atomic fact is expressed by a
structure of nine (9) relations, formed by the blue boxes in figure 2 (note that 4 of the 5
boxes appear twice in an atomic fact).
For example the fact that impeller O1 is part of centrifugal pump O2 is expressed in Gellish
by the following 4 elementary relations:
- O1 plays role R1
- R1 is required by C1
- C1 requires role R2 (the inverse of ‘R2 is required by C1’)
- R2 is played by O2
These 4 relations relate 5 objects. To interpret them correctly the following 5 additional
classification relations are required:
- O1 is classified as an impeller
- R1 is classified as a part
- C1 is classified as a composition relation (“is part of”)
- R2 is classified as a whole
- O2 is classified as a centrifugal pump
In practical implementations it appears that the explicit identification of the roles and their
classification can be neglected, because they follow from the classification of the relation
and the definition of the relation type.
Therefore the above relations are usually summarized in 3 Gellish atomic expressions as
follows:
- O1 is classified as an impeller
- O1 is part of O2
- O2 is classified as a centrifugal pump
From this example it can be seen that the 5 kinds of things with which the 5 objects are
classified need to be present in or added to the semantics of the Gellish knowledge base in
order to ensure that the fact can be interpreted correctly.
The awareness that a dictionary of predefined concepts is required for a correct
interpretation of Gellish expressions resulted in the development of the top-down
Information Representation 10 02/06/2014
hierarchical definition of the Gellish Formal English Dictionary of concepts, including also
relation types.
Knowledge representation: relations between classes
Any fact type that extends the semantics is expressed as a relation between kinds of things.
For example, assume that the concept ‘centrifugal pump’ needs to be added. Then the
following two atomic relations define that concept:
1. A specialization relation that defines that:
centrifugal pump is a specialization of pump
2. A relation that defines that a centrifugal pump by definition uses the centrifugal
principle:
centrifugal pump has by definition as aspect centrifugal.
These relations build respectively on the definition of the concept ‘pump’ and ‘centrifugal’.
5 Interpretation of expressions
In current database technology the semantic interpretation of an expression is done via the
fact that any object is implicitly classified by being an ‘instance’ of an entity of which the
semantics are defined.
For example, assume that P1 is an instance of an attribute called ‘name’ of an entity called
‘pump’. This probably means that P1 is the name of a thing that is classified as a pump,
although this meaning comprises two facts that are usually not defined in a computer
interpretable way. It should be noted that if there are no other attributes, this data structure
does not allow the classification of P1 as a centrifugal pump.
In Gellish all semantics is made explicit by the creation of explicit classification relations
between the elements in the expression and the Gellish concepts (classes of objects,
including relations). This replaces the instantiation relations and eliminates the need to
define a data model with entities and attributes, such as the entity ‘pump’ and the attribute
‘name’. This is illustrated in figure 3.
Figure 3, Linking a Gellish expression to Gellish concepts through classification
Figure 3 illustrates the expression P-101 is pumping S-1” (in dark yellow). The ‘pumping S-
1’ process is an interaction between the fluid S-1 and the pump P-101. The pump has the
classifier
classified
classifier
classified
classifier
classified
Green shaded area = Gellish ontology (STEPlib)Green shaded area = Gellish ontology (STEPlib)
classifier
classified
‘S-1’‘P-101’
is classified as ais classified as a is classified as ais classified as ais classified as ais classified as a
classifier
classified
is classified as ais classified as a
‘is performer of pumping S-1’
‘pumping S-1’
is classified as ais classified as a
player requirer
requirerplayer
‘is subject in pumping S-1’
pumpingpump liquid streamis performer of is subject in
111
11
11312
112
13 15 14
730083 192512130206
Information Representation 11 02/06/2014
role as performer and the liquid has the role as subject in the pumping process. The blue
boxes in the green shaded area represent the Gellish concepts, being instances in the Gellish
Dictionary. The explicit classification relations with the concepts in those blue boxes
provide the semantics for the interpretation of the expression.
In Gellish Expression Format this becomes:
Left hand
object UID
Left hand
object name
Fact UID Relation type
name
Right hand
object UID
Right hand
object name
111 P-101 11 is performer of 112 pumping S-1
113 S-1 12 is subject in 112 pumping S-1
111 P-101 13 is classified as a 130206 pump
112 pumping S-1 14 is classified as a 192512 pumping
113 S-1 15 is classified as a 730083 liquid stream
Such a set of rows in a Gellish Expression Format can be exchanged between Gellish
enabled software packages in any kind of table, such as an MS-Access database table, an
Oracle or DB2 table, XLS spreadsheet, an XML file (e.g. according to ISO 10303-28) or in
STEP physical file format (ISO 10303-21). Further details are described in ref. 1.
Note that the shaded light yellow boxes all have the same name: “is classified as a”.
However, they are different individual classification relations. Each of those relations has a
unique identifier (13, 14 and 15). The name in the shaded box indicates that each is
(implicitly) “conceptualized” to be a classification relation. In other words, each of them is a
“is classified as a” relation.
For a correct interpretation of the Gellish concepts they need to be defined in a computer
interpretable way. This is done via specialization relations as is illustrated in figure 4.
Figure 4, Definition of Gellish concepts in a specialization hierarchy
In practice there are several intermediate levels of specialization between e.g. ‘pump’ and
‘physical object’, etc.
classifier
classified
subtype
supertypesupertype
subtype
classifier
classified
supertype
subtype
classifier
classified
subtype
supertype
classifier
classified
‘P-101’
physical object
is a specialization ofis a specialization of
is classified as ais classified as a is classified as ais classified as a
is a specialization ofis a specialization of
is classified as ais classified as a
classifier
classified
is classified as ais classified as a
‘is performer of pumping S-1’
‘pumping S-1’
is a specialization ofis a specialization of
is classified as ais classified as a
subtype
is a specialization ofis a specialization of
requirer
requirerplayer
player
is a specialization ofis a specialization of
‘is subject in pumping S-1’
activityrelation
‘S-1’
pumpingliquid stream is subject inis performerpump
Green area = Gellish ontologyGreen area = Gellish ontology
Information Representation 12 02/06/2014
The knowledge about the meaning of the concepts pump, ‘is performer of’, liquid stream, ‘is
subject in’ and pumping is defined in the Gellish Dictionary. Some of that is illustrated in
the following facts, which includes some intermediate facts not shown in figure 4 (the UID’s
and names are taken from the dictionary):
Left hand
object UID
Left hand
object name Fact UID Relation type name
Right hand
object UID
Right hand
object name
130206 pump 16 is a specialization of 730044 physical object
4761 is performer of 17 is a specialization of 4767 is involved in
4761 is performer of 18 requires as role-1 a 640020 performer
730044 physical object 19 can have as role as a 640020 performer
4761 is performer of 20 requires as role-2 a 4773 involver
730083 liquid stream 21 is a specialization of 730045 stream
4760 is subject in 22 is a specialization of 4767 is involved in
192512 pumping 23 is a specialization of 190168 process
This knowledge is inherited from concepts that are higher in the hierarchy to lower level
concepts. If an individual thing is classified to be of such a class, then the knowledge is
applicable to the individual object as a constraint for the specific aspects of the individual
object. This is illustrated in figure 5.
Figure 5, Concepts and knowledge in a specialization hierarchy
Figure 5 illustrates the use of inheritance in the knowledge base of Gellish. It also enables
for example search engines to perform intelligent searches on subtypes of keywords using
the specialization hierarchy.
The difference between Gellish and the ISO standard data models (ISO 10303-221 and ISO
15926-2) is that in the methodology of these ISO standards some of the higher level
concepts are selected to form entities in the standard data model. This means that there will
Information Representation 13 02/06/2014
be instantiation relations between the concept in the library and for example the AP221
entities as is illustrated in figure 6.
Figure 6, Relation of Gellish concepts to ISO 15926-2 data model entities.
However, in fact there is no need to use a data model such as AP221 or ISO 15926-2 at all,
except for the little data model of figure 2 with the ‘basic semantic axioms’ mentioned
above.
A common use of the little data model of figure 2, together with the common use of the
Gellish ontology makes it possible to express and interpret a very wide scope of types of
facts. This is possible because the explicit classification relations provide interpretation rules
for the expressions for which the relation types as well as the object types are defined in
Gellish. It is only required to have the concepts defined in the Gellish knowledge base and to
refer to them as in the basic structure using the ‘basic semantic axioms’ mentioned above.
Figure 6 illustrates the further definition of concepts up to the top concept called ‘anything’.
Because of this generic top any concept can be added to Gellish as a subtype of an existing
concept.
An implementation of Gellish could for example declare all classes in the hierarchy
(subtypes of ‘individual thing’) as instances of the basic semantic concept ‘kinds of things’
as is illustrated in figure 7.
classifier
classified
subtype
supertypesupertype
subtype
classifier
classified
subtype
classifier
classified
subtype
supertype
Green area = Gellish ontologyGreen area = Gellish ontology
classifier
classified
‘P-101’
is classified as ais classified as a is classified as ais classified as ais classified as ais classified as a
classifier
classified
is classified as ais classified as a
‘is performer of pumping S-1’
‘pumping S-1’
is classified as ais classified as a
subtype
instance
entiry
class of activity
is an instance ofis an instance of
instance
entity
class of relation
is an instance ofis an instance of
instance
entity
class of product
is an instance ofis an instance of
requirer
requirerplayer
player
is a specialization ofis a specialization of
is a specialization ofis a specialization of
is a specialization ofis a specialization of
is a specialization ofis a specialization of
is a specialization ofis a specialization of
‘is subject in pumping S-1’
activitysupertypephysical object relation
‘S-1’
pumpingpump liquid stream is subject inis performer of
Information Representation 14 02/06/2014
Figure 7, Instantiation in the ‘basic semantic axioms’.
Figure 7 contains eight facts expressed as eight “is a specialization of” relations, each of
which is a separate relation between classes. Similarly to what is described above about the
“is classified as a” relation, this illustrates that the term ‘is a specialization of’ is not the
name of each of those relations, but it is a name of the Gellish concept (the class) that is the
conceptualization of those relations.
So, we distinguish between the individual specialization relations and the specialization of
class concept that is called ‘is a specialization of’. Similarly we distinguish between the
individual classification relations for the classification of individual objects and the
‘classification concept’ that is called ‘is classified as a’. This more detailed definition of the
semantics of the relations is illustrated in figure 8.
classifier
classified
subtype
supertypesupertype
subtype
classifier
classified
subtype
classifier
classified
subtype
supertype
pumpingpump liquid stream
classifier
classified
‘S-1’‘P-101’
is classified as ais classified as a is classified as ais classified as ais classified as ais classified as a
classifier
classified
is classified as ais classified as a
‘performer of pumping S-1’
‘pumping S-1’
is classified as ais classified as a
subtype
instance
entity
kinds of things
subtype
is subject inis performer of
requirer
requirerplayer
player
is a specialization ofis a specialization of
is a specialization ofis a specialization of
is a specialization ofis a specialization of
is a specialization ofis a specialization of
is a specialization ofis a specialization of
supertypephysical object relation activity
‘subject in pumping S-1’
is a specialization ofis a specialization of
supertype
is a specialization ofis a specialization of
is a specialization ofis a specialization of
individual thingis an instance ofis an instance ofinstance
individual thingsGreen area = Gellish ontologyGreen area = Gellish ontology
anything
is a specialization ofis a specialization of
Information Representation 15 02/06/2014
Figure 8, Individual classification and generalization relations.
Figure 8 also illustrates that in Gellish a relation is defined as an expression of a fact. In the
Gellish Expression Format this is translated into the statement that every relation is in
instance of the entity Gellish_fact.
6 The Gellish Knowledge Base
The Gellish knowledge base contains relations between concepts of many kinds. For
example, many relations represent knowledge about the decomposition structure of members
of the classes of objects, others define properties of the members of the classes or roles that
can be played by the members, etc. All together the knowledge library is an integrated
network of relations in which we can distinguish product models, activity models, process
models, and various other models. Parts of those models can be seen as views or templates
which represent limited sets of data about types of objects.
The knowledge in the Gellish knowledge base belongs to the definition of the Gellish
language and is in the public domain. This Gellish provides knowledge to users and enables
them to add knowledge either in the public domain or as private extensions.
Software that implements Gellish shall ensure that all knowledge is inherited via the
specialization hierarchy and can be used via the classification relations in Gellish
expressions. An example of such knowledge is illustrated in figure 9. The knowledge that a
pump can be performer of a pumping activity is represented in the Gellish knowledge base
by a relation between the concepts ‘pump and ‘pumping’ (‘to pump’). That relation is
defined as a specialization relation of the ‘can be a performer of a’ relation.
Information Representation 16 02/06/2014
Figure 9, Modeling knowledge in Gellish
The knowledge as presented in the upper part of figure 8 is documented in the Gellish
Dictionary as follows:
Left hand
object UID
Left hand
object name
Fact
UID Relation type name
Right hand
object UID
Right hand
object name
130206 pump 24 can be performer of a 192512 pumping
24 …. 25 is a specialization of 4650 can be a
performer of a
730083 liquid stream 26 can be subject in a 192512 pumping
26 …. 27 is a specialization of 4649 can be a
subject in a
A lot of knowledge of this type is already included in this way in the Gellish Dictionary or
Knowledge Base. That knowledge base will be further extended with public domain
knowledge and can be extended with proprietary knowledge, thus extending the semantics
of the Gellish language for your own applications. In addition to the addition of proprietary
extensions we recommend to propose additions to the public domain Gellish definitions as it
will extent the common language between all users of Gellish.
7 Experiences and applications
A commercial application of Gellish is a Gellish Search Engine. That software can read (and
write) and verify information that is expressed in the Gellish language and is able to present
any knowledge about classes of objects and any data about individual objects. It might be
expected that implementation of Gellish would have serious performance issues. However
the Search Engine has an excellent performance even when loaded with hundreds of
thousands of facts. We also customized an implementation of a product lifecycle
management (PLM) system and loaded the same data in that system. That system also had
an excellent performance.
classifier
classified
classifier
classified
classifier
classified
classifier
classified
‘S-1’‘P-101’
is classified as ais classified as a is classified as ais classified as ais classified as ais classified as a
classifier
classified
is classified as ais classified as a
‘is performer of pumping S-1’
‘pumping S-1’
is classified as ais classified as a
player requirer
requirerplayer
‘is subject in pumping S-1’
pumpingpump liquid streamis performer of is subject in
can be a performer of a
is a specialization ofis a specialization of
can be a subject in a
is a specialization ofis a specialization of
‘can be performer of a pumping’
‘can be subject in a pumping’
Green area = Gellish ontologyGreen area = Gellish ontology
subtype
‘generalization of
involvement as performer‘generalization of
involvement as performer
supertype‘generalization of
involvement as subject’‘generalization of
involvement as subject’
relation instance
24
4650
26
4649
Information Representation 17 02/06/2014
8 Conclusions
The above illustrates that:
- It is possible that a knowledge base of concepts and relations between concepts can
replace data models.
- The Gellish knowledge base of concepts solution is more flexible than fixed data
models and it is easier to add semantics to the database.
- The Gellish knowledge base of concepts provides an application independent
language with a semantic basis that is equivalent to a very large data model. If
sufficient concepts of an application domain are present or added, then data models
for such an application domain becomes superfluous.
- The Gellish knowledge base, using the inheritance capabilities of the specialization
hierarchy, provides extendable product models for many types of objects.
- The implementations have proven that a Gellish knowledge base can be implemented
with good performance.
- The implementations have proven that neutral format data exchange using Gellish
and the Gellish Expression Format is a feasible solution.
Further work will explore the use of Gellish for the exchange of messages by intelligent
Agent software, acting as nodes in the Semantic Web.
9 References
1. Andries van Renssen, The Gellish Formal English Syntax - Definition of
Universal Semantic Databases and Data Exchanghe Messages, available via the
download area of http://www.gellish.net/.
2. Andries van Renssen, Creation and Use of Dictionaries and Taxonomies, a guide
to develop or extent a Gellish domain dictionary, available via
http://www.gellish.net/index.php/shop.html.
3. Andries van Renssen, Development of Facility and Product Models, available via
http://www.gellish.net/index.php/shop.html.
4. The Gellish Formal English Dictionary. This is a set of tabular files in Gellish
Expression Format (in Excel). The upper level ontology part is documented in
the TOPini part. Available via http://www.gellish.net/index.php/shop.html.
5. Tim Berners-Lee, James Hendler and Ora Lassila, 'The Semantic Web', Scientific
American, May 2001;
http://www.sciam.com/2001/0501issue/0501berners-lee.html.
6. Ian Niles and Adam Pease (2001), “Towards a Standard Upper Ontology”, in:
Formal Ontology in Information Systems, ISBN 1-58113-377-4.
7. SUO (2001), The IEEE Standard Upper Ontology website, http://suo.ieee.org.
8. Lenat, D. (1995), “Cyc: A Large-Scale Investment in Knowledge Infrastructure”,
Communications of the ACM, 38, no 11 (November 1995).
9. Wolfgang Degen, Barbara Heller, Heinrich Herre and Barry Smith (2001),
“GOL: A General Ontological Language”, in: Formal Ontology in Information
Systems, ISBN 1-58113-377-4.
Information Representation 18 02/06/2014
10. The Epistle Core Data Model (2001),
http://www.btinternet.com/~chris.angus/epistle/specifications/ecm/ecm_400.html
11. ISO 10303-221 and ISO 15926-2, http://www.tc184-sc4.org/.
12. UNSPSC, http://www.unspsc.org/.
13. Ecl@ss, http://www.eclass.de/.
14. Trade Ranger, http://www.trade-ranger.com/EN/Pages/ContentStandards.asp.
15. Andries van Renssen, Semantic Modeling in Formal English, Lulu 2014,
ISBN 9781304513595,
http://www.lulu.com/shop/dr-ir-andries-van-renssen/semantic-modeling-in-
formal-english/paperback/product-21538016.html.