Gellish: an information representation language, knowledge base and ontology

18
Information Representation 1 02/06/2014 Gellish Formal English An information representation language, knowledge base and ontology by Ir. Andries van Renssen Shell Global Solutions International [email protected] 2003 Abstract Data storage and data exchange and interoperability lack a common standard widely applicable data model as well as a common data language with a dictionary-taxonomy of concepts and a grammar for data exchange messages. This article presents a solution to this problem in the form of the new “open” industry standard Gellish Formal English language, as a further development of the standard data model and ontology of two new ISO standards. The article states that Gellish is a suitable language for neutral data exchange between systems. The definition of Gellish includes an extensive Formal English Dictionary- Taxonomy with definitions of a large number of concepts and relation types. This also provides an ontology with standard reference data for customization of systems in a standardized way to be prepared for data harmonization, data integration and data exchange. The article illustrates that a collection of databases tables or data files with a common identical structure or format, the Gellish Expression Format, is suitable to express a wide range of kinds of facts about kinds of things as well as facts about individual things, and that it can replace conventional data models. Keywords: knowledge representation, formal language, data model, taxonomy, ontology, semantic web, knowledge base, data warehouse, interoperability, data exchange standard, information model Table of Content 1 Introduction ...................................................................................................................... 2 1.1 Standard data models, ontologies and reference data .............................................. 3 2 Issues in data modeling .................................................................................................... 4 3 The Gellish language and ontology ................................................................................. 4 4 Storage and exchange of data as well as semantics in Gellish ........................................ 7 5 Interpretation of expressions .......................................................................................... 10 6 The Gellish Knowledge Base ......................................................................................... 15 7 Experiences and applications ......................................................................................... 16 8 Conclusions .................................................................................................................... 17 9 References ...................................................................................................................... 17

Transcript of Gellish: an information representation language, knowledge base and ontology

Information Representation 1 02/06/2014

Gellish Formal English

An information representation language, knowledge base and ontology

by Ir. Andries van Renssen

Shell Global Solutions International [email protected]

2003

Abstract

Data storage and data exchange and interoperability lack a common standard widely

applicable data model as well as a common data language with a dictionary-taxonomy of

concepts and a grammar for data exchange messages. This article presents a solution to this

problem in the form of the new “open” industry standard Gellish Formal English language,

as a further development of the standard data model and ontology of two new ISO standards.

The article states that Gellish is a suitable language for neutral data exchange between

systems. The definition of Gellish includes an extensive Formal English Dictionary-

Taxonomy with definitions of a large number of concepts and relation types. This also

provides an ontology with standard reference data for customization of systems in a

standardized way to be prepared for data harmonization, data integration and data exchange.

The article illustrates that a collection of databases tables or data files with a common

identical structure or format, the Gellish Expression Format, is suitable to express a wide

range of kinds of facts about kinds of things as well as facts about individual things, and that

it can replace conventional data models.

Keywords: knowledge representation, formal language, data model, taxonomy, ontology, semantic web, knowledge base, data warehouse, interoperability, data exchange standard, information model

Table of Content

1 Introduction ...................................................................................................................... 2

1.1 Standard data models, ontologies and reference data .............................................. 3

2 Issues in data modeling .................................................................................................... 4

3 The Gellish language and ontology ................................................................................. 4

4 Storage and exchange of data as well as semantics in Gellish ........................................ 7

5 Interpretation of expressions .......................................................................................... 10

6 The Gellish Knowledge Base ......................................................................................... 15

7 Experiences and applications ......................................................................................... 16

8 Conclusions .................................................................................................................... 17

9 References ...................................................................................................................... 17

Information Representation 2 02/06/2014

1 Introduction

Conventionally, each software system stores its data using its own data model and

communicates with other systems usually using a dedicated interface data structure, which

means that it applies a dedicated interface data model. The large variety of data models

cause that data exchange between systems is costly because of the required conversion of

the data from the semantics of one data model to the other. This demonstrates the urgent

need for widely applicable common standard data models or a data representation

language.

Often systems can be ‘customized’ by adding ‘reference data’ as instances, such as the

definition of equipment types, document types, activity types, property types, etc. However,

reference data are usually different per implementation, even when database structures of

different systems are equal, such as is the case with several implementations of the same

system. This also holds for different implementations of the same system, such as a CAD,

CAE, PDM, PLM, ERP or CRM system. The consequence is that data in those

implementations can still not be compared, integrated or exchanged without costly data

conversion processes. This illustrates the urgent need for a common dictionary,

classification system, taxonomy or ontology of reference data and shared knowledge.

Unfortunately there is currently not such a standard user data language.

In the current systems there is a separation between the world of data models and the world

of instances. Data models are developed by IT specialists (data modelers) who document

them using either proprietary tools or using a standard data modeling language, such as

EXPRESS (ISO 10303-11) or UML, which languages are especially designed to define data

models. Once a data model is defined in such a language, the data model acts as another

language in which the reference data as well as the user data has to be expressed. The use of

two different languages, one for the model, one for the user data, illustrates the barrier

between the two worlds. It is as if the English language definition is expressed in Chinese.

On top of this comes that each programmer and each reference data producer is free to use

those data definition languages to define his own terminology!

The current situation is sketched by Smith and Welty (2001) as follows: “Out of the

apparent chaos, some coherence is beginning to emerge. Gradually, computer scientists are

beginning to recognize that the provision, once for all, of a common, robust reference

ontology – a shared taxonomy of entities – might provide significant advantages over the ad-

hoc, case-by-case methods previously used”.

From a business perspective, a “common language” would provide the opportunity for huge

cost savings, because of the advantages of the possibility to combine or integrate data from

different sources, to automate the verification of their consistency, and to exchange data

between parties (as in e-commerce), even if those parties apply different systems and

catalogues.

Several attempts are made to develop an ‘upper ontology’, such as SUMO by Niles and

Pease (2001), the IEEE Standard Upper Ontology, SUO (2001), the Cyc ontology, Lenat

(1995) and GOL, Degen et al (2001). However none of them integrates a generic data model

with reference data and a language for the description of knowledge and of individual

objects and processes.

This article presents a solution to the above-mentioned issues in the form of the Gellish

language. Gellish Formal English is defined in a smart dictionary, which has the form of a

taxonomy and knowledge base or ontology, and includes the concepts from standard generic

data models (it is a further development of ISO 15926-2 and ISO 10303-221) and concepts

Information Representation 3 02/06/2014

from various sources including other ISO standards, IEC standards, VDI standards, and

knowledge stemming from proprietary sources. As such it is an integration of an upper

ontology with a lower ontology. The total ontology includes also the definition of standard

fact types (or relation types) that defines the grammar of the Gellish language. Gellish

satisfies the criteria for proper ontologies as expressed by Degen et al (2001 par 6.1), but is

not limited to an upper ontology. Gellish is extendable just as any natural language. Its

taxonomy and knowledge base uses unique identifiers for concepts, thus allowing for

synonyms and multiple names in various languages. The latter enables the expression of

propositions about facts in one natural languages and their automatic translation and

presentation in any other natural language.

Gellish eliminates the traditional barrier between the data model definitions of classes and

the data instances. The Gellish language demonstrates that this barrier is not necessary and

that there are clear advantages when class definitions, reference data and user data are

expressed in one and the same language.

An extended version of the Gellish Formal English language is described in the book:

Semantic Modeling in Formal English [Ref. 15].

1.1 Standard data models, ontologies and reference data

There are several developments of standard lower level ontologies and reference data

libraries, stimulated among others by requirements of the e-commerce ‘market places’ and

the developments around The Semantic Web promoted by Lee et al (2000). For example, the

UNSPSC code, Ecl@ss, Trade Ranger, etc. These standards have their value mainly in the

standardization of terminology, but do not provide a standard language or a standard data

model for general use, because of their limited semantic expression power due to the fact

that they apply only a few relation types and lack of integration with a rich upper ontology.

There have also been several attempts to develop standard data models for data exchange or

for data storage. Some of them are proprietary, but others are in the public domain. Those

standard data models are defined independent of a particular system, and are therefore called

‘neutral’. Those standard data models are usually developed for a particular application

domain instead of being limited to a particular system.

Examples of standard data models are the STEP family of standards in ISO 10303, such as a

graphics data model AP203, a data model for the automotive industry (AP214), one for

piping systems (AP227), one under development for the defense industry (AP239, PLCS),

etc. The integration of all those data models into one overall data model is not yet fully

achieved. Although the scopes of these valuable standard data models are wide, they are still

limited to particular application area’s and do not provide a general ‘common language’ yet.

A standard data model with a generic scope is ISO 15926-2, which has a counterpart within

the STEP family (AP221). Although these two data models are stemming from the process

industries, their nature of being an upper ontology makes them applicable in other

application domains as well. To become practically applicable in a particular application

domain, these generic data models need a ‘reference data library’ or ‘ontology’ to add

definitions of application domain specific concepts to specialize the generic data model. The

Gellish Formal English Dictionary (earlier called STEPlib) provides such standard reference

data library or ontology. A part of that has been standardized as ISO 15926-4.

The Gellish language definition can be regarded as a very large generic data model that is

further extendable by adding subtypes to the existing concepts in the ontology hierarchy. It

can also be seen as a base knowledge library of product models and business processes that

can be extended with your own knowledge by expressing and exchanging them in the

Gellish language.

Information Representation 4 02/06/2014

2 Issues in data modeling

There is a language barrier between data models terminology and user data terminology.

This is strengthened by the strict separation between on one hand the concepts defined in a

data model and on the other hand the reference data (instances) to customize a system and

the operational data. This separation also implies that the semantic concepts of the data

model are not accessible by users of that data. Elimination of that barrier by integration of

the entity type and attribute type definitions with the customization data could reduce this

language barrier and would give users easier access to the exact interpretation rules for their

data.

Data models are fixed once databases are created and thus the semantics for the

interpretation of the user data is fixed. Any extension of this semantics requires a

redefinition of the structure of the database and a conversion of the data from the old to the

new database structure. A fixed data model is often seen as an advantage as it fixes the

‘rules of the game’. But the disadvantage is that it prevents an increase of knowledge and

semantics in the data model. Flexible data models would reduce costs of time consuming

modifications or extensions of a data model.

The scope of data models is limited. This puts constraints on the data storage capabilities of

systems, which is a problem in case of business changes. The normal solutions to this are

either to create very big data models or to create very generic data models. A very big data

model is difficult to understand, to manage and to apply. Generalization of data models

leads to smaller data models, but also to abstract entity types with semantics that is difficult

to grasp as is illustrated by the ISO 15926-2 standard data model. A common language with

a very wide scope as described in this paper might provide a third and better solution to this

problem.

Finally, each data model is different, so that exchange of data between different systems

means that the data shall be converted from one data structure to the other and vice versa.

This is caused by the fact that currently there is no systematic and standardized approach to

the reuse of earlier defined concepts. The promises of ‘object libraries’ did not yet provide a

general solution. A widely applicable data language is still needed to solve this problem.

The result of the current state of the art is that data storage is done in a Babylonian mix of

“languages”/data models with the consequence that exchange of data between systems is

impossible, except where dedicated bilateral translators are created between each pair of

languages/data models.

3 The Gellish language and ontology

Gellish is a public domain standard data and knowledge representation language and

ontology that does not have the above mentioned constraints and does not have the barrier

between the user data and the IT data model data. On the contrary, the ontology defines a

rich and extensible semantics in natural language terminology, expressed in Gellish itself.

This ontology is equivalent to a data model of over 20.000 entity types, attribute types and

relationship types.

Gellish is not object oriented, but fact oriented. The basic Gellish object is therefore a fact.

Each (atomic) fact is expressed as a relation between (two) objects.

For example, fact 1 is expressed by a particular relation between objects with unique

identifiers (UID’s) 100 and 101. This expression (1, 100, 101) illustrates the structure of

each basic Gellish expression. Gellish requires that both the objects and the fact must be

classified explicitly by standard classes, including standard relation types. The standard

Information Representation 5 02/06/2014

classes are predefined in the Gellish ontology. In addition to that, objects may have a name.

This enables that the expression can be interpreted correctly by software.

If a certain fact cannot be expressed in the current Gellish language, then new classes that

define the missing concepts can be added to the Gellish dictionary (the definition can always

be expressed using the existing Gellish language) and if necessary new fact types can also be

added to the Gellish grammar. This enables to express new kinds of facts about new kinds of

objects in Gellish. Note that those new definitions have to be exchanged with the party that

receives the message, to enable him to interpret the message correctly.

Gellish and the above mentioned ISO standards are both based on the understanding that

there appears to exist a limited set of application independent standard relation types that are

sufficient to model all kinds of products and processes, whereas each of those relation types

require well defined role types that can be played by particular object types.

A large part of that set is defined in the ISO standards and an extended set is defined in the

TOP part of the Gellish language definition.

A standard implementation of Gellish is defined in the Gellish Expression Format (GTF, see

below) in which names of the objects and the classification of the fact are combined in one

record (the classification of the objects is done via separate classification facts in additional

records). In this GTF format the basic Gellish expression becomes:

Left hand

object

UID

Left hand

object

name

Fact

UID

UID of

relation type

Name of relation

type

Right hand

object UID

Right hand

object name

100 thing-1 1 4658 is related to 101 thing-2

Note that the relation type also has a UID which is not shown in the above record nor in the records

below.

The semantic expression capabilities of Gellish are defined by the allowed relation types.

They define the kinds of relations that are possible to express facts. They also define the

roles that the related objects play towards each other.

Some examples of facts and standard Gellish relation types are:

Left hand

object UID

Left hand

object name

Fact

UID

UID of

relation

type

Name of relation

type

Right hand

object UID

Right hand

object name Scale

130091 diesel engine 2 1146 is a specialization of 130108 engine

104 M-1 3 1225 is classified as a 130091 diesel engine

130802 cylinder 4 1146 is a specialization of 730063 artifact

107 C-1 5 1225 is classified as a 130802 cylinder

107 C-1 6 1260 is a part of 104 M-1

107 C-1 7 1727 has as aspect 108 volume of C-1

108 volume of C-1 8 1225 is classified as a 550140 internal volume

108 volume of C-1 9 5025 has on scale a value

equal to 922235 1800 cm3

104 M-1 10 4760 is a subject of 110 order-1

The above table illustrates:

- The standard Gellish relation types, that classify the facts. The variation of standard

relation types determine the expression capabilities and semantics of Gellish.

- Examples of the large number of standard object types predefined in Gellish. For

example: engine, diesel engine, cylinder, artifact, internal volume, 1800 and cm3.

Information Representation 6 02/06/2014

- New concepts (objects types) can be added: such as fact 2 and 4. In this case they

already exist in the Gellish Formal English Dictionary, but if diesel engine and

cylinder would not have existed, they could have been added in this way.

- It is possible in Gellish to express facts, such as the volume of C-1, without the need

that such a fact is pre-modeled in a data model. Although such a fact type could be

defined in Gellish, after which this particular instance can be verified against such a

definition.

- One table is suitable to express many kinds of facts.

Note: The table above presents just an example of some of the capabilities of Gellish. For

example, Gellish also allows to express in which language the facts are expressed,

whether the objects are real or imaginary, what the communicative intent is, who the

author of a proposition is and the addressee, etc.

Gellish is not limited to specific application domains, although the current ontology (the

dictionary) does not yet cover the scope of a natural language. This wide applicability is

illustrated by the following example from a complete different domain:

Left

hand

object

UID

Left hand object

name

Fact

UID

Relation type

name

Right

hand

object

UID

Right hand

object name

Scale

111 Andries 11 is classified as a 990007 man

112 Rose-Mary 12 is classified as a 990013 woman

111 Andries 13 is married with 112 Rose-Mary

111 Andries 14 is born at 19460309 March 9, 1946

111 Andries 15 is author of 113 Gellish Handbook

113 Gellish Handbook 16 is classified as a 490193 manual

The flexibility of the semantics of Gellish is achieved by:

1. Defining an extendable specialization hierarchy of fact types (or relation types),

whereas each relation type is defined by two required role types, while for each role

type it is defined which kind of object can play such a role.

There are three hierarchies of fact types:

Fact types which express that members of a class can be related to members

of another class in a particular way. Facts of this type express knowledge

about classes.

For example: a pipe can have as aspect a diameter

Fact types which express that individual objects are related to other

individual objects. Facts of this type express information about individual

objects.

For example: John is performer of action#1

Fact types which express that individual objects are related to classes or can

be related to members of classes. This includes facts that express

classifications and facts that express that individual objects can have relations

with members of certain classes.

For example: action#1 is classified as maintenance

2. Expressing knowledge about classes (types of things) in addition to knowledge about

individual objects in the same data structure.

Information Representation 7 02/06/2014

3. Eliminating the difference in treatment between attribute types and instances, by

expressing the definition of the attribute types in the same way as the instances and

by expressing instantiations as explicit classification relations between individual

objects (instances) and the applicable classes that classify them (instances of class).

4. Eliminating the difference between entity types and attribute types by expressing the

relation between entities and their attributes as explicit individual facts, expressed as

individual relations between instances that are classified as ‘possession of aspect’

relations.

Figure 1 compares the essential concepts in the current methodologies with the concepts in

the Gellish language.

Figure 1, Comparison of Data modeling concepts with Gellish concepts

These principles provide the flexibility of the Gellish language and makes it an extensible

data model that can integrate the knowledge of many application domains, which knowledge

is not hidden in the data model, but is visible for the user as (meta) data that defines his

application data.

4 Storage and exchange of data as well as semantics in Gellish

In this paragraph I will describe how knowledge, data and semantics are represented in

Gellish.

I will use the example of the fact:

- a particular pump (‘P-1’) is pumping a particular stream (‘S-1’).

In a conventional database it is required to declare some entity types and attribute types that

define the semantics in the form of a data model. In case of the example, the data model

could for example consist of the entity types ‘pump’, ‘process’ and ‘stream’, whereas each

entity type possesses some attributes.

Current Data Modeling Concepts

1. Instantiation

- Implicit classification relations

with limited number of classes (entity types).

2. Entities have Attributes- Implicit relation types between

entity and attributes.

3. Subtyping of entities (not of attributes)

- Methodology does not require a consistent

subtyping strategy.

- Usually a limited use of inheritance.

4. Entity and attribute types

are not instances (fixed data model)

- Fixed knowledge model outside database.

Gellish Language Concepts

1. Explicit relations

- Explicit classification of individuals

with unlimited number of classes (subtypes).

- Explicit specialization hierarchy of classes.

2. Explicit Relation types between objects

- Explicit classification of relations

to standard relation types.

3. Specialization relations between classes

- Methodology requires that every class

has at least one supertype, which results in

a consistent specialization hierarchy.

- Full use of inheritance,

applicable for all objects, including also

properties, relations, occurrences, etc.

4. Entity types and attribute types

(classes) are instances- Flexible knowledge model

integrated with reference data and user data.

Information Representation 8 02/06/2014

In Gellish, the concepts ‘pump’, process’ and ‘stream’ are not defined as such entity types,

because concepts in Gellish do not imply ‘attributes’. Instead they are defined as concepts

that can be related to a flexible number of other concepts of any kind. The collection of

relations form a knowledge base. The definitions only contain the minimum number of

expressions of what is by definition the case, without specifying what can or shall be the

case. Such definitions have the general structure of a ‘basic semantic pattern’, which

comprises the fundamental ontological concepts of Gellish. That pattern is also applicable

for the definition of additional semantic concepts.

For the definition of a new concept it is required to define a coherent set of elementary facts,

expressed as relations between the new concept and the existing concepts. In other words,

each new concept requires the creation of a structure of expressions as presented in figure 2.

Figure 2, Basic semantic pattern

Figure 2 presents the minimum structure of ‘basic semantic concepts’ that are the axioms of

Gellish and which meaning should be understood.

The basic elementary facts in the figure are expressed in the Gellish Format Table below.

These facts form a template for other facts that are expressed in Gellish (the fact UID’s in

the table correspond with the numbers in the figure):

Left hand

object UID

Left hand

object name

Fact

UID

UID of

relation

type

Name of relation

type

Right hand

object UID

Right hand

object name

201 object-1 1A 5234 is player of 202 role-1

202 role-1 1B 1991 is played in 205 relation-1

203 object-2 2A 5234 is player of 204 role-2

204 role-2 2B 1991 is played in 205 relation-1

205 relation-1 3 1225 is classified as a 206 relation type-1

201 object-1 4 1225 is classified as a 207 object type-1

203 object-2 5 1225 is classified as a 208 object type-2

202 role-1 6 1225 is classified as a 209 role type-1

204 role-2 7 1225 is classified as a 210 role type-2

is a

kind of thing

is (a)

relationrole

(of something

in a relation)

anything playing

a rolerequirement

of role

is a

- object-1 - role-1- relation-1

- object-2 - role-2

plays

played by requires

in

1A 2A

34 5

2B1B

6 7

is a

specialization

of

is a

specialization

of

Information Representation 9 02/06/2014

The basic concepts are:

- anything - role - relation / relations - plays role - requires role - is / is a (is classified as a) - individual thing / individual things - kind of thing / kind of things - single thing / plural thing - specialization of class (is a specialization of)

The structure of figure 2 holds for facts about classes as well as facts about individual

objects (instances) or relations, but also for single objects as well as for plural objects. In

other words, object-1 and object-2 in figure 2 can be either a single or plural individual

object, relation or class. The lines in the top left corners of the boxes indicate that the

structure is a typical instance, because it defines instances of the concept ‘class’.

Any other ‘atomic fact’ is expressed as such a structure. In other words, any atomic fact is

expressed as an ‘atomic relation’ between two or more ‘objects’ and by the classification of

the ‘objects’, the ‘roles’ and the ‘relation’. This implies that an atomic fact is expressed by a

structure of nine (9) relations, formed by the blue boxes in figure 2 (note that 4 of the 5

boxes appear twice in an atomic fact).

For example the fact that impeller O1 is part of centrifugal pump O2 is expressed in Gellish

by the following 4 elementary relations:

- O1 plays role R1

- R1 is required by C1

- C1 requires role R2 (the inverse of ‘R2 is required by C1’)

- R2 is played by O2

These 4 relations relate 5 objects. To interpret them correctly the following 5 additional

classification relations are required:

- O1 is classified as an impeller

- R1 is classified as a part

- C1 is classified as a composition relation (“is part of”)

- R2 is classified as a whole

- O2 is classified as a centrifugal pump

In practical implementations it appears that the explicit identification of the roles and their

classification can be neglected, because they follow from the classification of the relation

and the definition of the relation type.

Therefore the above relations are usually summarized in 3 Gellish atomic expressions as

follows:

- O1 is classified as an impeller

- O1 is part of O2

- O2 is classified as a centrifugal pump

From this example it can be seen that the 5 kinds of things with which the 5 objects are

classified need to be present in or added to the semantics of the Gellish knowledge base in

order to ensure that the fact can be interpreted correctly.

The awareness that a dictionary of predefined concepts is required for a correct

interpretation of Gellish expressions resulted in the development of the top-down

Information Representation 10 02/06/2014

hierarchical definition of the Gellish Formal English Dictionary of concepts, including also

relation types.

Knowledge representation: relations between classes

Any fact type that extends the semantics is expressed as a relation between kinds of things.

For example, assume that the concept ‘centrifugal pump’ needs to be added. Then the

following two atomic relations define that concept:

1. A specialization relation that defines that:

centrifugal pump is a specialization of pump

2. A relation that defines that a centrifugal pump by definition uses the centrifugal

principle:

centrifugal pump has by definition as aspect centrifugal.

These relations build respectively on the definition of the concept ‘pump’ and ‘centrifugal’.

5 Interpretation of expressions

In current database technology the semantic interpretation of an expression is done via the

fact that any object is implicitly classified by being an ‘instance’ of an entity of which the

semantics are defined.

For example, assume that P1 is an instance of an attribute called ‘name’ of an entity called

‘pump’. This probably means that P1 is the name of a thing that is classified as a pump,

although this meaning comprises two facts that are usually not defined in a computer

interpretable way. It should be noted that if there are no other attributes, this data structure

does not allow the classification of P1 as a centrifugal pump.

In Gellish all semantics is made explicit by the creation of explicit classification relations

between the elements in the expression and the Gellish concepts (classes of objects,

including relations). This replaces the instantiation relations and eliminates the need to

define a data model with entities and attributes, such as the entity ‘pump’ and the attribute

‘name’. This is illustrated in figure 3.

Figure 3, Linking a Gellish expression to Gellish concepts through classification

Figure 3 illustrates the expression P-101 is pumping S-1” (in dark yellow). The ‘pumping S-

1’ process is an interaction between the fluid S-1 and the pump P-101. The pump has the

classifier

classified

classifier

classified

classifier

classified

Green shaded area = Gellish ontology (STEPlib)Green shaded area = Gellish ontology (STEPlib)

classifier

classified

‘S-1’‘P-101’

is classified as ais classified as a is classified as ais classified as ais classified as ais classified as a

classifier

classified

is classified as ais classified as a

‘is performer of pumping S-1’

‘pumping S-1’

is classified as ais classified as a

player requirer

requirerplayer

‘is subject in pumping S-1’

pumpingpump liquid streamis performer of is subject in

111

11

11312

112

13 15 14

730083 192512130206

Information Representation 11 02/06/2014

role as performer and the liquid has the role as subject in the pumping process. The blue

boxes in the green shaded area represent the Gellish concepts, being instances in the Gellish

Dictionary. The explicit classification relations with the concepts in those blue boxes

provide the semantics for the interpretation of the expression.

In Gellish Expression Format this becomes:

Left hand

object UID

Left hand

object name

Fact UID Relation type

name

Right hand

object UID

Right hand

object name

111 P-101 11 is performer of 112 pumping S-1

113 S-1 12 is subject in 112 pumping S-1

111 P-101 13 is classified as a 130206 pump

112 pumping S-1 14 is classified as a 192512 pumping

113 S-1 15 is classified as a 730083 liquid stream

Such a set of rows in a Gellish Expression Format can be exchanged between Gellish

enabled software packages in any kind of table, such as an MS-Access database table, an

Oracle or DB2 table, XLS spreadsheet, an XML file (e.g. according to ISO 10303-28) or in

STEP physical file format (ISO 10303-21). Further details are described in ref. 1.

Note that the shaded light yellow boxes all have the same name: “is classified as a”.

However, they are different individual classification relations. Each of those relations has a

unique identifier (13, 14 and 15). The name in the shaded box indicates that each is

(implicitly) “conceptualized” to be a classification relation. In other words, each of them is a

“is classified as a” relation.

For a correct interpretation of the Gellish concepts they need to be defined in a computer

interpretable way. This is done via specialization relations as is illustrated in figure 4.

Figure 4, Definition of Gellish concepts in a specialization hierarchy

In practice there are several intermediate levels of specialization between e.g. ‘pump’ and

‘physical object’, etc.

classifier

classified

subtype

supertypesupertype

subtype

classifier

classified

supertype

subtype

classifier

classified

subtype

supertype

classifier

classified

‘P-101’

physical object

is a specialization ofis a specialization of

is classified as ais classified as a is classified as ais classified as a

is a specialization ofis a specialization of

is classified as ais classified as a

classifier

classified

is classified as ais classified as a

‘is performer of pumping S-1’

‘pumping S-1’

is a specialization ofis a specialization of

is classified as ais classified as a

subtype

is a specialization ofis a specialization of

requirer

requirerplayer

player

is a specialization ofis a specialization of

‘is subject in pumping S-1’

activityrelation

‘S-1’

pumpingliquid stream is subject inis performerpump

Green area = Gellish ontologyGreen area = Gellish ontology

Information Representation 12 02/06/2014

The knowledge about the meaning of the concepts pump, ‘is performer of’, liquid stream, ‘is

subject in’ and pumping is defined in the Gellish Dictionary. Some of that is illustrated in

the following facts, which includes some intermediate facts not shown in figure 4 (the UID’s

and names are taken from the dictionary):

Left hand

object UID

Left hand

object name Fact UID Relation type name

Right hand

object UID

Right hand

object name

130206 pump 16 is a specialization of 730044 physical object

4761 is performer of 17 is a specialization of 4767 is involved in

4761 is performer of 18 requires as role-1 a 640020 performer

730044 physical object 19 can have as role as a 640020 performer

4761 is performer of 20 requires as role-2 a 4773 involver

730083 liquid stream 21 is a specialization of 730045 stream

4760 is subject in 22 is a specialization of 4767 is involved in

192512 pumping 23 is a specialization of 190168 process

This knowledge is inherited from concepts that are higher in the hierarchy to lower level

concepts. If an individual thing is classified to be of such a class, then the knowledge is

applicable to the individual object as a constraint for the specific aspects of the individual

object. This is illustrated in figure 5.

Figure 5, Concepts and knowledge in a specialization hierarchy

Figure 5 illustrates the use of inheritance in the knowledge base of Gellish. It also enables

for example search engines to perform intelligent searches on subtypes of keywords using

the specialization hierarchy.

The difference between Gellish and the ISO standard data models (ISO 10303-221 and ISO

15926-2) is that in the methodology of these ISO standards some of the higher level

concepts are selected to form entities in the standard data model. This means that there will

Information Representation 13 02/06/2014

be instantiation relations between the concept in the library and for example the AP221

entities as is illustrated in figure 6.

Figure 6, Relation of Gellish concepts to ISO 15926-2 data model entities.

However, in fact there is no need to use a data model such as AP221 or ISO 15926-2 at all,

except for the little data model of figure 2 with the ‘basic semantic axioms’ mentioned

above.

A common use of the little data model of figure 2, together with the common use of the

Gellish ontology makes it possible to express and interpret a very wide scope of types of

facts. This is possible because the explicit classification relations provide interpretation rules

for the expressions for which the relation types as well as the object types are defined in

Gellish. It is only required to have the concepts defined in the Gellish knowledge base and to

refer to them as in the basic structure using the ‘basic semantic axioms’ mentioned above.

Figure 6 illustrates the further definition of concepts up to the top concept called ‘anything’.

Because of this generic top any concept can be added to Gellish as a subtype of an existing

concept.

An implementation of Gellish could for example declare all classes in the hierarchy

(subtypes of ‘individual thing’) as instances of the basic semantic concept ‘kinds of things’

as is illustrated in figure 7.

classifier

classified

subtype

supertypesupertype

subtype

classifier

classified

subtype

classifier

classified

subtype

supertype

Green area = Gellish ontologyGreen area = Gellish ontology

classifier

classified

‘P-101’

is classified as ais classified as a is classified as ais classified as ais classified as ais classified as a

classifier

classified

is classified as ais classified as a

‘is performer of pumping S-1’

‘pumping S-1’

is classified as ais classified as a

subtype

instance

entiry

class of activity

is an instance ofis an instance of

instance

entity

class of relation

is an instance ofis an instance of

instance

entity

class of product

is an instance ofis an instance of

requirer

requirerplayer

player

is a specialization ofis a specialization of

is a specialization ofis a specialization of

is a specialization ofis a specialization of

is a specialization ofis a specialization of

is a specialization ofis a specialization of

‘is subject in pumping S-1’

activitysupertypephysical object relation

‘S-1’

pumpingpump liquid stream is subject inis performer of

Information Representation 14 02/06/2014

Figure 7, Instantiation in the ‘basic semantic axioms’.

Figure 7 contains eight facts expressed as eight “is a specialization of” relations, each of

which is a separate relation between classes. Similarly to what is described above about the

“is classified as a” relation, this illustrates that the term ‘is a specialization of’ is not the

name of each of those relations, but it is a name of the Gellish concept (the class) that is the

conceptualization of those relations.

So, we distinguish between the individual specialization relations and the specialization of

class concept that is called ‘is a specialization of’. Similarly we distinguish between the

individual classification relations for the classification of individual objects and the

‘classification concept’ that is called ‘is classified as a’. This more detailed definition of the

semantics of the relations is illustrated in figure 8.

classifier

classified

subtype

supertypesupertype

subtype

classifier

classified

subtype

classifier

classified

subtype

supertype

pumpingpump liquid stream

classifier

classified

‘S-1’‘P-101’

is classified as ais classified as a is classified as ais classified as ais classified as ais classified as a

classifier

classified

is classified as ais classified as a

‘performer of pumping S-1’

‘pumping S-1’

is classified as ais classified as a

subtype

instance

entity

kinds of things

subtype

is subject inis performer of

requirer

requirerplayer

player

is a specialization ofis a specialization of

is a specialization ofis a specialization of

is a specialization ofis a specialization of

is a specialization ofis a specialization of

is a specialization ofis a specialization of

supertypephysical object relation activity

‘subject in pumping S-1’

is a specialization ofis a specialization of

supertype

is a specialization ofis a specialization of

is a specialization ofis a specialization of

individual thingis an instance ofis an instance ofinstance

individual thingsGreen area = Gellish ontologyGreen area = Gellish ontology

anything

is a specialization ofis a specialization of

Information Representation 15 02/06/2014

Figure 8, Individual classification and generalization relations.

Figure 8 also illustrates that in Gellish a relation is defined as an expression of a fact. In the

Gellish Expression Format this is translated into the statement that every relation is in

instance of the entity Gellish_fact.

6 The Gellish Knowledge Base

The Gellish knowledge base contains relations between concepts of many kinds. For

example, many relations represent knowledge about the decomposition structure of members

of the classes of objects, others define properties of the members of the classes or roles that

can be played by the members, etc. All together the knowledge library is an integrated

network of relations in which we can distinguish product models, activity models, process

models, and various other models. Parts of those models can be seen as views or templates

which represent limited sets of data about types of objects.

The knowledge in the Gellish knowledge base belongs to the definition of the Gellish

language and is in the public domain. This Gellish provides knowledge to users and enables

them to add knowledge either in the public domain or as private extensions.

Software that implements Gellish shall ensure that all knowledge is inherited via the

specialization hierarchy and can be used via the classification relations in Gellish

expressions. An example of such knowledge is illustrated in figure 9. The knowledge that a

pump can be performer of a pumping activity is represented in the Gellish knowledge base

by a relation between the concepts ‘pump and ‘pumping’ (‘to pump’). That relation is

defined as a specialization relation of the ‘can be a performer of a’ relation.

Information Representation 16 02/06/2014

Figure 9, Modeling knowledge in Gellish

The knowledge as presented in the upper part of figure 8 is documented in the Gellish

Dictionary as follows:

Left hand

object UID

Left hand

object name

Fact

UID Relation type name

Right hand

object UID

Right hand

object name

130206 pump 24 can be performer of a 192512 pumping

24 …. 25 is a specialization of 4650 can be a

performer of a

730083 liquid stream 26 can be subject in a 192512 pumping

26 …. 27 is a specialization of 4649 can be a

subject in a

A lot of knowledge of this type is already included in this way in the Gellish Dictionary or

Knowledge Base. That knowledge base will be further extended with public domain

knowledge and can be extended with proprietary knowledge, thus extending the semantics

of the Gellish language for your own applications. In addition to the addition of proprietary

extensions we recommend to propose additions to the public domain Gellish definitions as it

will extent the common language between all users of Gellish.

7 Experiences and applications

A commercial application of Gellish is a Gellish Search Engine. That software can read (and

write) and verify information that is expressed in the Gellish language and is able to present

any knowledge about classes of objects and any data about individual objects. It might be

expected that implementation of Gellish would have serious performance issues. However

the Search Engine has an excellent performance even when loaded with hundreds of

thousands of facts. We also customized an implementation of a product lifecycle

management (PLM) system and loaded the same data in that system. That system also had

an excellent performance.

classifier

classified

classifier

classified

classifier

classified

classifier

classified

‘S-1’‘P-101’

is classified as ais classified as a is classified as ais classified as ais classified as ais classified as a

classifier

classified

is classified as ais classified as a

‘is performer of pumping S-1’

‘pumping S-1’

is classified as ais classified as a

player requirer

requirerplayer

‘is subject in pumping S-1’

pumpingpump liquid streamis performer of is subject in

can be a performer of a

is a specialization ofis a specialization of

can be a subject in a

is a specialization ofis a specialization of

‘can be performer of a pumping’

‘can be subject in a pumping’

Green area = Gellish ontologyGreen area = Gellish ontology

subtype

‘generalization of

involvement as performer‘generalization of

involvement as performer

supertype‘generalization of

involvement as subject’‘generalization of

involvement as subject’

relation instance

24

4650

26

4649

Information Representation 17 02/06/2014

8 Conclusions

The above illustrates that:

- It is possible that a knowledge base of concepts and relations between concepts can

replace data models.

- The Gellish knowledge base of concepts solution is more flexible than fixed data

models and it is easier to add semantics to the database.

- The Gellish knowledge base of concepts provides an application independent

language with a semantic basis that is equivalent to a very large data model. If

sufficient concepts of an application domain are present or added, then data models

for such an application domain becomes superfluous.

- The Gellish knowledge base, using the inheritance capabilities of the specialization

hierarchy, provides extendable product models for many types of objects.

- The implementations have proven that a Gellish knowledge base can be implemented

with good performance.

- The implementations have proven that neutral format data exchange using Gellish

and the Gellish Expression Format is a feasible solution.

Further work will explore the use of Gellish for the exchange of messages by intelligent

Agent software, acting as nodes in the Semantic Web.

9 References

1. Andries van Renssen, The Gellish Formal English Syntax - Definition of

Universal Semantic Databases and Data Exchanghe Messages, available via the

download area of http://www.gellish.net/.

2. Andries van Renssen, Creation and Use of Dictionaries and Taxonomies, a guide

to develop or extent a Gellish domain dictionary, available via

http://www.gellish.net/index.php/shop.html.

3. Andries van Renssen, Development of Facility and Product Models, available via

http://www.gellish.net/index.php/shop.html.

4. The Gellish Formal English Dictionary. This is a set of tabular files in Gellish

Expression Format (in Excel). The upper level ontology part is documented in

the TOPini part. Available via http://www.gellish.net/index.php/shop.html.

5. Tim Berners-Lee, James Hendler and Ora Lassila, 'The Semantic Web', Scientific

American, May 2001;

http://www.sciam.com/2001/0501issue/0501berners-lee.html.

6. Ian Niles and Adam Pease (2001), “Towards a Standard Upper Ontology”, in:

Formal Ontology in Information Systems, ISBN 1-58113-377-4.

7. SUO (2001), The IEEE Standard Upper Ontology website, http://suo.ieee.org.

8. Lenat, D. (1995), “Cyc: A Large-Scale Investment in Knowledge Infrastructure”,

Communications of the ACM, 38, no 11 (November 1995).

9. Wolfgang Degen, Barbara Heller, Heinrich Herre and Barry Smith (2001),

“GOL: A General Ontological Language”, in: Formal Ontology in Information

Systems, ISBN 1-58113-377-4.

Information Representation 18 02/06/2014

10. The Epistle Core Data Model (2001),

http://www.btinternet.com/~chris.angus/epistle/specifications/ecm/ecm_400.html

11. ISO 10303-221 and ISO 15926-2, http://www.tc184-sc4.org/.

12. UNSPSC, http://www.unspsc.org/.

13. Ecl@ss, http://www.eclass.de/.

14. Trade Ranger, http://www.trade-ranger.com/EN/Pages/ContentStandards.asp.

15. Andries van Renssen, Semantic Modeling in Formal English, Lulu 2014,

ISBN 9781304513595,

http://www.lulu.com/shop/dr-ir-andries-van-renssen/semantic-modeling-in-

formal-english/paperback/product-21538016.html.