On the regularity of circular splicing languages: a survey and new developments

24
On the regularity of circular splicing languages: a survey and new developments * Paola Bonizzoni , Clelia De Felice ** , Gabriele Fici ** , Rosalba Zizza ** Abstract Circular splicing has been introduced to model a specific recombinant behaviour of circu- lar DNA, continuing the investigation initiated with linear splicing. In this paper we focus on the relationship between regular circular languages and languages generated by finite circular splicing systems. We survey the known results towards a characterization of the intersection between these two classes and provide new contributions on the open problem of finding this characterization. First, we exhibit a non-regular circular language generated by a circular simple system thus disproving a known result in this area. Then we give new results related to a restrictive class of circular splicing systems, the marked systems. Precisely, we review in a graph theoretical setting the recently obtained characterization of marked systems gener- ating regular circular languages. In particular, we define a slight variant of marked systems, that is the g-marked systems, and we introduce the graph associated with a g-marked sys- tem. We show that a g-marked system generates a regular circular language if and only if its associated graph is a cograph. Furthermore, we prove that the class of g-marked systems generating regular circular languages is closed under a complement operation applied to sys- tems. We also prove that marked systems with self-splicing generate only regular circular languages. 1 Introduction The idea of using biological processes as models for computing is the basis of Adleman’s ex- periment [1]. Adleman showed how to solve the (NP-complete) Hamiltonian Path Problem by manipulating DNA sequences in a lab. This research represents part of the general trend towards the proposal of new (biological and quantum) models of computation. In response to the rapid streams of experimental research on these new computation paradigms, a series of papers on theoretical research on DNA computing has been attracting attention lately in computer science, originating from Head’s work on splicing systems [19]. Nowadays, developments in both directions have been achieved. On the one hand, other systems such as Watson-Crick Automata, Insertion and Deletion Systems, Sticker systems, P-systems have been presented [28, 29]. On the other hand, Adleman’s experiment has inspired the implementation * Partially supported by the MIUR Project “Mathematical aspects and emerging applications of automata and formal languages” (2007), by the ESF Project “Automata: from Mathematics to Applications (AutoMathA)” (2005-2010), by the 60% Project “Propriet` a strutturali e nuovi modelli di rappresentazione nella teoria dei lin- guaggi formali” (University of Salerno, 2007) and by the 60% Project “Estensioni della teoria dei linguaggi formali e loro propriet` a strutturali” (University of Salerno, 2008). Dipartimento di Informatica Sistemistica e Comunicazione, Universit` a degli Studi di Milano – Bicocca, Viale Sarca 336, 20126 Milano – ITALY. E-mail: [email protected] ** Dipartimento di Informatica ed Applicazioni, Universit` a di Salerno, 84084 Fisciano (SA) – ITALY. E-mail: {defelice,fici,zizza}@dia.unisa.it 1

Transcript of On the regularity of circular splicing languages: a survey and new developments

On the regularity of circular splicing languages:a survey and new developments ∗

Paola Bonizzoni †, Clelia De Felice∗∗, Gabriele Fici∗∗, Rosalba Zizza∗∗

Abstract

Circular splicing has been introduced to model a specific recombinant behaviour of circu-lar DNA, continuing the investigation initiated with linear splicing. In this paper we focus onthe relationship between regular circular languages and languages generated by finite circularsplicing systems. We survey the known results towards a characterization of the intersectionbetween these two classes and provide new contributions on the open problem of finding thischaracterization. First, we exhibit a non-regular circular language generated by a circularsimple system thus disproving a known result in this area. Then we give new results relatedto a restrictive class of circular splicing systems, the marked systems. Precisely, we review ina graph theoretical setting the recently obtained characterization of marked systems gener-ating regular circular languages. In particular, we define a slight variant of marked systems,that is the g-marked systems, and we introduce the graph associated with a g-marked sys-tem. We show that a g-marked system generates a regular circular language if and only ifits associated graph is a cograph. Furthermore, we prove that the class of g-marked systemsgenerating regular circular languages is closed under a complement operation applied to sys-tems. We also prove that marked systems with self-splicing generate only regular circularlanguages.

1 Introduction

The idea of using biological processes as models for computing is the basis of Adleman’s ex-periment [1]. Adleman showed how to solve the (NP-complete) Hamiltonian Path Problem bymanipulating DNA sequences in a lab. This research represents part of the general trend towardsthe proposal of new (biological and quantum) models of computation.

In response to the rapid streams of experimental research on these new computation paradigms,a series of papers on theoretical research on DNA computing has been attracting attentionlately in computer science, originating from Head’s work on splicing systems [19]. Nowadays,developments in both directions have been achieved. On the one hand, other systems such asWatson-Crick Automata, Insertion and Deletion Systems, Sticker systems, P-systems have beenpresented [28, 29]. On the other hand, Adleman’s experiment has inspired the implementation

∗Partially supported by the MIUR Project “Mathematical aspects and emerging applications of automata andformal languages” (2007), by the ESF Project “Automata: from Mathematics to Applications (AutoMathA)”(2005-2010), by the 60% Project “Proprieta strutturali e nuovi modelli di rappresentazione nella teoria dei lin-guaggi formali” (University of Salerno, 2007) and by the 60% Project “Estensioni della teoria dei linguaggi formalie loro proprieta strutturali” (University of Salerno, 2008).

†Dipartimento di Informatica Sistemistica e Comunicazione, Universita degli Studi di Milano – Bicocca,Viale Sarca 336, 20126 Milano – ITALY. E-mail: [email protected]

∗∗ Dipartimento di Informatica ed Applicazioni, Universita di Salerno, 84084 Fisciano (SA) – ITALY. E-mail:{defelice,fici,zizza}@dia.unisa.it

1

of new experiments in the lab, thus solving other NP-complete problems [21, 23, 29]. All theseexperiments are based on the use of many specially constructed linear double-stranded DNAmolecules, whereas in [21] the author developed a systematic approach applicable to a widevariety of algorithmic problems based on the use of one single, circular double-stranded DNAmolecule.

In this paper, we focus on splicing systems, generative devices of formal languages whichare inspired by cut and paste phenomena on DNA molecules under the action of restriction andligase enzymes. Indeed, a DNA strand can be viewed as a string over a four letter alphabet(the four deoxyribonucleotides Adenine, Guanine, Cytosine and Thymine), so the natural wayto model DNA computation is within the framework of formal language theory. In [20] Headreviewed the original definition of splicing systems and introduced circular splicing systems alongwith various open problems related to their computational power.

In the circular context, the splicing operation acts on two circular DNA molecules by meansof a pair of restriction enzymes. Each of these two enzymes is able to recognize a pattern insideone of these two circular DNA molecules and to cut the molecule in the middle of such a pattern.Two linear molecules are produced and then they are pasted together by the action of ligaseenzymes. Thus a new circular DNA sequence is generated [19, 21, 22, 29]. For instance, circularsplicing models the integration of a plasmid into the DNA of a host bacteria. As in the caseof the linear splicing operation, a language-theoretic operation can be defined and, dependingon whether or not these ligase enzymes substitute the recognized pattern, we have Pixton’sdefinition or Head’s and Paun’s definition in the circular context.

Obviously a string of circular DNA can be represented by a circular word, i.e., by an equiva-lence class with respect to the conjugacy relation ∼, defined by xy ∼ yx, for x, y ∈ A∗ [26]. It isworthy of note that this notion plays an important role in combinatorics on words as well as insome language factorization problems (e.g. the problem of finding factorizations of free monoids,see [2, 26]). A circular language is a set of circular words and we can also give a definition ofregular circular languages.

In short, a circular splicing system is a triple (A, I, R) where A is a finite alphabet, I isthe initial circular language and R is the set of the splicing rules. Splicing rules are iterativelyapplied starting from I. Here we assume that each circular word in I, or obtained by iteratedsplicing, is present in an unbounded number of copies. The circular language generated by acircular splicing system (A, I,R) is the smallest language which contains I and is invariant underiterated splicing by rules in R. While there have been many articles on linear splicing, relativelyfew works on circular splicing systems have been published.

Some still unanswered questions are related to the computational power of circular splicingsystems. It should be noted that in this context, at least three aspects need to be considered.Indeed, the computational power of circular splicing systems depends on (a) whether additionalconditions are taken into account (i.e., reflexive and symmetric sets of rules, self-splicing, seeSection 3), (b) which of the three definitions (Head’s, Paun’s or Pixton’s definition) is considered,(c) the level in the (circular) Chomsky hierarchy the initial set I and the set R of rules belong to.In particular, the problem of characterizing the circular languages generated by circular splicingsystems remains open. This problem has been solved for Paun (resp. Pixton) circular splicingsystem S = (A, I, R) with a finite (resp. regular) reflexive and symmetric set R of rules, initialset I in special families of languages (the class of regular circular languages among them, seeSection 3.2) and where self-splicing is used [30, 31].

In this paper we focus our interest on the following restricted version of the above problem.We recall that a circular splicing system S = (A, I,R) is said to be finite if I and R are bothfinite sets.

2

Problem 1.1 [22] Find a characterization of the regular circular languages generated by finitecircular splicing systems.

We mainly deal with finite Paun systems, with no additional assumptions, and with thecorresponding class of generated languages, denoted by C(Fin, F in). It is known that in contrastwith the case of the linear splicing, C(Fin, F in) is incomparable with the class of regular circularlanguages. More generally, the relationship between C(Fin, F in) and the classes of languagesin the (circular) Chomsky hierarchy is still not completely understood. Indeed, C(Fin, F in)contains context-free circular languages which are not regular (see [35]), context-sensitive circularlanguages which are not context-free (see [17]) and there exist regular circular languages whichare not in C(Fin, F in) (see [6]). Furthermore, it has been claimed that C(Fin, F in) is containedin the class of context-sensitive circular languages (see [17]). However, the structure of regularcircular languages in C(Fin, F in) has not been completely described.

We survey the known results towards a solution of Problem 1.1 which was approached directlyin [4, 5, 6] for the first time. We also prove new results regarding this problem in Sections 4.5,4.6, 4.7. In detail, results concerning some special families of regular circular languages inC(Fin, F in) have been given in [4, 5, 6] and they are recalled in Section 4.1. Furthermore,Problem 1.1 has been solved for languages over a one-letter alphabet in [6, 7]. Section 4.2 dealswith this case. Subsequently, we take into account the special class of Paun circular semi-simple splicing systems (CSSH systems, for short) and its subclass of circular simple systems(Section 4.3). Both the classes have already been considered in [10, 11, 17, 35]. Concerningthe computational power of CSSH systems, in [17] the author claimed that the class of circularlanguages generated by CSSH systems is contained in the class of context-free circular languages.As a new result, we will show that there are non-regular languages generated by circular simplesystems thus disproving a known result in this area. Once again, a characterization of theregular circular languages generated by CSSH systems is still lacking. This characterization wasobtained very recently for a subclass of CSSH systems, the marked systems (Section 4.4). Indeed,in [14, 15], the authors proved that a marked system generates a regular circular language ifand only if its set of rules satisfies a special (decidable) property. In the same papers [14, 15],the authors proved that we can decide whether a regular circular language is generated by amarked system and they also characterize the structure of these regular circular languages. Wereview these results in a graph theoretical setting and new results are derived. Precisely, wedefine a slight variant of marked systems, that is the g-marked systems, and we introduce thegraph associated with a g-marked system. We show that a g-marked system generates a regularcircular language if and only if its associated graph is a cograph. Furthermore, we prove that theclass of g-marked systems generating regular circular languages is closed under a complementoperation applied to systems (Section 4.5). In Section 4.7 we also prove that marked systemswith self-splicing generate only regular circular languages, a result that was reported withoutproof in [14]. Finally, we underline the difference between marked and CSSH systems in Section4.6.

The paper is organized as follows. Section 2 contains some basics on words, circular wordsand circular languages. We give a brief exposition of circular splicing in Section 3. Section 4 isdevoted to finite circular splicing systems. Finally, in Section 5 we discuss some open problemsthat follow on from these results.

3

2 Basics

2.1 Words

We denote by A∗ the free monoid over a finite alphabet A and we set A+ = A∗ \1, where 1 is theempty word. For a word w ∈ A∗, |w| is the length of w and for every a ∈ A, w ∈ A∗, we denoteby |w|a the number of occurrences of a in w. We also set alph(w) = {a ∈ A | |w|a > 0} and|w|A′ =

∑a∈A′ |w|a, where A′ ⊆ A. The reversal wRev of w is defined by the relations 1Rev = 1

and, for all x ∈ A∗, a ∈ A, (xa)Rev = axRev. For a subset X of A∗, XRev = {wRev | w ∈ X} isthe reversal of X and |X| is the cardinality of X. We denote by Fin (resp. Reg) the class offinite (resp. regular) languages over A, at times represented by means of regular expressions.

2.2 Circular words and languages

Circular splicing deals with circular strings, a notion which has been intensively examined informal language theory [2, 3, 12, 26, 33]. For a given word w ∈ A∗, a circular word ∼w is theequivalence class of w with respect to the conjugacy relation ∼ defined by xy ∼ yx, for x, y ∈ A∗

[26]. The notations |∼w|, |∼w|a, alph(∼w), |∼w|A′ will be defined as |w|, |w|a, alph(w), |w|A′

for any representative w of ∼w. Analogously, we define the reversal ∼wRev of the circular word∼w by ∼wRev = ∼(wRev). Notice that ∼wRev does not depend on which representative in ∼w wechoose to define it by. When the context does not make it ambiguous, we will use the notationw for a circular word ∼w.

Let ∼A∗ denote the set of all circular words over A, i.e., the quotient of A∗ with respect to ∼.Given L ⊆ A∗, ∼L = {∼w | w ∈ L} is the circularization of L, i.e., the set of all circular wordscorresponding to elements of L. A subset C of ∼A∗ is named a circular language and everysubset L of A∗ such that ∼L = C is called a linearization of C. In particular, a linearization of acircular word ∼w is a linearization of {∼w}, whereas the full linearization Lin(C) of C is the setof all the words in A∗ corresponding to the elements of C, i.e., Lin(C) = {w ∈ A∗ | ∼w ∈ C}.Notice that, given L ⊆ A∗, the notation ∼L∗ is unambiguous (and means ∼(L∗)). The sameholds for ∼L+. For simplicity of notation, we will use the same letter to designate a set of words(resp. circular words) of length one and its circularization (resp. full linearization). We willalso often write ∼w instead of {∼w} when no confusion arises. Given a family of languages FAin the Chomsky hierarchy, FA∼ is the set of all those circular languages C which have somelinearization in FA. In this paper we deal mainly with circular languages having a regularlinearization, i.e., with Reg∼ = {C ⊆ ∼A∗ | ∃L ∈ Reg : ∼L = C}. If C ∈ Reg∼ then C is aregular circular language. In particular, we notice that the reversal CRev = {∼wRev | ∼w ∈ C} ofa regular circular language C is still a regular circular language. It is classically known that givena regular language L ⊆ A∗, Lin(∼L) is regular (see Exercise 4.2.11 in [25]). As a result, givena circular language C, we have C ∈ Reg∼ if and only if its full linearization Lin(C) is regular[22]. More generally, we recall that a family FA of languages is closed under cyclic closure ifLin(∼L) ∈ FA, for each L ∈ FA. For instance, context-free languages are closed under cyclicclosure [25]. If FA is closed under cyclic closure, then C ∈ FA∼ if and only if its full linearizationLin(C) is in FA [22]. As a consequence, we can define context-free (resp. context-sensitive,recursive, recursively enumerable) circular languages and, once again, a circular language C iscontext-free if and only if Lin(C) is context-free.

4

3 Circular splicing

3.1 Models

As in the case of the linear splicing operation, Head, Paun and Pixton have each defined thecircular splicing operation differently. In this paper, we will mainly take into account a restrictedversion of Paun’s definition reported below.

Paun’s definition [22, 32]. A Paun circular splicing system is a triple S = (A, I,R),where A is a finite alphabet, I is the initial circular language, with I ⊆ ∼A∗ and R is the set ofthe rules, with R ⊆ A∗#A∗$A∗#A∗ and #, $ 6∈ A. Then, given a rule r = u1#u2$u3#u4 andcircular words ∼w′, ∼w′′, ∼w, we set (∼w′,∼w′′)`r

∼w if there are linearizations w′ of ∼w′, w′′ of∼w′′, w of ∼w such that w′ = u2xu1, w′′ = u4yu3 and w = u2xu1u4yu3. If (∼w′,∼w′′)`r

∼w wesay that ∼w is generated (or spliced) starting with ∼w′, ∼w′′ and by using a rule r. We also saythat u1u2, u3u4 are sites of splicing and we will use SITES(R) to denote the set of sites of therules in R.

From now on, “splicing system” will be synonymous with “circular Paun splicing system”.Furthermore, a finite splicing system S = (A, I,R) is a circular splicing system with both I andR finite sets. We will now give the definition of circular splicing languages. Given a splicingsystem S and a circular language C ⊆ ∼A∗, we set σ′(C) = {w ∈ ∼A∗ | ∃w′, w′′ ∈ C,∃r ∈R : (w′, w′′)`r w}. Then, we define σ0(C) = C, σi+1(C) = σi(C) ∪ σ′(σi(C)), i ≥ 0, andσ∗(C) =

⋃i≥0 σi(C).

Definition 3.1 (Circular splicing language) Given a splicing system S, with initial lan-guage I ⊆ ∼A∗, the circular language L(S) = σ∗(I) is the language generated by S. A circularlanguage C is Paun generated (or C is a circular splicing language) if a splicing system S existssuch that C = L(S).

As observed in [22], we may assume that the set R of the rules in a splicing systemS = (A, I, R) satisfies additional conditions, having also a biological counterpart. Namely,we may assume that R is reflexive (i.e., for each u1#u2$u3#u4 ∈ R, we have u1#u2$u1#u2,u3#u4$u3#u4 ∈ R) or R is symmetric (i.e., for each u1#u2$u3#u4 ∈ R, we have u3#u4$u1#u2 ∈R). We do not assume that R is reflexive. On the contrary, we notice that, in view of thedefinition of circular splicing, if (w′, w′′)`r w, with r = u1#u2$u3#u4, then (w′′, w′)`r′ w,with r′ = u3#u4$u1#u2. Consequently, L(S) = L(S′), where S′ = (A, I, R′) and R′ =R ∪ {u3#u4$u1#u2 | u1#u2$u3#u4 ∈ R}. Hence, in order to find a characterization of thecircular splicing languages, there is no loss of generality in assuming that R is symmetric. Thus,in what follows, we assume that R is symmetric (and we do not consider this assumption asan additional condition). However, for simplicity, in the examples of Paun systems, only one ofeither u1#u2$u3#u4 or u3#u4$u1#u2 will be reported in the set of rules.

In the sequel C(Fin, F in) denotes the class of the circular languages generated by finitePaun splicing systems. The example below follows from a collection of results proved in [6].

Example 3.1 [6] Let C = ∼(aa)∗b. Then, C ∈ Reg∼ and we can prove that C cannot begenerated by a finite Paun circular splicing system. On the contrary, the circularization of theregular language L1 = {w ∈ {a, b}∗ | ∃h, k ∈ N : |w|a = 2k, |w|b = 2h} is generated by a finitePaun circular splicing system.

We recall that the original definition of circular splicing was proposed by Head [20]. Hedefined circular splicing as an operation on two circular words ∼ypxq, ∼zuxv ∈ ∼A∗ performed

5

by two triples (p, x, q),(u, x, v) and producing ∼ypxvzuxq. The word x is called a crossing of thetriple. A Head circular splicing system S = (A, I, T, P ) is defined by giving a finite alphabet A,the initial set I ⊆ ∼A∗, the set T of triples, T ⊆ A∗×A∗×A∗, and where P is a binary relationon T such that, for each (p, x, q), (u, y, v) ∈ T , (p, x, q)P (u, y, v) if and only if x = y. Anotherdefinition of circular splicing has been given by Pixton [31]. In his scheme circular splicing isperformed on two circular words ∼w′ = ∼αε, ∼w′′ = ∼α′ε′, by using r = (α, α′;β), r = (α′, α;β′)and producing ∼w = ∼εβε′β′. A Pixton circular splicing system S = (A, I,R) is defined by givinga finite alphabet A, the initial set I ⊆ ∼A∗ and the set R of rules, R ⊆ A∗×A∗×A∗. Obviously,the counterpart of the notion of a reflexive (resp. symmetric) set of rules can be defined for(and added to) Head and Pixton systems as the notions of the corresponding generated circularlanguages.

We also recall that in the original definition of circular splicing given by Paun, rules in Rcould be used in two different ways [22]. One way has been described above while we recall theother, known as self-splicing, below along with the notion of circular languages generated byPaun systems (with self-splicing). It goes without saying that there are analogous definitionsfor Head or Pixton systems. Furthermore, self-splicing has also a biological counterpart [22].Self-splicing introduces a different semantics of how rules are used. While in the case of thecircular splicing operation, two words are pasted together to form a new circular word, in thecase of the self-splicing operation, a single circular word gives rise to two circular words. Theprecise definition is given below.

Self-splicing. Let S = (A, I, R) be a splicing system. Then, given a rule u1#u2$u3#u4

and a circular word ∼w, we set ∼w `r (∼w′,∼w′′) if there are linearizations w of ∼w, w′ of ∼w′,w′′ of ∼w′′ such that w = xu1u2yu3u4, w′ = u4xu1 and w′′ = u2yu3. If ∼w `r (∼w′,∼w′′) we saythat (∼w′,∼w′′) is generated starting with ∼w and by using self-splicing with a rule r.

Definition 3.2 Let S = (A, I,R) be a splicing system, let C ⊆ ∼A∗. We set τ ′(C) = {w ∈∼A∗ | ∃w′, w′′ ∈ C,∃r ∈ R : (w′, w′′)`r w} ∪{w′, w′′ ∈ ∼A∗ | ∃w ∈ C,∃r ∈ R : w `r(w′, w′′)}.Thus, we define τ0(C) = C, τ i+1(C) = τ i(C)∪τ ′(τ i(C)), for i ≥ 0 and then τ∗(C) =

⋃i≥0 τ i(C).

We say that τ∗(I) is the language generated by S (with self-splicing).

3.2 The computational power of circular splicing

The computational power of circular splicing systems depends on (a) whether R is reflexive, (b)self-splicing is taken into account, (c) which of the three definitions (Head’s, Paun’s or Pixton’sdefinition) is considered, (d) the level in the (circular) Chomsky hierarchy the initial set I andthe set R of the rules belong to.

The problem of comparing the computational power of the three definitions of circular splic-ing was tackled in [6], where the authors proved that computational power increases when wesubstitute Head systems with Paun systems. Indeed, they proved that there is a canonicaltransformation of a Head system S′ = (A′, I ′, T, P ) into a Paun system S = (A, I, R) suchthat S and S′ generate the same language. This transformation is defined by A′ = A, I ′ = I,R = {px#q$ux#v | (p, x, q), (u, x, v) ∈ T} (Proposition 3.1 in [6]). A still open question iswhether, for each Paun system S, there exists a transformation of S into a Pixton system S′

such that S and S′ generate the same language. However, this is the case for Paun systemssatisfying a special property (Remark 3.1 in [6]), CSSH systems among them (see Section 4.3for the definition of CSSH systems).

The main result concerning the computational power of splicing systems is recalled below.

6

Theorem 3.1 [29, 30, 31, 32] Let FA be a full AFL which is closed under cyclic closure. LetS = (A, I, R) be a Paun (resp. Pixton) circular splicing system such that I ∈ FA∼, R is a finite(resp. regular) reflexive and symmetric set of rules. Assume that both splicing and self-splicingoperations are allowed. Then the language generated by S is in FA∼.

In [31], the author shows that the above result is not true without the reflexivity assumption.Furthermore, without the self-splicing operation we can have non-regularity even if we assumereflexivity. In the same paper, the author also gives a definition of a circular language acceptedby an automaton and the construction of a special automaton accepting the language generatedby S, where S is a Pixton system as in Theorem 3.1. A simpler proof of Theorem 3.1 for Pixtonsystems is provided in [30]. Furthermore, Theorem 3.1 for Paun systems has been extended tothe so-called mixed splicing, another splicing variation which arises when both linear and circularwords are taken into account [20, 22, 30, 31, 35]. Theorem 3.1 for Pixton systems has also beenextended to mixed splicing when FA = Reg. Other models are described in [22, 34, 36], someof them have been proved to have the same computational power as Turing machines.

In this paper we deal with finite Paun systems without the reflexivity assumption and wherethe self-splicing operation is not allowed, and with the corresponding class C(Fin, F in) of gen-erated circular languages. We will be concerned with the self-splicing operation in Section 4.7only.

We have already said that in contrast with the case of the linear splicing, C(Fin, F in)is incomparable with the class of regular circular languages. More generally, the relationshipbetween C(Fin, F in) and the classes of languages in the (circular) Chomsky hierarchy is stillnot completely understood. For example, in [35] the authors show that ∼{(ca)n(cb)n | n ≥ 1}is generated by a finite Head circular splicing system. A slight modification of their argu-ments shows the same result for the context-free circular language ∼{anbn | n ≥ 1}. Therefore∼{anbn | n ≥ 1} ∈ C(Fin, F in). Notice that Lin(∼{anbn | n ≥ 1}) ∩ a∗b∗ is a non-regularlanguage, so Lin(∼{anbn | n ≥ 1}) is also non-regular and, consequently, ∼{anbn | n ≥ 1} isnot a regular circular language. On the other hand, Example 3.1 shows that there exist regularcircular languages that are generated by finite Paun splicing systems and also regular circularlanguages for which this is no longer true. Finally, in [17], the author claimed that C(Fin, F in)is contained in the class of context-sensitive circular languages. She also gave an example of acontext-sensitive circular language which is not context-free and which belongs to C(Fin, F in).

4 Finite circular splicing systems

In this section we will focus on Problem 1.1 and we will look more closely at the partial resultson the search for a characterization of the languages generated by finite circular splicing systems.The aim of this overview is not only to describe how state of the art this question is but also togive new contributions on this problem. In detail, in Proposition 4.2 we exhibit a non-regularcircular language generated by a circular simple system thus disproving a known result in thisarea. We also give a different proof of a result given in [10] (Proposition 4.4). In Section 4.5we give a new presentation of the marked systems, a known class of circular splicing systems.We define a slight variant of marked systems, that is the g-marked systems, and we introducethe graph associated with a g-marked system. We show that a g-marked system generatesa regular circular language if and only if its associated graph is a cograph. Furthermore, weprove that the class of g-marked systems generating regular circular languages is closed undera complement operation applied to systems. Other observations and new results are proved inSections 4.6, 4.7. In particular, we prove that marked systems with self-splicing generate only

7

regular circular languages, a result that was reported without proof in [14]. Let us outline theorganization of this section. We begin by taking into account finite circular splicing systems withno additional assumptions in Section 4.1. Section 4.2 is devoted to the simplest case, namelyfinite splicing systems generating languages on a one-letter alphabet. Subsequently, we consideralphabets A of larger size but we take into account CSSH systems, which are systems suchthat SITES(R) ⊆ A (Section 4.3). Then we discuss marked systems, which are CSSH systemssatisfying additional conditions (Sections 4.4, 4.5). We also deal with two different extensionsof marked systems which yield opposite situations. The first is discussed in Section 4.6 andunderlines the difference between marked and CSSH systems. The second extension is handledin Section 4.7, where marked systems with self-splicing will be discussed.

4.1 Conjugacy relation and regular circular splicing languages

The idea of considering regular languages L closed under the conjugacy relation and the actionof circular splicing over their circularization ∼L goes back to [4, 6]. This is a natural researchdirection since a circular language is regular if and only if its full linearization is regular. Un-fortunately, the structure of regular languages which are closed under the conjugacy relationis unknown, even for the simple case of languages which are the Kleene closure of a regularlanguage. Nevertheless, this structure has been completely described in [3, 33] when L is a freemonoid. Precisely, a free monoid L = X∗ generated by a (regular) code X is closed under theconjugacy relation if and only if X is a group code. We recall that X is a group code if a groupG and a surjective morphism φ : A∗ → G exist such that X∗ = φ−1(H), where H is a subgroupof G [2]. For instance, uniform codes Ad, d ≥ 1 are regular group codes. If X is a regular groupcode then ∼L = ∼X∗ is in C(Fin, F in) and this a consequence of a result proved in [6] andrecalled below.

Another natural research direction is to investigate whether the property of being a splicinglanguage can be translated into a property of a finite automaton recognizing its full linearization.Let A be a finite state automaton recognizing L. Loosely speaking, the crux of the matter instating whether ∼L is Paun generated is to exhibit a finite set of rules which are able to producewords having as factors a non-bounded number of occurrences of labels c of closed paths in A.In [4, 6], the authors overcome this difficulty for a special class of regular languages. In detail, astar language is a language L ⊆ A∗ which is closed under the conjugacy relation and such thatL = X∗, where X is a regular language. Moreover, a language L = L(A) is cycle closed withrespect to A if for each simple cycle c in A, we have c ∈ L. We recall that a word c = a1 · · · an,with a1, . . . , an ∈ A, n ≥ 1, is a simple cycle in A (with respect to the transition δ(q, c) = q) ifδ(q, c) = q and the states δ(q, a1 · · · ai), i ∈ {1, . . . , n}, are different from one another.

Theorem 4.1 [4, 6] Let X∗ be a star language. If X is finite or X∗ is cycle closed then∼X∗ ∈ Reg∼ ∩ C(Fin, F in).

For instance, if X is a group code then X∗ is a cycle closed star language [6].However, there exist regular languages which are not the Kleene closure of a regular language

and such that the corresponding circular languages are Paun generated. Examples of theselanguages are cyclic and weak cyclic languages [6]. In the same paper [6], the authors alsoconsidered Problem 1.1 in connection with a classical question in formal language theory, namelythe investigation of closure properties of splicing languages ∼L with respect to (the regular)operations.

8

4.2 The case of a one-letter alphabet

The aim of this section is to discuss the simplest case that we can investigate when consideringProblem 1.1, i.e., one-letter alphabets. Notice that each language L ⊆ a∗ is closed under theconjugacy relation and we can identify each word (resp. language) with its circularization.However, even in this simple case, linear and circular splicing operations are not comparable.Indeed, as noticed in [5], the regular language (aa)∗ is not generated by a finite linear splicingsystem, but the corresponding regular circular language ∼(aa)∗ is in C(Fin, F in), in view ofProposition 4.1. The same observation was reported in [35] for Head systems.

Problem 1.1 has been solved for one-letter alphabets and the structure of the unary languagesin C(Fin, F in) has been thoroughly described (Proposition 4.1). In particular, in contrastwith the case of alphabets of larger size, we have C(Fin, F in) ∩ Reg∼ = C(Fin, F in). In thefollowing proposition, Zn denotes the cyclic group of order n ∈ N and, for D ⊆ N , we setaD = {ad | d ∈ D}.

Proposition 4.1 [6] A subset L = ∼L of a∗ is Paun generated if and only if either L is a finiteset or there exist a finite subset L1 of a∗, positive integers p, r, n, with n = pr ≥ 2 and a subgroupG′ = {pk | k ∈ N , 0 ≤ k ≤ r − 1} of Zn such that L = L1 ∪ (aG)+, where G = G′(mod n) andmax{l | al ∈ L1} < n = min{m | m ∈ G}.

For example, for the regular language L = {a3, a4} ∪ {a6, a14, a16}+, we have L = L(S),where S = ({a}, I, R), I = {a3, a4, a6, a14, a16} and R = {a6#1$1#a6}. Here L1 = {a3, a4},n = 6 and G′ = {0, 2, 4}. We recall that for each Paun generated language L ⊆ a∗, there existsa (minimal) splicing system ({a}, I, R) generating L with either R = ∅ or R = {r} [6]. We alsorecall that Proposition 4.1 has been extended to the larger class of the uniform languages, i.e.,the circularizations of languages with the form AJ =

⋃j∈J Aj , with J ⊆ N [7].

The following decision problem is suggested by Proposition 4.1. Given L, with L ⊆ a∗,is it decidable whether L is a Paun generated language? If no assumption is made over L,the answer to this question is no, in view of the Rice theorem [25]. On the contrary, in [7],the authors gave a positive answer to this question when L is supposed to be a regular lan-guage. This result is obtained by a characterization of the minimal finite state automatonA= ({a}, Q, δ, q0, F ) recognizing L. Classical results show that there exists ` ≥ 0 and p′ > 0such that Q = {q0, q1, . . . , q`−1}∪{q`, . . . , q`+p′−1}, δ(qi, a) = qi+1, for 0 ≤ i ≤ max{0, `+p′−2},and δ(q`+p′−1, a) = q`. The integers ` ≥ 0 and p′ > 0 are called the stem and the period of A[16]. Then L = L(A) is Paun generated if and only if either qh 6∈ F , for all h ≥ l, or there existsa unique final state qm with m ≥ ` and qm is idempotent (i.e., δ(qm, am) = qm).

4.3 Circular semi-simple splicing systems

In this section, we consider a larger class of splicing systems. Precisely, we focus on the specialclass of circular semi-simple splicing systems, already considered in [10, 11, 17, 35] as the circularcounterpart of linear semi-simple splicing systems defined in [18]. The introduction of this classof systems may be justified as follows. When we search for a solution to Problem 1.1, anotherdifficulty is that if rules rj ’s exist such that (∼w′,∼w′′)`rj

∼wj , these rj ’s do not necessarilyapply to ∼wj while new rules could do that. This difficulty vanishes in the case of the circularsemi-simple splicing systems.

We recall that a Paun circular semi-simple splicing system (CSSH system for short) is a finitesplicing system S = (A, I,R) such that, for each u1#u2$u3#u4 ∈ R, we have |u1u2| = |u3u4| = 1[10]. Thus, there are four types of rules, namely ai#1$aj#1, ai#1$1#aj , 1#ai$aj#1 and1#ai$1#aj , with ai, aj ∈ A. Furthermore, since R is assumed to be symmetric, if ai#1$1#aj ∈

9

R then we also have 1#aj$ai#1 ∈ R. So, using the terminology of [10], an (i, j)-CSSH system,with (i, j) ∈ {(1, 3), (2, 4)}, (resp. a (2, 3)-CSSH system) is a CSSH system where for eachu1#u2$u3#u4 ∈ R we have ui, uj ∈ A (resp. u2, u3 ∈ A or u1, u4 ∈ A). For instance,S = (A, I, R), with A = {a, b, c}, I = ∼{aac, b}, R = {c#1$b#1} is a (1, 3)-CSSH system,whereas S′ = (A, I,R′), with R′ = {1#c$1#b} is a (2, 4)-CSSH system [10]. Notice that in a(1, 3)-CSSH system, circular splicing can be rephrased as follows: given a rule ai#1$aj#1 andtwo circular words ∼xai, ∼yaj , the circular splicing yields as a result ∼xaiyaj .

The special case u1u2 = u3u4 ∈ A (simple systems) was considered in [11], once again as thecircular counterpart of the linear case investigated in [27]. Given (i, j) ∈ {(1, 3), (2, 4), (2, 3)},an (i, j)-circular simple system is an (i, j)-CSSH system which is simple [11]. In Proposition4.2, we prove that there are circular simple systems generating non-regular circular languages.Subsequently, we show how this result disproves a statement reported in [35].

Proposition 4.2 Let S = (A, I, R) be the (1, 3)-circular simple system defined by A = {a, b, c},I = ∼{baca} and R = {a#1$a#1}. Then, Lin(L(S)) ∩ (ba)∗(ca)∗ = {(ba)n(ca)n | n ≥ 1} andconsequently L(S) is not a regular circular language.

Proof :We firstly prove by induction on n that ∼(ba)n(ca)n ∈ L(S). Indeed, ∼baca ∈ I ⊆ L(S). Thus,consider ∼(ba)n(ca)n, with n > 1. By induction hypothesis, ∼(ba)n−1(ca)n−1 ∈ L(S), hence,by using the definition, we get (∼(ba)n−1(ca)n−1,∼caba) `r

∼(ba)n−1(ca)n−1caba = ∼(ba)n(ca)n,where r = a#1$a#1 and ∼caba ∈ L(S), so ∼(ba)n(ca)n ∈ L(S). Consequently, {(ba)n(ca)n | n ≥1} ⊆ Lin(L(S)) ∩ (ba)∗(ca)∗.

Conversely, we prove that if w ∈ Lin(L(S)) ∩ (ba)∗(ca)∗ then w ∈ {(ba)n(ca)n | n ≥ 1}by induction on |w|. By hypothesis, w ∈ Lin(L(S)), hence there exists i such that ∼w ∈σi(I). If i = 0 then ∼w ∈ I and Lin(I) ∩ (ba)∗(ca)∗ = baca. Let us suppose i > 1. Thus,there exist n, m ≥ 1 such that w = (ba)n(ca)m and, by definition, ∼w′ = ∼xa, ∼w′′ = ∼ya ∈σi−1(I) also exist such that ∼w = ∼w′w′′ = ∼xaya. Hence xaya ∼ w = (ba)n(ca)m, so xaya ∈(ba)∗(ca)∗(ba)∗ ∪ (ca)∗(ba)∗(ca)∗ ∪ (ab)∗(ac)∗(ab)∗ ∪ (ac)∗(ab)∗(ac)∗. Thus, since xaya ∈ A∗a,we have xaya ∈ (ba)∗(ca)∗(ba)∗ ∪ (ca)∗(ba)∗(ca)∗. Assume that xaya ∈ (ba)∗(ca)∗(ba)∗. Sincexa, ya ∈ A∗a, one of the following three cases holds:

(1) xa ∈ (ba)∗, ya ∈ (ba)∗(ca)∗(ba)∗;

(2) xa ∈ (ba)∗(ca)∗, ya ∈ (ca)∗(ba)∗;

(3) xa ∈ (ba)∗(ca)∗(ba)∗, ya ∈ (ba)∗.

We cannot have xa ∈ (ba)∗ ⊆ (ba)∗(ca)∗ since otherwise, by ∼xa ∈ L(S) and by using theinduction hypothesis, we should have xa ∈ {(ba)n(ca)n | n ≥ 1}. A similar argument applies toya if we assume that xa ∈ (ba)∗(ca)∗(ba)∗, ya ∈ (ba)∗. In conclusion, we have xa ∈ (ba)+(ca)+

and ya ∈ (ca)+(ba)+. In the same manner we can prove that if xaya ∈ (ca)∗(ba)∗(ca)∗ thenxa ∈ (ca)+(ba)+ and ya ∈ (ba)+(ca)+. Now, we know that ∼xa,∼ya ∈ L(S). Hence, onceagain by induction hypothesis, xa = (ba)t(ca)t and ya = (ca)q(ba)q (or xa = (ca)q(ba)q andya = (ba)t(ca)t). So we have ∼w = ∼w′w′′ = ∼(ba)t(ca)t(ca)q(ba)q and w ∈ {(ba)n(ca)n | n ≥ 1}.

Therefore, Lin(L(S)) is not regular and consequently L(S) is not a regular circular language.

In [35], the authors discussed Head systems S′ = (A, I, T, P ) and the corresponding splicingoperation was named action SA1 of S′. Furthermore, a triple in T of the form (1, a, 1), a ∈ A,

10

was said to have null context and crossing of length one. The first part of Theorem 3.4 in [35]claims the regularity of the circular splicing language under action SA1 of a splicing systemS′ = (A, I, T, P ) with I finite and triples in T having null context and crossings of lengthone. On the contrary, consider the Head system S′ = (A, I, T, P ) defined by A = {a, b, c},I = ∼{baca}, T = {(1, a, 1)}, (1, a, 1)P (1, a, 1). As mentioned in Section 3.2, Proposition 3.1 in[6] shows the existence of a canonical transformation of a Head system S′ into a Paun systemS such that S and S′ generate the same language. This canonical transformation applied toS′ yields the (1, 3)-circular simple system S defined in Proposition 4.2, i.e., S = (A, I,R) withA = {a, b, c}, I = ∼{baca} and R = {a#1$a#1}. Consequently, S′ and S generate the samenon-regular language L(S) mentioned in Proposition 4.2. This fact leaves still open the problemof characterizing the class of regular circular splicing languages generated by Head systemshaving null context and crossing of length one, also known as simple Head systems.

Remark 4.1 Given w, x ∈ A∗, let us denote by |w|x the number of occurrences of x in w.Let S be the (1, 3)-circular simple system defined in Proposition 4.2, i.e., S = (A, I,R) withA = {a, b, c}, I = ∼{baca} and R = {a#1$a#1}. It is worth pointing out that, as a consequenceof a result proved in [8], we have Lin(L(S)) = {w ∈ {a, b, c}+ | |w|ba = |w|ca}. Therefore, L(S)is a context-free circular language.

In [11], the authors compared the classes of circular languages generated by (i, j)-circularsimple systems, for different values of the pair (i, j). A precise description of the relationshipamong these classes of languages along with some of their closure properties was given. Inparticular, in [11], the authors proved that (1, 3)- and (2, 4)-circular simple systems generatethe same class of languages. An analogous viewpoint was adopted for Paun circular semi-simplesplicing systems in [10] where the authors highlighted further differences between circular simpleand CSSH systems. In particular, the class of languages generated by (1, 3)-CSSH systems isnot comparable with the class of languages generated by (2, 4)-CSSH systems. Indeed, in [11],the authors proved that there is no (2, 4)-CSSH system S1 such that L(S) = L(S1), where S =(A, I,R) is the (1, 3)-CSSH system defined by A = {a, b, c}, I = ∼{aac, b} and R = {c#1$b#1}.In Proposition 4.4 we give an alternative proof of this statement. In [15], the authors show thatthe map µ defined by µ(C) = CRev is a bijection between the class of circular languages generatedby (2, 4)-CSSH and the class of circular languages generated by (1, 3)-CSSH systems. Finally,in [17], the author claimed that the class of circular languages generated by CSSH systems iscontained in the class of context-free circular languages. However, a still open problem is to finda characterization of the class of regular circular languages generated by CSSH systems. Thestructure of these languages is unknown even if we restrict ourselves to languages generated bycircular simple splicing systems.

4.4 Marked systems

The task of solving Problem 1.1 may be further simplified by considering (1, 3)-CSSH systemsS = (A, I,R) such that, for each ∼w ∈ I, there is at most one letter a in SITES(R)∩ alph(∼w)and a occurs in ∼w no more than once. Loosely speaking, in this case, for each ∼w1, ∼w2 ∈ I,the splicing operation applies to ∼w1, ∼w2 by using at most one rule and no more than one pairof linearizations of ∼w1, ∼w2.

In this section we give a brief exposition of results proved in [14, 15] which allow us to solveProblem 1.1 for this class of systems. In Section 4.5 we review these results in a graph theoreticalsetting. The advantage of using this new approach lies in the fact that interesting connectionswith known notions in graph theory are shown and from which new results are derived.

11

In detail, in [14, 15] the authors consider the class of weakened marked systems and itssubclass of marked systems. A weakened marked system S = (A, I, R) is a (1, 3)-CSSH systemsuch that for each w ∈ I we have |w|SITES(R) ≤ 1. A marked system S = (A, I, R) is a (1, 3)-CSSH system S = (A, I,R) such that I = SITES(R) = A. For instance, S = (A, I, R), withA = {a, b, c}, I = ∼{aac, b}, R = {c#1$b#1}, is a weakened marked system whereas S′ =(A′, I ′, R), with A′ = I ′ = {c, b}, R = {c#1$b#1}, is a marked system. In [15] the authors provethat a characterization of the marked systems that generate regular circular languages yieldsa characterization of the weakened marked systems satisfying the same property. In the samepaper, the authors also prove that similar results hold for the structure of the generated regularcircular languages: the regular circular languages generated by weakened marked systems can beconstructed with ease by means of regular circular languages generated by marked systems. Letus illustrate this construction over an example. Consider the above-mentioned marked systemS′ = (A′, I ′, R), with A′ = I ′ = {c, b}, R = {c#1$b#1}. Results in this section will statethat S′ generates the regular circular language L(S′) = ∼{b, c}+ \ ∼(b+b ∪ c+c). Therefore wecan prove that the weakened marked system S = (A, I,R), with A = {a, b, c}, I = ∼{aac, b},R = {c#1$b#1}, also generates a regular circular language. Precisely, L(S) is the circularizationof the language obtained by inserting the word aa on the left of each occurrence of c in the wordsof L(S′), i.e., L(S) = ∼{b, aac}+ \ ∼(b+b ∪ (aac)+aac).

In what follows, S = (I, R) will denote a marked system and L(I, R) will be the correspondinggenerated language. Note that S is uniquely identified by L(I,R) [15]. For abbreviation, wewill write (ai, aj) to denote a rule ai#1$aj#1 in R and we will use ≈ to denote the transitiveclosure of R. The relation ≈ is described in detail in the definition below.

Definition 4.1 Let ai, aj ∈ I. Then ai ≈ aj if and only if b1, . . . , bk ∈ I exist, with k ≥ 2, suchthat:

1. b1 = ai, bk = aj,

2. ∀h ∈ {1, . . . , k − 1}, (bh, bh+1) ∈ R.

Hence ≈ is a transitive (by definition), symmetric (since R is symmetric) relation and for eacha ∈ I there exists b ∈ I such that (a, b) ∈ R. As a consequence, the relation ≈ is an equivalencerelation on I.

The canonical decomposition of S is defined in [14] as the family {(Ih, Rh) | 1 ≤ h ≤ g} ofmarked systems Sh = (Ih, Rh), 1 ≤ h ≤ g, with I≈ = {I1, . . . , Ig} (resp. R≈ = {R1, . . . , Rg})being the partition of I (resp. R) induced by ≈.

Example 4.1 [15] Let S = (I, R) be the marked system defined by I = {c, b} and R ={(c, b)}. Then, the canonical decomposition of S is {(I, R)}, whereas the canonical decompo-sition of S′ = (I ′, R′), with I ′ = {a1, a2, a3, a4, a5, a6}, R′ = {(a1, a2), (a2, a3), (a2, a4), (a5, a6)}is {(Ih, Rh) | 1 ≤ h ≤ 2}, where I1 = {a1, a2, a3, a4}, R1 = {(a1, a2), (a2, a3), (a2, a4)}, I2 ={a5, a6}, R2 = {(a5, a6)}.

Proposition 4.3 [14, 15] Let S = (I,R) be a marked system, let {(Ih, Rh) | 1 ≤ h ≤ g} be thecanonical decomposition of S and let Sh = (Ih, Rh), 1 ≤ h ≤ g. Then L = L(I, R) =

⋃gh=1 Lh,

where Lh = L(Ih, Rh). Moreover Lh ∩ Lh′ = ∅, for h, h′ ∈ {1, . . . , g}, h 6= h′.

Hence, by taking into account classical results on formal languages, in order to characterizethe structure of the regular (resp. context-free) circular languages generated by marked systems,we can assume that {(I, R)} is the canonical decomposition of S. In this case S is defined to bea transitive marked system.

12

Example 4.2 The marked system S′ = (I ′, R′), with I ′ = {a1, a2, a3, a4, a5, a6}, R′ = {(a1, a2),(a2, a3), (a2, a4), (a5, a6)} is not transitive. On the contrary, S1 = (I1, R1), with I1 = {a1,a2, a3, a4}, R1 = {(a1, a2), (a2, a3), (a2, a4)}, is a transitive marked system.

In turn, Theorem 4.2 states a characterization of transitive marked systems generating reg-ular circular languages. Some definitions are needed. Precisely, let S = (I,R) be a transitivemarked system and let J be a subset of I. We set RJ = R∩ (J ×J) = {(ai, aj) ∈ R | ai, aj ∈ J}and we say that J is transitive if SJ = (J,RJ) is a transitive marked system. Furthermore,the distance d(ai, aj) between ai, aj ∈ I is defined by d(ai, aj) = min {k | ∃b1, . . . , bk ∈I : (bh, bh+1) ∈ R, 1 ≤ h ≤ k − 1, b1 = ai, bk = aj}. Finally, the diameter d(S) of S is2 if |I| = 1, d(S) = max{d(ai, aj) | ai, aj ∈ I, ai 6= aj} otherwise. Notice that if I = {a} then(a, a) ∈ R and so d(a, a) = 2.

Theorem 4.2 [14, 15] The following conditions are equivalent:

(1) L(I,R) is a regular circular language.

(2) For every transitive subset J of I in S = (I,R), we have d(SJ) ≤ 3.

(3) For every transitive subset J = {a1, a2, a3, a4} of I in S = (I,R), we have RJ 6= R′, for anyR′ such that {(a1, a2), (a2, a3), (a3, a4)} ⊆ R′ ⊆ {(a1, a2), (a2, a3), (a3, a4)} ∪ {(ai, ai) | 1 ≤i ≤ 4}.

(4) L(I,R) = I ∪⋃

J⊆I, J transitive∼(∩ai∈JJ∗aiJ

∗).

(5) L(I,R) = I ∪ {w ∈ ∼I+ | alph(w) is transitive }.

Example 4.3 [15] Let S = (I, R), where I = {a1, a2, a3, a4} and R = {(a1, a2), (a2, a3), (a3, a4),(a1, a3)}. Thus, S = (I, R) is a transitive marked system and, in view of Theorem 4.2, L(I,R)= ∼{a1, a2, a3, a4}+ \ (∪a∈I

∼a+a ∪ ∼({a1, a2, a4}+a4)) is a regular circular language.

From the above theorem, it follows that if d(S) < 3 then L(I,R) is a regular circular languageand if d(S) > 3 then L(I,R) is not a regular circular language. The class of the circular languagesgenerated by transitive marked systems S such that d(S) = 3 intersects both the family of regularcircular languages and its complement. Indeed, for the transitive marked system S reported inExample 4.3, we have that L(I,R) is a regular circular language and d(S) = 3. On the contrary,in [14, 15] the authors exhibit a non-regular circular language generated by a transitive markedsystem S with d(S) = 3. Results on marked systems allow us to give a different proof below ofa result given in [10].

Proposition 4.4 [10] The regular circular language L = ∼{b, aac}+ \ ∼(b+b ∪ (aac)+aac) isgenerated by the (1, 3)-CSSH system S = (A, I, R) defined by A = {a, b, c}, I = ∼{aac, b} andR = {c#1$b#1}. On the contrary, there is no (2, 4)-CSSH system S1 such that L = L(S1).Consequently, the class of languages generated by (1, 3)-CSSH systems is not comparable withthe class of languages generated by (2, 4)-CSSH systems.

Proof :We have already observed that Theorem 4.2 along with other results in [14, 15] allow us to provethat L = ∼{b, aac}+ \∼(b+b∪ (aac)+aac) = L(S), where S = (A, I,R) is the (1, 3)-CSSH systemdefined by A = {a, b, c}, I = ∼{aac, b} and R = {c#1$b#1}. By contradiction, assume thatthere exists a (2, 4)-CSSH system S′ = (A, I ′, R′) such that we have L = L(S′). Notice that foreach w ∈ L(S′) \ ∼{b, aac}, we have alph(w) = A.

13

Since b,∼aac ∈ L and there is no other circular word of length less than three in L, we haveb,∼aac ∈ I ′. Furthermore, if there existed x ∈ A such that 1#x$1#x ∈ R′, there would bew ∈ σ′(I ′) ⊆ (L(S′) \ ∼{b, aac}) such that alph(w) 6= A, a contradiction. In the same mannerwe can see that 1#a$1#c 6∈ R′. Since R′ 6= ∅ (otherwise L(S′) = I ′ would be a finite set), wehave 1#b$1#a ∈ R′ or 1#b$1#c ∈ R′. Assume that r = 1#b$1#a ∈ R′. Then (b,∼aca)`r

∼bacaand ∼baca ∈ L(S′) = L. Consequently baca ∼ w, with w ∈ {b, aac}+, a contradiction. A similarargument applies if we assume r′ = 1#b$1#c ∈ R′. Indeed, if r′ ∈ R′ then (b,∼caa)`r′

∼bcaa.Consequently, ∼bcaa ∈ L(S′) = L which yields bcaa ∼ w, with w ∈ {b, aac}+, once again acontradiction.

As another consequence of Theorem 4.2, given a marked system S = (I,R), we can decidewhether L(I,R) ∈ Reg∼ and we can also decide whether a regular circular language C isgenerated by a marked system. Finally, all the above-mentioned results for marked systems canbe established with (1, 3)-CSSH systems replaced by (2, 4)-CSSH systems [15].

4.5 Marked systems and graphs

We now consider all the material from the previous section in a graph theoretical framework,with which we assume the reader is familiar, and we refer to [9, 13] for more details. In particular,let G = (V,E) be a graph, where V is the vertex set of G and E is the edge set of G. We recallthat a path in G is a sequence (a1, . . . , ak) of vertices in V such that (ai, ai+1) ∈ E. The lengthof a path is the number of edges in the path and a path is simple if all vertices in the path aredistinct. In an undirected graph, self-loops - edges from a vertex to itself - are forbidden buthere we do not make this assumption. An undirected graph is connected if every two verticesare reachable from each other. The connected components of a graph are the equivalence classesof vertices under the “is reachable from” relation. Finally, a graph G is simple if there areno self-loops in G and the simple graph underlying G is the graph obtained by dropping theself-loops in G.

Then, since the set R of rules specifies a symmetric binary relation, in a natural way a markedsystem S = (I,R) is represented by an undirected graph G = (I, R), where I is the vertex setand R is the edge set. Notice that G may have self-loops. From now on, G will be referred to asthe graph associated with the marked system S = (I, R). Of course, the canonical decomposition{(Ih, Rh) | 1 ≤ h ≤ g} of S has as a counterpart the decomposition {Gh = (Ih, Rh) | 1 ≤ h ≤ g}of G into its connected components. A marked system S = (I,R) is transitive if G is connectedand either |I| > 1 or G is a single vertex with a self-loop. A subset J of I is transitive if thesubgraph GJ induced by J inherits the above-mentioned property of G (i.e., GJ is connectedand either |J | > 1 or GJ is a single vertex with a self-loop). Analogously, the notions of distanceand diameter can be defined for the directed version G′ of a connected graph G associated witha marked system S = (I, R). More precisely, given an undirected graph G = (V,E), the directedversion of G is the directed graph G′ = (V,E′), where (u, v) ∈ E′ if and only if (u, v) ∈ E. Thatis, each undirected edge (u, v) in G is replaced in the directed version by the two directed edges(u, v) and (v, u). Then, the distance d(ai, aj) between ai and aj is defined as the number ofvertices of the shortest non-empty path connecting ai and aj in G′. Furthermore, the diameterd(S) of the system S is 2 if G′ is a single vertex with a self-loop, d(S) is the maximum of thedistances d(ai, aj), where (ai, aj) ranges over all pairs of different vertices in G′, otherwise.

In particular, let us consider the subgraph GJ induced by the subset J = {a1, a2, a3, a4}mentioned in item (3) in Theorem 4.2. When RJ = {(a1, a2), (a2, a3), (a3, a4)}, GJ is a wellknown graph denoted by P4 [9]. We also recall that a P4-free graph G is a graph such thatevery connected subgraph of the simple graph underlying G which is induced by a subset of I

14

a1 -�

a2

6

?a3a4

-�

@@

@@

@I@@

@@@R

Figure 1:

with 4 vertices, is not P4. Furthermore, it is well known that simple P4-free graphs are cographs.Cographs have been deeply investigated in graph theory, and linear-time recognition algorithmshave been provided for this class of graphs [9]. The following corollary states a characterizationof the transitive marked systems S generating regular circular languages by means of a propertyof the graph G associated with S.

Corollary 4.1 Let S = (I,R) be a transitive marked system, let G = (I, R) be the graphassociated with S. The following conditions are equivalent:

(1) L(I, R) is a regular circular language.

(2) The simple graph underlying graph G is P4-free and so, G is a cograph.

Proof :Let S = (I, R) be a transitive marked system, let G = (I, R) be the graph associated withS. In view of item (3) in Theorem 4.2, L(I, R) is a regular circular language if and only iffor every transitive subset J = {a1, a2, a3, a4} of I in S, we have RJ \ {(ai, ai) | 1 ≤ i ≤4} 6= {(a1, a2), (a2, a3), (a3, a4)}. Of course, an equivalent formulation of the above statement is:L(I, R) is a regular circular language if and only if every connected subgraph GJ of the simplegraph underlying G which is induced by a subset J of I with 4 vertices, is not P4. This finishesthe proof.

For instance, in Figure 1 we have depicted the directed version of the graph associated withthe transitive marked system defined in Example 4.3.

Since a graph is uniquely defined by its components, it turns out that the marked systemgenerating L(I, R) is unique, as stated in [15]. Indeed, given a language L(I,R) generated by amarked system S, the vertex set of the graph associated with S is equal to the set of the circularwords of length one in L(I,R), whereas its edge set is defined by the circular words of lengthtwo in L(I, R). Furthermore, given a graph G = (I, R), we may use recognition algorithms forcographs in order to decide whether L(I,R) ∈ Reg∼.

Obviously, a graph G is P4-free if and only if the same holds for all the connected componentsof G. This observation allows us to state below a characterization of the marked systems Sgenerating regular circular languages by means of a property of the graph G associated with S.

Corollary 4.2 Let S = (I, R) be a marked system, let G = (I, R) be the graph associated withS. The following conditions are equivalent:

(1) L(I, R) is a regular circular language.

15

(2) The simple graph underlying graph G is P4-free and so, G is a cograph.

Proof :Let S = (I,R) be a marked system, let {(Ih, Rh) | 1 ≤ h ≤ g} be the canonical decompositionof S and let Sh = (Ih, Rh), 1 ≤ h ≤ g. Let G = (I, R) be the graph associated with S. We knowthat {Gh = (Ih, Rh) | 1 ≤ h ≤ g} is the decomposition of G into its connected components,where Gh is the graph associated with the transitive marked system Sh.

In view of Proposition 4.3 and by using classical results on formal languages, L(I,R) isa regular circular language if and only if Lh = L(Ih, Rh) is a regular circular language forh ∈ {1, . . . , g}. In turn, in view of Corollary 4.1, Lh is a regular circular language if and onlyif the simple graph underlying graph Gh is P4-free. Consequently, L(I, R) is a regular circularlanguage if and only if the simple graph underlying graph G is P4-free.

Remark 4.2 Notice that item (2) in Corollary 4.1 is the same as item (2) in Corollary 4.2. How-ever, in the former statement, G is a connected graph whereas in the latter G is not necessarilyso.

Finally, let us mention an important consequence of the previous results. We know that thecomplement graph G of G = (I,R) is defined by G = (I, (I × I) \ R). It is also known that ifG is a cograph then the same holds for G. This property of cographs suggests a definition of acorresponding complement operation for splicing systems. Given a marked system S = (I,R),this operation should map S in a (1, 3)-CSSH system S′ = (A′, I ′, R′) such that A′ = I ′ = I,R′ = (I × I) \ R. The class of marked systems is not closed with respect to this operation asthe example below shows.

Example 4.4 Let S = (I, R) be the marked system defined by I = {a, b, c}, R = {(a, b), (b, c),(b, b)}. Thus S′ = (A′, I ′, R′) with A′ = I ′ = I, R′ = (I × I) \ R = {(a, c), (a, a), (c, c)} is not amarked system.

So, for our aims, we need a slightly different definition of a marked system. More precisely,in the next part of this section a g-marked system S = (I, R) will be a (1, 3)-CSSH systemsuch that SITES(R) ⊆ I and I = A. Furthermore, G = (I, R) will be once again the graphassociated with S. Finally, as pointed out in [15], L(S) = L(SITES(R), R) ∪ (I \ SITES(R)).Notice that S′ = (SITES(R), R) is a marked system.

Definition 4.2 Let S = (I, R) be a g-marked system. We call S = (I, (I × I) \R) the comple-ment system of S.

Obviously the complement operation applied to S yields S. We have gathered below somedirect consequences of the definitions and of the above observations. In particular, Corollary4.3 states that the class of the g-marked systems generating regular circular languages is closedunder the complement operation.

Proposition 4.5 Let S = (I, R) be a g-marked system, let G be the graph associated with S.Then L(S) is a regular circular language if and only if the simple graph underlying graph G isP4-free.

Proof :Let S = (I, R) be a g-marked system, let G be the graph associated with S. As we have observed,L(S) = L(SITES(R), R)∪ (I \SITES(R)), with S′ = (SITES(R), R) being a marked system.

16

Consequently, L(S) is a regular circular language if and only if L(SITES(R), R) is a regularcircular language. In turn, in view of Corollary 4.2, L(SITES(R), R) is a regular circularlanguage if and only if the simple graph underlying graph G′ = (SITES(R), R) is P4-free.Finally, the simple graph underlying graph G′ is P4-free if and only if the same property holdsfor the simple graph underlying graph G.

Proposition 4.6 Let S = (I, R) be a g-marked system, let G be the graph associated with S.Then S is a g-marked system and G is the graph associated with S.

Corollary 4.3 If S = (I,R) is a g-marked system such that L(I, R) is a regular circular lan-guage then its complement system S = (I, (I × I) \R) generates a regular circular language.

Proof :Let S = (I,R) be a g-marked system, let G be the graph associated with S. In view ofProposition 4.5, if L(S) is a regular circular language then the simple graph underlying graph Gis a cograph. Thus, G is also a cograph and, by Propositions 4.5, 4.6, L(S) is a regular circularlanguage.

4.6 Non-marked splicing systems

Attempting to extend the results concerning marked systems to the more general case of CSSHsystems is a natural research direction. The results in this section do in fact go in this directionand actually underline the difference between the two above-mentioned classes of systems. Webegin with Proposition 4.7 and Example 4.5, where we show that a slight difference betweentwo non-marked CSSH systems changes their computational power. Precisely, let S = (I, R)be a marked system, where R = {(a, b)}. Then, we necessarily have I = {a, b} and, in viewof Theorem 4.2, L(I, R) is a regular circular language. On the contrary, below we prove thatL(S) is not a regular circular language, where S = (A, I,R) is the (1, 3)-CSSH system definedby A = {a, b}, I = ∼{ab}, R = {a#1$b#1}. It is worth noting that the (2, 3)-CSSH systemS = (A, I, R′) defined by A = {a, b}, I = ∼{ab}, R′ = {1#a$b#1} generates a regular circularlanguage (Example 4.5). Subsequently we consider the case of a circular splicing system obtainedby adding one rule r to the set of rules in a marked system S = (I,R), where r = 1#a$1#1or r = a#1$1#1 and a ∈ I. We give an example where this transformation changes a markedsystem generating a non-regular circular language into a circular splicing system generating aregular circular language (Proposition 4.9).

Proposition 4.7 Let S = (A, I,R) be the (1, 3)-CSSH system defined by A = {a, b}, I = ∼{ab}and R = {a#1$b#1}. Then, L(S) ∩ ∼(a∗b∗) = ∼{anbn | n ≥ 1} and consequently L(S) is not aregular circular language.

Proof :It is easy to prove that for each w ∈ L(S) we have |w|a = |w|b, by induction on the length of w,with w ∈ σi(I). Indeed, since I = ∼{ab}, we may assume that i > 0. Hence, w = ∼w′w′′, with∼w′,∼w′′ ∈ L(S). Therefore |w′|a = |w′|b, |w′′|a = |w′′|b (induction hypothesis) and consequently|w|a = |w|b. So, L(S) ⊆ ∼{w ∈ {a, b}∗ | |w|a = |w|b} and consequently L(S) ∩ ∼(a∗b∗) ⊆∼{anbn | n ≥ 1}. Conversely, let r = a#1$b#1. Since (∼ba,∼an−1bn−1) `r

∼baan−1bn−1 =∼anbn, we can prove by induction on n that ∼anbn ∈ L(S), for each n. Consequently ∼{anbn | n ≥1} ⊆ L(S)∩∼(a∗b∗) and therefore, L(S)∩∼(a∗b∗) = ∼{anbn | n ≥ 1}. Hence Lin(L(S)∩∼(a∗b∗))∩

17

a∗b∗ = Lin(∼{anbn | n ≥ 1}) ∩ a∗b∗ = {anbn | n ≥ 1}. Thus, Lin(L(S)) is not regular andconsequently L(S) is not a regular circular language.

Remark 4.3 Let S be the (1, 3)-CSSH system considered in Proposition 4.7, i.e., S = (A, I,R),with A = {a, b}, I = ∼{ab} and R = {a#1$b#1}. We have already proved that L(S) ⊆ ∼{w ∈{a, b}∗ | |w|a = |w|b}. However L(S) 6= ∼{w ∈ {a, b}+ | |w|a = |w|b} since ∼abab 6∈ L(S).Indeed, otherwise ∼abab = ∼w′w′′, with ∼w′,∼w′′ ∈ L(S). Since no circular word of length oneis in L(S), we should have ∼abab ∈ σ′(∼{ab}), a contradiction since ∼abab 6= ∼abba = σ′(∼{ab}).

Example 4.5 In Proposition 4.7 we stated that L(S) is not a regular circular language, whereS = ({a, b},∼{ab}, {a#1$b#1}). On the contrary, we now prove that if S = ({a, b},∼{ab}, {r}),where r = 1#a$b#1, then L(S) is a regular circular language. Indeed, we show that L(S) =∼(ab)+. Of course ∼(ab)+ is a regular circular language since Lin(∼(ab)+) = (ab)+ ∪ (ba)+.First, let us prove that for each ∼w ∈ L(S) = ∪i≥0σ

i(I) we have ∼w ∈ ∼(ab)+, by induction onthe length of ∼w. Let i be such that ∼w ∈ σi(I). If i = 0, then obviously ∼w = ∼ab ∈ ∼(ab)+.Otherwise, by definition, there exist ∼w′,∼w′′ ∈ σi−1(I) such that (∼w′,∼w′′) `r

∼w. Sincer = 1#a$b#1 we have ∼w1 = ∼ax, ∼w2 = ∼xb and ∼w = ∼axyb. Furthermore, by inductionhypothesis we have that ∼w′,∼w′′ ∈ ∼(ab)+, hence x ∈ b(ab)∗, y ∈ a(ba)∗. As a consequence, wehave ∼w = ∼axyb ∈ ∼(ab)+. Conversely, let us prove that for each i ≥ 1 we have ∼(ab)i ∈ L(S),by induction on i. Obviously, we have ∼ab ∈ I ⊆ L(S). In addition, if we assume that∼(ab)i ∈ L(S), then (∼w′,∼w′′) `r

∼(ab)i+1, where ∼w′ = ∼ab, ∼w′′ = ∼(ab)i ∈ L(S) (byinduction hypothesis) and consequently ∼(ab)i+1 ∈ L(S).

When dealing with CSSH systems we assume that 1 6∈ SITES(R). One can ask whathappens in the case in which this is no longer true. Proposition 4.8 answers this question in theparticular case that we add one rule r to the set of rules in a marked system, where r = 1#a$1#1or r = a#1$1#1 and a ∈ I. Obviously, by doing so, we obtain a new system which is neither amarked system nor a CSSH system. Lemma 4.1 will be used in the proof of Proposition 4.8.

Lemma 4.1 [15] Let S = (A1, I1, R1), S′ = (A2, I2, R2) be two Paun circular splicing systems.If A1 ⊆ A2, I1 ⊆ I2 and R1 ⊆ R2 then L(S) ⊆ L(S′).

Proposition 4.8 Let S = (I, R) be a marked system, let a ∈ I. Let Sa = (A, I,Ra), whereA = I and Ra = R∪{1#a$1#1} (resp. Ra = R∪{a#1$1#1}). Then L(Sa) = L(I, R)∪ ∼aI+.

Proof :Let us prove, by using induction on the length of w, that if ∼w ∈ L(Sa) = ∪i≥0σ

i(I) then∼w ∈ L(I, R)∪∼aI+. Of course, if ∼w ∈ I then ∼w ∈ L(I, R)∪∼aI+. Otherwise, by definition,there are ∼w′,∼w′′ ∈ L(Sa) and r ∈ Ra such that (∼w′,∼w′′) `r

∼w. If |w|a 6= 0 then ∼w ∈ ∼aI+,otherwise, since ∼w = ∼w′w′′, we have |w|a = |w′|a = |w′′|a = 0 and consequently r 6= 1#a$1#1(resp. r 6= a#1$1#1). Therefore, r ∈ R and by using the induction hypothesis, ∼w′,∼w′′ ∈L(I,R). Hence ∼w ∈ L(I,R).

Conversely, let us prove that L(I, R) ∪ ∼aI+ ⊆ L(Sa). In view of Lemma 4.1, we haveL(I,R) ⊆ L(Sa). Furthermore, let ∼w = ∼aa1 · · · an ∈ ∼aI+, with a1, . . . , an ∈ I, n ≥ 1. Then(∼w′,∼w′′) `r

∼w with ∼w′ = ∼aa1 · · · an−1, ∼w′′ = an, r = 1#a$1#1 (resp. ∼w′ = ∼a2 · · · ana,∼w′′ = a1, r = a#1$1#1). Thus, by using induction on n, it is clear that ∼w = ∼aa1 · · · an ∈L(Sa).

Given a marked system S = (I, R) and J ⊆ I, the circular splicing system SJ = (J,RJ) isnot necessarily a transitive marked system but we will still use the notation L(J,RJ) for the

18

corresponding generated language. Notice that in view of the definitions, SJ = (J,RJ) is ag-marked system.

Proposition 4.9 Let S = (I,R), where I = {a1, a2, a3, a4}, RMin ⊆ R ⊆ RMax, RMin ={(a1, a2), (a2, a3), (a3, a4)} and RMax = RMin ∪ {(a, a) | a ∈ SITES(RMin)}. Then L(S) is anon-regular circular language. Let Sax = (A, I,Rax), where A = I and Rax = R ∪ {1#ax$1#1}(resp. Rax = R ∪ {ax#1$1#1}), x ∈ {1, 2, 3, 4}. Then L(Sax) is a regular circular language.

Proof :Of course, by Theorem 4.2, L(S) is a non-regular circular language. Let Jx = I \ ax. It iseasy to state that L(I,R) \ ∼axI∗ = L(I,R) ∩ ∼J+

x = L(Jx, RJx) is a regular circular language,for each x ∈ {1, 2, 3, 4}. Indeed, by definition, Sx = (Jx, RJx) is a g-marked system with|Jx| = 3. Therefore, the graph associated with Sx is a cograph and so, in view of Proposition4.5, L(Jx, RJx) is a regular circular language. Having disposed of this preliminary step, wenow prove that L(Sax) is a regular circular language. Indeed, in view of Proposition 4.8 wehave L(Sax) = L(I,R) ∪ ∼axI+ = (L(I, R) \ ∼axI∗) ∪ {ax} ∪ ∼axI+ = (L(I,R) ∩ ∼J+

x ) ∪ ∼axI∗

= L(Jx, RJx) ∪ ∼axI∗. Hence, Lin(L(Sax)) = Lin(L(Jx, RJx) ∪ ∼axI∗) = Lin(L(Jx, RJx)) ∪Lin(∼axI∗). Consequently Lin(L(Sax)) is regular since Lin(∼axI∗) = I∗axI∗ is regular andwe have proved that Lin(L(Jx, RJx)) is also regular. Hence, by definition, L(Sax) is a regularcircular language.

4.7 Marked systems with self-splicing

In this section we consider the computational power of the marked systems in which both splicingand self-splicing are allowed. For better readability, we continue to use the same terminologyeven if it is understood that we are referring to this different setting. In particular, there willbe two circular languages associated with a marked system S = (I,R), depending on whetherself-splicing is used or not, namely L(I, R) = τ∗(I) and L(I, R) = σ∗(I). As a main result, inProposition 4.12 we will prove that L(I, R) is always a regular circular language, a result thatwas reported without proof in [14]. We recall that in the special case of CSSH systems, eachfinite Paun circular splicing system S may be transformed into a finite Pixton circular splicingsystem S′ such that S and S′ generate the same language (see Remark 3.1 in [6], Section 3.2).However, since in a marked system the set of rules is not necessarily assumed to be reflexive,Proposition 4.12 is not a consequence of Theorem 3.1.

In view of Definition 3.2, we have w ∈ L(I, R) if and only if there exists i, with i ≥ 0, suchthat w ∈ τ i(I). In turn, for i > 0, we have τ i(I) = τ i−1(I) ∪ τ ′(τ i−1(I)) and w ∈ τ ′(τ i−1(I))if and only if one of the following two cases is satisfied: (1) there exists a rule (a, b) ∈ R andthere are two circular words ∼xa, ∼yb ∈ τ i−1(I) such that w = ∼xayb, (2) there exists a rule(a, b) ∈ R and ∼xayb ∈ τ i−1(I) also exists such that w = ∼xa or w = ∼yb.

Remark 4.4 Let S = (I, R) be a transitive marked system. It is straightforward to show thatL(I, R) =

⋃i≥0 σi(I) ⊆ L(I, R) =

⋃i≥0 τ i(I) ⊆ ∼I+. In order to prove this statement, it suffices

to show, by induction on i that σi(I) ⊆ τ i(I), for all i ≥ 0. Of course σ0(I) = I = τ0(I).Let i > 0 and w ∈ σi(I) = σi−1(I) ∪ σ′(σi−1(I)). By using the induction hypothesis and thedefinition, we have σi−1(I) ⊆ τ i−1(I) ⊆ τ i(I). Thus, if w ∈ σi−1(I) then w ∈ τ i(I). Otherwise,w ∈ σ′(σi−1(I)). Therefore, by the definition of σ′, there exists (a, b) ∈ R and there are ∼xa,∼yb ∈ σi−1(I) such that w = ∼xayb. Hence, by using induction hypothesis, ∼xa, ∼yb ∈ τ i−1(I)and in view of the definition of τ ′, we have w ∈ τ i(I).

19

The result that follows shows that the regularity of L(I,R) will be proved for a markedsystem once we prove the same statement for transitive marked systems.

Proposition 4.10 Let S = (I, R) be a marked system, let {(Ih, Rh) | 1 ≤ h ≤ g} be thecanonical decomposition of S and let Sh = (Ih, Rh), 1 ≤ h ≤ g. Then L = L(I, R) =

⋃gh=1 Lh,

where Lh = L(Ih, Rh). Moreover, Lh ∩Lh′ = ∅, for h, h′ ∈ {1, . . . , g}, h 6= h′. Consequently, foreach w ∈ L = L(I,R), there exists a unique v ∈ {1, . . . , g}, such that w ∈ ∼I+

v .

Proof :Let S = (I,R), {(Ih, Rh) | 1 ≤ h ≤ g}, Sh = (Ih, Rh), L and Lh be as in the statement.Denote by τ ′h, τ i

h the functions τ ′, τ i classically defined when we refer to Sh = (Ih, Rh). Ourstatement, i.e., L = L(I,R) =

⋃i≥0 τ i(I) =

⋃gh=1

⋃i≥0 τ i

h(Ih) =⋃g

h=1 Lh will be proved oncewe prove that τ i(I) =

⋃gh=1 τ i

h(Ih) for all i ≥ 0. We now proceed to the proof of the latterrelation by induction on i. It clearly holds for i = 0. Let i > 0. So, by definition, we haveτ i(I) = τ i−1(I) ∪ τ ′(τ i−1(I)). Therefore, by induction hypothesis, τ i−1(I) =

⋃gh=1 τ i−1

h (Ih).Furthermore, if w ∈ τ ′(τ i−1(I)), one of the following two cases is satisfied: (1) there exists a rule(a, b) ∈ R and there are two circular words ∼xa, ∼yb ∈ τ i−1(I) such that w = ∼xayb, (2) thereexists a rule (a, b) ∈ R and ∼xayb ∈ τ i−1(I) also exists such that w = ∼xa or w = ∼yb. In thefirst case, by induction hypothesis and by definition of canonical decomposition, we have ∼xa,∼yb ∈ τ i−1

t (It) and (a, b) ∈ Rt, that yield w = ∼xayb ∈ τ it (It) ⊆

⋃gh=1 τ i

h(Ih). In the second case,by induction hypothesis and by definition of canonical decomposition, we have ∼xayb ∈ τ i−1

s (Is)and (a, b) ∈ Rs, that yield ∼xa, ∼yb ∈ τ i

s(Is) ⊆⋃g

h=1 τ ih(Ih). Conversely, if w ∈

⋃gh=1 τ i

h(Ih)then there exists t ∈ {1, . . . , g} such that w ∈ τ i

t (It) = τ i−1t (It) ∪ τ ′t(τ

i−1t (It)). We have already

observed that τ i−1t (It) ⊆ τ i−1(I) ⊆ τ i(I), so assume that w ∈ τ ′t(τ

i−1t (It)). Therefore, one of the

following two cases is satisfied: (1) there exists a rule (a, b) ∈ Rt and there are two circular words∼xa, ∼yb ∈ τ i−1

t (It) such that w = ∼xayb, (2) there exists a rule (a, b) ∈ Rt and ∼xayb ∈ τ i−1t (It)

also exists such that w = ∼xa or w = ∼yb. Correspondingly, by using the induction hypothesis,and by definition of circular splicing, w ∈ τ i(I). The proof of the second part of the statementeasily follows.

We recall that a word x ∈ A∗ is a prefix of w ∈ A∗ if there exists y ∈ A∗ such that w = xy.Then, we set Pref(w) = {x ∈ A+ | ∃y ∈ A∗ : xy = w} and Prefc(w) = {x ∈ A+ | ∃w′ ∼ w :x ∈ Pref(w′)}.

Lemma 4.2 Let L = L(I,R) be the circular language generated by a transitive marked system.Then Prefc(Lin(L)) = ∪w∈Lin(L)Prefc(w) = I+.

Proof :Let L = L(I,R) and Prefc(Lin(L)) = ∪w∈Lin(L)Prefc(w). Since L ⊆ L ⊆ ∼I+ (Remark4.4), we have Prefc(Lin(L)) ⊆ Prefc(Lin(L)) ⊆ I+. Conversely, let us prove that I+ ⊆Prefc(Lin(L)), i.e., that every word w ∈ I+ is in Prefc(Lin(L)) by using induction on |w|. If|w| = 1 then w ∈ I, hence w ∈ Lin(L) ⊆ Prefc(Lin(L)).

Thus, let w ∈ I+ such that w = a1 · · · anan+1, with aj ∈ I, 1 ≤ j ≤ n + 1. By inductionhypothesis, there exists v ∈ I∗ such that a1 · · · anv ∈ Lin(L) and so also va1 · · · an ∈ Lin(L).Since I = SITES(R), there exists a′n ∈ I such that (an, a′n) ∈ R and since (I, R) is transitive,we have an+1 ≈ a′n, i.e., b1 = an+1, . . . , bk = a′n ∈ I exist such that (b1, b2), . . . , (bk−1, bk) ∈R. By using induction on k, it is easy to prove that ∼an+1ua′n = ∼b1b2 · · · bk ∈ L whichyields an+1ua′n ∈ Lin(L). Thus, since (an, a′n) ∈ R and ∼va1 · · · an, ∼an+1ua′n ∈ L, we have∼va1 · · · anan+1ua′n ∈ L, i.e. va1 · · · anan+1ua′n ∈ Lin(L) and so a1 · · · anan+1ua′nv ∈ Lin(L).

20

The statement below describes the structure of the circular languages generated by transitivemarked systems.

Proposition 4.11 Let S = (I, R) be a transitive marked system. Then we have L = L(I,R) =∼I+ and L is a regular circular language.

Proof :We know that L ⊆ ∼I+. Conversely, let us prove that for every ∼w ∈ ∼I+, we have ∼w ∈ L.Let w = w′a with a ∈ I, let a′ ∈ I such that (a, a′) ∈ R (a′ exists since I = SITES(R)).Since a′w′a ∈ I+, in view of Lemma 4.2, a′w′a ∈ Prefc(Lin(L)). Therefore, there exists vsuch that a′w′av ∈ Lin(L). Hence ∼w′ava′ ∈ L, and thus by definition of self-splicing, we get∼w = ∼w′a ∈ L.

Proposition 4.12, which characterizes the structure of circular languages generated by markedsystems, follows easily by applying Propositions 4.10 and 4.11.

Proposition 4.12 Let S = (I, R) be a marked system and let {(Ih, Rh) | 1 ≤ h ≤ g} be thecanonical decomposition of S. Then L = L(I, R) =

⋃gh=1

∼I+h and L is a regular circular

language.

In the following proposition, we assume that a regular circular language C is defined bymeans of a finite automaton recognizing Lin(C) or by a regular expression representing Lin(C).

Proposition 4.13 Given a regular circular language C (over a finite alphabet I) we can decidewhether a marked system S = (I,R) exists such that C = L(I, R).

Proof :Let C be a regular circular language over a finite alphabet I. Since the set R of the rules ina marked system S = (I, R) is a subset of the Cartesian product I × I, only a finite num-ber of marked systems S = (I,R) on the alphabet I exist (and this number is less than2|I|

2). Furthermore, Proposition 4.12 characterizes the structure of the corresponding gener-

ated languages L(I, R). Of course, for each of these marked systems S = (I,R), we candecide whether Lin(C) = Lin(L(I, R)), since testing whether two deterministic finite au-tomata recognize the same language is decidable [25]. Obviously, if Lin(C) = Lin(L(I,R))then C = ∼Lin(C) = ∼Lin(L(I,R)) = L(I, R).

5 Open problems

In this paper we have presented a survey on known results and new developments towards asolution of a major open problem on finite circular splicing systems and that is, finding a char-acterization of circular splicing languages that are regular circular languages. This longstandingquestion is still far from being solved. However, the material in this paper gives new insightsinto the structure of regular circular languages generated by systems with very simple types ofrules. Indeed, the results reported in this article suggest new combinatorial approaches to thestudy of the computational power of circular splicing systems.

For instance, the canonical decomposition of a marked system leads to an investigation ofmore general decomposition properties of the system based on suitable relations over the symbolsinduced by the sites of rules. As another example, Corollary 4.3 states that the class of g-markedsystems generating regular circular languages is closed under the complement operation (appliedto systems). A natural question which arises is whether this result can be extended to a larger

21

class of systems (and to a larger set of closure properties). We can also ask whether the newviewpoint of bringing together splicing and graph theory can lead to new results for the largerclass of CSSH systems.

Finally, as we have already stated, there exists a transformation between marked systemsand weakened marked systems. We have pointed out that this transformation, with Proposition4.3 and Theorem 4.2, allows us to solve Problem 1.1 for the latter class of systems. We shouldask whether this technique can be extended to other classes of systems (for instance, circularsimple systems).

ACKNOWLEDGEMENTS

The authors wish to thank the anonymous referees for useful suggestions.

References

[1] L. M. Adleman, Molecular computation of solutions to combinatorial problems, Sci-ence 226 (1994) 1021-1024.

[2] J. Berstel, D. Perrin, Theory of Codes, Academic Press, New York, 1985.

[3] J. Berstel, A. Restivo, Codes et sousmonoides fermes par conjugaison, Sem. LITP81-45 (1981) 10 pages.

[4] P. Bonizzoni, C. De Felice, G. Mauri, R. Zizza, DNA and circular splicing, in: (A.Condon, G. Rozenberg, Eds.) Proc. DNA 2000, Lecture Notes in Computer Science2054 (2001) 117-129.

[5] P. Bonizzoni, C. De Felice, G. Mauri, R. Zizza, Decision problems on linear andcircular splicing, in: (M. Ito, M. Toyama, Eds.) Proc. DLT 2002, Lecture Notes inComputer Science 2450 (2003) 78-92.

[6] P. Bonizzoni, C. De Felice, G. Mauri, R. Zizza, Circular splicing and regularity, The-oretical Informatics and Applications 38 (2004) 189-228.

[7] P. Bonizzoni, C. De Felice, G. Mauri, R. Zizza, On the power of circular splicing,Discrete Applied Mathematics 150 (2005) 51-66.

[8] P. Bonizzoni, C. De Felice, R. Zizza, On circular semi-simple splicing systems, sub-mitted, 2009.

[9] A. Brandstadt, V. B. Le, J. Spinrad, Graph Classes: a Survey, SIAM Monographs onDiscrete Matematics and Applications, 1999.

[10] R. Ceterchi, C. Martin-Vide, K.G. Subramanian, On some classes of splicing lan-guages, in: (N. Jonoska, G. Paun, G. Rozenberg Eds.) Aspects of Molecular Comput-ing: Essays in Honor of the 70th Birthday of Tom Head, Lecture Notes in ComputerScience 2950 (2004) 83-104.

[11] R. Ceterchi, K. G. Subramanian, Simple circular splicing systems, Romanian Journalof Information Science and Technology 6 (2003) 121-134.

[12] C. Choffrut, J. Karhumaki, Combinatorics on words, in: (G. Rozenberg, A. Salomaa,Eds.) Handbook of Formal Languages, Vol. 1, 329-438, Springer-Verlag, 1996.

22

[13] T. H. Cormen, C. E. Leiserson, R. L. Rivest, C. Stein, Introduction to Algorithms,Second Edition, The MIT Press, McGraw-Hill Book Company, 2001.

[14] C. De Felice, G. Fici, R. Zizza, Marked systems and circular splicing, in: (E. Csuhaj-Varju, Z. Esik, Eds.) Proc. FCT 2007, Lecture Notes in Computer Science 4639(2007) 238-249.

[15] C. De Felice, G. Fici, R. Zizza, A characterization of regular circular languages gen-erated by marked splicing systems, submitted (2008).

[16] S. Eilenberg, Automata, Languages and Machines, Volume A, Academic Press, NewYork, 1974.

[17] I. Fagnot, Simple circular splicing systems, in: Preproc. of Dixieme Journees Mon-toises d’Informatique Theorique, Liege (2004).

[18] E. Goode, D. Pixton, Semi-simple splicing systems in: (C. Martine-Vide, V. Mitrana,Eds.) Where Mathematics, Computer Science, Linguistics and Biology Meet, 343-357,Kluwer Academic Publ., Dordrecht (2001).

[19] T. Head, Formal language theory and DNA: an analysis of the generative capacity ofspecific recombinant behaviours, Bulletin of Mathematical Biology 49 (1987) 737-759.

[20] T. Head, Splicing schemes and DNA, in: Lindenmayer Systems: Impacts on Theo-retical Computer Science and Developmental Biology, Springer-Verlag, Berlin (1992)371-383.

[21] T. Head, Circular suggestions for DNA computing, in: (A. Carbone, M. Gromov,P. Pruzinkiewicz, Eds.) Pattern Formation in Biology, Vision and Dynamics, WorldScientific, Singapore and London (1999).

[22] T. Head, G. Paun, D. Pixton, Language theory and molecular genetics: generativemechanisms suggested by DNA recombination, in: (G. Rozenberg, A. Salomaa, Eds.)Handbook of Formal Languages, Vol. 2, 295-360, Springer Verlag, 1996.

[23] T. Head, G. Rozenberg, R. Bladergroen, C. Breek, P. Lommerse, H. Spaink, Com-puting with DNA by operating on plasmids, BioSystems 57 (2000) 87-93.

[24] J.E. Hopcroft, J.D. Ullman, Introduction to Automata Theory, Languages, and Com-putation, Addison-Wesley, Reading, Mass., 1979.

[25] J.E. Hopcroft, R. Motwani, J.D. Ullman, Introduction to Automata Theory, Lan-guages, and Computation, Addison-Wesley, Reading, Mass., 2001.

[26] M. Lothaire, Combinatorics on Words, Encyclopedia of Mathematics and its Appli-cations, Addison Wesley Publishing Company, 1983.

[27] A. Mateescu, G. Paun, G. Rozenberg, A. Salomaa, Simple splicing systems, DiscreteApplied Mathematics 84 (1998) 145-163.

[28] G. Paun, Computing with membranes, Journal of Computer and System Sciences 61(2000) 108-143.

23

[29] G. Paun, G. Rozenberg, A. Salomaa, DNA Computing, New Computing Paradigms,Springer-Verlag, Berlin, 1998.

[30] D. Pixton, Linear and circular splicing systems, in: Proc. of the 1st InternationalSymposium on Intelligence in Neural and Biological Systems, IEEE Computer Society(1995) 181-188.

[31] D. Pixton, Regularity of splicing languages, Discrete Applied Mathematics 69 (1996)101-124.

[32] D. Pixton, Splicing in abstract families of languages, Theoretical Computer Science234 (2000) 135-166.

[33] C. Reis, G. Thierren, Reflective star languages and codes, Information and Control42 (1979) 1-9.

[34] R. Siromoney, Distributed circular systems, presented at: Grammar Systems 2000,Austria (2000).

[35] R. Siromoney, K.G. Subramanian, A. Dare, Circular DNA and splicing systems, in:Proc. of ICPIA, Lecture Notes in Computer Science 654 (1992) 260-273.

[36] T. Yokomori, S. Kobayaski, C. Ferretti, On the power of circular splicing systems andDNA computability, in: Proc. of IEEE ICEC99 (1999).

24