A Funny Proverb Generation System Based on Sukashi

10
A Funny Proverb Generation System Based on Sukashi Hiroaki Yamane and Masafumi Hagiwara The Department of Information and Computer Science, Keio University, Hiyoshi 3-14-1, Kohoku-ku, Yokohama, 223-8522 Japan {yamane,hagiwara}@soft.ics.keio.ac.jp Abstract. In this paper, we propose a system which produces funny proverbs. This system uses the punch line framework named Sukashi. That is, by changing the end of the line, the proposed system produces a funny sentence. In this system, we employ Google N-grams to make a lot of Sukashi candidates. After that, the system extracts parameters from each word in each candidate. We choose parameters such as words’ sounding, length, imageability, similarity and concrete level. The system selects candidates by using fuzzy rules. The performance of the proposed system has been evaluated by subjective experiments and obtained sat- isfactory results. Key words: Laugh, Text generation, Fuzzy rules, Sukashi 1 Introduction Laughing is an essential element for human beings. It makes people’s relation- ships better owing to humor [1]. Moreover, human beings are the species that look for funny things. Putting humor which raises a good laugh is one of the most in- telligent activities by human beings. Advancing this field not only contributes to deeper understanding of humankind, but also constructs more human-friendly interface. From various kinds of fields such as philosophy, literature, psychol- ogy and entertainment, etc., people has been studying laugh. For engineered approaches, “pun generation” has been a big topic [2][3]. However, as far as we know, few studies actually focused on the funny level of generated items. To deal with this problem, we focus on a kind of Japanese punch line named Sukashi[4]. We show the structure of Sukashi in Fig. 1. Owing to Sukashi, we are able to make these funny proverbs in simpler and more flexible ways. In this paper, we propose a system which produces funny proverbs by using Sukashi. The remain- der of the paper is organized as follows. We first describe the Sukashi generation system in Section. 2. Then we show experimental results in Section. 3. Finally, we conclude with the discussion and directions for future work in Section. 4. 2 An Automatic Funny Proverb Generation System Based on Sukashi We summarize the flow of the system in Fig. 2.

Transcript of A Funny Proverb Generation System Based on Sukashi

A Funny Proverb Generation SystemBased on Sukashi

Hiroaki Yamane and Masafumi Hagiwara

The Department of Information and Computer Science, Keio University,Hiyoshi 3-14-1, Kohoku-ku, Yokohama, 223-8522 Japan

{yamane,hagiwara}@soft.ics.keio.ac.jp

Abstract. In this paper, we propose a system which produces funnyproverbs. This system uses the punch line framework named Sukashi.That is, by changing the end of the line, the proposed system producesa funny sentence. In this system, we employ Google N-grams to makea lot of Sukashi candidates. After that, the system extracts parametersfrom each word in each candidate. We choose parameters such as words’sounding, length, imageability, similarity and concrete level. The systemselects candidates by using fuzzy rules. The performance of the proposedsystem has been evaluated by subjective experiments and obtained sat-isfactory results.

Key words: Laugh, Text generation, Fuzzy rules, Sukashi

1 Introduction

Laughing is an essential element for human beings. It makes people’s relation-ships better owing to humor [1]. Moreover, human beings are the species that lookfor funny things. Putting humor which raises a good laugh is one of the most in-telligent activities by human beings. Advancing this field not only contributes todeeper understanding of humankind, but also constructs more human-friendlyinterface. From various kinds of fields such as philosophy, literature, psychol-ogy and entertainment, etc., people has been studying laugh. For engineeredapproaches, “pun generation” has been a big topic [2][3]. However, as far as weknow, few studies actually focused on the funny level of generated items. To dealwith this problem, we focus on a kind of Japanese punch line named Sukashi[4].

We show the structure of Sukashi in Fig. 1. Owing to Sukashi, we are able tomake these funny proverbs in simpler and more flexible ways. In this paper, wepropose a system which produces funny proverbs by using Sukashi. The remain-der of the paper is organized as follows. We first describe the Sukashi generationsystem in Section. 2. Then we show experimental results in Section. 3. Finally,we conclude with the discussion and directions for future work in Section. 4.

2 An Automatic Funny Proverb Generation SystemBased on Sukashi

We summarize the flow of the system in Fig. 2.

藪から (Yabukara) 棒 (Bou)ボーナス (Bonus)

X YY’

From a bush A stick

BonusFig. 1. Structure of Sukashi. The meaning of the original proverb is “out of the blue”and is directly translated as “a stick from a bush. ” By changing the end word “Bou”to “Bonus”, it becomes funnier.

ProverbInput Sukashi Candidates GenerationAcquisition of Feature Parameters of Sukashi

Selection of Sukashi Candidates SukashiOutputFig. 2. Overall view of the proposed system

The proposed system consists of following five steps. That is, 1. ProverbInput 2. Sukashi Candidates Generation 3. Acquisition of Feature Parameters ofthe Candidates 4. Selection of Sukashi Candidates 5. Sukashi Output. We willexplain each step in more detail.

2.1 Sukashi Candidates Generation

In order to generate Sukashi candidates, the input proverbs are divided into twoparts, X and Y . We divide them by the difference of part of speech. For example,in Fig. 1, X is before noun “bou” and the rest is Y . The proposed system checksthe input proverbs whether each part of them has a noun or a verb using Japanesemorphological analyzer called MeCab [5], then checks whether the last of partof speech has Kanji, the system extracts X from them. After that, for these Xthe system finds sentences which are suitable for them to concatenate by usingJapanese Google N-grams Corpus [6].

2.2 Acquisition of Feature Parameters of the Candidates

The following five parameters are employed in the proposed system.

a) The number of characters and accordance of vowels in punch linesb) Difference of soundsc) Imageabilityd) Similarity of wordse) Concrete level

Person (Expert) Person (Especially Technic) Person (Medical Services) Doctor Pharmacist Nurse

Noun Concrete Abstract Agent … Journalist … …

… Others Lawyer …

Level 1 2 3 7 8 9 10

Fig. 3. Tree structure of Goi-Taikei. We see each word in each depth as concrete level

In a), we employ the number of characters and accordance of vowels in punchline as a parameter. According to our preliminary study, people tend to evaluateSukashi funnier if the head and the end vowels are the same as original ones. –If these Sukashis are romanized, the number of characters in Y ′ is less than thatof Y in the range of 1. – We set up the rule that if Sukashi candidates matchthese conditions, the system assumes that they are funny.

In b), we use difference of sounds as a parameter. As Oda [7] points out, ifthings are funny they may have points of similarity. We calculated the differenceof sounds by romanizing Sukashi and using Dynamic Programming [8].

In c), imageability is selected. Through the preliminary study, a numberof subjects commented things are funny because they are easy to imagine. Inaddition, from brain science field, high-imageability nouns activate large part ofhuman’s brain [9]. We used NTT Data Series Imageability dictionary [10] andextracted a parameter.

In d), we employ a parameter, which deals with similarity of words. Odapoints out that “sudden change of idea or behavior” and “unjustified expansionof idea or behavior” are essential elements for laughing. Therefore, a punchline needs to be different from original one. In the proposed system, by usingComputational System for the Similarity between Concepts(Words) [11]-[13],values are calculated.

In e), concrete level is employed as a parameter. Owing to the preliminarystudy, we assumed if things become more concrete, they will be funnier. Wedisplay a tree structure of Japanese word-relation dictionary, Goi-Taikei [14] inFig. 3. Hence, with measuring the depth of the tree structure in Goi-Taikei,concrete levels are granted.

2.3 Selection of Sukashi Candidates

Fig. 4 summarizes the flow of the selection of Sukashi candidates.

asdfdsaSukashi Candidates’ Parameter Input

Sukashi Output

YESNO

Fuzzy Rules A Fuzzy Rules B

Sort by Funny Level“Funnylevel>0” candidateexists

Fig. 4. Flow of fuzzy parts of the proposed system

Fuzzy Inference 3

Fuzzy Inference 1 Fuzzy Inference 2 Funny Level Imageability Similarity of Word Concrete Level

+ + +

Length of Sound Difference of Sound

Fig. 5. Fuzzy Rule A of the proposed system

First, the system acquires each parameter from each word in the punch lineof Sukashi candidates. In addition, if there are multiple values, the maximumone is selected. As Fig. 4 shows after acquiring values, fuzzy rules are applied.There are two kinds of fuzzy rules indicated by Fuzzy Rules A and Fuzzy RulesB. After applying the first one, the system outputs sorted Sukashi candidatesif there is one or more Sukashis whose Funny level is greater than zero. If not,Fuzzy Rules B is applied, which means the output is going to be the othersorted Sukashi candidates. We employ if-then rule [15] with direct method forcalculation. Fuzzy Rules A has 2 antecedent parts and B has 3. With centroidcomputation for the grades, final output values are calculated to estimate funnylevel such as “not so funny” and “funny”.

Fig. 5 represents the flow of the Fuzzy Rules A.

Rules are

– Rule A1If “Y is long” and “Difference of sounds is small” and “Imageability is high”then “Sukashi is funny”

Fuzzy Inference 6

Fuzzy Inference 4 Fuzzy Inference 5 Num. of Characters’ Accordance

Head Character Accordance

Imageability Similarity of Word Concrete Level

Funny Level + + + End Character Accordance

Fig. 6. Fuzzy Rule B of the proposed system

– Rule A2If “Y is long” and “Difference of sounds is small” and “Similarity of wordsis related term degree” then “Sukashi is funny”

– Rule A3If “Y is long” and “Difference of sounds is small” and “Concrete level ishigh” then “Sukashi is funny”

– Rule A4If “Y is middle long” and “Difference of sounds is small” and “Imageabilityis high” then “Sukashi is somewhat funny”

– Rule A5If “Y is middle long” and “Difference of sounds is small” and “Similarity ofwords is related term degree” then “Sukashi is somewhat funny”

– Rule A6If “Y is middle long” and “Difference of sounds is small” and “Concrete levelis high” then “Sukashi is somewhat funny”

We show the flow of the Fuzzy Rules B in Fig. 6. Here, the rules are givenpriority in order of Rule B1-B3, B4-B6, B7-B9.

– Rule B1If “Head of Y and that of Y ′ are the same” and “Imageability is high” then“Sukashi is funny”

– Rule B2If “Head of Y and that of Y ′ are the same” and “Similarity of words isrelated term degree” then “Sukashi is funny”

– Rule B3If “Head of Y and that of Y ′ are the same” and “Concrete level is high”then “Sukashi is funny”

– Rule B4If “End of Y and that of Y ′ are the same” and “Imageability is high” then“Sukashi is funny

– Rule B5If “End of Y and that of Y ′ are the same” and “Similarity of words is relatedterm degree” then “Sukashi is funny

Number of Characters

Grade

0 1 2 3 4 5 6 7 8 9

1Short Middle Long Long

0 5 10 15 20 25Difference of Sound

Small Middle Large

Grade1

Y’ + Difference of Sounds

Fig. 7. Membership functions of Y’s length and difference of sounds

Imageability+Similarity

0 2 4 6 Imageability

Low Average High

1 3 5 7

Grade1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9Similarity

Related termdegree Somewhathigh High1.0

Grade1

Fig. 8. Membership functions of imageability and similarity

– Rule B6If “End of Y and that of Y ′ are the same” and “Concrete level is high” then“Sukashi is funny

– Rule B7If “Number of characters in Y and that of Y ′ are in the range of 1” and“Imageability is high” then “Sukashi is funny

– Rule B8If “Number of characters in Y and that of Y ′ are in the range of 1” and“Similarity of words is related term degree” then “Sukashi is funny

– Rule B9If “Number of characters in Y and that of Y ′ are in the range of 1” and“Concrete level is high” then “Sukashi is funny

We display the membership functions of length of Y and difference of soundsin Fig. 7, imageability and similarity in Fig. 8, concrete level and consequentpart in Fig. 9.

0 2 4 6 Concrete Level

Low Average High

ConcreteLevel+ConsequentPart

1 3 5 7 8 9 10 11

Grade1

Funny Level1 2Somewhathigh High

Grade1

Fig. 9. Membership functions of concrete level and consequent part

3 Experiments

We conducted verification experiments to examine funny level of each generatedSukashi by our system. The experiments were performed in terms of funny leveland unpredictable quality.

Funny Level5 points: Sukashi is funny4 points: Sukashi is somewhat funny3 points: Sukashi is average2 points: Sukashi is somewhat boring1 point: Sukashi is boring or not understandable

Unpredictable Quality3 points: Punch line of Sukashi is beyond expectation2 points: Punch line of Sukashi is predictable if you assume it is Sukashi1 point: Punch line of Sukashi is predictable because it is a normal sentence

3.1 Experiment Condition

We used 50 proverbs [16] as input for the system. In order to evaluate thequality of the generated Sukashis, we carried out Turing test like experiments.The system generated 291 Sukashis. We shuffled both of system-generated onesand human-made ones randomly on condition that the number of Sukashis foreach proverb is the same. Subjects evaluated 240 Sukashis in terms of funnylevel and unpredictable quality without knowing which is the system-generatedone. The number of subjects is 11. Table 1 illustrates some examples of Sukashiswhich were produced by the proposed system.

3.2 Experimental Result

Fig. 10 is the result of comparison with the funny level of system-generatedSukashis and human-made ones. As the figure demonstrates the system-generated

Table 1. Examples of system-generated Sukashis

鬼に金棒: An ogre with an iron club(Unbeatable advantage)漁夫の利: The fisherman’s profit(Gaining the third party’s profit gain)縁の下の力持ち: The strong person under the floor(The strong person working in the background)年寄りの冷や水: The cold water of an old person(A hustling old man to keep up with the young)焼け石に水: Sprinkling water to a burnt stone(Only a drop in the ocean)身から出た錆: The rust which comes out of a body(An ill life, an ill end)塞翁が馬: The horse of an old man living in the fort(Good can come out of a misfortune)知らぬが仏: The Buddha who doesn’t know the truth(Ignorance is bliss)

→→ → →→ → →→

鬼にカネボウ: An ogre with “Kanebo”“Kanebo” is a famous cosmetic company's product漁夫のリハビリ: The fisherman's rehabilitation縁の下の父: The father working in the background年寄りの鼻水: The snot of an old man焼け石に傷: Do hurt to a burnt stone身から出たわさび : Wasabi which comes out of a body“Wasabi” is Japanese spice塞翁がネタ: A old man living in the fort is newsworthy知らぬがほっとけょぅ: I don't know but leave me alone

Proverb SukashiOni ni KanabouGyofu no RiEnnoshita no ChikaramochiToshiyori no HiyamizuYakeishi ni MizuMi kara Deta SabiSaiou ga UmaShiranu ga Hotoke

Oni ni KanebouGyofu no RihabiriEnnoshita no ChichiToshiyori no HanamizuYakeishi ni KizuMi kara Deta WasabiSaiou ga "etaShiranu ga Hottokeyou

Oni ni Kanabou

Table 2. Difference of Funny level features

Human-made Sukashi System-generated Sukashi

Funny level on average 2.93 2.58Variance of funny level 1.19 1.08

Sukashis are comparable to the human-made ones. Table 2 shows funny featuresof both of them.

Fig. 11 represents the result of comparison with the unpredictable qualityof system-generated Sukashis and human-made ones. Unpredictable features ofthem is shown in Table 3.

4 Conclusion

In this paper, we proposed a new system which produces funny proverbs. Theproposed system uses the punch line framework named Sukashi. That is, by

Table 3. Difference of unpredictable quality features

Human-made Sukashi System-generated Sukashi

Unpredictable quality on average 2.14 2.01Variance of unpredictable quality 0.563 0.613

0100200300400500600

1 2 3 4 5 Rating

Total Evaluation Numbers Human-made SukashiSystem-generated Sukashi

Fig. 10. Comparison of funny level

0100200300400500600

1 2 3 Rating

Total Evaluation Numbers Human-made SukashiSystem-generated Sukashi

Fig. 11. Comparison of unpredictable quality

changing the end of the line, it produces a funny sentence. In the proposedsystem, we have employed Japanese Google N-grams to make a lot of Sukashicandidates. After that, the system extracts parameters from each word in eachcandidate. We choose parameters such as words’ sounding, length, imageability,similarity and concrete level. Then the system selects candidates by using fuzzyrules with centroid computations.

The performance has been evaluated by subjective experiments and theyshows that the system-generated Sukashis are comparable to human-made ones.

Finding a laughing framework itself – named Sukashi – is the first step. Webelieve the direction of this research will contribute to construct more human-friendly interfaces such as communication and entertainer robots.

Acknowledgments. We express our deepest appreciation for Prof. TsutomuIshikawa’s provision of the Computational System for the Similarity betweenConcepts(Words). Also we are grateful to Google and Gengo-Shigen-Kyokai(GSK) for Japanese Google N-grams.

References

1. M. De Boni, A. Richardson and R. Hurling, “Humour, Relationship Maintenanceand Personality Matching in Automated Dialogue: A Controlled Study, ” Interact.Comput., Vol. 20, No. 2, pp. 342-353, Nov. 2007

2. K. Binsted and O. Takizawa “ ’BOKE’ – A Japanese Punning Riddle Generator,” Journal of the Japanese Society for Artificial Intelligence, Vol. 13, No . 6, pp.920-927, Nov. 1997

3. A. Waller, R. Black, D. A. O’Mara, H. Pain, G. Ritchie, Manurung and Ruli“Evaluating the STANDUP Pun Generating Software with Children with CerebralPalsy, ” ACM Trans. Access. Comput., Vol. 1, No. 3, pp. 1-27, 2009

4. N. Fukui, “ The Techniques for Making Laugher – Laugh Makes Us Discover theWorld, ” Sekai Shisousya, 2002

5. “Yet Another Part-of-Speech and Morphological Analyzer”http://mecab.sourceforge.net/.

6. T. Kudo and H. Kazawa, “Japanese Google N-grams Vol.1, ” Gengo-Shigen-Kyokai(GSK)

7. S. Oda, “Laughing and Humor, ” Chikuma Shobo, 19868. S. B.Needleman and C. D. Wunsch, “A General Method Applicable to the Search

for Similarities in the Amino Acid Sequence of Two Proteins, ” Journal of MolecularBiology, Vol. 48, No. 3, pp. 443-453, 1970

9. D. Sabsevitz, D. Medler, M. Seidenberg and J. Binder, “Modulation of the SemanticSystem by Word Imageability, ” Neuro Image, Vol. 27, No. 1, pp. 188-200, 2005

10. N. Sakuma, M. Ijuin, T. Fushimi, I. Tatsumi, M. Tanaka, S. Amano and K. Kondo,“NTT Database Series ”Japanese Vocabulary Attribution” Vol.8 Word Imageabil-ity(1), ” Sanseido, 2005

11. Ishikawa Laboratory, Takushoku University, “Computational System for the Sim-ilarity between Concepts(Words), ”http://www.cs.takushoku-u.ac.jp/ai/ruiji/Similarity System.cgi.

12. Y. Noguchi, R. Shimizu, K. Sugimoto and T. Ishikawa, “An Improved Tool forMeasuring Semantic Similarity between Words, ”The 69th National Convention of Information Processing Society of Japan, Vol. 2,pp. 2545-2546, Mar. 2007

13. T. Kawashima, T. Ishikawa, “An Evaluation of Knowledge Base of Words and The-saurus on Measuring the Semantic Similarity between Words, ” The 18th AnnualConference of the Japanese Society for Artificial Intelligence, Vol. 18, pp. 2D2-10,2004

14. Iwanami Shoten “Nihongo Goi-Taikei CD-ROM, ”http://www.kecl.ntt.co.jp/mtg/resources/GoiTaikei/.

15. M. Sugeno “Fuzzy Control, ” Nikkan Kogyo, 198816. T. Kurogo “Kurogo’s Proverb Dictionary, ”

http://www.geocities.jp/tomomi965/.