A Funny Proverb Generation System Based on Sukashi
Transcript of A Funny Proverb Generation System Based on Sukashi
A Funny Proverb Generation SystemBased on Sukashi
Hiroaki Yamane and Masafumi Hagiwara
The Department of Information and Computer Science, Keio University,Hiyoshi 3-14-1, Kohoku-ku, Yokohama, 223-8522 Japan
{yamane,hagiwara}@soft.ics.keio.ac.jp
Abstract. In this paper, we propose a system which produces funnyproverbs. This system uses the punch line framework named Sukashi.That is, by changing the end of the line, the proposed system producesa funny sentence. In this system, we employ Google N-grams to makea lot of Sukashi candidates. After that, the system extracts parametersfrom each word in each candidate. We choose parameters such as words’sounding, length, imageability, similarity and concrete level. The systemselects candidates by using fuzzy rules. The performance of the proposedsystem has been evaluated by subjective experiments and obtained sat-isfactory results.
Key words: Laugh, Text generation, Fuzzy rules, Sukashi
1 Introduction
Laughing is an essential element for human beings. It makes people’s relation-ships better owing to humor [1]. Moreover, human beings are the species that lookfor funny things. Putting humor which raises a good laugh is one of the most in-telligent activities by human beings. Advancing this field not only contributes todeeper understanding of humankind, but also constructs more human-friendlyinterface. From various kinds of fields such as philosophy, literature, psychol-ogy and entertainment, etc., people has been studying laugh. For engineeredapproaches, “pun generation” has been a big topic [2][3]. However, as far as weknow, few studies actually focused on the funny level of generated items. To dealwith this problem, we focus on a kind of Japanese punch line named Sukashi[4].
We show the structure of Sukashi in Fig. 1. Owing to Sukashi, we are able tomake these funny proverbs in simpler and more flexible ways. In this paper, wepropose a system which produces funny proverbs by using Sukashi. The remain-der of the paper is organized as follows. We first describe the Sukashi generationsystem in Section. 2. Then we show experimental results in Section. 3. Finally,we conclude with the discussion and directions for future work in Section. 4.
2 An Automatic Funny Proverb Generation SystemBased on Sukashi
We summarize the flow of the system in Fig. 2.
藪から (Yabukara) 棒 (Bou)ボーナス (Bonus)
X YY’
From a bush A stick
BonusFig. 1. Structure of Sukashi. The meaning of the original proverb is “out of the blue”and is directly translated as “a stick from a bush. ” By changing the end word “Bou”to “Bonus”, it becomes funnier.
ProverbInput Sukashi Candidates GenerationAcquisition of Feature Parameters of Sukashi
Selection of Sukashi Candidates SukashiOutputFig. 2. Overall view of the proposed system
The proposed system consists of following five steps. That is, 1. ProverbInput 2. Sukashi Candidates Generation 3. Acquisition of Feature Parameters ofthe Candidates 4. Selection of Sukashi Candidates 5. Sukashi Output. We willexplain each step in more detail.
2.1 Sukashi Candidates Generation
In order to generate Sukashi candidates, the input proverbs are divided into twoparts, X and Y . We divide them by the difference of part of speech. For example,in Fig. 1, X is before noun “bou” and the rest is Y . The proposed system checksthe input proverbs whether each part of them has a noun or a verb using Japanesemorphological analyzer called MeCab [5], then checks whether the last of partof speech has Kanji, the system extracts X from them. After that, for these Xthe system finds sentences which are suitable for them to concatenate by usingJapanese Google N-grams Corpus [6].
2.2 Acquisition of Feature Parameters of the Candidates
The following five parameters are employed in the proposed system.
a) The number of characters and accordance of vowels in punch linesb) Difference of soundsc) Imageabilityd) Similarity of wordse) Concrete level
Person (Expert) Person (Especially Technic) Person (Medical Services) Doctor Pharmacist Nurse
Noun Concrete Abstract Agent … Journalist … …
… Others Lawyer …
Level 1 2 3 7 8 9 10
…
Fig. 3. Tree structure of Goi-Taikei. We see each word in each depth as concrete level
In a), we employ the number of characters and accordance of vowels in punchline as a parameter. According to our preliminary study, people tend to evaluateSukashi funnier if the head and the end vowels are the same as original ones. –If these Sukashis are romanized, the number of characters in Y ′ is less than thatof Y in the range of 1. – We set up the rule that if Sukashi candidates matchthese conditions, the system assumes that they are funny.
In b), we use difference of sounds as a parameter. As Oda [7] points out, ifthings are funny they may have points of similarity. We calculated the differenceof sounds by romanizing Sukashi and using Dynamic Programming [8].
In c), imageability is selected. Through the preliminary study, a numberof subjects commented things are funny because they are easy to imagine. Inaddition, from brain science field, high-imageability nouns activate large part ofhuman’s brain [9]. We used NTT Data Series Imageability dictionary [10] andextracted a parameter.
In d), we employ a parameter, which deals with similarity of words. Odapoints out that “sudden change of idea or behavior” and “unjustified expansionof idea or behavior” are essential elements for laughing. Therefore, a punchline needs to be different from original one. In the proposed system, by usingComputational System for the Similarity between Concepts(Words) [11]-[13],values are calculated.
In e), concrete level is employed as a parameter. Owing to the preliminarystudy, we assumed if things become more concrete, they will be funnier. Wedisplay a tree structure of Japanese word-relation dictionary, Goi-Taikei [14] inFig. 3. Hence, with measuring the depth of the tree structure in Goi-Taikei,concrete levels are granted.
2.3 Selection of Sukashi Candidates
Fig. 4 summarizes the flow of the selection of Sukashi candidates.
asdfdsaSukashi Candidates’ Parameter Input
Sukashi Output
YESNO
Fuzzy Rules A Fuzzy Rules B
Sort by Funny Level“Funnylevel>0” candidateexists
Fig. 4. Flow of fuzzy parts of the proposed system
Fuzzy Inference 3
Fuzzy Inference 1 Fuzzy Inference 2 Funny Level Imageability Similarity of Word Concrete Level
+ + +
Length of Sound Difference of Sound
Fig. 5. Fuzzy Rule A of the proposed system
First, the system acquires each parameter from each word in the punch lineof Sukashi candidates. In addition, if there are multiple values, the maximumone is selected. As Fig. 4 shows after acquiring values, fuzzy rules are applied.There are two kinds of fuzzy rules indicated by Fuzzy Rules A and Fuzzy RulesB. After applying the first one, the system outputs sorted Sukashi candidatesif there is one or more Sukashis whose Funny level is greater than zero. If not,Fuzzy Rules B is applied, which means the output is going to be the othersorted Sukashi candidates. We employ if-then rule [15] with direct method forcalculation. Fuzzy Rules A has 2 antecedent parts and B has 3. With centroidcomputation for the grades, final output values are calculated to estimate funnylevel such as “not so funny” and “funny”.
Fig. 5 represents the flow of the Fuzzy Rules A.
Rules are
– Rule A1If “Y is long” and “Difference of sounds is small” and “Imageability is high”then “Sukashi is funny”
Fuzzy Inference 6
Fuzzy Inference 4 Fuzzy Inference 5 Num. of Characters’ Accordance
Head Character Accordance
Imageability Similarity of Word Concrete Level
Funny Level + + + End Character Accordance
Fig. 6. Fuzzy Rule B of the proposed system
– Rule A2If “Y is long” and “Difference of sounds is small” and “Similarity of wordsis related term degree” then “Sukashi is funny”
– Rule A3If “Y is long” and “Difference of sounds is small” and “Concrete level ishigh” then “Sukashi is funny”
– Rule A4If “Y is middle long” and “Difference of sounds is small” and “Imageabilityis high” then “Sukashi is somewhat funny”
– Rule A5If “Y is middle long” and “Difference of sounds is small” and “Similarity ofwords is related term degree” then “Sukashi is somewhat funny”
– Rule A6If “Y is middle long” and “Difference of sounds is small” and “Concrete levelis high” then “Sukashi is somewhat funny”
We show the flow of the Fuzzy Rules B in Fig. 6. Here, the rules are givenpriority in order of Rule B1-B3, B4-B6, B7-B9.
– Rule B1If “Head of Y and that of Y ′ are the same” and “Imageability is high” then“Sukashi is funny”
– Rule B2If “Head of Y and that of Y ′ are the same” and “Similarity of words isrelated term degree” then “Sukashi is funny”
– Rule B3If “Head of Y and that of Y ′ are the same” and “Concrete level is high”then “Sukashi is funny”
– Rule B4If “End of Y and that of Y ′ are the same” and “Imageability is high” then“Sukashi is funny
– Rule B5If “End of Y and that of Y ′ are the same” and “Similarity of words is relatedterm degree” then “Sukashi is funny
Number of Characters
Grade
0 1 2 3 4 5 6 7 8 9
1Short Middle Long Long
0 5 10 15 20 25Difference of Sound
Small Middle Large
Grade1
Y’ + Difference of Sounds
Fig. 7. Membership functions of Y’s length and difference of sounds
Imageability+Similarity
0 2 4 6 Imageability
Low Average High
1 3 5 7
Grade1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9Similarity
Related termdegree Somewhathigh High1.0
Grade1
Fig. 8. Membership functions of imageability and similarity
– Rule B6If “End of Y and that of Y ′ are the same” and “Concrete level is high” then“Sukashi is funny
– Rule B7If “Number of characters in Y and that of Y ′ are in the range of 1” and“Imageability is high” then “Sukashi is funny
– Rule B8If “Number of characters in Y and that of Y ′ are in the range of 1” and“Similarity of words is related term degree” then “Sukashi is funny
– Rule B9If “Number of characters in Y and that of Y ′ are in the range of 1” and“Concrete level is high” then “Sukashi is funny
We display the membership functions of length of Y and difference of soundsin Fig. 7, imageability and similarity in Fig. 8, concrete level and consequentpart in Fig. 9.
0 2 4 6 Concrete Level
Low Average High
ConcreteLevel+ConsequentPart
1 3 5 7 8 9 10 11
Grade1
Funny Level1 2Somewhathigh High
Grade1
Fig. 9. Membership functions of concrete level and consequent part
3 Experiments
We conducted verification experiments to examine funny level of each generatedSukashi by our system. The experiments were performed in terms of funny leveland unpredictable quality.
Funny Level5 points: Sukashi is funny4 points: Sukashi is somewhat funny3 points: Sukashi is average2 points: Sukashi is somewhat boring1 point: Sukashi is boring or not understandable
Unpredictable Quality3 points: Punch line of Sukashi is beyond expectation2 points: Punch line of Sukashi is predictable if you assume it is Sukashi1 point: Punch line of Sukashi is predictable because it is a normal sentence
3.1 Experiment Condition
We used 50 proverbs [16] as input for the system. In order to evaluate thequality of the generated Sukashis, we carried out Turing test like experiments.The system generated 291 Sukashis. We shuffled both of system-generated onesand human-made ones randomly on condition that the number of Sukashis foreach proverb is the same. Subjects evaluated 240 Sukashis in terms of funnylevel and unpredictable quality without knowing which is the system-generatedone. The number of subjects is 11. Table 1 illustrates some examples of Sukashiswhich were produced by the proposed system.
3.2 Experimental Result
Fig. 10 is the result of comparison with the funny level of system-generatedSukashis and human-made ones. As the figure demonstrates the system-generated
Table 1. Examples of system-generated Sukashis
鬼に金棒: An ogre with an iron club(Unbeatable advantage)漁夫の利: The fisherman’s profit(Gaining the third party’s profit gain)縁の下の力持ち: The strong person under the floor(The strong person working in the background)年寄りの冷や水: The cold water of an old person(A hustling old man to keep up with the young)焼け石に水: Sprinkling water to a burnt stone(Only a drop in the ocean)身から出た錆: The rust which comes out of a body(An ill life, an ill end)塞翁が馬: The horse of an old man living in the fort(Good can come out of a misfortune)知らぬが仏: The Buddha who doesn’t know the truth(Ignorance is bliss)
→→ → →→ → →→
鬼にカネボウ: An ogre with “Kanebo”“Kanebo” is a famous cosmetic company's product漁夫のリハビリ: The fisherman's rehabilitation縁の下の父: The father working in the background年寄りの鼻水: The snot of an old man焼け石に傷: Do hurt to a burnt stone身から出たわさび : Wasabi which comes out of a body“Wasabi” is Japanese spice塞翁がネタ: A old man living in the fort is newsworthy知らぬがほっとけょぅ: I don't know but leave me alone
Proverb SukashiOni ni KanabouGyofu no RiEnnoshita no ChikaramochiToshiyori no HiyamizuYakeishi ni MizuMi kara Deta SabiSaiou ga UmaShiranu ga Hotoke
Oni ni KanebouGyofu no RihabiriEnnoshita no ChichiToshiyori no HanamizuYakeishi ni KizuMi kara Deta WasabiSaiou ga "etaShiranu ga Hottokeyou
Oni ni Kanabou
Table 2. Difference of Funny level features
Human-made Sukashi System-generated Sukashi
Funny level on average 2.93 2.58Variance of funny level 1.19 1.08
Sukashis are comparable to the human-made ones. Table 2 shows funny featuresof both of them.
Fig. 11 represents the result of comparison with the unpredictable qualityof system-generated Sukashis and human-made ones. Unpredictable features ofthem is shown in Table 3.
4 Conclusion
In this paper, we proposed a new system which produces funny proverbs. Theproposed system uses the punch line framework named Sukashi. That is, by
Table 3. Difference of unpredictable quality features
Human-made Sukashi System-generated Sukashi
Unpredictable quality on average 2.14 2.01Variance of unpredictable quality 0.563 0.613
0100200300400500600
1 2 3 4 5 Rating
Total Evaluation Numbers Human-made SukashiSystem-generated Sukashi
Fig. 10. Comparison of funny level
0100200300400500600
1 2 3 Rating
Total Evaluation Numbers Human-made SukashiSystem-generated Sukashi
Fig. 11. Comparison of unpredictable quality
changing the end of the line, it produces a funny sentence. In the proposedsystem, we have employed Japanese Google N-grams to make a lot of Sukashicandidates. After that, the system extracts parameters from each word in eachcandidate. We choose parameters such as words’ sounding, length, imageability,similarity and concrete level. Then the system selects candidates by using fuzzyrules with centroid computations.
The performance has been evaluated by subjective experiments and theyshows that the system-generated Sukashis are comparable to human-made ones.
Finding a laughing framework itself – named Sukashi – is the first step. Webelieve the direction of this research will contribute to construct more human-friendly interfaces such as communication and entertainer robots.
Acknowledgments. We express our deepest appreciation for Prof. TsutomuIshikawa’s provision of the Computational System for the Similarity betweenConcepts(Words). Also we are grateful to Google and Gengo-Shigen-Kyokai(GSK) for Japanese Google N-grams.
References
1. M. De Boni, A. Richardson and R. Hurling, “Humour, Relationship Maintenanceand Personality Matching in Automated Dialogue: A Controlled Study, ” Interact.Comput., Vol. 20, No. 2, pp. 342-353, Nov. 2007
2. K. Binsted and O. Takizawa “ ’BOKE’ – A Japanese Punning Riddle Generator,” Journal of the Japanese Society for Artificial Intelligence, Vol. 13, No . 6, pp.920-927, Nov. 1997
3. A. Waller, R. Black, D. A. O’Mara, H. Pain, G. Ritchie, Manurung and Ruli“Evaluating the STANDUP Pun Generating Software with Children with CerebralPalsy, ” ACM Trans. Access. Comput., Vol. 1, No. 3, pp. 1-27, 2009
4. N. Fukui, “ The Techniques for Making Laugher – Laugh Makes Us Discover theWorld, ” Sekai Shisousya, 2002
5. “Yet Another Part-of-Speech and Morphological Analyzer”http://mecab.sourceforge.net/.
6. T. Kudo and H. Kazawa, “Japanese Google N-grams Vol.1, ” Gengo-Shigen-Kyokai(GSK)
7. S. Oda, “Laughing and Humor, ” Chikuma Shobo, 19868. S. B.Needleman and C. D. Wunsch, “A General Method Applicable to the Search
for Similarities in the Amino Acid Sequence of Two Proteins, ” Journal of MolecularBiology, Vol. 48, No. 3, pp. 443-453, 1970
9. D. Sabsevitz, D. Medler, M. Seidenberg and J. Binder, “Modulation of the SemanticSystem by Word Imageability, ” Neuro Image, Vol. 27, No. 1, pp. 188-200, 2005
10. N. Sakuma, M. Ijuin, T. Fushimi, I. Tatsumi, M. Tanaka, S. Amano and K. Kondo,“NTT Database Series ”Japanese Vocabulary Attribution” Vol.8 Word Imageabil-ity(1), ” Sanseido, 2005
11. Ishikawa Laboratory, Takushoku University, “Computational System for the Sim-ilarity between Concepts(Words), ”http://www.cs.takushoku-u.ac.jp/ai/ruiji/Similarity System.cgi.
12. Y. Noguchi, R. Shimizu, K. Sugimoto and T. Ishikawa, “An Improved Tool forMeasuring Semantic Similarity between Words, ”The 69th National Convention of Information Processing Society of Japan, Vol. 2,pp. 2545-2546, Mar. 2007
13. T. Kawashima, T. Ishikawa, “An Evaluation of Knowledge Base of Words and The-saurus on Measuring the Semantic Similarity between Words, ” The 18th AnnualConference of the Japanese Society for Artificial Intelligence, Vol. 18, pp. 2D2-10,2004
14. Iwanami Shoten “Nihongo Goi-Taikei CD-ROM, ”http://www.kecl.ntt.co.jp/mtg/resources/GoiTaikei/.
15. M. Sugeno “Fuzzy Control, ” Nikkan Kogyo, 198816. T. Kurogo “Kurogo’s Proverb Dictionary, ”
http://www.geocities.jp/tomomi965/.