FilCon: Filipino Sentiment Lexicon Generation using Word Level-annotated Dictionary-based and...
-
Upload
up-diliman -
Category
Documents
-
view
0 -
download
0
Transcript of FilCon: Filipino Sentiment Lexicon Generation using Word Level-annotated Dictionary-based and...
FilCon: Filipino Sentiment Lexicon Generation Using Word Level-Annotated Dictionary-
Based and Corpus-Based Cross Lingual Approach
A Thesis
Presented to the
Department of Computer Science
Institute of Information and Computing Sciences
University of Santo Tomas
In Partial Fulfillment
of the Requirement for the Degree
Bachelor of Science in Computer Science
By
Domingo, Darlan Keen S.
King, Nyker Matthew C.
Lopez, Jerome Lorenzo L.
Mondares, Alexandra C.
Adviser:
Ponay, Charmaine S.
November 2014
v
Acknowledgements
First of all, we would like to thank the Almighty God for giving us the strength,
wisdom, knowledge and perseverance to make this great endeavor possible. We lift up all
praises and glory to Him.
We would like to express our deepest gratitude to our dear adviser, Ms. Charmaine
Ponay, for guiding us throughout this endeavor. Our coordinator, Assoc. Prof. Perla Cosme
for contributing her knowledge and ideas for the improvement of this study.
Lastly, we would like to share the success and thank our families and friends for all
the support and understanding they have extended to us all.
vi
Abstract
The FilCon is a generated subjective lexicon for the Filipino language. It contains
22,380 words primarily based from a Filipino-English bilingual dictionary and aligned with
SentiWordNet 3.0’s polarity values. The existing identical entries from the first lexicon
generation iteration are filtered according to their translation accuracy produced using
Moses training sets. The final iteration involves sorting of words and the generation of final
output of a .txt file according to SentiWordNet’s format having: POS, ID, POS_SCORE,
NEG_SCORE, ENG_WORD and FIL_WORD. The produced results were analyzed and
interpreted by applying FilCon into a subjective classification of opinions with various test
cases and calculating the lexicon’s overall accuracy. The overall accuracy of FilCon
depended on the sentence sentiment analysis. Based on the results, 38.18% was the
accuracy.
vii
Table of Contents
Page
Acknowledgements
Chapter I: The Problem and Its Background ...................................................................... 1
A. Introduction ................................................................................................................ 1
B. Background of the Study ............................................................................................ 2
C. Theoretical Framework............................................................................................... 3
D. Conceptual Framework .............................................................................................. 7
E. Statement of the problem ............................................................................................ 9
F. Objectives of the Study ............................................................................................... 9
G. Scope and Limitations .............................................................................................. 11
H. Significance of the Study ......................................................................................... 12
I. Definition of Terms.................................................................................................... 13
Chapter II: Review of Related Literature and Studies ...................................................... 29
Approval Sheet....................................................................................................................ii
Certificate of Authenticity and Originality ........................................................................ iii
............................................................................................................ iv
Abstract ............................................................................................................................... v
Table of Contents ...............................................................................................................vi
List of Figures .................................................................................................................... ix
List of Tables ...................................................................................................................... x
viii
A. Related Literature ..................................................................................................... 29
1. Natural Language Processing Overview ............................................................... 29
2. Subjectivity ............................................................................................................ 32
3. Opinion Lexicon Generator ................................................................................... 34
B. Related Studies ......................................................................................................... 37
1. A Bengali (Bangla) SentiWordNet ........................................................................ 37
2. A Hindi SentiWordNet ......................................................................................... 38
3. An Urdu Text Sentiment Analyzer ........................................................................ 39
C. Generalization ........................................................................................................... 42
Chapter III: Research Design and Methodology .............................................................. 43
A. Hypothesis ................................................................................................................ 43
1. Assumptions .......................................................................................................... 43
B. Research Methods..................................................................................................... 43
C. Research Design ....................................................................................................... 44
1. Data Mining ........................................................................................................... 45
2. Words Translation ................................................................................................. 45
3. Cross-lingual Projection ........................................................................................ 45
4. Corpus Training ..................................................................................................... 45
5. Word Filtering ....................................................................................................... 46
6. FilCon Generation ................................................................................................. 46
ix
Chapter IV: Analysis and Interpretation ........................................................................... 47
A. Presentation of the Solution ..................................................................................... 47
1. System Architecture .............................................................................................. 47
2. Description of Modules and Interfaces .................................................................. 48
B. Results of Conducted Research ................................................................................ 49
C. Analysis and Interpretation of Results...................................................................... 50
D. Summary of Findings ............................................................................................... 52
Chapter V: Summary, Conclusions, and Recommendations ............................................ 53
A. Summary .................................................................................................................. 53
B. Conclusions .............................................................................................................. 54
C. Recommendations .................................................................................................... 54
Bibliography ..................................................................................................................... 56
Appendix A: Source Code ................................................................................................ 60
Appendix B: Filcon Entries ............................................................................................. 67
Appendix C: Test Cases .................................................................................................... 68
Appendix D: Summary Tabulation of Test Cases ............................................................ 69
Appendix E: Questionnaire ............................................................................................... 73
Curriculum Vitae .............................................................................................................. 78
x
List of Figures
Page
Figure I-1: Theoretical Framework of the Study ............... 4
Figure I-2: Hamouda and Rohaim’s (2011) Review ............... 6
Figure I-3: Conceptual Framework of the Study ............... 7
Figure II-1: The Natural Language Processing System Flow ............... 29
Figure II-2: The Four Distinct Stages of NLP ............... 31
Figure II-3: Classification Process of Urdu Lexical Construction ............... 40
Figure IV-1: System Architecture of the Study ............... 47
Figure IV-2: Translation Accuracy of Filipino Words ............... 50
Figure IV-3: Formula for calculating FilCon’s accuracy ............... 51
Figure IV-4: T-test Formula for Paired Two Sample for Means ............... 51
xi
List of Tables
Page
Table II-1: Polarity-wise Performance using Bengali SentiWordNet ............... 38
Table II-2: Results of experimentation on both corpora (MR and PR) ............... 41
Table IV-1: Sentiments Confusion Matrix ............... 51
Table IV-2: Summary of the T-Test Computation ............... 52
Table D-1: Test Cases Evaluated (Sentiment Value) by Expert and
FilCon
............... 69
Table D-2: Test Cases Evaluation (Positive, Negative, Neutral) by
Expert and FilCon
............... 71
1
Chapter I
The Problem and Its Background
A. Introduction
Several researchers have conducted many new approaches and algorithms to
exactly animate a computer with human-conversational capabilities. These approaches has
led to the science of enabling such capabilities to occur known as Natural Language
Processing (NLP). NLP, according to Liddy (2001), “is a range of computational
techniques for analyzing and representing naturally occurring texts for the purpose of
achieving human-like language processing for a range of tasks or applications”.
A language could convey messages delivered through verbal or non-verbal
communication. A non-verbal communication contains textual information which can be
categorized into two, facts or objective information and opinions or subjective information
(Regaldo & Cheng, 2012). The subjectivity of a word can be determined by comparing to
an existing subjectivity scores of the lexicon. Subjectivity scores contain positivity,
negativity, and neutrality percentage values expressed in decimal form.
An example of an English lexicon is the WordNet, created by Princeton University
in 2006 for the purpose of having a lexical-conceptual database. This database contains
lexical units and their interrelationship (Bondoc, Garcia, Lacaden, Ping, & Borra, 2010).
One kind of WordNet is SentiWordNet. SentiWordNet is an enhanced lexical
resource for sentiment analysis and opinion mining. This lexicon contains subjectivity
scores, positivity, negativity, and neutrality polarity values.
2
This study was conducted in order to aid the studies in sentiment analysis. There
have been several subjectivity lexicon; however, the Filipino language polarity lexicon did
not exist yet. FilCon is another tool like SentiWordNet. FilCon is in Filipino language;
whereas, SentiWordNet is in English language. FilCon was constructed using a bilingual
dictionary aligned with the polarity scores provided by SentiWordNet. Since neither the
lexicon nor the bilingual dictionary provides information concerning the semantic meaning
of the identical entries, FilCon has to rely on the most probable sense in the target language
using Moses.
B. Background of the Study
Finding out people’s thoughts is an essential part of information gathering (Pang &
Lee, 2008). Opinion mining and sentiment analyses are tools used to find out what people
think. Sentiment analyses have been used by diverse disciplines and various industries in
the community. Commercial, academe, and information technology industries are using
sentiment analysis in analyzing customer and survey feedbacks.
Part of the sentiment analysis is subjectivity classification which describes a
sentence as to whether it is objective or subjective. Subjectivity is a bootstrapping approach.
It involves a high precision classifier to automatically identify subjective and objective
sentences. “It is based on manually collected lexical items and single words which are good
subjective clues.” (Moghaddam & Ester, 2012)
With the current advancements in natural language processing, most sentiment
analysis applications utilize a lexical resource that provides the scores needed to determine
the positivity or negativity of a sentiment.
3
Since most sentiment analysis applications did not have a Filipino lexicon as a basis
this study focused on generating a lexical resource in the Filipino language.
C. Theoretical Framework
Opinion lexicons, also called as subjective lexicons, are resources consisting of
words annotated with sentiment polarity scores or subjectivity.
Two of the most frequently used opinion lexicons are OpinionFinder and
SentiWordNet. OpinionFinder is a subjective lexicon. On the other hand, SentiWordNet is
a polarity lexicon. OpinionFinder was compiled from manually developed resources
augmented with entries learned from corpora. Its entries have been labeled for part of
speech including subjectivity in to which, words that appeared in most subjective context
are labeled strong while those that appear often are labeled weak.
The other lexicon, SentiWordNet, is created by Esuli and Sebastiani (2006) as a
subset of WordNet. SentiWordNet assigns to each synset of WordNet three sentiment
classifiers with scores of polarity, positive, negative, and objective.
Opinion lexicons were built using several approaches. The simplest approach that
has been attempted for building these opinion lexicons in a new language is the dictionary-
based approach, wherein the existing lexical resource is translated using a bilingual
dictionary.
In a study conducted by Mihalcea, Banea and Wiebe (2008), a dictionary-based
approach was used to generate a subjectivity lexicon for Romanian language. Starting with
the English opinion lexicon from the OpinionFinder, words were lemmatized for
4
translation using an English-Romanian dictionary. However, in some cases, translations
loses their subjectivity due to word ambiguity and lemmatization. Considering these
problems, the researchers sought to find the most probable translation for each English
entries with direct translation.
Figure I-1. Theoretical Framework of the Study
To evaluate the lexicon, two native speakers of Romanian annotated the subjectivity
of 150 randomly selected entries. The subjectivity of every entry was judged in the context
where it frequently appears on websites, accounting for its common meanings. Results
show that 123 entries are correctly translated.
English Opinion Lexicon
from OpinionFinder
Lemmatization of
Translation of Entries
Annotation of Subjectivity
Romanian-English
Bilingual Dictionary
Romanian
Subjectivity Lexicon
News Sources
from Web
Frequency comparison
and evaluation of entries
5
The Figure I-1 is a block diagram of Mihalcea, Banea and Wiebe’s (2011) study in
building Romanian Subjectivity Lexicon.
Furthermore, another study of Hamouda and Rohaim (2011) states that, the
SentiWordNet can also be used as an important resource for sentiment classification tasks.
These researchers conducted a study to classify sentiments and determine the subjectivity
of English reviews. As a preparation for the sentiment classification, the following
linguistic analysis processes were made for pre-defined input reviews:
Tokenization process: splits the text into very simple tokens such as numbers,
punctuation and words of different types.
Sentence Splitting process: segments the text into sentences. This module is
required for the Speech Tagging Process. The splitter uses a list of abbreviations to
help distinguish sentence-marking full stops from other kinds.
Speech Tagging Process: produces a part-of-speech tag as an annotation on each
word or symbol.
SentiWordNet Interpretation: matching the words with the SentiWordNet semantic
entries.
SentiWordNet Orientation: estimates the polarity score of a word in regards to a
sentiment class (negativity, positivity and neutrality) and its relative frequency of
appearance in that class is carried out using this method.
6
Figure I-2. Hamouda and Rohaim’s (2011) Review Sentiment Classification using
SentiWordNet
After the linguistic processes, the techniques to calculate the positive and negative
scores for each review were proposed. These are the ‘Sum on Review’ and ‘Average on
Sentence and Average on Review’. The results of the study extracted from these techniques
show significant improvement in the overall accuracy to be 67% and 68.63%.
7
D. Conceptual Framework
Data mining is defined as the computer-assisted process of searching and
modifying large sets of data followed by the extraction of its meaning (Alexander, n.d.).
The data mining in FilCon aim to retrieve Filipino words as FilCon's primary input used
for further processing.
Figure I – 3. Conceptual Framework of the Study
In the first text file generation containing the retrieved words, each Filipino word
was translated to English using a Filipino-English bilingual dictionary. The translated
Filipino Words
Annotation of Polarity
Scores
Retrieval of Filipino
Words
Filipino-English
Bilingual
Dictionary
SentiWordNet
Translation of Words
Removal of Identical
Entries using Moses
training
Evaluation of the FilCon
using native speakers
Trained Sets
Train corpus in Moses
Manual Translation of
Corpus
Filipino Sentences
Retrieval of Filipino
Corpus
8
words were cross-referenced with the same English entries but with annotated polarity
scores from SentiWordNet 3.0. These polarity scores were also retrieved and aligned
together with the corresponding Filipino entries.
The SentiWordNet 3.0, a basis of polarity scores describing the positivity and/or
negativity of a word, also provided the additional needed fields for FilCon such as the POS,
ID, POS_SCORE, NEG_SCORE, ENG_WORD and FIL_WORD
These processes could be compared with the Figure I-1 where the dictionary-based
approach was used. However, in the Romanian lexicon, the subjectivity of the words were
annotated according to the English entries retrieved from OpinionFinder while this study
used Filipino words which are annotated with polarity scores according to the English
entries from SentiWordNet.
Furthermore, the retrieval of the Filipino sentences served as the training corpora
for the language tool, Moses, to find the most probable translation for each entry thus,
eliminating identical entries. This task was the counterpart of the direct translation
involving the accuracy of the Romanian lexicon.
The FilCon generation applied processes of text analysis which are similar to the
approaches used in Figure I-2. For instance, tokenization, SentiWordNet interpretation, and
orientation were also be adopted by FilCon. The translated Filipino words were tokenized
and cross-referenced with the SentiWordNet English entries, aligning the proper polarity
scores from SentiWordNet entries to FilCon entries. The reconstructed lexicon was, then,
the output of the system which can be used in opinion classification.
9
E. Statement of the problem
In the field of Sentiment Analysis, the need for a lexical resource in different
languages as a basis for determining subjectivity remains an open domain. In this regard,
this study aim to generate a Filipino Subjective Lexicon, through application of the word
level-annotated dictionary-based and corpus-based cross lingual approach with the use of
a Filipino-English bilingual dictionary, Moses, and SentiWordNet. Specifically, it also
intended to answer the following questions:
1. Were the Filipino-English bilingual dictionary, Moses and SentiWordNet a
possible combination in generating a Filipino Subjective Lexicon?
2. What was the accuracy level of the translated words against the entries of
SentiWordNet?
3. What was the accuracy level of the generated Filipino Subjective Lexicon using
the Filipino-English bilingual dictionary, Moses and SentiWordNet?
4. Was there a significant difference between the sentiment scores of the generated
Filipino Subjective Lexicon through the use of the Filipino-English bilingual
dictionary, Moses and SentiWordNet and expert’s sentiment scores?
F. Objectives of the Study
The main objective of this study was to develop a Filipino Subjective lexical
resource with acceptable polarity scores by application of the word level-annotated
dictionary-based and corpus-based cross lingual approach through the use a Filipino-
10
English bilingual dictionary, Moses and SentiWordNet which could contribute to the
development of the Natural Language Processing, specifically, Sentiment Analysis, which
is an area in NLP.
The FilCon generation undergo the following different phases and modules:
a. Data Mining. In this phase, the researchers seek to retrieve Filipino words from a
Filipino-English bilingual dictionary to use as FilCon's primary input and to
retrieve Filipino-English sentences to serve as the training corpora for further
processing of lexicon entries.
b. Word Translation. The second phase indicated the translation of Filipino words
to English for the cross-referencing of entries with the SentiWordNet 3.0.
c. Cross-Lingual Projection. The translation of the Filipino words were matched
with the same English word found in the SentiWordNet entries. The aligned
polarity score of a SentiWordNet entry were retrieved and aligned with the same
semantic meaning in Filipino.
d. Corpus Training. The retrieved Filipino-English sentences were trained in Moses
to provide each Filipino entry with their corresponding translation accuracy. The
translation accuracy indicated the most probable translation for each word.
e. Word Filtering. This phase conveyed the acceptable polarity scores aligned with
the Filipino words retrieved in terms of identical entries. It also involved the
11
application of the translation accuracy to filter multiple entries with different
translations and part-of-speech.
G. Scope and Limitations
The study aimed to provide a lexical resource annotated with polarity scores of the
Filipino language using the word level-annotated dictionary-based and corpus-based cross
lingual approach through the use of a Filipino-English bilingual dictionary, Moses, and
SentiWordNet 3.0.
The study focused on the retrieved Filipino words as the primary input of FilCon.
The output of this study, FilCon, would only provide the following fields: (a) Part of Speech
(POS), (b) ID, (c) Positivity Score (POS_SCORE), (d) negativity score (NEG_SCORE),
(e) corresponding English translation of the Filipino word (ENG_WORD), and (f) the
Filipino word (FIL_WORD).
This study was only limited in providing its users the lexical resource and the
polarity scores determined through pairing of SentiWordNet English word’s polarity score
with its corresponding translated Filipino word. Sentiment analysis of scenarios and
instances would not be covered by the study, however, it may be used for the testing of
accuracy.
The researchers emphasized on the fact that this study would only help writers and
linguists construct their literary texts with consideration of having a good diction by
referring to the sentiment values. The researchers also made the study to aid programmers
develop a system geared towards an acceptable basis for Filipino sentiment analysis of their
programs. The sentiment values of the Filipino words generated by this study were based
on SentiWordNet 3.0 and, hence, based on their criteria.
12
H. Significance of the Study
According to a study conducted by Martha Perry, Face to Face (FtF)
communication is more preferred by couples rather than Computer Mediated
Communications (CMC); because CMC were said to contribute to misunderstandings and
frustration, which can lead to escalated conflict (Perry, 2010). If it were said that
technology bring people together (Chiala, 2013), then, this study could help clear out the
ambiguities between in CMCs and be able to further understand a person’s message.
The study aimed to provide a Filipino Subjective Lexicon (FilCon) to be used in
future Natural Language Processing applications. This study was able to give a significant
benefit to entities especially to:
1. Sentiment Analysis Systems Programmers
FilCon can provide programmers with the utmost accurate automated Filipino
lexicon in order to efficiently integrate the analysis of subjectivity of Filipino
context into their sentiment analysis systems.
2. Artificial Intelligence Systems Programmers
Artificial Intelligent (AI) systems mostly rely on subjectivity lexicon (pre-
learned) and machine learned experience. With subjectivity lexicon mostly
made from the English language, it would be of a great milestone to be able to
develop an AI system in the Filipino language.
13
I. Definition of Terms
ENG_WORD. The translated word of a Filipino word using a Filipino-English bilingual
dictionary.
FIL_WORD. The corresponding Filipino word retrieved.
ID. The corresponding word identification number assigned by SentiWordNet to the synset.
Natural Language Processing. “Range of computational techniques for analyzing and
representing naturally occurring texts for the purpose of achieving human-like
language processing for a range of tasks or applications.” (Liddy, 2001)
NEG_SCORE. It is the negativity score assigned by SentiWordNet to the synset.
POS. Also known as Part of Speech. Together with ID, they uniquely determine a WordNet
synset as a composite key.
POS_SCORE. It is the positivity score assigned by SentiWordNet to the synset.
Sentiment analysis. As word per se, this is a system of analyzing sentiments from various
subjective materials.
SentiWordNet. “It is a lexical resource explicitly devised for supporting sentiment
classification and opinion mining applications.” (Baccianella, Esuli, & Se, 2010)
Synset. A term linked/grouped with other words with semantic and lexical relations. It is a
field in the SentiWordNet which contains terms, with sense number, belonging to
the synset.
14
TRANS_ACC. It is the produced translation accuracy for each Filipino word obtained
from the Moses corpora training of the Filipino sentences.
WordNet. It is a lexical database of English words. Words are grouped into sets of
cognitive synonyms (synsets), each expressing a distinct concept.
Chapter II
Review of Related Literature and Studies
A. Related Literature
1. Natural Language Processing Overview
Computation linguistics, mostly referred as Natural Language Processing (NLP), is
a subset of both linguistics and computer science which enables language processing by a
computer.
NLP provides a range of computational techniques for automated text analysis and
representation of human language. It has two focuses namely, language understanding
(NLU) and language generation (NLG). Language understanding takes language as an
input for analysis while language generation produces the structured language
representations capturing the meaning of the input.
Figure II – 1. The Natural Language Processing System Flow
According to Indurkhya, N and Damerau F.J. (2010), NLP is concerned with “the
design and implementation of effective natural language input and output components for
30
computational systems”. This only shows that the important problems occur in natural
language input and output.
One example is the first attempt of NLP in1950s to automate translation between
Russian and English (Locke & Booth). These systems were unsuccessful since human
translators were needed to pre-edit the Russian and post-edit the English.
In early and middle 1970s, serious developments in NLP took place as systems
started to use more general approaches and attempt to formally describe the rules of
language they worked with. Effective mechanisms for parsing languages and representing
meanings are produced.
However, two problems in particular, make NLP difficult causing different
techniques used in processing artificial languages. These problems are (a) the level of
ambiguity in natural languages and (b) the complexity of semantic information existing
in simple sentences. The implication for these problems is to follow the four distinct stages
of NLP in a sequential manner. The four distinct stages of NLP are:
Morphological Processing: breaking strings of language input into sets of
tokens corresponding to discrete words, sub-words and punctuation forms
Syntax Analysis: checking if string of words is well-formed and breaking it
up into a structure showing the syntactic relationships between the different words
Semantic Analysis: expanding the lexicon to include semantic definitions for
each word it contains and extending grammar to specify semantics of any phrase
Pragmatics: interpreting the results of semantic analysis from the perspective
of a specific context (the context of the dialogue or state of the word)
31
Figure II – 2. The Four distinct stages of NLP in sequential manner
As the development of NLP continues, it provides both theories and
implementations for a wide range of end applications such as:
Information Extraction: focuses on recognition, tagging, and extraction into
structured representation
Summarization: retains the meaning of the text while reducing large corpus of
multiple documents into a short set of words or paragraph
Machine Translation: restores the meaning of original text in the translated
language
32
Paraphrasing: an alternative way in conveying the same information which
allows the creation of more varied and fluent text
Part of Speech (POS) Tagging: indicates syntactic role by labeling each word
with a unique tag
Furthermore, the goal of most NLP systems is to extract meaning from pre-defined
language input. NLP must create interpretations to capture and refine meanings from the
input. This meaning is the semantic representation of the system. Semantic representations
are built from semantic fragments attached to words in the lexicon. Research effort in NLP
includes formally specifying grammars for languages which have not previously been
analyzed.
2. Subjectivity
Cognitive Semantics (CS) divides semantics into meaning-construction and
knowledge-representation. It is concerned with the representation of conceptual structure
in language.
One topic in CS is the confrontation of Elizabeth Traugott’s (1988) conception of
“subjectivity” and the process of grammaticalization which leads to the establishment of
subjective meanings.
According to Edward Finegan (1995), subjectivity contrasts with objectivity.
Objectivity, as defined by dictionary.com, is related to an actual and external phenomena
and undistorted by emotions or personal bias. Whereas Subjectivity, is defined as an
expression of self and the speaker’s point of view having an array of meanings in a
discourse.
33
Lyons (1982) says that subjectivity can be analyzed as a matter which indicates
subjective meaning by representing it “weak” (objective) vs. “strong” (subjective) labels.
A process of underlying subjective meaning is opinion mining. This sub-discipline
is focused on opinionated ideas which are conveyed through words known as opinion
words.
There are two kinds of opinion words, the positive and the negative opinion words.
The positive opinion words describes a desired states whereas, the negative opinion words
describes the undesired states of the context of a word in the text.
Another way of classifying opinion words is through its type. These are known as
base type and comparative type. The base types are the words that render a meaning to a
text without comparison. On the other hand, the comparative type of opinion words pertains
to the word which uses the comparative and superlative degree form-adverb for the opinion
word.
Opinion mining tackles the issue of determining subjective terms in a corpus and
deciding whether a term carries a positive, negative or no subjective connotation.
Opinion mining involves pairing of a known corpora to an expressed opinion. These
are further divided into several subtasks in tagging which are identified by Esuli and
Sebastiani (2006). These are:
1. Determining document subjectivity by deciding whether the corpora has a
factual nature
2. Determining document orientation (polarity) by deciding whether terms are
positive, negative or not subjective.
34
3. Determining the strength of document orientation by classifying its subject
matter as Weakly Positive, Mildly Positive or Strongly Positive.
The simplest approach conceptualized by Turney (2003) for accomplishing the
second subtask is by considering the algebraic sum of the orientations of terms. However,
there are no available lexical resources where terms are tagged with negativity and
positivity scores.
3. Opinion Lexicon Generator
According to the Handbook of Natural Language Processing, Second Edition, the
main objective of the lexicon generator is to produce a lexical resource of opinion words
with appropriate polarity scores called opinion lexicon. An opinion lexicon is a collection
of generated opinion words. "Opinion words are also known as polar words, opinion-
bearing words, and sentiment words".
According to Bikel and Zitouni (2012), to create an opinion lexicon for a new
language, there are four best methods to use. However, the best method also largely
depends on the monolingual resources and tools available for the new language such as
bilingual dictionaries, large corpora, natural processing tools and/or the cross-lingual
connections that can be made to a major language such as English.
The best scenario is the manual creation of an annotated corpora in the target
language. However, in some cases, manually annotated corpora only exist for some
languages. An alternative is to perform data mining from an online data such as reviews.
Learning algorithms such as naïve Bayes, decision trees and SVMs can be used for the
subjectivity annotation of the new text.
35
The second best is introduced by Mihalcea et. al (2008), to construct a lexicon using
the corpus-based cross-lingual projections. If a cross-lingual projection exist between the
major language (such as English) and the target language, the corpus annotations can be
transferred into the target language. The translation may also be performed in two
directions, translating the texts from the target language to the major language or
translating the texts from the major language to the target language. Regardless of the
direction of translation, with the manual labels that can be projected, the result is a data set
in the target language with a subjectivity annotation.
The third best method is known as bootstrapping. Using a few manually selected
seeds, the lexicon can be expanded by finding the related synonyms, antonyms and
definitions found in an electronic dictionary. Running the process for several iterations may
produce a large lexicon with thousands of entries.
The fourth best method is by translating a lexicon. If the previous methods are not
applicable, translating a lexicon using a bilingual dictionary may be possible. However, the
accuracy of the method is low because of the challenges encountered in the translation
process. Words may lose their meanings due to lemmatization. To resolve the problem, a
human intervention or direct translation may aid.
With the construction of a lexicon consisting of words and phrases annotated for
sentiment and subjectivity, these word- and phrase- level annotations introduce three
different approaches in generating the opinion lexicon. These are the manual approach,
dictionary-based approach, and the corpus-based approach, Liu (2010).
36
The manual approach is used as a checking mechanism for the automated process
of lexicon generation. It is regarded as a time-consuming approach hence considered
inefficient.
The dictionary-based approach uses a small set of seed opinion words and online
dictionary. One example is the WordNet, a software developed by Princeton University
(2006).
The corpus-based approach, an approach proposed by Hatzivassiloglou and
McKeown (1997), "relies on the syntactic or co-occurrence patterns and also a seed list of
opinion words to find other opinion words in a large corpus". The following tasks briefly
discuss the corpus-based approach: (1) A list of seed opinion adjective words is used. (2)
A list of linguistic constraints or conventions on connectives to identify additional adjective
opinion words is also used.
Such constraint is the conjunction constraint where it is classified into two, the
natural and the unnatural. The natural conjunction constraint pertains to the structure of the
sentence having two words, separated by a conjunction, pertaining to the subject of the
sentence. For example, in the sentence "The knight is brave and handsome", the words
brave and handsome are referred to as both positive words; thus words of the same polarity
joined by an 'and' conjunction is considered as a natural conjunction constraint. "The knight
is brave but handsome", is an example of an unnatural conjunction constraint. The sentence
has the words brave and handsome but uses the conjunction. This is therefore considered
to be an unnatural conjunction constraint.
37
In this regard, Esuli and Sebastiani (2006) developed a lexical resource,
SentiWordNet for determining term subjectivity by expanding the initial seeds through K
iterations by means of WordNet lexical relations (synset) with ternary classifiers: Positive,
Negative and Objective. If these classifiers agree in assigning the same label to a synset,
the label will have the maximum score for that synset, otherwise each label will have a
score proportional to the number of classifiers that have assigned it.
B. Related Studies
There are several studies conducted to generate a lexical resource of different
languages aside from English to determine the subjectivity of a corpus. These are Bengali
(Bangla) SentiWordNet, Hindi SentiWordNet and Urdu Text Sentiment Analyzer.
1. A Bengali (Bangla) SentiWordNet
Researchers Das and Bandhopadhyay (2009) developed SentiWordNet for Bengali
language by implementing a word level lexical-transfer technique. This technique involves
stemming cluster and SentiWordNet validation to translate ambiguous subjective words
that may lose their subjective meaning once lemmatized (converted to base form of words).
Each entry in the English SentiWordNet is translated to Bengali using an English-Bengali
dictionary. This process resulted in 35,805 Bengali entries.
However, several filtering techniques were been applied in the produced entries to
remove duplicate entries. A total of 8,427 opinionated words were extracted from
SentiWordNet. The words, whose orientation strength is below 0.4, were considered
38
ambiguous and considered to have lost their subjectivity after translation. As a result, total
of 2652 words were discarded (Wiebe and Riloff, 2005).
For subjectivity detection, the two corpora, news and blog were used as inputs to
evaluate the developed Bengali SentiWordNet.
The table II-1 shows the results which provided a conclusion that the polarity scores
of the SentiWordNet (Bengali) are reliable (approximately 50% accurate).
Table II – 1. Polarity-wise Performance using Bengali SentiWordNet
2. A Hindi SentiWordNet
The Hindi SentiWordNet (H-SWN) was created by Joshi et. Al. (2010) using two
lexical resources namely English SentiWordNet and English WordNet Linking. The
English entries in SentiWordNet were replaced with equivalent Hindi words to create H-
SWN.
Two manually compiled English-Hindi electronic dictionaries were used to replace
the entries. First, one is the SHABDKOSH 5 and the second one is Shabdanjali 6. These
two dictionaries were merged automatically by replacing the duplicates resulting to
approximately 90,872 entries. The positive and negative sentiment scores for the Hindi
words were copied from the English SentiWordNet resulting to 22,708 Hindi entries.
POLARITY ACCURACY
Negative 56.59%
Positive 75.57%
39
Hindi WordNet is a well-structured and manually compiled resource with
continuous updates since the last nine years. There is an available API 13 for accessing the
Hindi WordNet. Almost 60% of final SentiWordNet synset in Hindi are generated by this
method.
An arbitrary 100 words were chosen from the Hindi SentiWordNet for human
evaluation. Two persons were asked to manually check them.
Moreover, another approach was used to generate Hindi Subjective Lexicon, the
WordNet Graph Traversal. This method is language independent which uses only one
resource (WordNet) for lexicon generation. Arora, Bakliwal and Varma (2012) proposed
that, this explores synonym and antonym relations exploitation. As a result, 8048 words
were generated and 74% accuracy on classification of reviews and 69% on the evaluation
of human annotators were achieved.
3. An Urdu Text Sentiment Analyzer
Urdu has been part of the South Asian language and it is described as a recourse
poor language (Mukundet al, 2010). Many researchers have attempted to construct an Urdu
lexicon despite that the specific sentiment-annotated Urdu lexicon construction still poses
many challenges.
Researchers Afraz Z. S., A. Muhammad, and Martinez-Enriquez A. M (2011)
developed a lexicon-based Urdu Text Sentiment Analyzer incorporating two components:
(a) the classifier, which analyzed and categorized the given text and (b) the lexicons,
containing the information about the orientations (positive or negative) and subjectivity of
the entries (words and phrases).
40
The Urdu lexical construction tasks are summarized as follows:
Identifying the sentiment-oriented words/phrases in Urdu language
Identifying morphological rules, (e.g. inflection or derivation)
Identifying grammatical rules (e.g. use of modifiers)
Identifying semantics between different entries (e.g. synonyms, antonyms and
cross-references)
Identifying and annotate polarities to the entries.
Identifying modifiers and annotate intensities.
Differentiating between multiple Part of Speech (POS) tags for same entries.
Constructing lexicon
Figure II – 3. Classification Process of Urdu Lexical Construction (Afraz,
Muhammad, & Martinez-Enriquez, 2011)
41
The subjectivity of the entries was classified in orientation and intensity.
Orientation was predicted by marked polarity and intensity, calculated by analyzing the
modifiers. An example is, the intensity of “much more”, is more than. “more”.
After the Urdu lexicon development, Muhammad and Enriquez (2011) used the
classifier which uses a classification algorithm to compare all words and phrases present
in a corpus of movie and product reviews in Urdu language by using the lexicon entries.
With polarity scores of the individual entries as the basis, the total polarity were calculated.
As a result, there were 450 movie reviews corpus (MR) comprised of 226 positive
and 224 negative reviews. There were 328 product reviews corpus (PR) of electronic
appliances with 177 positive and 151 negative reviews. The overall evaluation of the
system measured the accuracy of precision of the document classification suggested by the
system to the actual sentiments present in the reviews. A series of experiments is performed
on both corpora, one after another.
The Table II-2 shows the results, with accuracy of 66-74% for MR and 77-79% for
PR. It also gives the variation of the classification of positive and negative reviews,
separately.
Table II - 2. Results of experimentation on both corpora (MR and PR)
POLARITY CORPORA ACCURACY
Negative MR 66%
PR 77%
Positive MR 74%
PR 79%
42
C. Generalization
Among other areas of NLP, an emerging field is the Sentiment Analysis (SA). The
need for sentiment analysis is the outcome of sudden increase in the opinionated or
sentimental text. Researchers have made great attempts in determining subjectivity in texts
using lexical resources. Since there were no lexical resources available, studies were
conducted to generate lexicon annotated with polarity scores to be used in SA.
The SentiWordNet became successful in providing an English lexical resource for
SA. With this as the basis of the later studies, Bengali, Hindi and Urdu SentiWordNet were
produced, providing a lexical resource of different languages.
Bengali SentiWordNet was created by translating English SentiWordNet entries to
Bengali using a bilingual dictionary. Hindi SentiWordNet used the same method but with
two bilingual dictionaries and WordNet Graph Traversal. Urdu SentiWordNet used a
classification algorithm which predicted the polarity and the intensity by comparing words.
Despite the developments in SA of English text, it is a fact that for other languages,
this domain is still an open challenge. Among a number of possible future works, the most
important is the extension of the lexicon and its accuracy.
This was where the researchers of the study would like to embark with – applied
the techniques to generate a Filipino Subjective lexicon using the second best method in
building a lexicon, the corpus-based cross-lingual projections, applying a word- and
phrase- level annotations using a bilingual dictionary, Moses and SentiWordNet for the
field of Sentiment Analysis.
43
Chapter III
Research Design and Methodology
A. Hypothesis
The FilCon is an accurate polarity lexicon generated by the combination of several
approaches namely, word- & phase- level annotation, dictionary-based, and corpus-based
cross-lingual projections of Filipino words to the English translation in SentiWordNet. This
combinatory approach is believed to be achieved using a Filipino-English bilingual
dictionary, Moses and SentiWordNet.
1. Assumptions
The use of Filipino-English bilingual dictionary, Moses, and SentiWordNet
was an effective approach in applying the corpus-based cross lingual
projections, constructing the FilCon generator.
FilCon rendered an accurate translation of each Filipino word to English as
well as the pairing of polarity values from SentiWordNet to FilCon by
applying cross lingual projections.
FilCon have an acceptable level of accuracy that can be used for Natural
Language Processing applications.
B. Research Methods
The main objective of this study was to provide a Filipino lexicon annotated with
the polarity scores of SentiWordNet.
44
First, Filipino words were retrieved and translated to English using a Filipino-
English bilingual dictionary. The translated words were paired with the same English
entries from SentiWordNet using cross-lingual projection in order to retrieve the
corresponding polarity score for each word.
Identical translated entries generated in the lexicon were filtered using Moses.
Filipino sentences were retrieved and translated manually to English to be a training
corpora for Moses. These trained sets provide the translation accuracy that determines the
most appropriate translation for a specific word. In most cases, the translated word with
high translation accuracy was automatically chosen and considered as the nearest possible
translation.
Another case was if there are different translated words contain the same translation
for a specific Filipino word, manual selection for the nearest translation was conducted.
However, if there were no identical translated entries for a specific word, it was
automatically considered as the translation for the Filipino word.
The output of the study, FilCon, the generated lexicon was an alphabetically-
arranged polarity lexicon exported as a .txt file.
Experimental design was used in this research since there was a causal relationship
between the variables involved. This enabled the researchers to gain control over all factors
which may affect the outcome of the generation of lexicon. The factor of producing an
accurate subjectivity analysis is the manipulation of input data, the polarity lexicon.
C. Research Design
The study on FilCon generation have the following developing phases:
45
1. Data Mining
The Filipino words were retrieved from a Filipino-English bilingual
dictionary to use as FilCon's primary input and the Filipino-English
sentences were retrieved to serve as the training corpora for further
processing of lexicon entries.
2. Words Translation
Using a Filipino-English bilingual dictionary retrieved from an available
application, the Filipino words were provided and translated to English for
the next phase. The automatic speech tagging of words were also included
since the dictionary matches the part of speech appropriate for each word.
3. Cross-lingual Projection
The major language, English, established a bridge to the target language,
Filipino, also known as cross-lingual projection. With this, the translation
of the Filipino words were matched with the same English word found in
the SentiWordNet entries. The aligned polarity score of a SentiWordNet
entry was retrieved and aligned with the same semantic meaning in Filipino.
4. Corpus Training
A separate 40,000 Filipino-English sentences corpora were retrieved from
the Internet. These sentences were used for the training corpus for Moses.
Moses found the most probable translation for each entry, represented by a
translation accuracy (TRANS_ACC) according to the frequency of
appearance in the sentences. Words with low translation accuracy were not
included in FilCon.
46
5. Word Filtering
This phase conveyed the acceptable polarity scores aligned with the Filipino
words retrieved in terms of identical entries. It involved the application of
the translation accuracy to filter multiple entries with different translations
and part-of-speech.
6. FilCon Generation
The final output of this study was a Filipino lexicon having a file format
of .txt. FilCon based its structure from SentiWordNet, adapting the
following fields: POS, ID, POS_SCORE, NEG_SCORE, ENG_WORD and
the FIL_WORD. The sorting and grouping of the lexicon by part of speech
was performed and within each POS, resorting of the lexicon by ID was
performed.
47
Chapter IV
Analysis and Interpretation
A. Presentation of the Solution
This section presents the results of the analysis of the generated FilCon entries.
1. System Architecture
The architecture represents the system flow of the generation of Filipino words
aligned with polarity scores in FilCon.
Figure IV – 1. System Architecture of the Study
Yes
No
Yes
Yes
No
CASE 1:
Multiple translations
with same
TRANS_ACC
CASE 2:
Multiple translations
with different
TRANS_ACC
CASE 3:
No identical entry
Manual Selection
of translation
Filipino Words
SentiWordNet 3.0
Filipino-English
Bilingual Dictionary
Translation &
POS Tagging Cross-lingual
Projection Filipino-English
Sentences Corpus Elimination of
Identical entries
Moses Training
Lexicon with Translation
Accuracy
Data mining
Selection
based on
highest
TRANS_ACC
Sorting FilCon
48
This study focused on generating a Filipino Subjective Lexicon using Filipino-
English bilingual dictionary, Moses, and SentiWordNet 3.0. The researchers designed
modules and interfaces patterned to the previous study mentioned in the conceptual
framework of Chapter 1. This is evident in the figure shown below.
2. Description of Modules and Interfaces
a. Data mining: The Filipino words were retrieved to serve as the initial entries
for FilCon. There was no separation of subjective words from the objective
words.
b. Translation and POS Tagging: The Filipino words, as input strings,
undergone translation and POS Tagging processes. With the use of a Filipino-
English bilingual dictionary as a basis, the English translation and the Part of
Speech of a Filipino word was provided, hence, translation was done.
c. Cross-Lingual Projection: Once the Filipino words were given with English
translations, the polarity scores were annotated. The English entries and their
corresponding polarity scores from SentiWordNet 3.0 were matched with the
same English translation of a specific Filipino word. Word-matching and
polarity scores annotation was another term for this module.
d. Moses Training: This module trained sets of corpora. With the Filipino
sentences having their English translations, the translation accuracy for each
word can be obtained by determining the frequency of appearance of the
Filipino word in the sentence and then, retrieving the English translation
counterpart in the English sentence. In this way, the translation accuracy
49
becomes a basis for the most probable translation of a word thus creating a new
database for each Filipino word with the translation accuracy.
e. Elimination of Identical Entries: This module seek to filter the words by
finding the most probable translation for a specific word, reducing the various
translations made. Using the translation accuracy, the following cases are
addressed:
a. Case 1: If there were different translated entries but with the same
translation accuracy for a specific Filipino word, the word filtering was
done by manual selection of the most probable meaning of each word.
b. Case 2: If there were different translation and different translation
accuracy for a specific Filipino word, the word filtering was done by
automatic selection of the word with the highest translation accuracy.
c. Case 3: If there were no identical translated entries for a specific
Filipino word, it was automatically included in the final output to be
sorted.
f. Sorting: Through PHP scripting, the filtered lexicon database obtained with
translation accuracy were joined to form a sorted lexicon, FilCon, with POS,
ID, POS_SCORE, NEG_SCORE, ENG_WORD and FIL_WORD.
B. Results of Conducted Research
The Filipino-English bilingual dictionary, Moses and SentiWordNet 3.0 were
combined to generate a Filipino Subjective Lexicon. The Filipino words were translated to
the most probable translation using the translation accuracy generated by Moses. FilCon
50
was generated, containing 22,380 words annotated with polarity scores. However, there
were around 30-40% part of the FilCon that were with lemma and the remaining were in
the base form (root word). These results were used for analysis and interpretation.
C. Analysis and Interpretation of Results
In order to evaluate the accuracy of each word in the generated Filipino Subjective
Lexicon using a Filipino-English bilingual dictionary, Moses, and SentiWordNet 3.0, the
translation accuracy was used from the training sets in Moses. If there were more than one
translations for a specific word, the translation with a high translation accuracy was
interpreted to be of the nearest possible translation for that corresponding Filipino word.
However, if there was at most one translation for that specific word, then it was considered
as the probable translation available.
𝑂𝑣𝑒𝑟𝑎𝑙𝑙_𝑎𝑐𝑐𝑢𝑟𝑎𝑐𝑦𝑡𝑟𝑎𝑛𝑠𝑙𝑎𝑡𝑖𝑜𝑛 =∑ 𝑤𝑜𝑟𝑑𝑛−1
max_num_entries𝑛=0
max_𝑛𝑢𝑚_𝑒𝑛𝑡𝑟𝑖𝑒𝑠× 100%
Figure IV-2. Translation Accuracy of Filipino Words
Moreover, the translation accuracy for these Filipino words were averaged and
multiplied to 100% to evaluate the accuracy level of translation of entries. By applying the
formula of averaging in Figure IV-2, the overall translation accuracy yielded an accuracy
of 63.622%. Such result was obtained since some of the alignment during the MOSES
training is not properly matched with the right translation; hence, it may result into a
mistranslation in the FilCon entry.
For the overall evaluation of the accuracy level of FilCon, the researchers tested the
output of the system by using sentences that were handpicked and randomly selected from
51
various sources. The results of the system(see Appendix D, Table D-1) were then compared
to the evaluation of an expert using a two-tailed t-test with ά =0.05 having the hypotheses
Ho: μ1=μ2 and H1: μ1≠μ2.
Table IV- 1. Sentiments Confusion Matrix
EXPERTS
Positive Negative Neutral FILCON
Positive 11 3 1
Negative 8 10 1
Neutral 12 9 0
Table IV-1 shows a confusion matrix of the results of the sentences that were
evaluated by the system versus the evaluation of the experts having the expert’s evaluation
the basis. The right diagonal of the matrix shows how many sentences where evaluated the
same by the system and the experts. The other cells shows the number of sentence that
were evaluated differently by the system and the experts. There were 21 correctly evaluated
sentences and 34 incorrectly evaluated sentences for a total of 55 sentences evaluated.
𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 =𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝐶𝑜𝑟𝑟𝑒𝑐𝑡𝑙𝑦 𝐸𝑣𝑎𝑙𝑢𝑎𝑡𝑒𝑑 𝑆𝑒𝑛𝑡𝑒𝑛𝑐𝑒𝑠
𝑇𝑜𝑡𝑎𝑙 𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑆𝑒𝑛𝑡𝑒𝑛𝑐𝑒𝑠× 100%
Figure IV-3. Formula for calculating FilCon’s accuracy
FilCon’s accuracy were then calculated using the formula given on Figure IV-2.
Based on the calculation, FilCon’s accuracy is 38.18%.
Figure IV-4. T-test Formula for Paired Two Sample for Means
52
Table IV- 2. Summary of the T-Test Computation
System Expert
Mean -0.058906422 0.140606061
Observations(n) 55 55
t computed -2.63523816
t critical ±2.004879288
Table IV-2 shows the means of the means of the results of the system and t
computed using the formula in Figure IV-4 and the t critical from the table of t-values. The
t computed, -2.63523816, is outside the range of the t critical values ±2.004879288, hence
we reject Ho. It denotes that there was a significant difference between the system’s
sentiment scores and the expert’s sentiment scores.
D. Summary of Findings
The generation of the FilCon yielded the following results:
1. The Filipino-English bilingual dictionary, Moses, and SentiWordNet was a
possible combination in generating a Filipino Subjective Lexicon.
2. The translation (SentiWordNet from English to Filipino) accuracy of the FilCon
entries is 63.622%.
3. The overall accuracy of FilCon was 38.18%.
4. The generated FilCon were significantly different from expert’s scores.
53
Chapter V
Summary, Conclusions, and Recommendations
A. Summary
Scaling would be assigning a metric system as a standard unit of measurement for
a certain object. To scale a subjective statement, sentiment analysis is required. Such
analysis are conducted in order to draw acceptable conclusion regarding a certain matter
based on the standard scale. This standard scale is the FilCon.
FilCon is generally an opinion polarity lexicon which determined the polarity
(positivity and negativity degree) of a word which then contributes in the analysis of the
subjectivity and the sentiment of a given document. With the use of FilCon, annotating
Filipino words for subjectivity analysis would be automatically be available to its users
without human intervention.
FilCon was basically constructed through the use of dictionary-based approach.
This approach gave a developing language an advantage for a seamless generation of words
in the lexicon through direct translation at a word/phrase-level. This was how FilCon came
into existence. FilCon made use of an English-Filipino bilingual dictionary in translating
synsets in the SentiWordNet lexicon.
Sampling of words from the FilCon was made through the use of MOSES training
tool. MOSES did the training through aligning of English sentences with a human-provided
counter-translation Filipino sentences. These sentences were fed into MOSES for aligning.
This process, then, yielded translation probability accuracy for a group of words in FilCon.
54
The testing of the accuracy of subjectivity was measured through the use of corpus-based
approach.
B. Conclusions
The researchers were able to generate a Filipino Polarity Lexicon, FilCon adapting
the combination of word level-annotation approach, dictionary-based approach and corpus-
based cross lingual approach through the use of language tools involving the Filipino-
English bilingual dictionary, Moses and SentiWordNet.
The translation of FilCon entries and its cross-lingual projection with the
SentiWordNet entries were made possible. Out of 6,124 sample data from the training sets
of Moses used for computing the translation accuracy, the result yields 63.622%. It only
showed that the translation of the Filipino words is relatively accurate.
The overall accuracy of FilCon depended on the sentence sentiment analysis. Based
on the results, 38.18% was the accuracy.
In the t-test computation, there was a significant difference between the sentiment
scores of FilCon and sentiment scores of experts. It denoted that scores from the generated
lexicon were significantly different from expert’s scores.
C. Recommendations
The researchers acknowledged anyone who wish to further study the generation of
Filipino Subjective Lexicon. The following recommendations may be used for its
improvement:
1. Use the word mappings of WordNet after gathering Filipino words. This is helpful
to understand more about the relationship of a word to another word. Bootstrapping,
55
an another method for building lexicons, can also be done using word mappings
only that, few manually selected Filipino word seeds are used.
2. Gather word entries from two or more Filipino-English bilingual dictionaries for a
different perspective. Since vocabulary is continuing to expand, different words are
formed. Some dictionaries may list new words such as UP Diksyunaryong Pilipino.
3. Document-level annotation may be applied for subjectivity classification. It may be
performed by determining the frequency of subjective or sentiment words in a
document. Since review classification or web opinion mining is one of natural
language applications, a document-level annotation may help FilCon for faster
development.
4. Use a different method in finding the most probable translation. Since
lemmatization affects the word in a way that a word loses its meaning and context
when lemmatized, manual translation (native speakers) may be applied or different
language tools may be used.
5. Gather different sentences with different structures as a train set corpora in Filipino
languages. Some sentences in Filipino tend to have different meanings depending
on their sentence structures.
56
Bibliography
Afraz, Z. S., Muhammad, A., & Martinez-Enriquez, A. (2011, December 4). Sentiment-
annotated lexicon construction for an Urdu text based. Pakistan Journal of Science,
63(4). Retrieved March 8, 2014, from
http://www.paas.com.pk/images/volume/pdf/271647097-(7).%2047-
11%20modified%2010.11.2011.pdf
Alexander, D. (n.d.). Data Mining. Retrieved February 13, 2014, from Instructional
Technology Services:
http://www.laits.utexas.edu/~anorman/BUS.FOR/course.mat/Alex/
Arora, P., Bakliwal, A., & Varma, V. (2012, January). Hindi subjective lexicon generation
using graph traversal. Retrieved March 8, 2013, from Dr. Alexander Gelbukh:
http://www.gelbukh.com/ijcla/2012-1/025-039-paper.pdf
Baccianella, S., Esuli, A., & Se, F. (2010, May 19-21). SentiWordNet: An enhanced lexical
resource for sentiment analysis and opinion mining. (N. Calzolari, K. Choukri, B.
Maegaard, & J. Maria, Eds.) Retrieved February 13, 2014, from LREC Conferences:
http://www.lrec-conf.org/proceedings/lrec2010/pdf/769_Paper.pdf
Bikel, D. M., & Zitouni, I. (2012). Multilingual natural language processing applications:
From theory to practice. Westford, Massachusetts, United States of America: US:
International Business Machines Corporation.
Bondoc, R., Garcia, A., Lacaden, J., Ping, Y., & Borra, A. (2010, December). The filipino
wordnet construction. (H. Adorna, Ed.) Philippine Computing Journal, V(2), 44-
57
49. Retrieved December 13, 2013, from
https://docs.google.com/file/d/0BxI8feCZhWsoNExBWDlKeVMwSE0/edit
Borra, A., Pease, A., Roxas, R. O., & Dita, S. (2010). Global wordnet conference: Research
paper details. Retrieved December 13, 2013, from Center For Indian Language
Technology:
http://www.cfilt.iitb.ac.in/gwc2010/pdfs/33_Filipino_Wordnet__Borra.pdf
Chiala, K. (2013, November 14). Tragedy and technology bring people together. Retrieved
December 13, 2013, from The Network: Cisco's Technology News Site:
http://newsroom.cisco.com/feature-content?type=webcontent&articleId=1289398
Conrad, C. (1974, January 1). Context effects in sentence comprehension: A study of the
subjective lexicon. Memory & Cognition, II(1), 130-138. doi:10.3758/BF03197504
Das, A., & Bandyopadhyay , S. (2009). SentiWordNet for Bangla. Retrieved March 8, 2014,
from Amitava Das:
http://www.amitavadas.com/Pub/SentiwordNet%20(Bengali).pdf
Das, A., & Bandyopadhyay, S. (n.d.). SentiWordNet for Indian languages. Retrieved
March 8, 2014, from Association for Computational Linguistics:
http://aclweb.org/anthology//W/W10/W10-3208.pdf
Esuli, A., & Sebastiani, F. (2006). SentiWordNet: A publicly available lexical resource.
Retrieved March 8, 2014, from Uni Digital:
http://gandalf.aksis.uib.no/lrec2006/pdf/384_pdf.pdf
58
Hamouda, A., & Rohaim, M. (2011, November 1). Reviews classification using
SentiWordNet lexicon. The Online Journal on Computer Science and Information
Technology, 2(1), 120-123. Retrieved March 8, 2014, from
https://www.academia.edu/1336655/Reviews_Classification_Using_SentiWordN
et_Lexicon
Hamouda, A., Mahmoud, M., & Mahamed, R. (2011, November). Building machine
learning based senti-word. Journal of Advances in Information Technology, 2(4).
Retrieved March 8, 2014, from
http://www.academia.edu/1336653/Building_Machine_Learning_Based_Senti-
word_Lexicon_for_Sentiment_Analysis
Hatzivassiloglou, V., & McKeown , K. R. (1997). Predicting the semantic orientation of
adjectives . Retrieved March 8, 2014, from ACL Anthology:
http://acl.ldc.upenn.edu/P/P97/P97-1023.pdf
Indurkhya, N., & Damerau, F. J. (2010). Handbook of natural language processing (2nd
ed.). (R. Hebrich, & T. Graepel, Eds.) New York, Unites States of America:
Chapman & Hall/CRC. Retrieved March 8, 2014, from
http://f3.tiera.ru/2/Cs_Computer%20science/CsNl_Natural%20language/Indurkhy
a%20N.,%20Damerau%20F.J.%20(eds.)%20Handbook%20of%20natural%20lan
guage%20processing%20(2ed.,%20CRC,%202010)(ISBN%209781420085921)(
O)(692s)_CsNl_.pdf
Liddy, E. D. (2001). Natural Language Processing. In Encyclopedia of Library and
Information Science (2nd ed.). New York: Marcel Decker, Inc. Retrieved
59
December 13, 2013, from
http://surface.syr.edu/cgi/viewcontent.cgi?article=1043&context=istpub
Moghaddam, S., & Ester, M. (2012, August 12). Aspect-based opinion mining from online
reviews. Retrieved January 17, 2014, from Simon Fraser University:
http://www.cs.sfu.ca/~ester/papers/SIGIR2012.Tutorial.Final.pdf
Pang, B., & Lee, L. (2008). Opinion mining and sentiment analysis. Retrieved February
13, 2014, from Cornell University:
http://www.cs.cornell.edu/home/llee/omsa/omsa.pdf
Perry, M. (2010). Face to face versus computer-mediated communication: Couples
satisfaction and experience across conditions. University of Kentucky Master's
Theses. Retrieved December 13, 2013, from UKnowledge / University of Kentucky
Libraries: http://uknowledge.uky.edu/gradschool_theses/66
Regaldo, R. J., & Cheng, C. K. (2012, November). Feature-based subjectivity classification
of filipino text. Asian Language Processing (IALP), 2012 International Conference,
57-60. doi:10.1109/IALP.2012.39
Titan Soft (2014). Filipino Dictionary (Athena Version) [Mobile application software].
Retrieved April 2014 from http://play.google.com/store/apps
Turney, P. D., & Littman, M. L. (2003). Measuring praise and criticism: Inference of
semantic orientation from association. ACM Transactions of Information System,
21(4), 315-346. Retrieved March 8, 2014, from
http://acl.ldc.upenn.edu/eacl2006/main/papers/13_1_esulisebastiani_192.pdf
60
Appendix A
Source Code
<html> <head> <title> FilCon </title> <link rel='shortcut icon' href='rsc/images/logo_filcon.ico'/> <link rel="stylesheet" type="text/css" href="main.css"> </head> <body> <img src='rsc/images/logo_filcon.png' style='float:left; position:absolute; bottom:0px;'/> <div class='container'> <?php // Set default timezone date_default_timezone_set('ASIA/MANILA'); function func_test_sqlite_support() { // Test SQLite 3 Support if(!class_exists('SQLite3')){
die("SQLite 3 NOT supported.");}
} function func_go_bilingual() { func_test_sqlite_support(); // Set database filepath
$dbfilepath = 'rsc/bilingual.db'; // Connect to database try { $dbhandle = new PDO("sqlite:".$dbfilepath); } catch (PDOException $errormsg) { echo "Connection failed: ". $errormsg->getMessage(); } echo "<center><h2> Bilingual Table </h2></center><hr/><br/>"; $query = "SELECT ENG.SERIAL AS ENG_ID, ENG.WORD AS ENGLISH_WORD, FIL.WORD AS FILIPINO_WORD FROM ENG ENG, OTHER FIL ON ENG.SERIAL=FIL.SERIAL ORDER BY ENG.WORD ASC;"; $result = $dbhandle->query($query); echo " <center> <table> <thead> <th>ID</th> <th>English Word</th> <th>Filipino Word</th> </thead> <tbody> ";
61
if($result) { foreach ($dbhandle->query($query) as $row) { echo " <tr> <td>$row[0]</td> <td>$row[1]</td> <td>$row[2]</td> </tr> "; } } else { echo "<tr><td colspan='3'>No results found.</td></tr>"; } echo "</tbody></table></center>"; // Close the database connection $dbhandle = null; } function func_go_strip_chars() { // Set text file path $file_to_read = "rsc/to_split.txt"; $file_to_write = "rsc/splitted.txt"; // File I/O if(file_exists($file_to_read)) { // File found
// Initialize variables $write_text = ""; // Initialize file handlers $file_handle_r = fopen($file_to_read, "r"); // Iteration of text file contents while(!feof($file_handle_r)) { $write_text .= fgets($file_handle_r); } // Initialization of array of unwanted characters $arr_unwanted_chars = array("#"," "); for($index=2;$index<=11;$index++) { $arr_unwanted_chars[$index] = $index - 2; } // Removal of unnecessary characters foreach($arr_unwanted_chars as $elements) { if($elements==" ") { $write_text = str_replace($elements,"\t",$write_text);
62
} else { $write_text = str_replace($elements,"",$write_text); } } // Copies the read text file to new text file file_put_contents($file_to_write, $write_text); // Closes the text files fclose($file_handle_r); print $file_to_read . " successfully splitted/removed delimiting characters: <br/>\n" ; print "<ol>\n"; foreach($arr_unwanted_chars as $elements) { print "\t<li>\t" . $elements . "\t</li>\n"; } print "</ol><br/>\n"; print "Normalized form of the text file is now on: " . $file_to_write . "\n"; } else { // File not found print "ERROR: File not found."; }
} function func_go_elim_dup() { // Set text file path $file_to_read = "rsc/read_trans_acc.txt"; $file_to_write = "rsc/col1_trans_acc.txt"; // File I/O if(file_exists($file_to_read)) { // File found // Initialize variables $write_text = ""; $prev = "$"; $cur = ""; $to_append = ""; // Initialize file handlers $file_handle_r = fopen($file_to_read, "r"); // Iteration of text file contents while(!feof($file_handle_r)) { $cur = fgets($file_handle_r); if($cur == $prev) { $to_append = "\n"; } else {
63
$to_append = $cur; } $prev = $cur; $write_text .= $to_append; print $to_append . "<br/>"; } // Copies the read text file to new text file file_put_contents($file_to_write, $write_text); // Closes the text files fclose($file_handle_r); } else { // File not found print "ERROR: File not found."; } } function func_go_filcon_dir_trans() { func_test_sqlite_support(); // Set database filepath $dbfilepath = 'rsc/bilingual.db'; // Connect to database try { $dbhandle = new PDO("sqlite:".$dbfilepath); }
catch (PDOException $errormsg) { echo "Connection failed: ". $errormsg->getMessage(); } echo "<center><h2> FilCon </h2></center><hr/><br/>"; $query = "SELECT * FROM FILCON_DIRECT_TRANSLATION LIMIT 1000"; $result = $dbhandle->query($query); echo " <center> <table> <thead> <th>POS</th> <th>ID</th> <th>POS_SCORE</th> <th>NEG_SCORE</th> <th>ENGLISH_WORD</th> <th>FILIPINO_WORD</th> </thead> <tbody> "; if($result) { foreach ($dbhandle->query($query) as $row) { echo " <tr> <td>$row[0]</td>
64
<td>$row[1]</td> <td>$row[2]</td> <td>$row[3]</td> <td>$row[4]</td> <td>$row[5]</td> </tr> "; } } else { echo "<tr><td colspan='3'>No results found.</td></tr>"; } echo "</tbody></table></center>"; // Close the database connection $dbhandle = null; } function func_go_filcon() { func_test_sqlite_support(); // Set database filepath $dbfilepath = 'rsc/bilingual.db'; // Connect to database try { $dbhandle = new PDO("sqlite:".$dbfilepath); } catch (PDOException $errormsg) {
echo "Connection failed: ". $errormsg->getMessage(); } echo "<center><h2> FilCon </h2></center><hr/><br/>"; $query = "SELECT * FROM FILCON LIMIT 1000"; $result = $dbhandle->query($query); echo " <center> <table> <thead> <th>POS</th> <th>ID</th> <th>POS_SCORE</th> <th>NEG_SCORE</th> <th>ENGLISH_WORD</th> <th>FILIPINO_WORD</th> <th>ACCURACY</th> </thead> <tbody> "; if($result) { foreach ($dbhandle->query($query) as $row) { echo " <tr> <td>$row[0]</td>
65
<td>$row[1]</td> <td>$row[2]</td> <td>$row[3]</td> <td>$row[4]</td> <td>$row[5]</td> <td>$row[6]</td> </tr> "; } } else { echo "<tr><td colspan='3'>No results found.</td></tr>"; } echo "</tbody></table></center>"; // Close the database connection $dbhandle = null; } function func_put_homebutton() { echo "\n\n <input type='button' name='btn_home' value='Go back to home' style='float:right; position:relative; bottom:0px;' onclick='window.location.href=\"index.php\";'> \n"; }
// Query: Pairing English words with Filipino Words if(isset($_POST['btn_go_bilingual'])) { func_go_bilingual(); } // Query: Stripping SentiWordNet Instances elseif(isset($_POST['btn_go_strip_chars'])) { func_go_strip_chars(); } // Query: Eliminating Duplicate Instances elseif(isset($_POST['btn_go_elim_dup'])) { func_go_elim_dup(); } // Query: Direct Translation of English SentiWordNet to Filipino elseif(isset($_POST['btn_go_filcon_dir_trans'])) { func_go_filcon_dir_trans(); } // Query: Direct Translation of English SentiWordNet to Filipino elseif(isset($_POST['btn_go_filcon'])) { func_go_filcon(); } func_put_homebutton();
68
Appendix C
Test Cases
The researchers provided Filipino sentences with words available in FilCon to serve as the
sample test data. FilCon is used to analyse the word- and phrase- level subjectivity by using
the POS_SCORE or NEG_SCORE. The produced results are the following:
69
Appendix D
Summary Tabulation of Test Cases
Table D-1. Test Cases Evaluated (Sentiment Value) by Expert and FilCon
SENTENCE NO. EXPERTave FILCON
1 -0.833333333 -0.247619048
2 0.733333333 -0.166666667
3 0.733333333 -0.511904762
4 -0.733333333 -0.0875
5 0.1 0
6 0.766666667 0.025
7 0.866666667 0.6
8 0.9 0
9 0.466666667 0
10 0.666666667 0.125
11 0.366666667 -0.583333333
12 0.733333333 0
13 0.5 0
14 0 -0.25
15 0 0.05
16 0.666666667 -0.583333333
17 0.833333333 0.175
18 -0.233333333 0
19 0.566666667 -0.625
20 -0.166666667 -0.5
21 -0.033333333 -0.173611111
22 0.5 0
23 -0.666666667 0
24 0.133333333 -0.019736842
25 -0.3 -0.104166667
26 -0.466666667 -0.458333333
27 0.633333333 0.306818182
28 -0.066666667 0
29 0.7 0
30 -0.4 -0.166666667
31 0.3 0.0125
32 0.6 0.283333333
70
33 -0.066666667 -0.489583333
34 0.366666667 0.0625
35 0.3 0
36 0.6 0.716666667
37 0.8 0
38 0.633333333 -0.583333333
39 0.8 0
40 0.766666667 0
41 -0.633333333 0
42 -0.333333333 0.135416667
43 -0.633333333 0
44 -0.7 0
45 -0.766666667 0
46 -0.166666667 0.034375
47 0.4 -0.155555556
48 -0.233333333 0
49 0.233333333 0.035714286
50 -0.233333333 0
51 0.233333333 0
52 0.466666667 0.458333333
53 -0.733333333 -0.166666667
54 -0.633333333 0.05
55 -0.6 -0.4375
71
Table D-2. Test Cases Evaluation (Positive, Negative, Neutral) by Expert and FilCon
SENTENCE NO. EXPERT FILCON
1 NEGATIVE NEGATIVE
2 POSITIVE NEGATIVE
3 POSITIVE NEGATIVE
4 NEGATIVE NEGATIVE
5 POSITIVE NEUTRAL
6 POSITIVE POSITIVE
7 POSITIVE POSITIVE
8 POSITIVE NEUTRAL
9 POSITIVE NEUTRAL
10 POSITIVE POSITIVE
11 POSITIVE NEGATIVE
12 POSITIVE NEUTRAL
13 POSITIVE NEUTRAL
14 NEUTRAL NEGATIVE
15 NEUTRAL POSITIVE
16 POSITIVE NEGATIVE
17 POSITIVE POSITIVE
18 NEGATIVE NEUTRAL
19 POSITIVE NEGATIVE
20 NEGATIVE NEGATIVE
21 NEGATIVE NEGATIVE
22 POSITIVE NEUTRAL
23 NEGATIVE NEUTRAL
24 POSITIVE NEGATIVE
25 NEGATIVE NEGATIVE
26 NEGATIVE NEGATIVE
27 POSITIVE POSITIVE
28 NEGATIVE NEUTRAL
29 POSITIVE NEUTRAL
30 NEGATIVE NEGATIVE
31 POSITIVE POSITIVE
32 POSITIVE POSITIVE
33 NEGATIVE NEGATIVE
34 POSITIVE POSITIVE
35 POSITIVE NEUTRAL
36 POSITIVE POSITIVE
72
37 POSITIVE NEUTRAL
38 POSITIVE NEGATIVE
39 POSITIVE NEUTRAL
40 POSITIVE NEUTRAL
41 NEGATIVE NEUTRAL
42 NEGATIVE POSITIVE
43 NEGATIVE NEUTRAL
44 NEGATIVE NEUTRAL
45 NEGATIVE NEUTRAL
46 NEGATIVE POSITIVE
47 POSITIVE NEGATIVE
48 NEGATIVE NEUTRAL
49 POSITIVE POSITIVE
50 NEGATIVE NEUTRAL
51 POSITIVE NEUTRAL
52 POSITIVE POSITIVE
53 NEGATIVE NEGATIVE
54 NEGATIVE POSITIVE
55 NEGATIVE NEGATIVE
73
Appendix E
Questionnaire
We, the 4th year students of the Institute of Information and Computing Sciences taking
up BS Computer Science, are the proponents of a thesis titled, “FilCon: Filipino Sentiment
Lexicon Generation Using Word Level-Annotated Dictionary-Based and Corpus-Based
Cross Lingual Approach”.
Kindly answer the following. From a range of -10 (as a sentence with most negative
sentiment) to +10 (as a sentence with most positive sentiment) (i.e. answers must be in
between -10 to +10) assess on what degree of sentiment the sentence has.
Thank you.
Name: ________________________
Conformé: ____________________
Filipino Sentences Sentiment
Polarity Score
1. Para sa Senado na maging masigasig sa isang kaso at huwag
pansinin ang isa pang kaso ay nakapagdududa at hindi
katanggap-tanggap.
2. At sa kabila nito, mayroon tayong sapat na resources na
maaaring makapag-develop sa bansa na maraming
mahahangin na isla, maaraw na lagay ng panahon, at
geothermal sites pati na ang matatabang lupain na maaaring
magpayabong ng mga halaman na para sa sa biomass energy.
3. Sa magandang balitang ito mula sa norte at ang inanunsyong
plano ng maraming private firm na gamitin ang kanilang
sariling mga generator sa kritikal na panahon, maaaring hindi
na kailangan ng gobyerno ang makipagkontrata para sa
karagdagang enerhiya sa susunod na taon.
74
4. Sobra raw ang pagkahilig o “pagkagumon” ng boxing champ
sa iba’t ibang larangan bukod sa pagboboksing.
5. Akusado si Pemberton sa pagpatay kay Laude, isang
transgender, sa Subic.
6. Dapat ay pakinggan na ni PNoy ang boses ng kanyang mga
boss.
7. Maganda ang disiplina ng taumbayan at maging ng mga
kawani ng gobyerno.
8. Ipinagmamalaki naman ng Albay ang dumagdag sa
mahabang listahan nila ng mga beauty queen, ang modelong
si Valerie Clacio Weigmann na kinoronahan kamakailan
bilang Miss World Philippines.
9. Kung lumago ang ating ekonomiya at ikinalat sa lahat ang
biyaya nito, dapat maginhawa nang nagagamit ng
mamamayan ang sistema ng transportasyon natin.
10. Ang kainaman naman sa kanya ay nanatili siyang matatag sa
kanyang paninindigan.
11. Bilang pinuno ay hindi siya nasangkot sa anumang
katiwalian.
12. Nagpahayag ng ibayong suporta kontra droga si Antipolo
City Mayor Jun Ynares III upang lalong mapalakas ang
kampanya laban sa illegal drugs sa lungsod
13. Ang Oktubre sa mga taga-Cardona, Rizal ay pagbibigay-
buhay sa kanilang tatlong tradisyon na nakaugat na sa kultura.
14. Sa tagumpay ng fishpen sa lawa na naging pangunahing
pinagmumulan ng supply ng isda sa Metro Manila, hindi na
napigil ang pagdami ng malalaking fishpen sa Laguna de
Bay.
15. Hinimok ng isang obispo ang publiko laban sa pagpapakalat
ng pornographic images at videos sa social media na aniya’y
isa ito sa mga dahilan kung bakit nasasalaula ang isipan ng
kabataan.
75
16. Ayon kay Daet, Camarines Norte Bishop Gilbert Garcera na
dapat ay maging responsable sa paggamit ng Internet ang ilan
upang hindi malantad ang kabataan sa pornograpiya.
17. Umapela pa ang Obispo sa mga mamamayan na tulungan ang
mga mag-asawa na isabuhay ang katapatan at mga pangakong
binitawan nang sila’y ikasal.
18. Natuklasan ng mga dalubhasa ang posibilidad na may epekto
ang kape sa Deoxyribonucleic acid (DNA) ng isang
indibidbwal.
19. Muling pinalawig ng Department of Health (DOH) ang
kanilang Ligtas-Tigdas program hanggang sa Oktubre 10
upang mabigyan pa ng pagkakataon ang mga batang hindi
nabakunahan na mabigyan ng proteksiyon laban sa sakit na
tigdas at polio.
20. Samantala, nakiusap si Emelo sa netizens na huwag
magpakalat ng mali-maling impormasyon na maaaring
magdala ng matinding takot sa kaniyang mga kababayan.
21. Sa kabila ng ulat na pagkamatay ng isang apat na taong
gulang na lalaki na nakitaan ng sintomas ng sakit na
meningococcemia, pinabulaanan ng Municipal Health Office
ng Rosario, Cavite, ang pagkalat ng balita na may meningo
scare sa nasabing lugar.
22. Nais ni Senator Sonny Angara na magkaroon ng feeding
program sa lahat ng public school sa bansa para matugunan
ang laganap na malnutrisyon.
23. Mistulang walang intensiyon ang 290 miyembro ng
Kongreso na imbestigahan ang umano’y overpriced na Iloilo
Convention Center (ICC).
24. Naniniwala ang kalihim na kailangan pag-aralan ang
magpairal ng ilang pagbabago sa crime fighting efforts ng
PNP upang higit na maging epektibo ito sa harap ng
lumalaking populasyon, lalo na sa Metro Manila.
76
25. Isang nakasinding kandila na natumba ang naging mitsa ng
isang sunog na tumupok sa 60 kabahayan sa Quezon City,
kahapon ng umaga.
26. Wala siyang iniisip kung hindi sarili niya.
27. Pinakamagandang araw ito ng buhay ko.
28. Nais kong maging normal ulit ang lahat.
29. Sa palagay ko, ang soccer ay isang kahanga-hangang
paligsahan.
30. Posible na nagsasabi siya ng kasinungalingan.
31. Nagtalumpati ang pangulo sa maraming tao.
32. Malaki ang tiwala niya sa kanyang sarili.
33. Hindi ka makakasalungat sa batas ng kalikasan.
34. Naghihintay ako ng magandang balita mula sa kanila.
35. Kailangan kitang makausap ngayon.
36. ‘Di lang maganda, magaling pa!
37. Walang kinikilingan, walang pinoprotektahan, serbisyong
totoo lamanag.
38. Hindi madaling magalit, ang haba ng pasensya.
39. Sinisigurado niya ng naiintindihan ng lahat ang tinuturo niya.
40. Laging maaga sa klase, ganado magturo.
41. Nagagalit tuwing nagtatanong ang estudyante.
42. Mabilis magturo.
43. Yung totoo, tres o singko lang ang alam na ibigay, anong
klaseng prof, yan?
44. Parang binabasa lang niya yung libro kapag nagtuturo.
45. Walang kabuhay-buhay magturo, parang binabasa lang yung
binigay na hand-out
46. Agad dumating ang mataas na alon.
77
47. Maagang umalis ang malakas na bagyo.
48. Siya ay umawit ng kanta.
49. Kami ay huminga ng hangin.
50. Ginalaw ng nana yang damit.
51. Kumain ng sorbetes si ama.
52. Umani ng mga papuri ang pari.
53. Kalimutan mo nang nakilala kita!
54. Isa kang malaking hadlang sa aking hinaharap!
55. Ako ay labis na humagulgol dahil namatay si nanay.
78
Curriculum Vitae
DARLAN KEEN SABADO DOMINGO
1257 Alfredo St. Sampaloc, Manila
Mobile Number : +639175420697
Email: [email protected]
Career Objective
To be able to work in an IT industry to utilize my skills and abilities effectively, while
learning and understanding the innovative industry in the process.
Educational Attainment
Bachelor of Science Computer Science (Dean's List Awardee)
University of Santo Tomas
2011-present
Secondary Education
Tarlac State University Laboratory School
2007-2011
Primary Education
Ecumenical Christian College
2001-2007
Skills Summary
Language
Filipino (Written and Spoken)
English (Written and Spoken)
Technical
Knowledgeable in C, C++, Java and Assembly programming languages
Completed SAP Business One Training Course (2012)
IBM DB2 Academic Associate: DB2 Database and Application Fundamentals
Knowledgeable in MS Word, Excel and Powerpoint
Knowledgeable in basic networking, webpage design
79
Work Experience / Trainings Attended:
Intern at Accenture Philippines
Extra-Curricular Activities
Computer Science Society Member (2011-present)
Local Commission on Elections, Deputy (2011)
Local Commission on Elections, Computer Science Commissioner (2012)
Local Commission on Elections, Finance Officer (2013)
Junior Philippine Computer Society , Member (2012)
82
Curriculum Vitae
JEROME LORENZO LIAO LOPEZ
26 Corolla St. Village East Executive Homes Cainta, Rizal 1900
Phone Number: 09277411565 || Landline: (02) 655-1612
e-mail: [email protected]
________________________________________________________________________
Career Objective
To be able to work in a career oriented and challenging environment that will
promote personal growth and professional development
___________________________________________________________________
Skills Summary
Competitive with knowledge of software engineering and programming using
Java and C/ C++ languages.
Knowledgeable in web design and development using HTML, CSS, and
JavaScript.
Basic knowledge in Database Management using IBM Express C, Enterprise Resource Planning using SAP Business One, and networking.
Basic knowledge of System Analysis and Design.
Willing to learn new concepts in the industry. Inclined in sharing ideas with
others and handling pressure.
________________________________________________________________________
Educational Attainment
University of Santo Tomas BS Computer Science 2011-2015
Sampaloc, Manila Tertiary Level
Faith Christian School Secondary Level 2009-2011 Cainta, Rizal
Marikina Science High School Secondary Level 2007-2009
Sta. Elena, Marikina
San Benildo Integrated School Primary Level 2000-2007 Cainta, Rizal
_______________________________________________________________________
Extra-Curricular
UST Computer Science Society Member 2011 - 2014
UST Junior Philippine Computer Society Member 2011 - 2014
83
Curriculum Vitae
ALEXANDRA CABANTOG MONDARES
Nuestra Señora Dela Paz Subdivision, Brgy. Sta. Cruz
Sumulong Highway, Antipolo City, Rizal 1870
Home: (02) 213-7607 | Cell: 09273810105
Career Objective
To be able to attain a work position that will suit my knowledge and skills as a Computer
Science Major and to be able to acquire experience in a professional workplace while
contributing on the company’s goals
Educational Attainment
University of Santo Tomas 2011 - Present
BS Computer Science
St. Clare Science High School 2007 - 2011
Secondary Education
First Honorable Mention
St. Clare Montessori School 2000 - 2007
Primary Education
Skills Summary
IBM DB2 Express C
Web Development and Design
In-depth knowledge in Systems Analysis and Design
Understanding of accounting principles related to SAP Business One
In-depth knowledge of Subnetting and network configurations using Cisco Packet
Tracer
Oriented in Microsoft Office Applications such as:
o Microsoft Office Powerpoint
o Microsoft Office Excel
o Microsoft Office Word
Capable of creating and editing a video, picture, files using Adobe Photoshop and
Sony Vegas
Basic knowledge in Linux and Windows Operating Systems
84
Languages:
o HTML and CSS
o ASSEMBLY
o C/C++
o JAVA
Certifications
IBM DB2 Academic Associate
SAP Business One Certificate of Completion
Seminars Attended
Techno Game Development February 13, 2014
Computer Science Society
IT Conference 2014: Today and Tomorrow January 24, 2014
Computer Society of the Ateneo (COMPSAT)
Ateneo De Manila University
Microsoft Student Partner Seminar December 2011
Junior Philippine Computer Society
Extra-Curricular Involvement
Code Jam 2014 February 14, 2014
Participant
Thomasian Engineer (Engineering Publication)
Project Head, 35th General Information Quiz Contest (GIQC) A.Y. 2013 - 2014
Photojournalist 2012 – Present
Computer Science Society
Team Head, Documentation Team A.Y. 2013 - 2014
Member 2011 - Present
Junior Philippine Computer Society
Staff, Special Development Team A.Y. 2014 - 2015