FilCon: Filipino Sentiment Lexicon Generation using Word Level-annotated Dictionary-based and...

FilCon: Filipino Sentiment Lexicon Generation Using Word Level-Annotated Dictionary-

Based and Corpus-Based Cross Lingual Approach

A Thesis

Presented to the

Department of Computer Science

Institute of Information and Computing Sciences

University of Santo Tomas

In Partial Fulfillment

of the Requirement for the Degree

Bachelor of Science in Computer Science

By

Domingo, Darlan Keen S.

King, Nyker Matthew C.

Lopez, Jerome Lorenzo L.

Mondares, Alexandra C.

Adviser:

Ponay, Charmaine S.

November 2014

Nyker

Typewritten text

ii

iv

Nyker

Typewritten text

iii

v

Acknowledgements

First of all, we would like to thank the Almighty God for giving us the strength,

wisdom, knowledge and perseverance to make this great endeavor possible. We lift up all

praises and glory to Him.

We would like to express our deepest gratitude to our dear adviser, Ms. Charmaine

Ponay, for guiding us throughout this endeavor. Our coordinator, Assoc. Prof. Perla Cosme

for contributing her knowledge and ideas for the improvement of this study.

Lastly, we would like to share the success and thank our families and friends for all

the support and understanding they have extended to us all.

Nyker

Typewritten text

iv

vi

Abstract

The FilCon is a generated subjective lexicon for the Filipino language. It contains

22,380 words primarily based from a Filipino-English bilingual dictionary and aligned with

SentiWordNet 3.0’s polarity values. The existing identical entries from the first lexicon

generation iteration are filtered according to their translation accuracy produced using

Moses training sets. The final iteration involves sorting of words and the generation of final

output of a .txt file according to SentiWordNet’s format having: POS, ID, POS_SCORE,

NEG_SCORE, ENG_WORD and FIL_WORD. The produced results were analyzed and

interpreted by applying FilCon into a subjective classification of opinions with various test

cases and calculating the lexicon’s overall accuracy. The overall accuracy of FilCon

depended on the sentence sentiment analysis. Based on the results, 38.18% was the

accuracy.

vii

Table of Contents

Page

Acknowledgements

Chapter I: The Problem and Its Background ...................................................................... 1

A. Introduction ................................................................................................................ 1

B. Background of the Study ............................................................................................ 2

C. Theoretical Framework............................................................................................... 3

D. Conceptual Framework .............................................................................................. 7

E. Statement of the problem ............................................................................................ 9

F. Objectives of the Study ............................................................................................... 9

G. Scope and Limitations .............................................................................................. 11

H. Significance of the Study ......................................................................................... 12

I. Definition of Terms.................................................................................................... 13

Chapter II: Review of Related Literature and Studies ...................................................... 29

Approval Sheet....................................................................................................................ii

Certificate of Authenticity and Originality ........................................................................ iii

............................................................................................................ iv

Abstract ............................................................................................................................... v

Table of Contents ...............................................................................................................vi

List of Figures .................................................................................................................... ix

List of Tables ...................................................................................................................... x

viii

A. Related Literature ..................................................................................................... 29

1. Natural Language Processing Overview ............................................................... 29

2. Subjectivity ............................................................................................................ 32

3. Opinion Lexicon Generator ................................................................................... 34

B. Related Studies ......................................................................................................... 37

1. A Bengali (Bangla) SentiWordNet ........................................................................ 37

2. A Hindi SentiWordNet ......................................................................................... 38

3. An Urdu Text Sentiment Analyzer ........................................................................ 39

C. Generalization ........................................................................................................... 42

Chapter III: Research Design and Methodology .............................................................. 43

A. Hypothesis ................................................................................................................ 43

1. Assumptions .......................................................................................................... 43

B. Research Methods..................................................................................................... 43

C. Research Design ....................................................................................................... 44

1. Data Mining ........................................................................................................... 45

2. Words Translation ................................................................................................. 45

3. Cross-lingual Projection ........................................................................................ 45

4. Corpus Training ..................................................................................................... 45

5. Word Filtering ....................................................................................................... 46

6. FilCon Generation ................................................................................................. 46

ix

Chapter IV: Analysis and Interpretation ........................................................................... 47

A. Presentation of the Solution ..................................................................................... 47

1. System Architecture .............................................................................................. 47

2. Description of Modules and Interfaces .................................................................. 48

B. Results of Conducted Research ................................................................................ 49

C. Analysis and Interpretation of Results...................................................................... 50

D. Summary of Findings ............................................................................................... 52

Chapter V: Summary, Conclusions, and Recommendations ............................................ 53

A. Summary .................................................................................................................. 53

B. Conclusions .............................................................................................................. 54

C. Recommendations .................................................................................................... 54

Bibliography ..................................................................................................................... 56

Appendix A: Source Code ................................................................................................ 60

Appendix B: Filcon Entries ............................................................................................. 67

Appendix C: Test Cases .................................................................................................... 68

Appendix D: Summary Tabulation of Test Cases ............................................................ 69

Appendix E: Questionnaire ............................................................................................... 73

Curriculum Vitae .............................................................................................................. 78

Nyker

Typewritten text

viii

x

List of Figures

Page

Figure I-1: Theoretical Framework of the Study ............... 4

Figure I-2: Hamouda and Rohaim’s (2011) Review ............... 6

Figure I-3: Conceptual Framework of the Study ............... 7

Figure II-1: The Natural Language Processing System Flow ............... 29

Figure II-2: The Four Distinct Stages of NLP ............... 31

Figure II-3: Classification Process of Urdu Lexical Construction ............... 40

Figure IV-1: System Architecture of the Study ............... 47

Figure IV-2: Translation Accuracy of Filipino Words ............... 50

Figure IV-3: Formula for calculating FilCon’s accuracy ............... 51

Figure IV-4: T-test Formula for Paired Two Sample for Means ............... 51

Nyker

Typewritten text

i

xi

List of Tables

Page

Table II-1: Polarity-wise Performance using Bengali SentiWordNet ............... 38

Table II-2: Results of experimentation on both corpora (MR and PR) ............... 41

Table IV-1: Sentiments Confusion Matrix ............... 51

Table IV-2: Summary of the T-Test Computation ............... 52

Table D-1: Test Cases Evaluated (Sentiment Value) by Expert and

FilCon

............... 69

Table D-2: Test Cases Evaluation (Positive, Negative, Neutral) by

Expert and FilCon

............... 71

1

Chapter I

The Problem and Its Background

A. Introduction

Several researchers have conducted many new approaches and algorithms to

exactly animate a computer with human-conversational capabilities. These approaches has

led to the science of enabling such capabilities to occur known as Natural Language

Processing (NLP). NLP, according to Liddy (2001), “is a range of computational

techniques for analyzing and representing naturally occurring texts for the purpose of

achieving human-like language processing for a range of tasks or applications”.

A language could convey messages delivered through verbal or non-verbal

communication. A non-verbal communication contains textual information which can be

categorized into two, facts or objective information and opinions or subjective information

(Regaldo & Cheng, 2012). The subjectivity of a word can be determined by comparing to

an existing subjectivity scores of the lexicon. Subjectivity scores contain positivity,

negativity, and neutrality percentage values expressed in decimal form.

An example of an English lexicon is the WordNet, created by Princeton University

in 2006 for the purpose of having a lexical-conceptual database. This database contains

lexical units and their interrelationship (Bondoc, Garcia, Lacaden, Ping, & Borra, 2010).

One kind of WordNet is SentiWordNet. SentiWordNet is an enhanced lexical

resource for sentiment analysis and opinion mining. This lexicon contains subjectivity

scores, positivity, negativity, and neutrality polarity values.

2

This study was conducted in order to aid the studies in sentiment analysis. There

have been several subjectivity lexicon; however, the Filipino language polarity lexicon did

not exist yet. FilCon is another tool like SentiWordNet. FilCon is in Filipino language;

whereas, SentiWordNet is in English language. FilCon was constructed using a bilingual

dictionary aligned with the polarity scores provided by SentiWordNet. Since neither the

lexicon nor the bilingual dictionary provides information concerning the semantic meaning

of the identical entries, FilCon has to rely on the most probable sense in the target language

using Moses.

B. Background of the Study

Finding out people’s thoughts is an essential part of information gathering (Pang &

Lee, 2008). Opinion mining and sentiment analyses are tools used to find out what people

think. Sentiment analyses have been used by diverse disciplines and various industries in

the community. Commercial, academe, and information technology industries are using

sentiment analysis in analyzing customer and survey feedbacks.

Part of the sentiment analysis is subjectivity classification which describes a

sentence as to whether it is objective or subjective. Subjectivity is a bootstrapping approach.

It involves a high precision classifier to automatically identify subjective and objective

sentences. “It is based on manually collected lexical items and single words which are good

subjective clues.” (Moghaddam & Ester, 2012)

With the current advancements in natural language processing, most sentiment

analysis applications utilize a lexical resource that provides the scores needed to determine

the positivity or negativity of a sentiment.

3

Since most sentiment analysis applications did not have a Filipino lexicon as a basis

this study focused on generating a lexical resource in the Filipino language.

C. Theoretical Framework

Opinion lexicons, also called as subjective lexicons, are resources consisting of

words annotated with sentiment polarity scores or subjectivity.

Two of the most frequently used opinion lexicons are OpinionFinder and

SentiWordNet. OpinionFinder is a subjective lexicon. On the other hand, SentiWordNet is

a polarity lexicon. OpinionFinder was compiled from manually developed resources

augmented with entries learned from corpora. Its entries have been labeled for part of

speech including subjectivity in to which, words that appeared in most subjective context

are labeled strong while those that appear often are labeled weak.

The other lexicon, SentiWordNet, is created by Esuli and Sebastiani (2006) as a

subset of WordNet. SentiWordNet assigns to each synset of WordNet three sentiment

classifiers with scores of polarity, positive, negative, and objective.

Opinion lexicons were built using several approaches. The simplest approach that

has been attempted for building these opinion lexicons in a new language is the dictionary-

based approach, wherein the existing lexical resource is translated using a bilingual

dictionary.

In a study conducted by Mihalcea, Banea and Wiebe (2008), a dictionary-based

approach was used to generate a subjectivity lexicon for Romanian language. Starting with

the English opinion lexicon from the OpinionFinder, words were lemmatized for

4

translation using an English-Romanian dictionary. However, in some cases, translations

loses their subjectivity due to word ambiguity and lemmatization. Considering these

problems, the researchers sought to find the most probable translation for each English

entries with direct translation.

Figure I-1. Theoretical Framework of the Study

To evaluate the lexicon, two native speakers of Romanian annotated the subjectivity

of 150 randomly selected entries. The subjectivity of every entry was judged in the context

where it frequently appears on websites, accounting for its common meanings. Results

show that 123 entries are correctly translated.

English Opinion Lexicon

from OpinionFinder

Lemmatization of

Translation of Entries

Annotation of Subjectivity

Romanian-English

Bilingual Dictionary

Romanian

Subjectivity Lexicon

News Sources

from Web

Frequency comparison

and evaluation of entries

5

The Figure I-1 is a block diagram of Mihalcea, Banea and Wiebe’s (2011) study in

building Romanian Subjectivity Lexicon.

Furthermore, another study of Hamouda and Rohaim (2011) states that, the

SentiWordNet can also be used as an important resource for sentiment classification tasks.

These researchers conducted a study to classify sentiments and determine the subjectivity

of English reviews. As a preparation for the sentiment classification, the following

linguistic analysis processes were made for pre-defined input reviews:

Tokenization process: splits the text into very simple tokens such as numbers,

punctuation and words of different types.

Sentence Splitting process: segments the text into sentences. This module is

required for the Speech Tagging Process. The splitter uses a list of abbreviations to

help distinguish sentence-marking full stops from other kinds.

Speech Tagging Process: produces a part-of-speech tag as an annotation on each

word or symbol.

SentiWordNet Interpretation: matching the words with the SentiWordNet semantic

entries.

SentiWordNet Orientation: estimates the polarity score of a word in regards to a

sentiment class (negativity, positivity and neutrality) and its relative frequency of

appearance in that class is carried out using this method.

6

Figure I-2. Hamouda and Rohaim’s (2011) Review Sentiment Classification using

SentiWordNet

After the linguistic processes, the techniques to calculate the positive and negative

scores for each review were proposed. These are the ‘Sum on Review’ and ‘Average on

Sentence and Average on Review’. The results of the study extracted from these techniques

show significant improvement in the overall accuracy to be 67% and 68.63%.

7

D. Conceptual Framework

Data mining is defined as the computer-assisted process of searching and

modifying large sets of data followed by the extraction of its meaning (Alexander, n.d.).

The data mining in FilCon aim to retrieve Filipino words as FilCon's primary input used

for further processing.

Figure I – 3. Conceptual Framework of the Study

In the first text file generation containing the retrieved words, each Filipino word

was translated to English using a Filipino-English bilingual dictionary. The translated

Filipino Words

Annotation of Polarity

Scores

Retrieval of Filipino

Words

Filipino-English

Bilingual

Dictionary

SentiWordNet

Translation of Words

Removal of Identical

Entries using Moses

training

Evaluation of the FilCon

using native speakers

Trained Sets

Train corpus in Moses

Manual Translation of

Corpus

Filipino Sentences

Retrieval of Filipino

Corpus

8

words were cross-referenced with the same English entries but with annotated polarity

scores from SentiWordNet 3.0. These polarity scores were also retrieved and aligned

together with the corresponding Filipino entries.

The SentiWordNet 3.0, a basis of polarity scores describing the positivity and/or

negativity of a word, also provided the additional needed fields for FilCon such as the POS,

ID, POS_SCORE, NEG_SCORE, ENG_WORD and FIL_WORD

These processes could be compared with the Figure I-1 where the dictionary-based

approach was used. However, in the Romanian lexicon, the subjectivity of the words were

annotated according to the English entries retrieved from OpinionFinder while this study

used Filipino words which are annotated with polarity scores according to the English

entries from SentiWordNet.

Furthermore, the retrieval of the Filipino sentences served as the training corpora

for the language tool, Moses, to find the most probable translation for each entry thus,

eliminating identical entries. This task was the counterpart of the direct translation

involving the accuracy of the Romanian lexicon.

The FilCon generation applied processes of text analysis which are similar to the

approaches used in Figure I-2. For instance, tokenization, SentiWordNet interpretation, and

orientation were also be adopted by FilCon. The translated Filipino words were tokenized

and cross-referenced with the SentiWordNet English entries, aligning the proper polarity

scores from SentiWordNet entries to FilCon entries. The reconstructed lexicon was, then,

the output of the system which can be used in opinion classification.

9

E. Statement of the problem

In the field of Sentiment Analysis, the need for a lexical resource in different

languages as a basis for determining subjectivity remains an open domain. In this regard,

this study aim to generate a Filipino Subjective Lexicon, through application of the word

level-annotated dictionary-based and corpus-based cross lingual approach with the use of

a Filipino-English bilingual dictionary, Moses, and SentiWordNet. Specifically, it also

intended to answer the following questions:

1. Were the Filipino-English bilingual dictionary, Moses and SentiWordNet a

possible combination in generating a Filipino Subjective Lexicon?

2. What was the accuracy level of the translated words against the entries of

SentiWordNet?

3. What was the accuracy level of the generated Filipino Subjective Lexicon using

the Filipino-English bilingual dictionary, Moses and SentiWordNet?

4. Was there a significant difference between the sentiment scores of the generated

Filipino Subjective Lexicon through the use of the Filipino-English bilingual

dictionary, Moses and SentiWordNet and expert’s sentiment scores?

F. Objectives of the Study

The main objective of this study was to develop a Filipino Subjective lexical

resource with acceptable polarity scores by application of the word level-annotated

dictionary-based and corpus-based cross lingual approach through the use a Filipino-

10

English bilingual dictionary, Moses and SentiWordNet which could contribute to the

development of the Natural Language Processing, specifically, Sentiment Analysis, which

is an area in NLP.

The FilCon generation undergo the following different phases and modules:

a. Data Mining. In this phase, the researchers seek to retrieve Filipino words from a

Filipino-English bilingual dictionary to use as FilCon's primary input and to

retrieve Filipino-English sentences to serve as the training corpora for further

processing of lexicon entries.

b. Word Translation. The second phase indicated the translation of Filipino words

to English for the cross-referencing of entries with the SentiWordNet 3.0.

c. Cross-Lingual Projection. The translation of the Filipino words were matched

with the same English word found in the SentiWordNet entries. The aligned

polarity score of a SentiWordNet entry were retrieved and aligned with the same

semantic meaning in Filipino.

d. Corpus Training. The retrieved Filipino-English sentences were trained in Moses

to provide each Filipino entry with their corresponding translation accuracy. The

translation accuracy indicated the most probable translation for each word.

e. Word Filtering. This phase conveyed the acceptable polarity scores aligned with

the Filipino words retrieved in terms of identical entries. It also involved the

11

application of the translation accuracy to filter multiple entries with different

translations and part-of-speech.

G. Scope and Limitations

The study aimed to provide a lexical resource annotated with polarity scores of the

Filipino language using the word level-annotated dictionary-based and corpus-based cross

lingual approach through the use of a Filipino-English bilingual dictionary, Moses, and

SentiWordNet 3.0.

The study focused on the retrieved Filipino words as the primary input of FilCon.

The output of this study, FilCon, would only provide the following fields: (a) Part of Speech

(POS), (b) ID, (c) Positivity Score (POS_SCORE), (d) negativity score (NEG_SCORE),

(e) corresponding English translation of the Filipino word (ENG_WORD), and (f) the

Filipino word (FIL_WORD).

This study was only limited in providing its users the lexical resource and the

polarity scores determined through pairing of SentiWordNet English word’s polarity score

with its corresponding translated Filipino word. Sentiment analysis of scenarios and

instances would not be covered by the study, however, it may be used for the testing of

accuracy.

The researchers emphasized on the fact that this study would only help writers and

linguists construct their literary texts with consideration of having a good diction by

referring to the sentiment values. The researchers also made the study to aid programmers

develop a system geared towards an acceptable basis for Filipino sentiment analysis of their

programs. The sentiment values of the Filipino words generated by this study were based

on SentiWordNet 3.0 and, hence, based on their criteria.

12

H. Significance of the Study

According to a study conducted by Martha Perry, Face to Face (FtF)

communication is more preferred by couples rather than Computer Mediated

Communications (CMC); because CMC were said to contribute to misunderstandings and

frustration, which can lead to escalated conflict (Perry, 2010). If it were said that

technology bring people together (Chiala, 2013), then, this study could help clear out the

ambiguities between in CMCs and be able to further understand a person’s message.

The study aimed to provide a Filipino Subjective Lexicon (FilCon) to be used in

future Natural Language Processing applications. This study was able to give a significant

benefit to entities especially to:

1. Sentiment Analysis Systems Programmers

FilCon can provide programmers with the utmost accurate automated Filipino

lexicon in order to efficiently integrate the analysis of subjectivity of Filipino

context into their sentiment analysis systems.

2. Artificial Intelligence Systems Programmers

Artificial Intelligent (AI) systems mostly rely on subjectivity lexicon (pre-

learned) and machine learned experience. With subjectivity lexicon mostly

made from the English language, it would be of a great milestone to be able to

develop an AI system in the Filipino language.

13

I. Definition of Terms

ENG_WORD. The translated word of a Filipino word using a Filipino-English bilingual

dictionary.

FIL_WORD. The corresponding Filipino word retrieved.

ID. The corresponding word identification number assigned by SentiWordNet to the synset.

Natural Language Processing. “Range of computational techniques for analyzing and

representing naturally occurring texts for the purpose of achieving human-like

language processing for a range of tasks or applications.” (Liddy, 2001)

NEG_SCORE. It is the negativity score assigned by SentiWordNet to the synset.

POS. Also known as Part of Speech. Together with ID, they uniquely determine a WordNet

synset as a composite key.

POS_SCORE. It is the positivity score assigned by SentiWordNet to the synset.

Sentiment analysis. As word per se, this is a system of analyzing sentiments from various

subjective materials.

SentiWordNet. “It is a lexical resource explicitly devised for supporting sentiment

classification and opinion mining applications.” (Baccianella, Esuli, & Se, 2010)

Synset. A term linked/grouped with other words with semantic and lexical relations. It is a

field in the SentiWordNet which contains terms, with sense number, belonging to

the synset.

14

TRANS_ACC. It is the produced translation accuracy for each Filipino word obtained

from the Moses corpora training of the Filipino sentences.

WordNet. It is a lexical database of English words. Words are grouped into sets of

cognitive synonyms (synsets), each expressing a distinct concept.

Chapter II

Review of Related Literature and Studies

A. Related Literature

1. Natural Language Processing Overview

Computation linguistics, mostly referred as Natural Language Processing (NLP), is

a subset of both linguistics and computer science which enables language processing by a

computer.

NLP provides a range of computational techniques for automated text analysis and

representation of human language. It has two focuses namely, language understanding

(NLU) and language generation (NLG). Language understanding takes language as an

input for analysis while language generation produces the structured language

representations capturing the meaning of the input.

Figure II – 1. The Natural Language Processing System Flow

According to Indurkhya, N and Damerau F.J. (2010), NLP is concerned with “the

design and implementation of effective natural language input and output components for

30

computational systems”. This only shows that the important problems occur in natural

language input and output.

One example is the first attempt of NLP in1950s to automate translation between

Russian and English (Locke & Booth). These systems were unsuccessful since human

translators were needed to pre-edit the Russian and post-edit the English.

In early and middle 1970s, serious developments in NLP took place as systems

started to use more general approaches and attempt to formally describe the rules of

language they worked with. Effective mechanisms for parsing languages and representing

meanings are produced.

However, two problems in particular, make NLP difficult causing different

techniques used in processing artificial languages. These problems are (a) the level of

ambiguity in natural languages and (b) the complexity of semantic information existing

in simple sentences. The implication for these problems is to follow the four distinct stages

of NLP in a sequential manner. The four distinct stages of NLP are:

Morphological Processing: breaking strings of language input into sets of

tokens corresponding to discrete words, sub-words and punctuation forms

Syntax Analysis: checking if string of words is well-formed and breaking it

up into a structure showing the syntactic relationships between the different words

Semantic Analysis: expanding the lexicon to include semantic definitions for

each word it contains and extending grammar to specify semantics of any phrase

Pragmatics: interpreting the results of semantic analysis from the perspective

of a specific context (the context of the dialogue or state of the word)

31

Figure II – 2. The Four distinct stages of NLP in sequential manner

As the development of NLP continues, it provides both theories and

implementations for a wide range of end applications such as:

Information Extraction: focuses on recognition, tagging, and extraction into

structured representation

Summarization: retains the meaning of the text while reducing large corpus of

multiple documents into a short set of words or paragraph

Machine Translation: restores the meaning of original text in the translated

language

32

Paraphrasing: an alternative way in conveying the same information which

allows the creation of more varied and fluent text

Part of Speech (POS) Tagging: indicates syntactic role by labeling each word

with a unique tag

Furthermore, the goal of most NLP systems is to extract meaning from pre-defined

language input. NLP must create interpretations to capture and refine meanings from the

input. This meaning is the semantic representation of the system. Semantic representations

are built from semantic fragments attached to words in the lexicon. Research effort in NLP

includes formally specifying grammars for languages which have not previously been

analyzed.

2. Subjectivity

Cognitive Semantics (CS) divides semantics into meaning-construction and

knowledge-representation. It is concerned with the representation of conceptual structure

in language.

One topic in CS is the confrontation of Elizabeth Traugott’s (1988) conception of

“subjectivity” and the process of grammaticalization which leads to the establishment of

subjective meanings.

According to Edward Finegan (1995), subjectivity contrasts with objectivity.

Objectivity, as defined by dictionary.com, is related to an actual and external phenomena

and undistorted by emotions or personal bias. Whereas Subjectivity, is defined as an

expression of self and the speaker’s point of view having an array of meanings in a

discourse.

33

Lyons (1982) says that subjectivity can be analyzed as a matter which indicates

subjective meaning by representing it “weak” (objective) vs. “strong” (subjective) labels.

A process of underlying subjective meaning is opinion mining. This sub-discipline

is focused on opinionated ideas which are conveyed through words known as opinion

words.

There are two kinds of opinion words, the positive and the negative opinion words.

The positive opinion words describes a desired states whereas, the negative opinion words

describes the undesired states of the context of a word in the text.

Another way of classifying opinion words is through its type. These are known as

base type and comparative type. The base types are the words that render a meaning to a

text without comparison. On the other hand, the comparative type of opinion words pertains

to the word which uses the comparative and superlative degree form-adverb for the opinion

word.

Opinion mining tackles the issue of determining subjective terms in a corpus and

deciding whether a term carries a positive, negative or no subjective connotation.

Opinion mining involves pairing of a known corpora to an expressed opinion. These

are further divided into several subtasks in tagging which are identified by Esuli and

Sebastiani (2006). These are:

1. Determining document subjectivity by deciding whether the corpora has a

factual nature

2. Determining document orientation (polarity) by deciding whether terms are

positive, negative or not subjective.

34

3. Determining the strength of document orientation by classifying its subject

matter as Weakly Positive, Mildly Positive or Strongly Positive.

The simplest approach conceptualized by Turney (2003) for accomplishing the

second subtask is by considering the algebraic sum of the orientations of terms. However,

there are no available lexical resources where terms are tagged with negativity and

positivity scores.

3. Opinion Lexicon Generator

According to the Handbook of Natural Language Processing, Second Edition, the

main objective of the lexicon generator is to produce a lexical resource of opinion words

with appropriate polarity scores called opinion lexicon. An opinion lexicon is a collection

of generated opinion words. "Opinion words are also known as polar words, opinion-

bearing words, and sentiment words".

According to Bikel and Zitouni (2012), to create an opinion lexicon for a new

language, there are four best methods to use. However, the best method also largely

depends on the monolingual resources and tools available for the new language such as

bilingual dictionaries, large corpora, natural processing tools and/or the cross-lingual

connections that can be made to a major language such as English.

The best scenario is the manual creation of an annotated corpora in the target

language. However, in some cases, manually annotated corpora only exist for some

languages. An alternative is to perform data mining from an online data such as reviews.

Learning algorithms such as naïve Bayes, decision trees and SVMs can be used for the

subjectivity annotation of the new text.

35

The second best is introduced by Mihalcea et. al (2008), to construct a lexicon using

the corpus-based cross-lingual projections. If a cross-lingual projection exist between the

major language (such as English) and the target language, the corpus annotations can be

transferred into the target language. The translation may also be performed in two

directions, translating the texts from the target language to the major language or

translating the texts from the major language to the target language. Regardless of the

direction of translation, with the manual labels that can be projected, the result is a data set

in the target language with a subjectivity annotation.

The third best method is known as bootstrapping. Using a few manually selected

seeds, the lexicon can be expanded by finding the related synonyms, antonyms and

definitions found in an electronic dictionary. Running the process for several iterations may

produce a large lexicon with thousands of entries.

The fourth best method is by translating a lexicon. If the previous methods are not

applicable, translating a lexicon using a bilingual dictionary may be possible. However, the

accuracy of the method is low because of the challenges encountered in the translation

process. Words may lose their meanings due to lemmatization. To resolve the problem, a

human intervention or direct translation may aid.

With the construction of a lexicon consisting of words and phrases annotated for

sentiment and subjectivity, these word- and phrase- level annotations introduce three

different approaches in generating the opinion lexicon. These are the manual approach,

dictionary-based approach, and the corpus-based approach, Liu (2010).

36

The manual approach is used as a checking mechanism for the automated process

of lexicon generation. It is regarded as a time-consuming approach hence considered

inefficient.

The dictionary-based approach uses a small set of seed opinion words and online

dictionary. One example is the WordNet, a software developed by Princeton University

(2006).

The corpus-based approach, an approach proposed by Hatzivassiloglou and

McKeown (1997), "relies on the syntactic or co-occurrence patterns and also a seed list of

opinion words to find other opinion words in a large corpus". The following tasks briefly

discuss the corpus-based approach: (1) A list of seed opinion adjective words is used. (2)

A list of linguistic constraints or conventions on connectives to identify additional adjective

opinion words is also used.

Such constraint is the conjunction constraint where it is classified into two, the

natural and the unnatural. The natural conjunction constraint pertains to the structure of the

sentence having two words, separated by a conjunction, pertaining to the subject of the

sentence. For example, in the sentence "The knight is brave and handsome", the words

brave and handsome are referred to as both positive words; thus words of the same polarity

joined by an 'and' conjunction is considered as a natural conjunction constraint. "The knight

is brave but handsome", is an example of an unnatural conjunction constraint. The sentence

has the words brave and handsome but uses the conjunction. This is therefore considered

to be an unnatural conjunction constraint.

37

In this regard, Esuli and Sebastiani (2006) developed a lexical resource,

SentiWordNet for determining term subjectivity by expanding the initial seeds through K

iterations by means of WordNet lexical relations (synset) with ternary classifiers: Positive,

Negative and Objective. If these classifiers agree in assigning the same label to a synset,

the label will have the maximum score for that synset, otherwise each label will have a

score proportional to the number of classifiers that have assigned it.

B. Related Studies

There are several studies conducted to generate a lexical resource of different

languages aside from English to determine the subjectivity of a corpus. These are Bengali

(Bangla) SentiWordNet, Hindi SentiWordNet and Urdu Text Sentiment Analyzer.

1. A Bengali (Bangla) SentiWordNet

Researchers Das and Bandhopadhyay (2009) developed SentiWordNet for Bengali

language by implementing a word level lexical-transfer technique. This technique involves

stemming cluster and SentiWordNet validation to translate ambiguous subjective words

that may lose their subjective meaning once lemmatized (converted to base form of words).

Each entry in the English SentiWordNet is translated to Bengali using an English-Bengali

dictionary. This process resulted in 35,805 Bengali entries.

However, several filtering techniques were been applied in the produced entries to

remove duplicate entries. A total of 8,427 opinionated words were extracted from

SentiWordNet. The words, whose orientation strength is below 0.4, were considered

38

ambiguous and considered to have lost their subjectivity after translation. As a result, total

of 2652 words were discarded (Wiebe and Riloff, 2005).

For subjectivity detection, the two corpora, news and blog were used as inputs to

evaluate the developed Bengali SentiWordNet.

The table II-1 shows the results which provided a conclusion that the polarity scores

of the SentiWordNet (Bengali) are reliable (approximately 50% accurate).

Table II – 1. Polarity-wise Performance using Bengali SentiWordNet

2. A Hindi SentiWordNet

The Hindi SentiWordNet (H-SWN) was created by Joshi et. Al. (2010) using two

lexical resources namely English SentiWordNet and English WordNet Linking. The

English entries in SentiWordNet were replaced with equivalent Hindi words to create H-

SWN.

Two manually compiled English-Hindi electronic dictionaries were used to replace

the entries. First, one is the SHABDKOSH 5 and the second one is Shabdanjali 6. These

two dictionaries were merged automatically by replacing the duplicates resulting to

approximately 90,872 entries. The positive and negative sentiment scores for the Hindi

words were copied from the English SentiWordNet resulting to 22,708 Hindi entries.

POLARITY ACCURACY

Negative 56.59%

Positive 75.57%

39

Hindi WordNet is a well-structured and manually compiled resource with

continuous updates since the last nine years. There is an available API 13 for accessing the

Hindi WordNet. Almost 60% of final SentiWordNet synset in Hindi are generated by this

method.

An arbitrary 100 words were chosen from the Hindi SentiWordNet for human

evaluation. Two persons were asked to manually check them.

Moreover, another approach was used to generate Hindi Subjective Lexicon, the

WordNet Graph Traversal. This method is language independent which uses only one

resource (WordNet) for lexicon generation. Arora, Bakliwal and Varma (2012) proposed

that, this explores synonym and antonym relations exploitation. As a result, 8048 words

were generated and 74% accuracy on classification of reviews and 69% on the evaluation

of human annotators were achieved.

3. An Urdu Text Sentiment Analyzer

Urdu has been part of the South Asian language and it is described as a recourse

poor language (Mukundet al, 2010). Many researchers have attempted to construct an Urdu

lexicon despite that the specific sentiment-annotated Urdu lexicon construction still poses

many challenges.

Researchers Afraz Z. S., A. Muhammad, and Martinez-Enriquez A. M (2011)

developed a lexicon-based Urdu Text Sentiment Analyzer incorporating two components:

(a) the classifier, which analyzed and categorized the given text and (b) the lexicons,

containing the information about the orientations (positive or negative) and subjectivity of

the entries (words and phrases).

40

The Urdu lexical construction tasks are summarized as follows:

Identifying the sentiment-oriented words/phrases in Urdu language

Identifying morphological rules, (e.g. inflection or derivation)

Identifying grammatical rules (e.g. use of modifiers)

Identifying semantics between different entries (e.g. synonyms, antonyms and

cross-references)

Identifying and annotate polarities to the entries.

Identifying modifiers and annotate intensities.

Differentiating between multiple Part of Speech (POS) tags for same entries.

Constructing lexicon

Figure II – 3. Classification Process of Urdu Lexical Construction (Afraz,

Muhammad, & Martinez-Enriquez, 2011)

41

The subjectivity of the entries was classified in orientation and intensity.

Orientation was predicted by marked polarity and intensity, calculated by analyzing the

modifiers. An example is, the intensity of “much more”, is more than. “more”.

After the Urdu lexicon development, Muhammad and Enriquez (2011) used the

classifier which uses a classification algorithm to compare all words and phrases present

in a corpus of movie and product reviews in Urdu language by using the lexicon entries.

With polarity scores of the individual entries as the basis, the total polarity were calculated.

As a result, there were 450 movie reviews corpus (MR) comprised of 226 positive

and 224 negative reviews. There were 328 product reviews corpus (PR) of electronic

appliances with 177 positive and 151 negative reviews. The overall evaluation of the

system measured the accuracy of precision of the document classification suggested by the

system to the actual sentiments present in the reviews. A series of experiments is performed

on both corpora, one after another.

The Table II-2 shows the results, with accuracy of 66-74% for MR and 77-79% for

PR. It also gives the variation of the classification of positive and negative reviews,

separately.

Table II - 2. Results of experimentation on both corpora (MR and PR)

POLARITY CORPORA ACCURACY

Negative MR 66%

PR 77%

Positive MR 74%

PR 79%

42

C. Generalization

Among other areas of NLP, an emerging field is the Sentiment Analysis (SA). The

need for sentiment analysis is the outcome of sudden increase in the opinionated or

sentimental text. Researchers have made great attempts in determining subjectivity in texts

using lexical resources. Since there were no lexical resources available, studies were

conducted to generate lexicon annotated with polarity scores to be used in SA.

The SentiWordNet became successful in providing an English lexical resource for

SA. With this as the basis of the later studies, Bengali, Hindi and Urdu SentiWordNet were

produced, providing a lexical resource of different languages.

Bengali SentiWordNet was created by translating English SentiWordNet entries to

Bengali using a bilingual dictionary. Hindi SentiWordNet used the same method but with

two bilingual dictionaries and WordNet Graph Traversal. Urdu SentiWordNet used a

classification algorithm which predicted the polarity and the intensity by comparing words.

Despite the developments in SA of English text, it is a fact that for other languages,

this domain is still an open challenge. Among a number of possible future works, the most

important is the extension of the lexicon and its accuracy.

This was where the researchers of the study would like to embark with – applied

the techniques to generate a Filipino Subjective lexicon using the second best method in

building a lexicon, the corpus-based cross-lingual projections, applying a word- and

phrase- level annotations using a bilingual dictionary, Moses and SentiWordNet for the

field of Sentiment Analysis.

43

Chapter III

Research Design and Methodology

A. Hypothesis

The FilCon is an accurate polarity lexicon generated by the combination of several

approaches namely, word- & phase- level annotation, dictionary-based, and corpus-based

cross-lingual projections of Filipino words to the English translation in SentiWordNet. This

combinatory approach is believed to be achieved using a Filipino-English bilingual

dictionary, Moses and SentiWordNet.

1. Assumptions

The use of Filipino-English bilingual dictionary, Moses, and SentiWordNet

was an effective approach in applying the corpus-based cross lingual

projections, constructing the FilCon generator.

FilCon rendered an accurate translation of each Filipino word to English as

well as the pairing of polarity values from SentiWordNet to FilCon by

applying cross lingual projections.

FilCon have an acceptable level of accuracy that can be used for Natural

Language Processing applications.

B. Research Methods

The main objective of this study was to provide a Filipino lexicon annotated with

the polarity scores of SentiWordNet.

44

First, Filipino words were retrieved and translated to English using a Filipino-

English bilingual dictionary. The translated words were paired with the same English

entries from SentiWordNet using cross-lingual projection in order to retrieve the

corresponding polarity score for each word.

Identical translated entries generated in the lexicon were filtered using Moses.

Filipino sentences were retrieved and translated manually to English to be a training

corpora for Moses. These trained sets provide the translation accuracy that determines the

most appropriate translation for a specific word. In most cases, the translated word with

high translation accuracy was automatically chosen and considered as the nearest possible

translation.

Another case was if there are different translated words contain the same translation

for a specific Filipino word, manual selection for the nearest translation was conducted.

However, if there were no identical translated entries for a specific word, it was

automatically considered as the translation for the Filipino word.

The output of the study, FilCon, the generated lexicon was an alphabetically-

arranged polarity lexicon exported as a .txt file.

Experimental design was used in this research since there was a causal relationship

between the variables involved. This enabled the researchers to gain control over all factors

which may affect the outcome of the generation of lexicon. The factor of producing an

accurate subjectivity analysis is the manipulation of input data, the polarity lexicon.

C. Research Design

The study on FilCon generation have the following developing phases:

45

1. Data Mining

The Filipino words were retrieved from a Filipino-English bilingual

dictionary to use as FilCon's primary input and the Filipino-English

sentences were retrieved to serve as the training corpora for further

processing of lexicon entries.

2. Words Translation

Using a Filipino-English bilingual dictionary retrieved from an available

application, the Filipino words were provided and translated to English for

the next phase. The automatic speech tagging of words were also included

since the dictionary matches the part of speech appropriate for each word.

3. Cross-lingual Projection

The major language, English, established a bridge to the target language,

Filipino, also known as cross-lingual projection. With this, the translation

of the Filipino words were matched with the same English word found in

the SentiWordNet entries. The aligned polarity score of a SentiWordNet

entry was retrieved and aligned with the same semantic meaning in Filipino.

4. Corpus Training

A separate 40,000 Filipino-English sentences corpora were retrieved from

the Internet. These sentences were used for the training corpus for Moses.

Moses found the most probable translation for each entry, represented by a

translation accuracy (TRANS_ACC) according to the frequency of

appearance in the sentences. Words with low translation accuracy were not

included in FilCon.

46

5. Word Filtering

This phase conveyed the acceptable polarity scores aligned with the Filipino

words retrieved in terms of identical entries. It involved the application of

the translation accuracy to filter multiple entries with different translations

and part-of-speech.

6. FilCon Generation

The final output of this study was a Filipino lexicon having a file format

of .txt. FilCon based its structure from SentiWordNet, adapting the

following fields: POS, ID, POS_SCORE, NEG_SCORE, ENG_WORD and

the FIL_WORD. The sorting and grouping of the lexicon by part of speech

was performed and within each POS, resorting of the lexicon by ID was

performed.

47

Chapter IV

Analysis and Interpretation

A. Presentation of the Solution

This section presents the results of the analysis of the generated FilCon entries.

1. System Architecture

The architecture represents the system flow of the generation of Filipino words

aligned with polarity scores in FilCon.

Figure IV – 1. System Architecture of the Study

Yes

No

Yes

Yes

No

CASE 1:

Multiple translations

with same

TRANS_ACC

CASE 2:

Multiple translations

with different

TRANS_ACC

CASE 3:

No identical entry

Manual Selection

of translation

Filipino Words

SentiWordNet 3.0

Filipino-English

Bilingual Dictionary

Translation &

POS Tagging Cross-lingual

Projection Filipino-English

Sentences Corpus Elimination of

Identical entries

Moses Training

Lexicon with Translation

Accuracy

Data mining

Selection

based on

highest

TRANS_ACC

Sorting FilCon

48

This study focused on generating a Filipino Subjective Lexicon using Filipino-

English bilingual dictionary, Moses, and SentiWordNet 3.0. The researchers designed

modules and interfaces patterned to the previous study mentioned in the conceptual

framework of Chapter 1. This is evident in the figure shown below.

2. Description of Modules and Interfaces

a. Data mining: The Filipino words were retrieved to serve as the initial entries

for FilCon. There was no separation of subjective words from the objective

words.

b. Translation and POS Tagging: The Filipino words, as input strings,

undergone translation and POS Tagging processes. With the use of a Filipino-

English bilingual dictionary as a basis, the English translation and the Part of

Speech of a Filipino word was provided, hence, translation was done.

c. Cross-Lingual Projection: Once the Filipino words were given with English

translations, the polarity scores were annotated. The English entries and their

corresponding polarity scores from SentiWordNet 3.0 were matched with the

same English translation of a specific Filipino word. Word-matching and

polarity scores annotation was another term for this module.

d. Moses Training: This module trained sets of corpora. With the Filipino

sentences having their English translations, the translation accuracy for each

word can be obtained by determining the frequency of appearance of the

Filipino word in the sentence and then, retrieving the English translation

counterpart in the English sentence. In this way, the translation accuracy

49

becomes a basis for the most probable translation of a word thus creating a new

database for each Filipino word with the translation accuracy.

e. Elimination of Identical Entries: This module seek to filter the words by

finding the most probable translation for a specific word, reducing the various

translations made. Using the translation accuracy, the following cases are

addressed:

a. Case 1: If there were different translated entries but with the same

translation accuracy for a specific Filipino word, the word filtering was

done by manual selection of the most probable meaning of each word.

b. Case 2: If there were different translation and different translation

accuracy for a specific Filipino word, the word filtering was done by

automatic selection of the word with the highest translation accuracy.

c. Case 3: If there were no identical translated entries for a specific

Filipino word, it was automatically included in the final output to be

sorted.

f. Sorting: Through PHP scripting, the filtered lexicon database obtained with

translation accuracy were joined to form a sorted lexicon, FilCon, with POS,

ID, POS_SCORE, NEG_SCORE, ENG_WORD and FIL_WORD.

B. Results of Conducted Research

The Filipino-English bilingual dictionary, Moses and SentiWordNet 3.0 were

combined to generate a Filipino Subjective Lexicon. The Filipino words were translated to

the most probable translation using the translation accuracy generated by Moses. FilCon

50

was generated, containing 22,380 words annotated with polarity scores. However, there

were around 30-40% part of the FilCon that were with lemma and the remaining were in

the base form (root word). These results were used for analysis and interpretation.

C. Analysis and Interpretation of Results

In order to evaluate the accuracy of each word in the generated Filipino Subjective

Lexicon using a Filipino-English bilingual dictionary, Moses, and SentiWordNet 3.0, the

translation accuracy was used from the training sets in Moses. If there were more than one

translations for a specific word, the translation with a high translation accuracy was

interpreted to be of the nearest possible translation for that corresponding Filipino word.

However, if there was at most one translation for that specific word, then it was considered

as the probable translation available.

𝑂𝑣𝑒𝑟𝑎𝑙𝑙_𝑎𝑐𝑐𝑢𝑟𝑎𝑐𝑦𝑡𝑟𝑎𝑛𝑠𝑙𝑎𝑡𝑖𝑜𝑛 =∑ 𝑤𝑜𝑟𝑑𝑛−1

max_num_entries𝑛=0

max_𝑛𝑢𝑚_𝑒𝑛𝑡𝑟𝑖𝑒𝑠× 100%

Figure IV-2. Translation Accuracy of Filipino Words

Moreover, the translation accuracy for these Filipino words were averaged and

multiplied to 100% to evaluate the accuracy level of translation of entries. By applying the

formula of averaging in Figure IV-2, the overall translation accuracy yielded an accuracy

of 63.622%. Such result was obtained since some of the alignment during the MOSES

training is not properly matched with the right translation; hence, it may result into a

mistranslation in the FilCon entry.

For the overall evaluation of the accuracy level of FilCon, the researchers tested the

output of the system by using sentences that were handpicked and randomly selected from

51

various sources. The results of the system(see Appendix D, Table D-1) were then compared

to the evaluation of an expert using a two-tailed t-test with ά =0.05 having the hypotheses

Ho: μ1=μ2 and H1: μ1≠μ2.

Table IV- 1. Sentiments Confusion Matrix

EXPERTS

Positive Negative Neutral FILCON

Positive 11 3 1

Negative 8 10 1

Neutral 12 9 0

Table IV-1 shows a confusion matrix of the results of the sentences that were

evaluated by the system versus the evaluation of the experts having the expert’s evaluation

the basis. The right diagonal of the matrix shows how many sentences where evaluated the

same by the system and the experts. The other cells shows the number of sentence that

were evaluated differently by the system and the experts. There were 21 correctly evaluated

sentences and 34 incorrectly evaluated sentences for a total of 55 sentences evaluated.

𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 =𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝐶𝑜𝑟𝑟𝑒𝑐𝑡𝑙𝑦 𝐸𝑣𝑎𝑙𝑢𝑎𝑡𝑒𝑑 𝑆𝑒𝑛𝑡𝑒𝑛𝑐𝑒𝑠

𝑇𝑜𝑡𝑎𝑙 𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑆𝑒𝑛𝑡𝑒𝑛𝑐𝑒𝑠× 100%

Figure IV-3. Formula for calculating FilCon’s accuracy

FilCon’s accuracy were then calculated using the formula given on Figure IV-2.

Based on the calculation, FilCon’s accuracy is 38.18%.

Figure IV-4. T-test Formula for Paired Two Sample for Means

52

Table IV- 2. Summary of the T-Test Computation

System Expert

Mean -0.058906422 0.140606061

Observations(n) 55 55

t computed -2.63523816

t critical ±2.004879288

Table IV-2 shows the means of the means of the results of the system and t

computed using the formula in Figure IV-4 and the t critical from the table of t-values. The

t computed, -2.63523816, is outside the range of the t critical values ±2.004879288, hence

we reject Ho. It denotes that there was a significant difference between the system’s

sentiment scores and the expert’s sentiment scores.

D. Summary of Findings

The generation of the FilCon yielded the following results:

1. The Filipino-English bilingual dictionary, Moses, and SentiWordNet was a

possible combination in generating a Filipino Subjective Lexicon.

2. The translation (SentiWordNet from English to Filipino) accuracy of the FilCon

entries is 63.622%.

3. The overall accuracy of FilCon was 38.18%.

4. The generated FilCon were significantly different from expert’s scores.

53

Chapter V

Summary, Conclusions, and Recommendations

A. Summary

Scaling would be assigning a metric system as a standard unit of measurement for

a certain object. To scale a subjective statement, sentiment analysis is required. Such

analysis are conducted in order to draw acceptable conclusion regarding a certain matter

based on the standard scale. This standard scale is the FilCon.

FilCon is generally an opinion polarity lexicon which determined the polarity

(positivity and negativity degree) of a word which then contributes in the analysis of the

subjectivity and the sentiment of a given document. With the use of FilCon, annotating

Filipino words for subjectivity analysis would be automatically be available to its users

without human intervention.

FilCon was basically constructed through the use of dictionary-based approach.

This approach gave a developing language an advantage for a seamless generation of words

in the lexicon through direct translation at a word/phrase-level. This was how FilCon came

into existence. FilCon made use of an English-Filipino bilingual dictionary in translating

synsets in the SentiWordNet lexicon.

Sampling of words from the FilCon was made through the use of MOSES training

tool. MOSES did the training through aligning of English sentences with a human-provided

counter-translation Filipino sentences. These sentences were fed into MOSES for aligning.

This process, then, yielded translation probability accuracy for a group of words in FilCon.

54

The testing of the accuracy of subjectivity was measured through the use of corpus-based

approach.

B. Conclusions

The researchers were able to generate a Filipino Polarity Lexicon, FilCon adapting

the combination of word level-annotation approach, dictionary-based approach and corpus-

based cross lingual approach through the use of language tools involving the Filipino-

English bilingual dictionary, Moses and SentiWordNet.

The translation of FilCon entries and its cross-lingual projection with the

SentiWordNet entries were made possible. Out of 6,124 sample data from the training sets

of Moses used for computing the translation accuracy, the result yields 63.622%. It only

showed that the translation of the Filipino words is relatively accurate.

The overall accuracy of FilCon depended on the sentence sentiment analysis. Based

on the results, 38.18% was the accuracy.

In the t-test computation, there was a significant difference between the sentiment

scores of FilCon and sentiment scores of experts. It denoted that scores from the generated

lexicon were significantly different from expert’s scores.

C. Recommendations

The researchers acknowledged anyone who wish to further study the generation of

Filipino Subjective Lexicon. The following recommendations may be used for its

improvement:

1. Use the word mappings of WordNet after gathering Filipino words. This is helpful

to understand more about the relationship of a word to another word. Bootstrapping,

55

an another method for building lexicons, can also be done using word mappings

only that, few manually selected Filipino word seeds are used.

2. Gather word entries from two or more Filipino-English bilingual dictionaries for a

different perspective. Since vocabulary is continuing to expand, different words are

formed. Some dictionaries may list new words such as UP Diksyunaryong Pilipino.

3. Document-level annotation may be applied for subjectivity classification. It may be

performed by determining the frequency of subjective or sentiment words in a

document. Since review classification or web opinion mining is one of natural

language applications, a document-level annotation may help FilCon for faster

development.

4. Use a different method in finding the most probable translation. Since

lemmatization affects the word in a way that a word loses its meaning and context

when lemmatized, manual translation (native speakers) may be applied or different

language tools may be used.

5. Gather different sentences with different structures as a train set corpora in Filipino

languages. Some sentences in Filipino tend to have different meanings depending

on their sentence structures.

56

Bibliography

Afraz, Z. S., Muhammad, A., & Martinez-Enriquez, A. (2011, December 4). Sentiment-

annotated lexicon construction for an Urdu text based. Pakistan Journal of Science,

63(4). Retrieved March 8, 2014, from

http://www.paas.com.pk/images/volume/pdf/271647097-(7).%2047-

11%20modified%2010.11.2011.pdf

Alexander, D. (n.d.). Data Mining. Retrieved February 13, 2014, from Instructional

Technology Services:

http://www.laits.utexas.edu/~anorman/BUS.FOR/course.mat/Alex/

Arora, P., Bakliwal, A., & Varma, V. (2012, January). Hindi subjective lexicon generation

using graph traversal. Retrieved March 8, 2013, from Dr. Alexander Gelbukh:

http://www.gelbukh.com/ijcla/2012-1/025-039-paper.pdf

Baccianella, S., Esuli, A., & Se, F. (2010, May 19-21). SentiWordNet: An enhanced lexical

resource for sentiment analysis and opinion mining. (N. Calzolari, K. Choukri, B.

Maegaard, & J. Maria, Eds.) Retrieved February 13, 2014, from LREC Conferences:

http://www.lrec-conf.org/proceedings/lrec2010/pdf/769_Paper.pdf

Bikel, D. M., & Zitouni, I. (2012). Multilingual natural language processing applications:

From theory to practice. Westford, Massachusetts, United States of America: US:

International Business Machines Corporation.

Bondoc, R., Garcia, A., Lacaden, J., Ping, Y., & Borra, A. (2010, December). The filipino

wordnet construction. (H. Adorna, Ed.) Philippine Computing Journal, V(2), 44-

57

49. Retrieved December 13, 2013, from

https://docs.google.com/file/d/0BxI8feCZhWsoNExBWDlKeVMwSE0/edit

Borra, A., Pease, A., Roxas, R. O., & Dita, S. (2010). Global wordnet conference: Research

paper details. Retrieved December 13, 2013, from Center For Indian Language

Technology:

http://www.cfilt.iitb.ac.in/gwc2010/pdfs/33_Filipino_Wordnet__Borra.pdf

Chiala, K. (2013, November 14). Tragedy and technology bring people together. Retrieved

December 13, 2013, from The Network: Cisco's Technology News Site:

http://newsroom.cisco.com/feature-content?type=webcontent&articleId=1289398

Conrad, C. (1974, January 1). Context effects in sentence comprehension: A study of the

subjective lexicon. Memory & Cognition, II(1), 130-138. doi:10.3758/BF03197504

Das, A., & Bandyopadhyay , S. (2009). SentiWordNet for Bangla. Retrieved March 8, 2014,

from Amitava Das:

http://www.amitavadas.com/Pub/SentiwordNet%20(Bengali).pdf

Das, A., & Bandyopadhyay, S. (n.d.). SentiWordNet for Indian languages. Retrieved

March 8, 2014, from Association for Computational Linguistics:

http://aclweb.org/anthology//W/W10/W10-3208.pdf

Esuli, A., & Sebastiani, F. (2006). SentiWordNet: A publicly available lexical resource.

Retrieved March 8, 2014, from Uni Digital:

http://gandalf.aksis.uib.no/lrec2006/pdf/384_pdf.pdf

58

Hamouda, A., & Rohaim, M. (2011, November 1). Reviews classification using

SentiWordNet lexicon. The Online Journal on Computer Science and Information

Technology, 2(1), 120-123. Retrieved March 8, 2014, from

https://www.academia.edu/1336655/Reviews_Classification_Using_SentiWordN

et_Lexicon

Hamouda, A., Mahmoud, M., & Mahamed, R. (2011, November). Building machine

learning based senti-word. Journal of Advances in Information Technology, 2(4).

Retrieved March 8, 2014, from

http://www.academia.edu/1336653/Building_Machine_Learning_Based_Senti-

word_Lexicon_for_Sentiment_Analysis

Hatzivassiloglou, V., & McKeown , K. R. (1997). Predicting the semantic orientation of

adjectives . Retrieved March 8, 2014, from ACL Anthology:

http://acl.ldc.upenn.edu/P/P97/P97-1023.pdf

Indurkhya, N., & Damerau, F. J. (2010). Handbook of natural language processing (2nd

ed.). (R. Hebrich, & T. Graepel, Eds.) New York, Unites States of America:

Chapman & Hall/CRC. Retrieved March 8, 2014, from

http://f3.tiera.ru/2/Cs_Computer%20science/CsNl_Natural%20language/Indurkhy

a%20N.,%20Damerau%20F.J.%20(eds.)%20Handbook%20of%20natural%20lan

guage%20processing%20(2ed.,%20CRC,%202010)(ISBN%209781420085921)(

O)(692s)_CsNl_.pdf

Liddy, E. D. (2001). Natural Language Processing. In Encyclopedia of Library and

Information Science (2nd ed.). New York: Marcel Decker, Inc. Retrieved

59

December 13, 2013, from

http://surface.syr.edu/cgi/viewcontent.cgi?article=1043&context=istpub

Moghaddam, S., & Ester, M. (2012, August 12). Aspect-based opinion mining from online

reviews. Retrieved January 17, 2014, from Simon Fraser University:

http://www.cs.sfu.ca/~ester/papers/SIGIR2012.Tutorial.Final.pdf

Pang, B., & Lee, L. (2008). Opinion mining and sentiment analysis. Retrieved February

13, 2014, from Cornell University:

http://www.cs.cornell.edu/home/llee/omsa/omsa.pdf

Perry, M. (2010). Face to face versus computer-mediated communication: Couples

satisfaction and experience across conditions. University of Kentucky Master's

Theses. Retrieved December 13, 2013, from UKnowledge / University of Kentucky

Libraries: http://uknowledge.uky.edu/gradschool_theses/66

Regaldo, R. J., & Cheng, C. K. (2012, November). Feature-based subjectivity classification

of filipino text. Asian Language Processing (IALP), 2012 International Conference,

57-60. doi:10.1109/IALP.2012.39

Titan Soft (2014). Filipino Dictionary (Athena Version) [Mobile application software].

Retrieved April 2014 from http://play.google.com/store/apps

Turney, P. D., & Littman, M. L. (2003). Measuring praise and criticism: Inference of

semantic orientation from association. ACM Transactions of Information System,

21(4), 315-346. Retrieved March 8, 2014, from

http://acl.ldc.upenn.edu/eacl2006/main/papers/13_1_esulisebastiani_192.pdf

60

Appendix A

Source Code

<html> <head> <title> FilCon </title> <link rel='shortcut icon' href='rsc/images/logo_filcon.ico'/> <link rel="stylesheet" type="text/css" href="main.css"> </head> <body> <img src='rsc/images/logo_filcon.png' style='float:left; position:absolute; bottom:0px;'/> <div class='container'> <?php // Set default timezone date_default_timezone_set('ASIA/MANILA'); function func_test_sqlite_support() { // Test SQLite 3 Support if(!class_exists('SQLite3')){

die("SQLite 3 NOT supported.");}

} function func_go_bilingual() { func_test_sqlite_support(); // Set database filepath

$dbfilepath = 'rsc/bilingual.db'; // Connect to database try { $dbhandle = new PDO("sqlite:".$dbfilepath); } catch (PDOException $errormsg) { echo "Connection failed: ". $errormsg->getMessage(); } echo "<center><h2> Bilingual Table </h2></center><hr/><br/>"; $query = "SELECT ENG.SERIAL AS ENG_ID, ENG.WORD AS ENGLISH_WORD, FIL.WORD AS FILIPINO_WORD FROM ENG ENG, OTHER FIL ON ENG.SERIAL=FIL.SERIAL ORDER BY ENG.WORD ASC;"; $result = $dbhandle->query($query); echo " <center> <table> <thead> <th>ID</th> <th>English Word</th> <th>Filipino Word</th> </thead> <tbody> ";

61

if($result) { foreach ($dbhandle->query($query) as $row) { echo " <tr> <td>$row[0]</td> <td>$row[1]</td> <td>$row[2]</td> </tr> "; } } else { echo "<tr><td colspan='3'>No results found.</td></tr>"; } echo "</tbody></table></center>"; // Close the database connection $dbhandle = null; } function func_go_strip_chars() { // Set text file path $file_to_read = "rsc/to_split.txt"; $file_to_write = "rsc/splitted.txt"; // File I/O if(file_exists($file_to_read)) { // File found

// Initialize variables $write_text = ""; // Initialize file handlers $file_handle_r = fopen($file_to_read, "r"); // Iteration of text file contents while(!feof($file_handle_r)) { $write_text .= fgets($file_handle_r); } // Initialization of array of unwanted characters $arr_unwanted_chars = array("#"," "); for($index=2;$index<=11;$index++) { $arr_unwanted_chars[$index] = $index - 2; } // Removal of unnecessary characters foreach($arr_unwanted_chars as $elements) { if($elements==" ") { $write_text = str_replace($elements,"\t",$write_text);

62

} else { $write_text = str_replace($elements,"",$write_text); } } // Copies the read text file to new text file file_put_contents($file_to_write, $write_text); // Closes the text files fclose($file_handle_r); print $file_to_read . " successfully splitted/removed delimiting characters: <br/>\n" ; print "<ol>\n"; foreach($arr_unwanted_chars as $elements) { print "\t<li>\t" . $elements . "\t</li>\n"; } print "</ol><br/>\n"; print "Normalized form of the text file is now on: " . $file_to_write . "\n"; } else { // File not found print "ERROR: File not found."; }

} function func_go_elim_dup() { // Set text file path $file_to_read = "rsc/read_trans_acc.txt"; $file_to_write = "rsc/col1_trans_acc.txt"; // File I/O if(file_exists($file_to_read)) { // File found // Initialize variables $write_text = ""; $prev = "$"; $cur = ""; $to_append = ""; // Initialize file handlers $file_handle_r = fopen($file_to_read, "r"); // Iteration of text file contents while(!feof($file_handle_r)) { $cur = fgets($file_handle_r); if($cur == $prev) { $to_append = "\n"; } else {

63

$to_append = $cur; } $prev = $cur; $write_text .= $to_append; print $to_append . "<br/>"; } // Copies the read text file to new text file file_put_contents($file_to_write, $write_text); // Closes the text files fclose($file_handle_r); } else { // File not found print "ERROR: File not found."; } } function func_go_filcon_dir_trans() { func_test_sqlite_support(); // Set database filepath $dbfilepath = 'rsc/bilingual.db'; // Connect to database try { $dbhandle = new PDO("sqlite:".$dbfilepath); }

catch (PDOException $errormsg) { echo "Connection failed: ". $errormsg->getMessage(); } echo "<center><h2> FilCon </h2></center><hr/><br/>"; $query = "SELECT * FROM FILCON_DIRECT_TRANSLATION LIMIT 1000"; $result = $dbhandle->query($query); echo " <center> <table> <thead> <th>POS</th> <th>ID</th> <th>POS_SCORE</th> <th>NEG_SCORE</th> <th>ENGLISH_WORD</th> <th>FILIPINO_WORD</th> </thead> <tbody> "; if($result) { foreach ($dbhandle->query($query) as $row) { echo " <tr> <td>$row[0]</td>

64

<td>$row[1]</td> <td>$row[2]</td> <td>$row[3]</td> <td>$row[4]</td> <td>$row[5]</td> </tr> "; } } else { echo "<tr><td colspan='3'>No results found.</td></tr>"; } echo "</tbody></table></center>"; // Close the database connection $dbhandle = null; } function func_go_filcon() { func_test_sqlite_support(); // Set database filepath $dbfilepath = 'rsc/bilingual.db'; // Connect to database try { $dbhandle = new PDO("sqlite:".$dbfilepath); } catch (PDOException $errormsg) {

echo "Connection failed: ". $errormsg->getMessage(); } echo "<center><h2> FilCon </h2></center><hr/><br/>"; $query = "SELECT * FROM FILCON LIMIT 1000"; $result = $dbhandle->query($query); echo " <center> <table> <thead> <th>POS</th> <th>ID</th> <th>POS_SCORE</th> <th>NEG_SCORE</th> <th>ENGLISH_WORD</th> <th>FILIPINO_WORD</th> <th>ACCURACY</th> </thead> <tbody> "; if($result) { foreach ($dbhandle->query($query) as $row) { echo " <tr> <td>$row[0]</td>

65

<td>$row[1]</td> <td>$row[2]</td> <td>$row[3]</td> <td>$row[4]</td> <td>$row[5]</td> <td>$row[6]</td> </tr> "; } } else { echo "<tr><td colspan='3'>No results found.</td></tr>"; } echo "</tbody></table></center>"; // Close the database connection $dbhandle = null; } function func_put_homebutton() { echo "\n\n <input type='button' name='btn_home' value='Go back to home' style='float:right; position:relative; bottom:0px;' onclick='window.location.href=\"index.php\";'> \n"; }

// Query: Pairing English words with Filipino Words if(isset($_POST['btn_go_bilingual'])) { func_go_bilingual(); } // Query: Stripping SentiWordNet Instances elseif(isset($_POST['btn_go_strip_chars'])) { func_go_strip_chars(); } // Query: Eliminating Duplicate Instances elseif(isset($_POST['btn_go_elim_dup'])) { func_go_elim_dup(); } // Query: Direct Translation of English SentiWordNet to Filipino elseif(isset($_POST['btn_go_filcon_dir_trans'])) { func_go_filcon_dir_trans(); } // Query: Direct Translation of English SentiWordNet to Filipino elseif(isset($_POST['btn_go_filcon'])) { func_go_filcon(); } func_put_homebutton();

66

?> </div>

</body> </html>

67

Appendix B

The FilCon Entries

68

Appendix C

Test Cases

The researchers provided Filipino sentences with words available in FilCon to serve as the

sample test data. FilCon is used to analyse the word- and phrase- level subjectivity by using

the POS_SCORE or NEG_SCORE. The produced results are the following:

69

Appendix D

Summary Tabulation of Test Cases

Table D-1. Test Cases Evaluated (Sentiment Value) by Expert and FilCon

SENTENCE NO. EXPERTave FILCON

1 -0.833333333 -0.247619048

2 0.733333333 -0.166666667

3 0.733333333 -0.511904762

4 -0.733333333 -0.0875

5 0.1 0

6 0.766666667 0.025

7 0.866666667 0.6

8 0.9 0

9 0.466666667 0

10 0.666666667 0.125

11 0.366666667 -0.583333333

12 0.733333333 0

13 0.5 0

14 0 -0.25

15 0 0.05

16 0.666666667 -0.583333333

17 0.833333333 0.175

18 -0.233333333 0

19 0.566666667 -0.625

20 -0.166666667 -0.5

21 -0.033333333 -0.173611111

22 0.5 0

23 -0.666666667 0

24 0.133333333 -0.019736842

25 -0.3 -0.104166667

26 -0.466666667 -0.458333333

27 0.633333333 0.306818182

28 -0.066666667 0

29 0.7 0

30 -0.4 -0.166666667

31 0.3 0.0125

32 0.6 0.283333333

70

33 -0.066666667 -0.489583333

34 0.366666667 0.0625

35 0.3 0

36 0.6 0.716666667

37 0.8 0

38 0.633333333 -0.583333333

39 0.8 0

40 0.766666667 0

41 -0.633333333 0

42 -0.333333333 0.135416667

43 -0.633333333 0

44 -0.7 0

45 -0.766666667 0

46 -0.166666667 0.034375

47 0.4 -0.155555556

48 -0.233333333 0

49 0.233333333 0.035714286

50 -0.233333333 0

51 0.233333333 0

52 0.466666667 0.458333333

53 -0.733333333 -0.166666667

54 -0.633333333 0.05

55 -0.6 -0.4375

71

Table D-2. Test Cases Evaluation (Positive, Negative, Neutral) by Expert and FilCon

SENTENCE NO. EXPERT FILCON

1 NEGATIVE NEGATIVE

2 POSITIVE NEGATIVE

3 POSITIVE NEGATIVE

4 NEGATIVE NEGATIVE

5 POSITIVE NEUTRAL

6 POSITIVE POSITIVE

7 POSITIVE POSITIVE

8 POSITIVE NEUTRAL

9 POSITIVE NEUTRAL

10 POSITIVE POSITIVE

11 POSITIVE NEGATIVE

12 POSITIVE NEUTRAL

13 POSITIVE NEUTRAL

14 NEUTRAL NEGATIVE

15 NEUTRAL POSITIVE



18 NEGATIVE NEUTRAL


20 NEGATIVE NEGATIVE


22 POSITIVE NEUTRAL

23 NEGATIVE NEUTRAL





28 NEGATIVE NEUTRAL

29 POSITIVE NEUTRAL






35 POSITIVE NEUTRAL


72

37 POSITIVE NEUTRAL


39 POSITIVE NEUTRAL

40 POSITIVE NEUTRAL

41 NEGATIVE NEUTRAL

42 NEGATIVE POSITIVE

43 NEGATIVE NEUTRAL

44 NEGATIVE NEUTRAL

45 NEGATIVE NEUTRAL



48 NEGATIVE NEUTRAL


50 NEGATIVE NEUTRAL

51 POSITIVE NEUTRAL





73

Appendix E

Questionnaire

We, the 4th year students of the Institute of Information and Computing Sciences taking

up BS Computer Science, are the proponents of a thesis titled, “FilCon: Filipino Sentiment

Lexicon Generation Using Word Level-Annotated Dictionary-Based and Corpus-Based

Cross Lingual Approach”.

Kindly answer the following. From a range of -10 (as a sentence with most negative

sentiment) to +10 (as a sentence with most positive sentiment) (i.e. answers must be in

between -10 to +10) assess on what degree of sentiment the sentence has.

Thank you.

Name: ________________________

Conformé: ____________________

Filipino Sentences Sentiment

Polarity Score

1. Para sa Senado na maging masigasig sa isang kaso at huwag

pansinin ang isa pang kaso ay nakapagdududa at hindi

katanggap-tanggap.

2. At sa kabila nito, mayroon tayong sapat na resources na

maaaring makapag-develop sa bansa na maraming

mahahangin na isla, maaraw na lagay ng panahon, at

geothermal sites pati na ang matatabang lupain na maaaring

magpayabong ng mga halaman na para sa sa biomass energy.

3. Sa magandang balitang ito mula sa norte at ang inanunsyong

plano ng maraming private firm na gamitin ang kanilang

sariling mga generator sa kritikal na panahon, maaaring hindi

na kailangan ng gobyerno ang makipagkontrata para sa

karagdagang enerhiya sa susunod na taon.

74

4. Sobra raw ang pagkahilig o “pagkagumon” ng boxing champ

sa iba’t ibang larangan bukod sa pagboboksing.

5. Akusado si Pemberton sa pagpatay kay Laude, isang

transgender, sa Subic.

6. Dapat ay pakinggan na ni PNoy ang boses ng kanyang mga

boss.

7. Maganda ang disiplina ng taumbayan at maging ng mga

kawani ng gobyerno.

8. Ipinagmamalaki naman ng Albay ang dumagdag sa

mahabang listahan nila ng mga beauty queen, ang modelong

si Valerie Clacio Weigmann na kinoronahan kamakailan

bilang Miss World Philippines.

9. Kung lumago ang ating ekonomiya at ikinalat sa lahat ang

biyaya nito, dapat maginhawa nang nagagamit ng

mamamayan ang sistema ng transportasyon natin.

10. Ang kainaman naman sa kanya ay nanatili siyang matatag sa

kanyang paninindigan.

11. Bilang pinuno ay hindi siya nasangkot sa anumang

katiwalian.

12. Nagpahayag ng ibayong suporta kontra droga si Antipolo

City Mayor Jun Ynares III upang lalong mapalakas ang

kampanya laban sa illegal drugs sa lungsod

13. Ang Oktubre sa mga taga-Cardona, Rizal ay pagbibigay-

buhay sa kanilang tatlong tradisyon na nakaugat na sa kultura.

14. Sa tagumpay ng fishpen sa lawa na naging pangunahing

pinagmumulan ng supply ng isda sa Metro Manila, hindi na

napigil ang pagdami ng malalaking fishpen sa Laguna de

Bay.

15. Hinimok ng isang obispo ang publiko laban sa pagpapakalat

ng pornographic images at videos sa social media na aniya’y

isa ito sa mga dahilan kung bakit nasasalaula ang isipan ng

kabataan.

75

16. Ayon kay Daet, Camarines Norte Bishop Gilbert Garcera na

dapat ay maging responsable sa paggamit ng Internet ang ilan

upang hindi malantad ang kabataan sa pornograpiya.

17. Umapela pa ang Obispo sa mga mamamayan na tulungan ang

mga mag-asawa na isabuhay ang katapatan at mga pangakong

binitawan nang sila’y ikasal.

18. Natuklasan ng mga dalubhasa ang posibilidad na may epekto

ang kape sa Deoxyribonucleic acid (DNA) ng isang

indibidbwal.

19. Muling pinalawig ng Department of Health (DOH) ang

kanilang Ligtas-Tigdas program hanggang sa Oktubre 10

upang mabigyan pa ng pagkakataon ang mga batang hindi

nabakunahan na mabigyan ng proteksiyon laban sa sakit na

tigdas at polio.

20. Samantala, nakiusap si Emelo sa netizens na huwag

magpakalat ng mali-maling impormasyon na maaaring

magdala ng matinding takot sa kaniyang mga kababayan.

21. Sa kabila ng ulat na pagkamatay ng isang apat na taong

gulang na lalaki na nakitaan ng sintomas ng sakit na

meningococcemia, pinabulaanan ng Municipal Health Office

ng Rosario, Cavite, ang pagkalat ng balita na may meningo

scare sa nasabing lugar.

22. Nais ni Senator Sonny Angara na magkaroon ng feeding

program sa lahat ng public school sa bansa para matugunan

ang laganap na malnutrisyon.

23. Mistulang walang intensiyon ang 290 miyembro ng

Kongreso na imbestigahan ang umano’y overpriced na Iloilo

Convention Center (ICC).

24. Naniniwala ang kalihim na kailangan pag-aralan ang

magpairal ng ilang pagbabago sa crime fighting efforts ng

PNP upang higit na maging epektibo ito sa harap ng

lumalaking populasyon, lalo na sa Metro Manila.

76

25. Isang nakasinding kandila na natumba ang naging mitsa ng

isang sunog na tumupok sa 60 kabahayan sa Quezon City,

kahapon ng umaga.

26. Wala siyang iniisip kung hindi sarili niya.

27. Pinakamagandang araw ito ng buhay ko.

28. Nais kong maging normal ulit ang lahat.

29. Sa palagay ko, ang soccer ay isang kahanga-hangang

paligsahan.

30. Posible na nagsasabi siya ng kasinungalingan.

31. Nagtalumpati ang pangulo sa maraming tao.

32. Malaki ang tiwala niya sa kanyang sarili.

33. Hindi ka makakasalungat sa batas ng kalikasan.

34. Naghihintay ako ng magandang balita mula sa kanila.

35. Kailangan kitang makausap ngayon.

36. ‘Di lang maganda, magaling pa!

37. Walang kinikilingan, walang pinoprotektahan, serbisyong

totoo lamanag.

38. Hindi madaling magalit, ang haba ng pasensya.

39. Sinisigurado niya ng naiintindihan ng lahat ang tinuturo niya.

40. Laging maaga sa klase, ganado magturo.

41. Nagagalit tuwing nagtatanong ang estudyante.

42. Mabilis magturo.

43. Yung totoo, tres o singko lang ang alam na ibigay, anong

klaseng prof, yan?

44. Parang binabasa lang niya yung libro kapag nagtuturo.

45. Walang kabuhay-buhay magturo, parang binabasa lang yung

binigay na hand-out

46. Agad dumating ang mataas na alon.

77

47. Maagang umalis ang malakas na bagyo.

48. Siya ay umawit ng kanta.

49. Kami ay huminga ng hangin.

50. Ginalaw ng nana yang damit.

51. Kumain ng sorbetes si ama.

52. Umani ng mga papuri ang pari.

53. Kalimutan mo nang nakilala kita!

54. Isa kang malaking hadlang sa aking hinaharap!

55. Ako ay labis na humagulgol dahil namatay si nanay.

78

Curriculum Vitae

DARLAN KEEN SABADO DOMINGO

1257 Alfredo St. Sampaloc, Manila

Mobile Number : +639175420697

Email: [email protected]

Career Objective

To be able to work in an IT industry to utilize my skills and abilities effectively, while

learning and understanding the innovative industry in the process.

Educational Attainment

Bachelor of Science Computer Science (Dean's List Awardee)

University of Santo Tomas

2011-present

Secondary Education

Tarlac State University Laboratory School

2007-2011

Primary Education

Ecumenical Christian College

2001-2007

Skills Summary

Language

Filipino (Written and Spoken)

English (Written and Spoken)

Technical

Knowledgeable in C, C++, Java and Assembly programming languages

Completed SAP Business One Training Course (2012)

IBM DB2 Academic Associate: DB2 Database and Application Fundamentals

Knowledgeable in MS Word, Excel and Powerpoint

Knowledgeable in basic networking, webpage design

79

Work Experience / Trainings Attended:

Intern at Accenture Philippines

Extra-Curricular Activities

Computer Science Society Member (2011-present)

Local Commission on Elections, Deputy (2011)

Local Commission on Elections, Computer Science Commissioner (2012)

Local Commission on Elections, Finance Officer (2013)

Junior Philippine Computer Society , Member (2012)

82

Curriculum Vitae

JEROME LORENZO LIAO LOPEZ

26 Corolla St. Village East Executive Homes Cainta, Rizal 1900

Phone Number: 09277411565 || Landline: (02) 655-1612

e-mail: [email protected]

________________________________________________________________________

Career Objective

To be able to work in a career oriented and challenging environment that will

promote personal growth and professional development

___________________________________________________________________

Skills Summary

Competitive with knowledge of software engineering and programming using

Java and C/ C++ languages.

Knowledgeable in web design and development using HTML, CSS, and

JavaScript.

Basic knowledge in Database Management using IBM Express C, Enterprise Resource Planning using SAP Business One, and networking.

Basic knowledge of System Analysis and Design.

Willing to learn new concepts in the industry. Inclined in sharing ideas with

others and handling pressure.

________________________________________________________________________


University of Santo Tomas BS Computer Science 2011-2015

Sampaloc, Manila Tertiary Level

Faith Christian School Secondary Level 2009-2011 Cainta, Rizal

Marikina Science High School Secondary Level 2007-2009

Sta. Elena, Marikina

San Benildo Integrated School Primary Level 2000-2007 Cainta, Rizal

_______________________________________________________________________

Extra-Curricular

UST Computer Science Society Member 2011 - 2014

UST Junior Philippine Computer Society Member 2011 - 2014

83

Curriculum Vitae

ALEXANDRA CABANTOG MONDARES

Nuestra Señora Dela Paz Subdivision, Brgy. Sta. Cruz

Sumulong Highway, Antipolo City, Rizal 1870

Home: (02) 213-7607 | Cell: 09273810105

[email protected]

Career Objective

To be able to attain a work position that will suit my knowledge and skills as a Computer

Science Major and to be able to acquire experience in a professional workplace while

contributing on the company’s goals


University of Santo Tomas 2011 - Present

BS Computer Science

St. Clare Science High School 2007 - 2011

Secondary Education

First Honorable Mention

St. Clare Montessori School 2000 - 2007

Primary Education

Skills Summary

IBM DB2 Express C

Web Development and Design

In-depth knowledge in Systems Analysis and Design

Understanding of accounting principles related to SAP Business One

In-depth knowledge of Subnetting and network configurations using Cisco Packet

Tracer

Oriented in Microsoft Office Applications such as:

o Microsoft Office Powerpoint

o Microsoft Office Excel

o Microsoft Office Word

Capable of creating and editing a video, picture, files using Adobe Photoshop and

Sony Vegas

Basic knowledge in Linux and Windows Operating Systems

84

Languages:

o HTML and CSS

o ASSEMBLY

o C/C++

o JAVA

Certifications

IBM DB2 Academic Associate

SAP Business One Certificate of Completion

Seminars Attended

Techno Game Development February 13, 2014

Computer Science Society

IT Conference 2014: Today and Tomorrow January 24, 2014

Computer Society of the Ateneo (COMPSAT)

Ateneo De Manila University

Microsoft Student Partner Seminar December 2011

Junior Philippine Computer Society

Extra-Curricular Involvement

Code Jam 2014 February 14, 2014

Participant

Thomasian Engineer (Engineering Publication)

Project Head, 35th General Information Quiz Contest (GIQC) A.Y. 2013 - 2014

Photojournalist 2012 – Present

Computer Science Society

Team Head, Documentation Team A.Y. 2013 - 2014

Member 2011 - Present

Junior Philippine Computer Society

Staff, Special Development Team A.Y. 2014 - 2015

FilCon: Filipino Sentiment Lexicon Generation using Word Level-annotated Dictionary-based and...

Documents

Transcript of FilCon: Filipino Sentiment Lexicon Generation using Word Level-annotated Dictionary-based and...