Implementasi Metode, Yustinus Vernanda, FTI UMN, 2018kc.umn.ac.id/5110/7/HALAMAN AWAL.pdfspam filter...
-
Upload
nguyenliem -
Category
Documents
-
view
217 -
download
0
Transcript of Implementasi Metode, Yustinus Vernanda, FTI UMN, 2018kc.umn.ac.id/5110/7/HALAMAN AWAL.pdfspam filter...
Team project ©2017 Dony Pratidana S. Hum | Bima Agus Setyawan S. IIP
Hak cipta dan penggunaan kembali:
Lisensi ini mengizinkan setiap orang untuk menggubah, memperbaiki, dan membuat ciptaan turunan bukan untuk kepentingan komersial, selama anda mencantumkan nama penulis dan melisensikan ciptaan turunan dengan syarat yang serupa dengan ciptaan asli.
Copyright and reuse:
This license lets you remix, tweak, and build upon work non-commercially, as long as you credit the origin creator and license it on your new creations under the identical terms.
IMPLEMENTASI METODE N-GRAM DAN ALGORITMA
NAIVE BAYES UNTUK MENDETEKSI EMAIL SPAM
BERBAHASA INDONESIA BERBASIS WEB SERVICE
SKRIPSI
Diajukan sebagai salah satu syarat untuk memperoleh gelar
Sarjana Komputer (S. Kom.)
Yustinus Vernanda
13110110111
PROGRAM STUDI TEKNIK INFORMATIKA
FAKULTAS TEKNIK DAN INFORMATIKA
UNIVERSITAS MULTIMEDIA NUSANTARA
TANGERANG
2017
Implementasi Metode..., Yustinus Vernanda, FTI UMN, 2018
Implementasi Metode..., Yustinus Vernanda, FTI UMN, 2018
iv
KATA PENGANTAR
Puji syukur kepada Tuhan Yang Maha Esa yang selalu menyertai selama
masa pengerjaan skripsi dan laporan skripsi berjudul “Implementasi Metode N-
gram dan Algortima Naive Bayes untuk Mendeteksi Email Spam Berbahasa
Indonesia Berbasis Web Service.” sehingga dapat diselesaikan dengan baik dan
benar. Skripsi ini diajukan kepada Program Studi Teknik Informatika, Fakultas
Teknik dan Informatika, Universitas Multimedia Nusantara.
Penyelesaian skripsi ini juga dibantu dan didukung oleh berbagai pihak,
seperti teman-teman, dosen-dosen pembimbing, dan keluarga. Oleh karena itu,
ucapan terima kasih yang sebesar-besarnya diucapkan kepada:
1. Dr. Ninok Leksono, selaku Rektor Universitas Multimedia Nusantara,
2. Hira Meidia, Ph. D., selaku Wakil Rektor Bidang Akademik,
3. Ir. Andrey Andoko, M.Sc., selaku Wakil Rektor Bidang Administrasi
Umum dan Keuangan,
4. Ika Yanuarti, S.E., MSF., selaku Wakil Rektor Bidang Kemahasiswaan,
5. Prof. Dr. Muliawati G. Siswanto, M.Eng.Sc., selaku Wakil Rektor Bidang
Hubungan dan Kerjasama,
6. Maria Irmina Prasetiyowati, S.Kom., M.T., selaku Ketua Program Studi
Teknik Informatika Universitas Multimedia Nusantara dan dosen
pembimbing pengerjaan skripsi,
7. Marcel Bonar Kristanda, S.Kom., M.Sc., selaku dosen pembimbing pertama
pengerjaan skripsi,
8. Seng Hansun, S.Si., M.Cs., selaku dosen pembimbing kedua pengerjaan
skripsi,
Implementasi Metode..., Yustinus Vernanda, FTI UMN, 2018
Implementasi Metode..., Yustinus Vernanda, FTI UMN, 2018
vi
IMPLEMENTASI METODE N-GRAM DAN ALGORITMA
NAIVE BAYES UNTUK MENDETEKSI EMAIL SPAM
BERBAHASA INDONESIA BERBASIS WEB SERVICE
ABSTRAK
Indonesia termasuk dalam rangking 18 dari total penyebaran spam di dunia.
Spam filter berbasis web service dengan jenis REST API dapat digunakan untuk
mendeteksi email spam berbahasa Indonesia pada email server maupun berbagai
jenis aplikasi email server. Dengan REST API, maka antar aplikasi dapat
melakukan pertukaran data bertipe JSON menggunakan perintah-perintah yang ada
pada HTTP. Salah satu jenis spam filter adalah Bayesian Filtering, dimana
algoritma Naive Bayes sebagai algoritma klasifikasinya. Sedangkan, Metode N-
gram digunakan untuk menambah akurasi dari pengimplementasian algoritma
Naive Bayes. Metode N-gram dan algoritma Naive Bayes untuk mendeteksi email
spam berbahasa Indonesia berbasis web service berhasil diimplementasikan dengan
hasil nilai akurasi sekitar 0,615 hingga 0,94, dengan nilai precision berkisaran pada
0,566 hingga 0,924, kemudian nilai recall berkisaran pada 0,96 hingga 1, dan nilai
f-measure berkisaran pada 0,721 hingga 0,942. Dengan metode 5-gram sebagai
metode N-gram yang memiliki nilai tertinggi untuk mendeteksi email spam
berbahasa Indonesia dengan nilai akurasi sebesar 0,94, nilai precision sebesar
0,924, nilai recall sebesar 0,96, dan nilai f-measure sebesar 0,942.
Kata Kunci: N-gram, Naive Bayes, spam filter, web service
Implementasi Metode..., Yustinus Vernanda, FTI UMN, 2018
vii
IMPLEMENTATION OF N-GRAM METHOD AND NAIVE
BAYES ALGORITHM FOR DETECTING INDONESIAN
LANGUAGE EMAIL SPAM BASED ON WEB SERVICE
ABSTRACT
Indonesia is ranked 18th out of the total spread of spam in the world. Web
Service-based spam filter has been able to detect spam emails in Bahasa Indonesia
on email servers and various types of email client applications. With REST API,
there will be data exchange in a form of JSON using HTTP commands. One of
spam filter technique is called Bayesian Filtering. This technique implements Naïve
Baiyes algorithm to classify such information into spam or non-spam category.
Meanwhile, the N-gram method has the ability to increase the accuracy of the
implementation of the Naive Bayes algorithm. Combination of N-gram method and
the Naive Bayes algorithm to detect spam email in Bahasa Indonesia based on web
service was successful with accuracy about 0,615 until 0,94, precision at 0,566 until
0,924, then recall at 0,96 until 1 and f-measure at 0.721 until 0.94. Moreover, the
5-gram method gives the best result based on test between 0-gram to 10-gram in
spam detection with calculated evaluation in accuracy at 0,94 , precision at 0,924,
recall at 0,96 and f-measure at 0,942.
Keyword: N-gram, Naive Bayes, spam filter, web service
Implementasi Metode..., Yustinus Vernanda, FTI UMN, 2018
viii
DAFTAR ISI
LEMBARAN PENGESAHAN SKRIPSI ............................................................. ii
PERNYATAAN TIDAK MELAKUKAN PLAGIAT ......................................... iii
KATA PENGANTAR ........................................................................................ iv
ABSTRAK ......................................................................................................... vi
ABSTRACT ...................................................................................................... vii
DAFTAR ISI .................................................................................................... viii
DAFTAR TABEL ............................................................................................... x
DAFTAR GAMBAR .......................................................................................... xi
DAFTAR RUMUS ........................................................................................... xiv
BAB I PENDAHULUAN ................................................................................. 1
1.1 Latar Belakang Masalah......................................................................... 1
1.2 Rumusan Masalah .................................................................................. 3
1.3 Batasan Masalah .................................................................................... 3
1.4 Tujuan Penelitian ................................................................................... 4
1.5 Manfaat Penelitian ................................................................................. 4
1.6 Sistematika Penulisan ............................................................................ 4
BAB II LANDASAN TEORI ............................................................................ 6
2.1 Email ..................................................................................................... 6
2.2 Spam dan Ham ...................................................................................... 8
2.3 Klasifikasi Spam dengan Algortima Naive Bayes ................................ 10
2.3.1 Penelitian Terdahulu ..................................................................... 10
2.3.2 Spam Filtering Teknik .................................................................. 12
2.3.3 Text Mining .................................................................................. 12
2.3.4 Metode N-gram ............................................................................ 18
2.3.5 Algoritma Naive Bayes ................................................................. 19
2.4 Efektivitas Spam Filter ........................................................................ 23
2.5 REST ................................................................................................... 25
2.6 Arsitektur Email Spam Filter ............................................................... 29
BAB III METODOLOGI PENELITIAN DAN PERANCANGAN SISTEM ..... 31
3.1 Metodologi Penelitian .......................................................................... 31
3.2 Perancangan Sistem ............................................................................. 32
3.2.1 Data Flow Diagram....................................................................... 34
3.2.2 Hierarki Menu .............................................................................. 41
3.2.3 Flowchart ..................................................................................... 44
3.2.4 Perancangan Application Programming Interface(API) ................. 78
3.2.4 Database Schema .......................................................................... 91
3.2.5 Struktur Tabel ............................................................................... 92
3.2.6 Perancangan Antarmuka ............................................................... 98
BAB IV IMPLEMENTASI DAN UJI COBA .................................................. 104
4.1 Spesifikasi Sistem .............................................................................. 104
4.2 Implementasi Algoritma .................................................................... 105
4.3 Implementasi Antarmuka ................................................................... 110
4.4 Implementasi Data JSON ................................................................... 114
4.5 Implementasi Web Service................................................................. 118
4.6 Uji Coba Perhitungan Manual ............................................................ 122
4.7 Uji Coba Spam Filter ......................................................................... 129
Implementasi Metode..., Yustinus Vernanda, FTI UMN, 2018
ix
4.8 Analisis Data ..................................................................................... 137
BAB V KESIMPULAN DAN SARAN .......................................................... 142
5.1 Kesimpulan........................................................................................ 142
5.2 Saran ................................................................................................. 142
DAFTAR PUSTAKA ...................................................................................... 144
DAFTAR LAMPIRAN .................................................................................... 148
Implementasi Metode..., Yustinus Vernanda, FTI UMN, 2018
x
DAFTAR TABEL
Tabel 2.1 Hasil Penelitian Mathew dan Bai ........................................................ 11
Tabel 2.2 Daftar Prefiks yang Meluluh ............................................................... 17
Tabel 2.3 Daftar Kemungkinan Perubahan Prefiks ............................................. 17
Tabel 2.4 Daftar Kombinasi Prefiks dan Sufiks yang Tidak Diperbolehkan ........ 17
Tabel 2.5 Daftar Kata Kategori Spam................................................................. 20
Tabel 2.6 Daftar Kata Kategori Ham .................................................................. 21
Tabel 2.7 Daftar Kata yang Diklasifikasi ............................................................ 21
Tabel 2.8 Nilai Probabilitas pada Kata Diklasifikasi dengan Kategori Spam ...... 22
Tabel 2.9 Nilai Probabilitas pada Kata Diklasifikasi dengan Kategori Ham ........ 22
Tabel 3.1 Aturan Kombinasi Awalan dan Akhiran yang Tidak Diperbolehkan ... 58
Tabel 3.2 Aturan Peluruhan Imbuhan ................................................................. 61
Tabel 3.3 Struktur Tabel Dataset ........................................................................ 92
Tabel 3.4 Struktur Tabel dataset_fil ................................................................... 93
Tabel 3.5 Struktur Tabel dataset_kosakata ......................................................... 94
Tabel 3.6 Struktur Tabel Keys............................................................................ 95
Tabel 3.7 Struktur Tabel Klasifikasi ................................................................... 95
Tabel 3.8 Struktur Tabel Limits ......................................................................... 96
Tabel 3.9 Struktur Tabel Login .......................................................................... 96
Tabel 3.10 Struktur Tabel ngram ........................................................................ 97
Tabel 3.11 Struktur Tabel Stopword ................................................................... 97
Tabel 3.12 Struktur Tabel tb_katadasar .............................................................. 98
Tabel 4.1 Frekuensi Setiap Kata ....................................................................... 122
Tabel 4.2 Probabilitas Setiap Kata Terhadap Spam dan Ham ........................... 124
Tabel 4.3 Confusion Matrix untuk Spam Filter tanpa Metode N-gram .............. 130
Tabel 4.4 Confusion Matrix untuk Spam Filter dengan Metode 1-gram ............ 130
Tabel 4.5 Confusion Matrix untuk Spam Filter dengan Metode 2-gram ............ 130
Tabel 4.6 Confusion Matrix untuk Spam Filter dengan Metode 3-gram ............ 131
Tabel 4.7 Confusion Matrix untuk Spam Filter dengan Metode 4-gram ............ 131
Tabel 4.8 Confusion Matrix untuk Spam Filter dengan Metode 5-gram ............ 131
Tabel 4.9 Confusion Matrix untuk Spam Filter dengan Metode 6-gram ............ 132
Tabel 4.10 Confusion Matrix untuk Spam Filter dengan Metode 7-gram .......... 132
Tabel 4.11 Confusion Matrix untuk Spam Filter dengan Metode 8-gram .......... 132
Tabel 4.12 Confusion Matrix untuk Spam Filter dengan Metode 9-gram .......... 133
Tabel 4.13 Confusion Matrix untuk Spam Filter dengan Metode 10-gram ........ 133
Tabel 4.14 Hasil Uji Coba Nilai Akurasi .......................................................... 133
Tabel 4.15 Hasil Uji Coba Nilai Recall ............................................................ 134
Tabel 4.16 Hasil Uji Coba Nilai Precision ........................................................ 135
Tabel 4.17 Hasil Uji Coba Nilai F-measure ...................................................... 136
Tabel 4.18 Hasil Uji Spam Filter ...................................................................... 138
Implementasi Metode..., Yustinus Vernanda, FTI UMN, 2018
xi
DAFTAR GAMBAR
Gambar 2.1 Cara Kerja Email ............................................................................. 7
Gambar 2.2 Contoh Proses Case-folding ............................................................ 13
Gambar 2.3 Contoh Proses Tokenizing .............................................................. 14
Gambar 2.4 Contoh Proses Filtering ................................................................... 14
Gambar 2.5 Contoh Proses Stemming ................................................................ 15
Gambar 2.6 Format Kata Berimbuhan dalam Bahasa Indonesia.......................... 16
Gambar 2.7 Arsitektur REST API ...................................................................... 27
Gambar 2.8 Contoh HTTP request ..................................................................... 27
Gambar 2.9 Contoh HTTP Respond Error .......................................................... 28
Gambar 2.10 Contoh HTTP Respond Success .................................................... 28
Gambar 2.11 Arsitektur Spam Filter Sebagai Local Proxy .................................. 29
Gambar 2.12 Arsitektur Spam Filter Proses Paralel ............................................ 29
Gambar 2.13 Arsitektur Spam Filter dalam Mail Client ..................................... 30
Gambar 2.14 Arsitektur Spam Filter dalam Mail Store ....................................... 30
Gambar 3.1 Sistem Flow Spam Filter ................................................................. 33
Gambar 3.2 Diagram Konteks Aplikasi API Spam Filter .................................... 34
Gambar 3.3 Diagram level 1 Aplikasi API Spam Filter ...................................... 35
Gambar 3.4 Diagram Level 2 Pada Proses 1.5 Aplikasi API Spam Filter ............ 36
Gambar 3.5 Diagram Level 2 Pada Proses 1.7 Aplikasi API Spam Filter ............ 37
Gambar 3.6 Diagram Level 2 Pada Proses 1.8 Aplikasi API Spam Filter ............ 39
Gambar 3.7 Diagram Level 2 Pada Proses 1.9 Aplikasi API Spam Filter ............ 41
Gambar 3.8 Hierarki Menu Visitor dan User ...................................................... 42
Gambar 3.9 Hierarki Menu Admin ..................................................................... 43
Gambar 3.10 Flowchart Menu Utama Visitor dan User ...................................... 45
Gambar 3.11 Flowchart Menu Register .............................................................. 47
Gambar 3.12 Flowchart Menu Sign In ............................................................... 48
Gambar 3.13 Flowchart Menu Case Folding ...................................................... 49
Gambar 3.14 Flowchart Case Folding ................................................................ 49
Gambar 3.15 Flowchart Menu Tokenizing ......................................................... 50
Gambar 3.16 Flowchart Tokenizing ................................................................... 51
Gambar 3.17 Flowchart Menu Filtering ............................................................. 51
Gambar 3.18 Flowchart Filtering ....................................................................... 52
Gambar 3.19 Flowchart Menu Stemming ........................................................... 53
Gambar 3.20 Flowchart Stemming ..................................................................... 54
Gambar 3.21 Flowchart Function Del_Inflection_Suffixer ................................. 55
Gambar 3.22 Flowchart Function Del_Derivation_Suffixes ............................... 59
Gambar 3.23 Flowchart Function Del_Derivation_Prefix ................................... 60
Gambar 3.24 Flowchart Menu N-gram ............................................................... 62
Gambar 3.25 Flowchart N-gram ......................................................................... 63
Gambar 3.26 Flowchart Menu Naive Bayes ....................................................... 65
Gambar 3.27 Flowchart Algoritma Naive Bayes ................................................ 66
Gambar 3.28 Flowchart Menu API..................................................................... 70
Gambar 3.29 Flowchart Menu Generate Key User ............................................. 70
Gambar 3.30 Flowchart Menu List User............................................................. 71
Gambar 3.31 Flowchart Menu Remove Key User .............................................. 71
Gambar 3.32 Flowchart Menu Add Dataset ........................................................ 71
Implementasi Metode..., Yustinus Vernanda, FTI UMN, 2018
xii
Gambar 3.33 Flowchart Menu Remove Dataset ................................................. 72
Gambar 3.34 Flowchart Menu List Dataset ........................................................ 72
Gambar 3.35 Flowchart Menu Check Spam ....................................................... 72
Gambar 3.36 Flowchart Menu Console .............................................................. 73
Gambar 3.37 Flowchart Menu Profile ................................................................ 73
Gambar 3.38 Flowchart Menu Upgrade ............................................................. 74
Gambar 3.39 Flowchart Menu Halaman Admin ................................................. 75
Gambar 3.40 Flowchart Menu Active Key ......................................................... 76
Gambar 3.41 Flowchart Menu Suspend Key ...................................................... 76
Gambar 3.42 Flowchart Menu Upgrade Key ...................................................... 77
Gambar 3.43 Flowchart Menu Sign Out ............................................................. 77
Gambar 3.44 API Flow Spam Filter ................................................................... 78
Gambar 3.45 Header dan Body Method Generate Key User ............................... 80
Gambar 3.46 Header dan Body Method Remove Key User ............................... 81
Gambar 3.47 Header dan Body Method List User .............................................. 82
Gambar 3.48 Header dan Body Method List Dataset .......................................... 83
Gambar 3.49 Header dan Body Method Remove Dataset ................................... 84
Gambar 3.50 Header dan Body Method Add Dataset ......................................... 85
Gambar 3.51 Header dan Body Method Check Spam ......................................... 86
Gambar 3.52 Struktur Data JSON Failure .......................................................... 87
Gambar 3.53 Struktur Data JSON Method Generate Key User ........................... 87
Gambar 3.54 Struktur Data JSON Method Remove Key User ............................ 88
Gambar 3.55 Struktur Data JSON Method List User .......................................... 88
Gambar 3.56 Struktur Data JSON Method List Dataset ...................................... 89
Gambar 3.57 Struktur Data JSON Method Remove Dataset ............................... 89
Gambar 3.58 Struktur Data JSON Method Add Dataset ..................................... 90
Gambar 3.59 Struktur Data JSON Method Check Spam ..................................... 90
Gambar 3.60 Database Schema Spam Filter ....................................................... 91
Gambar 3.61 Rancangan Halaman Konten ......................................................... 98
Gambar 3.62 Rancangan Menu Visitor............................................................... 99
Gambar 3.63 Rancangan Menu User ................................................................ 100
Gambar 3.64 Rancangan Menu Sign In ............................................................ 101
Gambar 3.65 Rancangan Menu Register .......................................................... 101
Gambar 3.66 Rancangan Menu Console ........................................................... 102
Gambar 3.67 Rancangan Menu Halaman Admin .............................................. 103
Gambar 4.1 Potongan Kode Case Folding ........................................................ 105
Gambar 4.2 Potongan Kode Tokenizing ........................................................... 105
Gambar 4.3 Potongan Kode Filtering ............................................................... 106
Gambar 4.4 Potongan Kode Stemming............................................................. 106
Gambar 4.5 Potongan Kode N-gram ................................................................ 107
Gambar 4.6 Potongan Kode Algoritma Naive Bayes ........................................ 108
Gambar 4.7 Implementasi Desain Halaman Konten ......................................... 110
Gambar 4.8 Implementasi Desain Menu Visitor ............................................... 111
Gambar 4.9 Implementasi Desain Menu User .................................................. 111
Gambar 4.10 Implementasi Desain Menu Sign In............................................. 112
Gambar 4.11 Implementasi Desain Menu Register ........................................... 112
Gambar 4.12 Implementasi Desain Menu Console ........................................... 113
Gambar 4.13 Implementasi Desain Halaman Admin ........................................ 113
Implementasi Metode..., Yustinus Vernanda, FTI UMN, 2018
xiii
Gambar 4.14 Implementasi Data JSON Failure ................................................ 114
Gambar 4.15 Implementasi Data JSON Metode Generate Key User ................. 115
Gambar 4.16 Implementasi Data JSON Metode Remove Key User .................. 115
Gambar 4.17 Implementasi Data JSON Metode List User ................................ 115
Gambar 4.18 Implementasi Data JSON Metode List Dataset ............................ 116
Gambar 4.19 Implementasi Data JSON Metode List Dataset ............................ 117
Gambar 4.20 Implementasi Data JSON Metode Add Dataset ........................... 117
Gambar 4.21 Implementasi Data JSON Metode Check Spam ........................... 117
Gambar 4.22 Arsitektur Implementasi Web service .......................................... 118
Gambar 4.23 Potongan Kode Koneksi Gmail ................................................... 119
Gambar 4.24 Potongan Kode Pengambilan Email ............................................ 119
Gambar 4.25 Potongan Kode Request Web Service Spam Filter ...................... 120
Gambar 4.26 Potongan Kode Respond Web Service ........................................ 121
Gambar 4.27 Screenshot Implementasi Web Service ........................................ 121
Gambar 4.28 Screenshot Hasil Klasifikasi Menggunakan Spam Filter .............. 128
Gambar 4.29 Grafik Akurasi Spam Filter ......................................................... 134
Gambar 4.30 Grafik Recall Spam Filter ........................................................... 135
Gambar 4.31 Grafik Precision Spam Filter ....................................................... 136
Gambar 4.32 Grafik F-measure Spam Filter ..................................................... 137
Gambar 4.33 Grafik Hasil Uji Spam Filter ....................................................... 139
Implementasi Metode..., Yustinus Vernanda, FTI UMN, 2018
xiv
DAFTAR RUMUS
Rumus 2.1 Perhitungan Algoritma Naive Bayes ................................................. 19
Rumus 2.2 Perhitungan Naive Bayes Classifier ................................................. 19
Rumus 2.3 Probabilitas Kata Terhadap Kelas ..................................................... 20
Rumus 2.4 Probabilitas Kelas............................................................................. 20
Rumus 2.5 Perhitungan Akurasi ......................................................................... 23
Rumus 2.6 Perhitungan Recall ........................................................................... 24
Rumus 2.7 Perhitungan Precision ....................................................................... 24
Rumus 2.8 Perhitungan F-measure ..................................................................... 24
Implementasi Metode..., Yustinus Vernanda, FTI UMN, 2018