Post on 08-Dec-2015
description
Sequence• Sebuah sequence adalah urutan dari elemen-elemen (transaksi)
s = < e1 e2 e3 … >
Setiap elemen terdiri dari kumpulan kejadian-kejadian (item)ei = {i1, i2, …, ik}
Setiap elemen merupakan atribut yang dihubungkan dengan suatu lokasi atau waktu tertentu (spesifik)
• Panjang Sequence, |s|, adalah banyaknya unsur-unsur sequence yang diberikan.
• A k-sequence adalah sebuah sequence yang terdiri dari k kejadian (item)
Sequential Pattern 3
SequenceDatabaseSequence
Sequence Elemen (Transaksi) Kejadian(Item)
Customer Transaksi-transaksi penjualan yang dilakukan oleh konsumen tertentu
Item – item yang dibeli konsumen dalam waktu t.
Buku, diary Produk, CD, dll.
Web Data Aktifitas browsing pada pengunjung web tertentu
Sekumpulan File-file yang dilihat pengunjung web setelah melakukan proses single mouse click
Home page, index page, contact info, dll
Event data Kejadian – kejadian yang dihasilkan oleh sensor tertentu
Kejadian-kejadi yang timbul dari sensor saat waktu t
Jenis-jenis tanda(alarm) yang dihasilkan oleh sensor
Genome sequences
DNA sequence dari spesies tertentu
Elemen dari DNA sequence Bases A,T,G,C
Sequential Pattern 4
Sequence Web sequence:
< {Homepage} {Electronics} {Digital Cameras} {Canon Digital Camera} {Shopping Cart} {Order Confirmation} {Return to Shopping} >
Sequence kejadian kecelakaan yang disebabkan oleh ledakan nuklir pada 3-mile Island:(http://stellar-one.com/nuclear/staff_reports/summary_SOE_the_initiating_event.htm) < {clogged resin} {outlet valve closure} {loss of feedwater}
{condenser polisher outlet valve shut} {booster pumps trip} {main waterpump trips} {main turbine trips} {reactor pressure increases}>
Sequence buku checked out pada perpustakaan: <{Fellowship of the Ring} {The Two Towers} {Return of the King}>
Sequential Pattern 5
CS632 - Data Mining 64/3/01
Mining Sequential Patterns (1)
• Sequential patterns are ordered list of itemsets.
• Market basket example:– Customers typically rent “star wars” then “empire
strikes back” then “return of the Jedi”– “Fitted sheets and pillow cases” then “comforter”
then “drapes and ruffles”
CS632 - Data Mining 74/3/01
Mining Sequential Patterns (2)
• Looks at sequences of transactions as opposed to a single transaction.
• Groups transactions based on customer ID.– Customer sequence.
SequenceDefinisi Subsequent
Sequential Pattern 8
Sebuah sequence <a1 a2 … an> terdapat dalam sequence lain <b1 b2 … bm> (m ≥ n) jika terdapat integer i1 < i2 < … < in maka a1 bi1 , a2 bi1, …, an bin
Support subsequence w didefinisikan sebagai bagian dari data sequence yang berisi w
Sequential pattern adalah subsequence yang sering muncul (yaitu, support subsequence ≥ minsup)
Data sequence Subsequence Contain?
< {2,4} {3,5,6} {8} > < {2} {3,5} > Yes
< {1,2} {3,4} > < {1} {2} > No
< {2,4} {2,4} {2,5} > < {2} {4} > Yes
Sequential Pattern MiningDefinisi
Sequential Pattern 9
Terdapat:
Database sequence
Minimum menetapkan user yang
mendukung(support), minsup
Task:
Menemukan semua subsequence dengan user
yang mendukung ≥ minsup
Sequential Pattern Mining
• Algoritma Sequential Pattern Mining
Sequential Pattern 10
1. Sort Phase
2. Large Itemset Phase
3. Transformation Phase
4. Sequence Phase
5. Maximal Phase
Sequential Pattern Mining• Contoh Kasus
Sequential Pattern 11
customerID itemBought
1111
10,50203040
2222
10304030,50
3333
10203040
444
103050
55
4050
Sequential Pattern Mining
Sequential Pattern 12
customerID itemBought
1111
10,50203040
2222
10304030,50
3333
10203040
444
103050
55
4050
customerID Customer Sequence
1 <(10,50) (20) (30) (40)>
2 <(10) (30) (40) (30,50)>
3 <(10) (20) (30) (40)>
4 <(10) (30) (50)>
5 <(40) (50)>
1.Sort PhaseMengurutkan berdasarkan customerID sebagai major key
Sequential Pattern Mining2. Large Itemset Phase
Sequential Pattern 13
• Menentukan Large Itemset
• Memetakan ItemsetcustomerID Customer Sequence
1 <(10,50) (20) (30) (40)>
2 <(10) (30) (40) (30,50)>
3 <(10) (20) (30) (40)>
4 <(10) (30) (50)>
5 <(40) (50)>
Itemset Support
(10) 4
(20) 2
(30) 4
(40) 4
(50) 4
(10,50) 1
(30,50) 1
min_sup = 40%
40% x 5 = 2 customer sequence
Large Itemset Dipetakan ke-
(10) 1
(20) 2
(30) 3
(40) 4
(50) 5
Catatan:Sehingga apabila terdapat itermset (10,50), Memenuhi minimum support akan dikodekan 6, dst
Sequential Pattern Mining3. Transformation Phase
Sequential Pattern 14
Customer ID Original Sequence Transformed CustomerSequence
Setelah Pemetaan
1 <(10,50) (20) (30) (40)> ⟨ {(10) (50)} {(20)} {(30)} {(40)} ⟩ ⟨ {1, 5} {2} {3} {4} ⟩
2 <(10) (30) (40) (30,50)> ⟨ {(10)} {(30)} {(40)} {(30) (50)}⟩ ⟨ {1} {3} {4} {3, 5}⟩
3 <(10) (20) (30) (40)> ⟨ {(10)} {(20)} {(30)} {(40)} ⟩ ⟨ {1} {2} {3} {4} ⟩
4 <(10) (30) (50)> ⟨ {(10)} {(30)} {(50)}⟩ ⟨ {1} {3} {5} ⟩
5 <(40) (50)> ⟨ {(40)} {(50)}⟩ ⟨ {4} {5} ⟩
• Menghapus non-Large Itemset • Memetakan Large Itemset ke suatu integer
Sequential Pattern Mining4. Sequential Phase
Sequential Pattern 16
Menggunakan set Large Itemset, untuk mencari hasil sequence tertentu
Dua jenis algoritma Count-All
1. Algoritma AprioriAll Count-Some
1. Algoritma AprioriSome2. Algoritma DynamicSome
Sequential Pattern Mining
⟨ {1, 5} {2} {3} {4} ⟩ ⟨ {1} {3} {4} {3, 5}⟩ ⟨ {1} {2} {3} {4} ⟩
⟨ {1} {3} {4} ⟩ ⟨ {4} {5} ⟩
Sequential Pattern 17
Customer Sequence
Sequential Pattern Mining
⟨ {1, 5} {2} {3} {4} ⟩ ⟨ {1} {3} {4} {3, 5}⟩ ⟨ {1} {2} {3} {4} ⟩
⟨ {1} {3} {5} ⟩ ⟨ {4} {5} ⟩
Sequential Pattern 18
Large 1-Sequence
Sequence Support<1> 4<2> 2<3> 4<4> 4<5> 4
Sequential Pattern Mining
⟨ {1, 5} {2} {3} {4} ⟩ ⟨ {1} {3} {4} {3, 5}⟩ ⟨ {1} {2} {3} {4} ⟩
⟨ {1} {3} {5} ⟩ ⟨ {4} {5} ⟩
Sequential Pattern 19
Large 2-Sequence
Sequence Support<1 2> 2<1 3> 4<1 4> 3<1 5> 2<2 3> 2<2 4> 2<3 4> 3<3 5> 2<4 5> 2
Sequential Pattern Mining
⟨ {1, 5} {2} {3} {4} ⟩ ⟨ {1} {3} {4} {3, 5}⟩ ⟨ {1} {2} {3} {4} ⟩
⟨ {1} {3} {5} ⟩ ⟨ {4} {5} ⟩
Sequential Pattern 20
Large 3-Sequence
Sequence Support<1 2 3> 2<1 2 4> 2<1 3 4> 3<1 3 5> 2<2 3 4> 2
Sequential Pattern Mining
⟨ {1, 5} {2} {3} {4} ⟩ ⟨ {1} {3} {4} {3, 5}⟩ ⟨ {1} {2} {3} {4} ⟩
⟨ {1} {3} {5} ⟩ ⟨ {4} {5} ⟩
Sequential Pattern 21
Large 4-Sequence
Sequence Support<1 2 3 4> 2<1 3 4 5> 1
Sequential Pattern Mining5. Maximum Phase
Sequential Pattern 22
S, set seluruh Large Itemset n, merupakan jarak terpanjang
sequence
Sequential Pattern Mining
Sequential Pattern 23
Sequence Support<1 2 3 4> 2
Sequence Support<1 2 3> 2<1 2 4> 2<1 3 4> 3<1 3 5> 2<2 3 4> 2
Sequence Support<1 2> 2<1 3> 4<1 4> 3<1 5> 3<2 3> 2<2 4> 2<3 4> 3<3 5> 2<4 5> 2
Sequence Support<1> 4<2> 2<3> 4<4> 4<5> 4
Latihan soal• Berikut daftar transaksi pembelian dari customer. Maka
carilah pola urutan pembelian item yang dilakukan customer.
Virtual Memory 25
Customer ID Transaction Time
Items Bought
11
June 25 93June 30 93
3090
222
June 10 93June 15 93June 20 93
10, 203040, 60, 70
3 June 25 93 30, 50, 70
444
June 25 93June 30 93July 25 93
3040, 7090
5 June 12 93 90
Minimum support: 2 item