Data Mining with SAS® Enterprise Miner - SUCourse

7
Page 1 of 7 M.Sc. in Business Analytics Program Spring 2017 BAN 504 – Data Mining with SAS® Enterprise Miner Instructor: Onuralp Öztürk, M.Sc. Office: SAS Turkey Phone: (212) 212-9808 Fax: (212) 212-9824 E-mail: [email protected] Web: SUCourse Office Hours: Wednesdays 14:00-18:00 Type Time Days Where Class 2:40 pm - 5:30 pm W SOM G-041 Course Objective: This course is designed to equip students with skills in using SAS® Enterprise Guide to perform statistical analyses and also to cover the skills that are required to assemble analysis flow diagrams using the rich tool set of SAS® Enterprise Miner for both pattern discovery (segmentation, association, and sequence analyses) and predictive modeling (decision tree, regression, and neural network models). Learning Outcomes: Upon successful completion of the course, the student should be able to: 1. Generate descriptive statistics and explore data with graphs, 2. Perform analysis of variance, 3. Perform linear regression and assess the assumptions, 4. Use diagnostic statistics to identify potential outliers in multiple regression, 5. Use chi-square statistics to detect associations among categorical variables, 6. Fit a multiple logistic regression model, 7. Define a SAS Enterprise Miner project and explore data graphically, 8. Modify data for better analysis results, 9. Build and understand predictive models such as decision trees and regression models, 10. Compare and explain complex models, 11. Generate and use score code, 12. Apply association and sequence discovery to transaction data, 13. Be prepared for the Predictive Modeling Using SAS Enterprise Miner certification exam

Transcript of Data Mining with SAS® Enterprise Miner - SUCourse

Page 1 of 7

M.Sc. in Business Analytics Program

Spring 2017 BAN 504 – Data Mining with SAS® Enterprise Miner

Instructor: Onuralp Öztürk, M.Sc. Office: SAS Turkey Phone: (212) 212-9808 Fax: (212) 212-9824 E-mail: [email protected] Web: SUCourse Office Hours: Wednesdays 14:00-18:00 Type Time Days Where Class 2:40 pm - 5:30 pm W SOM G-041

Course Objective: This course is designed to equip students with skills in using SAS® Enterprise Guide to perform statistical analyses and also to cover the skills that are required to assemble analysis flow diagrams using the rich tool set of SAS® Enterprise Miner for both pattern discovery (segmentation, association, and sequence analyses) and predictive modeling (decision tree, regression, and neural network models). Learning Outcomes: Upon successful completion of the course, the student should be able to: 1. Generate descriptive statistics and explore data with graphs, 2. Perform analysis of variance, 3. Perform linear regression and assess the assumptions, 4. Use diagnostic statistics to identify potential outliers in multiple regression, 5. Use chi-square statistics to detect associations among categorical variables, 6. Fit a multiple logistic regression model, 7. Define a SAS Enterprise Miner project and explore data graphically, 8. Modify data for better analysis results, 9. Build and understand predictive models such as decision trees and regression models, 10. Compare and explain complex models, 11. Generate and use score code, 12. Apply association and sequence discovery to transaction data, 13. Be prepared for the Predictive Modeling Using SAS Enterprise Miner certification exam

BAN 504 – Data Mining with SAS® Enterprise Miner – Spring 2017 2/7

Course Material: The recommended course textbooks are; SAS® Enterprise Guide®: ANOVA, Regression, and Logistic Regression Course Notes © 2012 (EGBS for short) was developed by Marc Huber. Additional contributions were made by Ron Cody, Stephanie Curtis, Robert Hamer, Gerry Hobbs, Dan Kelly, Bob Lucas, Paul Marovich, Danny Modlin, Liza Nirelli, Azhar Nizam, Mike Patetta, Jill Tao, Catherine Truxillo, and artwork by Stanley Goldman. Editing and production support was provided by the Cirruculum Development and Support Department. Applied Analytics Using SAS® Enterprise Miner® Course Notes © 2011 (AAEM for short) was developed by Peter Christie, Jim Georges, Jeff Thompson, and Chip Wells. Additional contributions were made by Tom Bohannon, Mike Hardin, Dan Kelly, Bob Lucas, and Sue Walsh. Editing and production support was provided by the Curriculum Development and Support Department. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies. Copyright © SAS Institute Inc. Cary, NC, USA. All rights reserved. Printed in the United States of America. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, or otherwise, without the prior written permission of the publisher, SAS Institute Inc. The textbooks will be given to students before the 1st lesson starts. There will be other readings throughout the course of the semester. All readings are chosen with the aim of stimulating class participation. Read all assigned material for a given week before coming to class. Additional list of references:

• Siddiqi, Naeem (2006) Credit Risk Scorecards. Wiley. • Skoglund, Jimmy and Chen, Wei (2015) Financial Risk Management. Wiley. • Chakraborty, Goutam and Pagolu, Murali and Garla, Satish (2013) Text Mining

and Analysis. SAS Press. • Pinherio, C.A.R. (2011) Social Network Analysis in Telecommunications. Wiley.

BAN 504 – Data Mining with SAS® Enterprise Miner – Spring 2017 3/7

List of Readings

Reading 1 Date: March 1, 2017 Case:

Subject: Teamwork:

Beware Spurious Correlations Exploratory Data Analysis No

Reading 2 Date: April 12, 2017 Case:

Subject: Teamwork:

Big Data’s Dangerous New Era of Discrimination Data Mining No

Reading 3 Date: April 26, 2017 Case:

Subject: Teamwork:

Let Algorithms Decide and Act for Your Company Predictive Modeling No

Reading 4 Date: May 17, 2017 Case:

Subject: Teamwork:

The Discipline of Business Experimentation Experimentation No

Course Web: In this course, students will actively use the SUCourse online system. Lecture notes, slides and additional material will be available on SUCourse. Students will be expected to visit the course web site at least two or three times a week. SUCourse will also be used for any in-class exercises, short assignments and uploading/downloading all course-related files. Students will submit all assignments via SUCourse. Sabanci University uses a very powerful web-based tool called Turnitin. Turnitin is the worldwide standard in online plagiarism prevention. It allows instructors to compare student papers against a database composed of millions of articles. Every paper submitted by students will be scanned and cross-checked by Turnitin, and results will be reflected in students’ grades. Instructional Design: This course is intended to be highly interactive, engaging students in active learning through hands-on exercises, invited guest speakers, etc. in addition to standard lecture material. To facilitate this process, students are expected to come to class prepared by reading the assigned material, and to actively and meaningfully participate in class discussions.

BAN 504 – Data Mining with SAS® Enterprise Miner – Spring 2017 4/7

Grading: Participation : 20% Homeworks : 20% Midterm Exam : 25% Final Exam : 35%

Requirements: General requirements regarding the grading items listed above are as follows:

a) Participation score is awarded for active in-class participation only (not attendance). Participation means joining in class discussions and intellectually contributing to the learning in the classroom by voicing one’s ideas, comments, feedback, etc. regarding the subject matter.

b) Homeworks are to be individually submitted via SUCourse no later than the posted due date and time. For team submissions, only 1 team member can submit. Students will be allowed late submissions up to 48 hours past the due date, but each day of late submission (after the first 5 minutes late) will cost 25% of the grade. After 48 hours, SUCourse will be closed for submission and late homeworks will not be accepted. There will be no deadline extensions for any homework or report.

c) Midterm Exam will include multiple-choice, short essay and problem solving type of questions. The exam will be closed book, closed notes and closed laptops. A make-up exam will not be offered for missing the Midterm Exam; students who have medical reasons (with accompanying doctor’s report) to miss the Midterm will have their grading percent added to the Final Exam.

d) Final exam will be an exam conducted also as the Predictive Modeling Using SAS Enterprise Miner certification exam offered by SAS®. Your passing score in this exam will not only contribute to your course grade, but also will determine if you are eligible for the certification.

e) If you miss a particular assignment (including class attendance) due to sickness, accident, etc., you must bring in an official doctor’s report describing the situation before you can request a make-up for the missed grade. No other excuses will be accepted for make-up purposes.

Academic Honesty: Learning is enhanced through cooperation and as such you are encouraged to work in groups, ask for and give help freely in all appropriate settings. At the same time, as a matter of personal integrity, you should only represent your own work as yours. Any work that is submitted to be evaluated in this class should be an original piece of writing, presenting your ideas in your own words. Everything you borrow from books, articles, or web sites (including those in the syllabus) should be properly cited. Although you are encouraged to discuss your ideas with others (including your friends in the class), it is important that you do not share your writing (slides, MS Excel files, reports, etc.) with anyone. Using ideas, text and other intellectual property developed by someone else while claiming it is your original work is plagiarism. Copying from others or providing answers or information, written or oral, to others is cheating. Unauthorized help from another person or having someone else write one’s paper or assignment is collusion.

BAN 504 – Data Mining with SAS® Enterprise Miner – Spring 2017 5/7

Cheating, plagiarism and collusion are serious offenses that could result in an F grade and disciplinary action. Please pay utmost attention to avoid such accusations. Classroom policies and conduct: Sabancı Business Analytics M.Sc. Program values participatory learning. Establishing the necessary social order for a participatory learning environment requires that we all: • Come prepared to make helpful comments and ask questions that facilitate your

own understanding and that of your classmates. This requires that you complete the assigned readings for each session before class starts.

• Listen to the person who has the floor. During class, no unnecessary conversations. • Come to class on time. Professor has the right to not let you in until the first break. • Except emergency and health related excuses, do not leave and reenter the class

during each fifty-minute-long lecture. • All cell phones or other electronic devices should be turned off unless they are used

as part of the lecture. In some weeks, you may be asked to bring laptop computer for in-class exercises. Laptops will be used in this course for purposes of hands-on learning and enhanced understanding of the course material. However, laptops will be used only for educational purposes and as instructed by the Professor; any other use during the lecture will be strongly discouraged.

Course Schedule:

Week 1 Date: Feb. 8, 2017 Topics: Course Introduction

Introducing SAS Enterprise Guide Introduction to Statistics

Requirements: Reading (before class) – Chapter 1 in EGBS Reading (before class) – Chapter 2 in EGBS

Week 2 Date: Feb. 15, 2017 Topic: Analysis of Variance (ANOVA)

Requirements: Reading (before class) – Chapter 3 in EGBS Assigned – Homework #1

Week 3 Date: Feb. 22, 2017 Topic: Regression & Diagnostics

Requirements: Reading (before class) – Chapter 4 in EGBS Reading (before class) – Chapter 5 in EGBS Due – Homework #1 Assigned – Reading 1: Beware Spurious Correlations

Week 4 Date: Mar. 1, 2017 Topic: Reading 1 Discussion

Exploratory Data Analysis Categorical Data Analysis

Requirements: Reading (before class) – Chapter 6 in EGBS Assigned – Homework #2

Week 5 Date: Mar. 8, 2017 Topics: Introduction to SAS Enterprise Miner

Accessing and Assaying Prepared Data Requirements: Reading (before class) – Chapter 1 in AAEM

Reading (before class) – Chapter 2 in AAEM Due – Homework #2

BAN 504 – Data Mining with SAS® Enterprise Miner – Spring 2017 6/7

Week 6 Date: Mar. 15, 2017 Topics: Introduction to Predictive Modeling

Decision Trees Requirements: Reading (before class) – Chapter 3 in AAEM

Assigned – Homework #3 Week 7 Date: Mar. 22, 2017

Topics: In-class Midterm Exam on WEDNESDAY, MARCH 22 Requirements: Textbook, notes, laptops

Due – Homework #3 Week 8 Date: Mar. 29, 2017

Topics: Regressions Neural Networks Other Modeling Tools

Requirements: Reading (before class) – Chapter 4 in AAEM Reading (before class) – Chapter 5 in AAEM Assigned – Reading 2: Big Data’s Dangerous New Era of Discrimination

Week 9 Date: Apr. 12, 2017 Topics: Reading 2 Discussion

Data Mining Model Assessment Model Implementation

Requirements: Reading (before class) – Chapter 6 in AAEM Reading (before class) – Chapter 7 in AAEM Assigned – Homework #4

Week 10 Date: Apr. 19, 2017 Topics: Ensemble Models

Variable Selection Categorical Input Consolidation Surrogate Models

Requirements: Reading (before class) – Chapters 9, 1-39 in AAEM Due – Homework #4 Assigned – Reading 3: Let Algorithms Decide and Act for Your Company

Week 11 Date: Apr. 26, 2017 Topics: Reading 3 Discussion

Predictive Modeling SAS Rapid Predictive Modeler A Closer Look at Credit Scorecard Development – SAS perspective

Requirements: Reading (before class) – Chapters 9, 39-54 in AAEM Week 12 Date: May 3, 2017

Topics: Introduction to Pattern Discovery Clustering SAS Recency, Frequency, and Monetary Analysis (Self-Study)

Requirements: Reading (before class) – Chapter 8, 1-59 in AAEM Assigned – Homework #5

Week 13 Date: May 10, 2017 Topics: A Closer Look at Social Network Analysis – SAS perspective

Market Basket Analysis Sequence Analysis

Requirements: Reading (before class) – Chapter 8, 59-85 in AAEM Due – Homework #5

BAN 504 – Data Mining with SAS® Enterprise Miner – Spring 2017 7/7

Assigned – Reading 4: The Discipline of Business Experimentation Week 14 Date: May 17, 2017

Topics: Reading 4 Discussion Experimentation Review for the Certification Exam

Requirements: Weeks 15-16 Date: Final Exam Weeks

Requirements: Exam date to be announced