NCSU MSA Program Course Module

Text Mining

Spring 2009
Meeting Times: Tuesday/Thursday, 10:45am – 12:00pm till Feb 5
See calendar: http://analytics.ncsu.edu/?page_id=88
Meeting Location: Venture 3 third-floor IAA classroom (Map)
Wolfware Course Web
Instructor: Tao Xie



Course Module Description

This module will address practical techniques and tools of data mining when the “data” takes the form of text. In this module, we will first present the overview and background knowledge of text mining. To allow hand-on demonstration, we will then illustrate text mining principles by applying SAS text miner on various example data. In particular, this module will cover basic steps and problems when dealing with textual data by showing what is possible to achieve without very sophisticated technology. The module will cover more sophisticated techniques for solving more difficult and challenging problems. The students will gain hands-on experience by doing various exercises with SAS text miner and doing a group project by applying SAS text miner on realistic textual data of students' interest. By the end of the course module, the students should be able to apply common text mining techniques on textual data to accomplish specific tasks in practice.


Software: SAS Text Miner (Part of SAS Enterprise Miner)

In the VCL - we will be using the "IAA Text Mining" image.

Projects will be saved, on the VCL blade, to C:\EMINER. YOU MUST LOG OFF THE BLADE so that your project will be copied to K:\EMINER. The next time you load the image, a script will automatically copy back the contents of K:\EMINER to the C:\EMINER where you can continue to work on your project. IF YOU DO NOT LOG OFF THE VCL IMAGE, if your reservation times out, if you simply click the X in the corner, etc, YOUR PROJECT WILL NOT BE COPIED TO THE K DRIVE AND WILL NOT BE SAVED.

You are suggested to use the text miner in SAS Enterprise Miner 5.2.

But you can also use SAS Text Miner 9.1, which is included in SAS 9.1 (specifically in the SAS Enterprise Miner 4.3 and later versions). You can use the copy installed in NCSU VCL. You can follow the instructions described at NCSU SAS web to install a copy in your local machine but the release includes only SAS Enterprise Miner Client not Server software (after installation, you need to follow the instruction here to set system path environment variable).


Class Schedule:

 W

 Date

Topic

Assigned Readings

Assignments

1

01/13

Text Mining Overview-I
[Office 2007 Slides][PDF]

 


1

01/15

Text Mining Overview-II
[Office 2007 Slides][PDF]
[News Sample Text]

 
2

01/20

Text Mining Overview-III
[Office 2007 Slides][PDF]

2

01/22 

Text Mining Overview-IV
[Office 2007 Slides][PDF]

Thursday 1/22 11pm: submit via Wolfware submission link a text file (either in plain text, PDF, or MS Word/RTF file) including the information on textual data that are available publicly or available to you and you are most interested in applying SAS Text Miner on. The submitted information can include brief description of the textual data as well as the web link for accessing the textual data if it is publicly available.

Here are the list of textual data proposed by 2008 students. Here are some sample data sets.

Data proposed by 2009 students

3

01/27

Text Mining with SAS Text Miner I:
- Basics
[Slides PDF][Demo Notes][SAS Data Set][SAS startup code]
  
3

01/29 

Text Mining with SAS Text Miner II: 
- Exploratory Analysis
[Slides PDF][Demo Notes][SAS Data Set]
[DownTheMall Tool][SAS startup code]
Thursday Jan 29 11pm Homework Group Exercise 1 due
4

02/3

Text Mining with SAS Text Miner III:
- Predictive Modeling
[Demo Notes][SAS Data Set][SAS startup code]
  Tuesday Feb 3 11pm Project Group submits the information on (1) the description of the textual data being mined including its web link if available (identify the units in the text collection); (2) the set of question(s) to be answered with text mining; (3) the type of mining techniques to be used (e.g., text clustering, text categorization).
4

02/05 

Project Presentations
Thursday Feb 5 11pm Group Project Report due.

The report should include an expanded version of the information submitted on Feb 3.

In addition, the report should include (1). mining procedure, including configurations used during each step of the mining procedure; and (2). the mining results (e.g., how the outputs of SAS Text Miner help answer the target questions). The report should include enough details for anyone else to reproduce the same mining results given the same set of raw data being mined.

The report should describe lessons learned (what works and what doesn't work), e.g., which particular configuration has significant impact on the mining result.

Finally, the report should describe how each group member contributes to which part(s) of the project development.

 

Related Materials on SAS Text Miner

Related Tools

Related Courses

Related Survey Papers

Related Books/Book Chapters

Related Sample Data Sets

SAS Text Miner Basics