|
Data Mining Project
Daniel Brockman
Dan@DanielBrockman.com
Pursuant to
Course CIS366, SAS Data Mining
Summer 2006
University of California Berkeley Extension
Jianmin Liu, Instructor
jiliu@ggu.edu
|
Contents
|
Report
Lift Charts for Final Models
cumulative
noncumulative
Notes on the project
Conclusion
On working with fellow students
Candidate Neural Network Models
Model NN-D2
Model NN-D3
Lift Chart -- cumulative
Lift Chart -- noncumulative
Project Assignment
Data
Review by
Jaison K. Joseph
|
|
|
|
Cumulative Lift Chart for Final Models
|
Contents
|
|
|
Noncumulative Lift Chart for Final Models
|
Contents
|
|
|
Notes on the project
|
SAS Enterprise Data Miner impresses me
with the ease with which one
can produce useful results. This makes it
appear simple. At the same time, it
offers many opportunities to insert
controls on the modeling process. This
makes it appear complex.
Feeling constrained by time, I
focused on the specifically assigned tasks, but did
allow myself a bit of exploration.
1. Conclusion: I notice the
Tree model gives results quite as good or
better than other models, though it runs
more quickly than the Neural Networks,
and considers more data than the Logit
model. If the cost of data processing
becomes significant, the Tree model
should outperform the others in return on
cost. If the cost of data processing isn't
significant, then the ensemble of three models,
Tree-D2, Logit-D2 and NN-D3, gives
the best results.
2. The Neural Network Models:
Interested in Neural Networks, I created
several of these models, two of which (NN-D2 and NN-D3) were interesting, and I
retained them. I used an assessment node to choose NN-D3 as the best Neural Network model.
3. Variable Transformation: The Neural
Networks weren't interesting until I used
the Variable Transformation node to
create some binned variables, which
showed up as "Ordinal" in the i
network diagrams.
4. Discarded Models: In all, I
created two Logit models, two Tree models
and four Neural Network models, and
discarded half of them because they had
no predictive value and no interest.
5. Working with fellow students:I had
initially agreed to collaborate with
Chookij Vanatham, but he had scheduling
conflicts with unrelated activities and
had to withdraw from the collaboration.
At that point, no
opportunity remained to plan a joint
project with someone else. Jaison Joseph
reviewed my work at an intermediate stage.
Nicholas Zemstov and Chookij asked me
about structuring the project. They and
Jaison inquired with me about some
technical questions. I provided what
suggestions I could. Jaison and Chookij
and I worked together on the technical
task of getting copies of the SAS
software for our use.
6. Considerations for the Future:
At the end of a project, I always notice what
might have been
done differently but for some constraint.
They serve to guide future projects.
One is the assessments using "profit"
to discriminate the goodness of a model.
In the future, I want to find the control
on this behavior and investigate
alternatives. Also, I've given little
attention to CHAID, CART and C4.5 models,
all of which have virtues about which I
want to learn more. Further, the Neuron
Network models' capabilities interest me
immensely, and I want to explore them more.
Contents
|
|
|
|
Candidate Neural Network Models
Model NN-D2
|
Contents
|
|
|
Candidate Neural Network Models
Model NN-D3
|
Contents
|
|
|
Candidate Neural Network Models
Lift Chart -- Cumulative
|
Contents
|
|
|
Candidate Neural Network Models
Lift Chart -- Noncumulative
|
Contents
| | | | | |
|
Top of Page
| Home
| Up
| Daniel Brockman
| Contact
|