Cory's Wiki

Data mining has two complementary goals: understanding and prediction. It encompases many technologies and techniques from AI, physics, statistics, and machine learning.

Data Sets

Data mining operates on observational, secondary, or retrospective data. This is not data gathered for an experiment, but is often recorded for other purposes and is cheap to obtain.

There are common types or sources of data that have similar properties e.g. web data or streaming data.

Properties of Data

  • numeric, text, visual
  • discrete or continuous

Main Data Mining Techniques

Descriptive Methods

Predictive Modelling

Typical Data Mining Process

  • Problem formulation
    1. define goals
    2. understand the problem domain
  • Data Definition
    • How does the available data address the problem?
  • Validation
    • Apply rigorous testing to proposed solutions
    • can be considered “productization”

Emerging Fields