IN3011 Data Mining Coursework

Option 1 – Using Your Own Dataset


  1. Choose a data set
  2. Identify a data mining problem that you might be able to solve.
  3. Identify a technique that might be useful for mining the data.
  4. Clean up and prepare your data for analysis.
  5. Run the technique on your data.
  6. Interpret your results and if necessary change steps 3-5 above.
  7. Draw your conclusions from your data analysis.

What is Required/Marking Criteria


  1. Description of Problem


This can be broken down into two parts:

1)      What is the data set? Where did it come from (reference), what else do you know about it? (Background reading).

2)      What do you propose to do with it? Is it a matter of classification, something more exploratory?, e.g. clustering



Please note that your problem does not have to be highly original. For example you could choose to analyse hurricane data as a classification problem. The classification system is well known, but what matters is how you go about it, not what you find out.



  1. Analysis of the Data


1)      What do you propose to do with the data?

2)      What technique do you propose to use and why?

3)      How do you propose carrying out your analysis?



  1. Interpretation of Results


OK, your data mining software has spat out the results, but what do they mean?


Option 2


Download the dataset dataset2.sav. The dataset contains (fictionalised) first year students along with their first year performance. Your brief is to carry out an analysis of entry qualifications in order to advise the admissions tutor how to achieve an intake of 200 students who are likely to achieve a 90% pass rate in year 1. You will need to produce a report with accompanying analysis to support your findings.


