# Conduct an exploratory data analysis of the weath

## Conduct an exploratory data analysis of the weatherAUS.csv data set using RapidMiner to understand the characteristics of each variable and the relationship of each variable to the other variables in the data set.

The primary theme of the paper is Conduct an exploratory data analysis of the weatherAUS.csv data set using RapidMiner to understand the characteristics of each variable and the relationship of each variable to the other variables in the data set. in which you are required to emphasize its aspects in detail. The cost of the paper starts from \$115 and it has been purchased and rated 4.9 points on the scale of 5 points by the students. To gain deeper insights into the paper and achieve fresh information, kindly contact our support.

Task 1.1 Conduct an exploratory data analysis of the weatherAUS.csv data set using RapidMiner to understand the characteristics of each variable and the relationship of each variable to the other variables in the data set. Summarise the findings of your exploratory data analysis in terms of describing key characteristics of each of the variables in the weather data set such as maximum, minimum values, average, standard deviation, most frequent values (mode), missing values and invalid values etc and relation ships with other variables if relevant in a table named Task 1.1 Results of Exploratory Data

Analysis for weatherAUS Data Set.

HintStatistics Tab and Chart Tab in RapidMiner provide a lot of descriptive statistical information and useful charts like Barcharts, Scatterplots etc. You might also like to look at running some correlations and chi square tests. Indicate in Task 1.1 Table which variables you consider to be the key variables which contribute most to determining whether it is likely to rain tomorrow.

Briefly discuss the key results of your exploratory data analysis and the justification for selecting your five top variables for predicting whether it is likely to rain tomorrow based on today’s weather conditions. (About 250 words)

Task 1.2 Build a Decision Tree model for predicting whether it is likely to rain tomorrow based on today’s weather conditions using RapidMiner and an appropriate set of data mining operators and a reduced weatherAUS.csv data set determined by your exploratory data analysis in Task 1.1.  Provide these outputs from RapidMiner (1) Final Decision Tree Model process, (2) Final Decision Tree diagram, and (3) associated decision tree rules.

Briefly explain your final Decision Tree Model Process, and discuss the results of the Final Decision Tree Model drawing on the key outputs (Decision Tree Diagram, Decision Tree Rules) for predicting whether it is likely to rain tomorrow based on today’s weather conditions and relevant supporting literature on the interpretation of dec ision trees.

Task 1.3 Build a Logistic Regression model for predicting whether it is likely to rain tomorrow based on today’s weather conditions using RapidMiner and an appropriate set of data mining operators and a reduced weatherAUS.csv data set determined by your exploratory data analysis in Task 1.1.  Provide these outputs from RapidMiner (1) Final Logistic Regression Model process and (2) Coefficients, and (3) Odds Ratios. Hint you will need to install the Weka Extension in RapidMiner, use W-Logistic Regression Operator for this Task 1.3 and you may need to change data types of some variables.

Briefly explain your final Logistic Regression Model Process, and discuss the results of the Final Logistic Regression Model drawing on the key outputs (Coefficients, Odds Ratios) for predicting whether it is likely to rain tomorrow based on today’s weather conditions and relevant supporting literature on the interpretation of logistic regression models (About 250 words).

Task 1.4 You will need to validate your Final Decision Tree Model and Final Logistic Regression Model. Note you will need to use the X-Validation Operator; Apply Model Operator and Performance Operator in your data mining process models here.

Discuss and compare the accuracy of your Final Decision Tree Model with the Final Logistic Regression Model for whether it is likely to rain tomorrow based on today’s weather conditions based the results of the confusion matrix, and ROC chart for each final model. You should use a table here to compare the key results of the confusion matrix for the Final Decision Tree Model and Final Logistic Regression Model (About 250 words).

Notthe important outputs from your data mining analyses conducted in RapidMiner for Task 1 should be included in your Assignment 3 report to provide support for your conclusions reached regarding each analysis conducted for Task 1.1, Task 1.2, Task 1.3 and

Task 1.4. Note you can export the important outputs from RapidMiner as jpg image files and include these screenshots in the relevant Task 1 parts of your Assignment 3 Report.

Note you will find the North Text book a useful reference for the data mining process activities conducted in Task 1 in relation to the exploratory data analysis, decision tree analysis, logistic regression analysis and evaluation of the accuracy of the Final Decision Tree model and the Final Logistic Regression model.