DATA MINING MODEL WITH RAPIDMINER WEATHER AUS

DATA MINING MODEL WITH RAPIDMINER WEATHER AUS

The primary theme of the paper is DATA MINING MODEL WITH RAPIDMINER WEATHER AUS in which you are required to emphasize its aspects in detail. The cost of the paper starts from $109 and it has been purchased and rated 4.9 points on the scale of 5 points by the students. To gain deeper insights into the paper and achieve fresh information, kindly contact our support.

Task 1 Data mining model with RapidMiner WeatherAUS

Task 1.1 Conduct an exploratory data analysis (EDA)of weatherAUS.csv data set using RapidMiner summarise key findings of EDA in a table and discuss key findings in regards to the weatherAUS.csv data set (about 250 words) Possible Marks 10

Dealing with Missing data NA – need to replace (Replace operator) or declare NA as a missing value (Declare Missing Values Operator)

Certain variables in the WeatherAUS csv data set need to be specified as the correct type:

Date

Text

Real

Integer

Nominal to numeric operator; Nominal to date operator; Nominal to Binominal operator

How are you going to deal with records with missing data – Filter Examples operator

Decision Tree is fairly robust in working with different data sets but Logistic Regression requires the data to be in a particular format and in particular the label variable to be a particular type

Task 1.2 Build a Decision Tree model for predicting whether it is likely to rain tomorrow based on today’s weather using the weatherAUS.csv data set and RapidMiner; provide Final Decision Tree model process, Decision Tree Model and Decision Tree Rules and explain final decision tree model process and discuss results of Final Decision Tree Model (about 250 words)Possible Marks 6

Cross validation or Split validation operator, Decision Tree, Apply Model, Performance (Classification) or Performance (Classification Binominal)

Task 1.3 Task 1.3 Build a Logistic Regression model for predicting whether it is likely to rain tomorrow based on today’s weather using the weatherAUS.csv data set and using RapidMiner; provide Final Logistic Regression model process, and Coefficients and Odds Ratios and explain final logistic regression model process and discuss results of Final /logistic Regression Model (about 250 words)Possible Marks 7

Use Weka Logistic Regression Operator as it provides the odds ratio as well as the coefficients for each predictor variable alternatively you can use the Logistic Regression operator provided in RapidMiner

Cross validation or Split validation operator, Logistic Regression, Apply Model, Performance (Classification) or Performance (Classification Binominal)

Task 1.4 Comment on the accuracy of Final Decision Tree Model and Final Logistic Regression Model for predicting whether it is likely to rain tomorrow based on today’s weather using the weatherAUS.csv data set and RapidMiner based the results of the confusion matrix, and ROC chart for each final model (about 250 words)Possible Marks 7

Suggest using a summary Table here to compare the results of Decision Tree and Logistic Regression models based on Confusion matrix and ROC chart for each model

Task 2 Incorporating Big Data into Data Warehouse Architecture

Chapter 2 and Chapter 6 of Textbook but expect at least 5 more key references to support analysis and discussion here

Task 2.1 Provide a high level data warehouse architecture design for a large state owned water utility that incorporates big data capture, processing, storage and presentation in a diagram called Figure 1.1 Big Data Analytics and Data Warehouse Combined. Possible Marks 8

Front end and back end of this diagram should be contextualised for large state owned water utility

Task 2.2 Describe and justify main components of proposed high level data warehouse architecture design with big data capability incorporated presented in Task 2.1 Figure 1.1 with in-text referencing support (about 750 words) Possible Marks12

Task 2.3 Identify & discuss some key security privacy and ethical concerns for organisations using a big data analytics and algorithmic approach to decision making with appropriate in-text referencing support (about 750 words) Possible Marks 10

Task 3 Los Angeles Police Department Dashboard Tableau

There is a sample data set available on Assignment 3 Discussion Forum Sample of 10% of original data set

Task 3.1 Specific Crimes within each Crime Category for a specific Police Department Area and specific year, provided screenshot of this view in report with description key trends and patterns (about 60 words)Possible Marks 5

Task 3.2 Frequency of Occurrence for a selected crime over 24 hours for a specific Police Department Area, provided screenshot of this view in report with description key trends and patterns (about 60 words)Possible Marks 5

Task 3.3 Frequency of Crimes within each Crime Classification by Police Department Area and by Time, provided screenshot of this view in report with description key trends and patterns (about 60 words)Possible Marks 5

Task 3.4 A Geographical (location) presentation of each Police Department Area for given crime(s) and year, provided screenshot of this view in report with description key trends and patterns (about 60 words)Possible Marks 5

Latitude and Longitude variables need to be numeric and converted to type latitude and longitude

100% Plagiarism Free & Custom Written, Tailored to your instructions