论文题目:Data Mining Analysis
论文语种:英文
您的研究方向:管理科学
是否有数据处理要求:是
您的国家:英国
您的学校背景:英国名校排名10
要求字数:35页
论文用途:本科课程论文 BA Assignment
是否需要盲审(博士或硕士生有这个需要):
补充要求和说明:需要根据SAS Enterprise Miner 分析HEMQ 数据,写一篇相关报告
Hi, I need to finish a coursework which is about Data Mining Analysis. I was wondering if there is a writer in our company can use the software called "SAS 9.1.3", as the coursework is required to analyse one of the data set by the software.
Also, this coursework would be about 35 pages. Is there any problem to do with it since your max page is 20???
Please reply me asap, thanks.
The topic is shown below:
By using SAS software, develop the most suitable predictions for the test dataset of the HMEQ dataset and prepare a technical report documenting your modelling process and results. Use your knowledge on all aspects of http://www.ukassignment.org/ model building, including earlier phases of data sampling, partitioning, transformation, replacement, and the different algorithms. Discuss and justify your choices of preprocessing and methods in a report (see instructions at the end). The report should be no longer than 30 pages (excl. Appendices). This report should include graphs and appropriate analysis. (Create random seed: 47701)
Some ideas / questions which you may want to address:
--Does the dataset contain outliers, missing values etc? How were those issues addressed in data preprocessing? You should evaluate at least two different candidate sets of relevant variables (argue why these could be relevant) and also evaluate at least two different pre-processings, transformations and replacement schemes of these variables. Use the findings from the data analysis to evaluate different pre-processing and transformation candidates. What are the most important variables / the ones with the highest discriminatory power (you may need to run models to determine this)?What is the impact of those data preprocessing choices on the performance different algorithms? Consider that documenting those pre-processings that do not have a significant impact on the final accuracy may also be of interest.
--What is the best method to predict this dataset? Are particular methods more or less suitable for solving this task? What could serve as a simple baseline solution? You are expected to build and evaluate at least four different candidate models (e.g. Different types of decision trees, logistic regression, neural networks or just 4 types of neural nets) and justify your choice of algorithm parameters. What is the sensitivity of the algorithms to their options in setting them up?#p#分页标题#e#
--Interpret how well your models are performing. What is a suitable benchmark? What is an appropriate performance metric / measure? Classification rate, lift, ROC, costs...? Use at least two metrics and justify your choice. Carefully choose, describe and justify on what data partition you build, validate and evaluate your models. Provide evidence on errors on all data partitions!
--Consider whether this problem represents a balanced or imbalanced classification problem. Are the relevant classes balanced? Evaluate two different sampling strategies. Also, are the costs of misclassifying individual instances symmetric or asymmetric? Consider t in setting up your experiments with different target profiles to rebalance asymmetries and /or costs. Assume a cost relationship of “10” for failing to predict a default and “1” for failing to predict an instance to pay back the loan. Document whether and how this improves your results!
--Clustering algorithms of unsupervised learning allow a type of modelling and answers / solutions to question distinctively different to those of classification. Critically discuss the difference of clustering to classification, and where clustering algorithms may be employed within the scope of the aforementioned questions of the coursework on the HMEQ dataset. The essay should be no longer than 1500 words and make adequate use of current academic literature, case study evidence and examples to support your arguments.
|