EssayGhost Assignment代写,Essay代写,网课代修,Quiz代考

EssayGhost-Essay代写,作业代写,网课代修代上,cs代写代考

一站式网课代修,论文代写

高质量的Assignment代写、Paper代写、Report代写服务

EG1hao
网课代修代上,cs代写代考
R语言代写
您的位置: 主页 > 编程案例 > R语言代写 >
lab | R语言代写 - 632 HW - java代写, python代写, R语言代做,R代做
发布时间:2021-08-03 23:33:27浏览次数:
lab | R语言代写 | 统计代写   这个项目是lab代写的代写题目,关于R语言Qianwen Shi2019/4/library (ISLR) library (tree) library (rpart.plot)Warning: package rpart.plot was built under R version 3.5.Loading required package: rpartlibrary (rpart) library (caret)Warning: package caret was built under R version 3.5.Loading required package: latticeLoading required package: ggplotlibrary (randomForest)randomForest 4.6-Type rfNews() to see new features/changes/bug fixes.Attaching package: randomForest The following object is masked from package:ggplot2 :margina.set.seed (1) train = sample (1 :nrow (Carseats), nrow (Carseats) / 2) training = Carseats[train, ] testing = Carseats[ train,]b.reg_tree = tree (Sales ~ .,data = Carseats, subset=train) summary (reg_tree)Regression tree:tree(formula = Sales ~ ., data = Carseats, subset = train)Variables actually used in tree construction:[1] ShelveLoc Price Age Advertising Income [6] CompPrice Number of terminal nodes: 18Residual mean deviance: 2.36 = 429.5 / 182Distribution of residuals:Min. 1st Qu. Median Mean 3rd Qu. Max.-4.2570 -1.0360 0.1024 0.0000 0.9301 3.reg_model - rpart (Sales ~ .,data = training, method = anova ) rpart.plot (reg_model)ShelveLoc = Bad,MediumPrice = 121Age = 67CompPrice 148Advertising 11Age = 51Price = 92ShelveLoc = BadPrice = 105ShelveLoc = BadPrice = 113US = Noyes noyhat = predict (reg_tree,newdata = testing) mean ((yhat testing $ Sales) ^ 2)[1] 4.As we can see in the plot, as the price higher than 121, the rate is 82%. In this percentage age older than 67 is 30%, age older than 51 is 53%, then for the prcie higher than 113 is 18%. And the test MSE is 4.148897.c.trControl - trainControl (method = cv , number = 10) crossvali_model - train (Sales ~ ., data = training, method = rpart , trControl = trControl, tuneGrid = expand.grid (cp = seq (0, 0.4, length.out = 30)))Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info =trainInfo, : There were missing values in resampled performance measures.plot.train (crossvali_model)Complexity ParameterRMSE (CrossValidation)0.0 0.1 0.2 0.3 0.New_tree = prune.tree (reg_tree, best = 8) rpart.plot (crossvali_model $ finalModel)ShelveLocGood = 0Price = 121Age = 67 Age = 51Price = 92Price = 113yes noyhat= predict (New_tree, newdata= testing) mean ((yhat testing $ Sales) ^ 2)## [1] 5.As we can in the result, the new tree increse the test MSE to 5.09085.d.set.seed (1) bagging = randomForest (Sales ~ ., data = training, mtry = 10, importance = TRUE) yhat_bagging = predict (bagging, newdata = testing) mean ((yhat_bagging testing $ Sales) ^ 2)[1] 2.importance (bagging)%IncMSE IncNodePurityCompPrice 16.4714051 126.Income 4.0561872 78.Advertising 16.2730251 122.Population 0.7711188 62.Price 54.5571815 512.ShelveLoc 42.4486118 320.Age 20.5369414 184.Education 2.7755968 42.Urban -2.3962157 8.US 7.2258536 17.varImpPlot (bagging)UrbanPopulationEducationIncomeAdvertisingCompPriceShelveLocPrice0 10 30 50%IncMSEUrbanEducationPopulationIncomeAdvertisingCompPriceShelveLocPrice0 100 300 500IncNodePuritybaggingAs we can see in the plot, the most important variables are the Price and the quality for car seats in each site. And the test MSE with bagging approach regression is 2.614642.e.set.seed (1) random_forst = randomForest (Sales ~ ., data = training, mtry = 3, importance = TRUE) yhat_random = predict (random_forst, newdata = testing) mean ((yhat_random testing $ Sales) ^ 2)[1] 3.As we can see the test MSE is 3.237463, which is bigger than bagging test MSE. Thus the random forests cannot improve this case.a.set.seed (1) train = sample ( dim (OJ)[1],800) training_OJ = OJ[train,] testing_OJ = OJ[ train,]b.OJ_tree = tree (Purchase ~ ., data = training_OJ) summary (OJ_tree)Classification tree:tree(formula = Purchase ~ ., data = training_OJ)Variables actually used in tree construction:[1] LoyalCH PriceDiff SpecialCH ListPriceDiff Number of terminal nodes: 8Residual mean deviance: 0.7305 = 578.6 / 792Misclassification error rate: 0.165 = 132 / 800As we can see the training error rate is 0.165 and there are 800 terminal nodes.c.OJ_treenode), split, n, deviance, yval, (yprob)* denotes terminal node1) root 800 1064.00 CH ( 0.61750 0.38250 )2) LoyalCH 0.508643 350 409.30 MM ( 0.27143 0.72857 )4) LoyalCH 0.264232 166 122.10 MM ( 0.12048 0.87952 )8) LoyalCH 0.0356415 57 10.07 MM ( 0.01754 0.98246 ) *9) LoyalCH 0.0356415 109 100.90 MM ( 0.17431 0.82569 ) *5) LoyalCH 0.264232 184 248.80 MM ( 0.40761 0.59239 )10) PriceDiff 0.195 83 91.66 MM ( 0.24096 0.75904 )20) SpecialCH 0.5 70 60.89 MM ( 0.15714 0.84286 ) *21) SpecialCH 0.5 13 16.05 CH ( 0.69231 0.30769 ) *11) PriceDiff 0.195 101 139.20 CH ( 0.54455 0.45545 ) *3) LoyalCH 0.508643 450 318.10 CH ( 0.88667 0.11333 )6) LoyalCH 0.764572 172 188.90 CH ( 0.76163 0.23837 )12) ListPriceDiff 0.235 70 95.61 CH ( 0.57143 0.42857 ) *13) ListPriceDiff 0.235 102 69.76 CH ( 0.89216 0.10784 ) *## 7) LoyalCH 0.764572 278 86.14 CH ( 0.96403 0.03597 ) *When we find the terminal node because of the asterisk. For example, the label9 of the split criterion what is LoyalCh 0.0356415, which means the branch is 109 of the dviance of 100.90. As less than 17.4% in the branch take the value of CH and the remaining 82.6% is take the value of MM. d. rpart_modelOJ - rpart (Purchase ~ ., data = training_OJ, method = class , control = rpart.control (cp = 0)) rpart.plot (rpart_modelOJ)LoyalCH = 0.LoyalCH = 0.ListPriceDiff = 0.StoreID = 3LoyalCH = 0.SalePriceMM = 1.ListPriceDiff = 0.WeekofPurchase 229STORE = 4LoyalCH = 0.PriceDiff = 0.ListPriceDiff = 0.LoyalCH 0.LoyalCH = 0.STORE = 4PriceMM 2.StoreID = 2WeekofPurchase = 247SpecialCH = 1LoyalCH = 0.ListPriceDiff 0.DiscMM = 0.0.38CH0.11CH0.04CH0.24CHCH0.1113%0.04CH0.24CH0.15CHMM0.CH0.439%0.37CH0.20CH0.45CH0.14CH0.52MM0.31CH0.65MM0.80MM0.73MM0.59MMCH0.4512%0.29CH0.12CH0.35CH0.23CH0.57MM0.51MM0.25CH0.55MM0.48CHCH0.414%0.32CH0.53MM0.67MMMM0.MM0.7610%0.31CHMM0.0.88MMMM0.8214%MM0.0.44CHMM0.MM0.MM0.yes noAs we can see the above plot, as the information about that a particular customer that the plot to predictwhich brand will buy the orange juice.Prediction_tree = predict (rpart_modelOJ, newdata = testing_OJ, type = "class")table (Prediction_tree, testing_OJ $ Purchase)#### Prediction_tree CH MM## CH 132 25## MM 27 86(132 + 86) / 270## [1] 0.Thus the test observation are 81% are correctly and the test error rate is 19%.CV_OJ = cv.tree (OJ_tree, FUN = prune.misclass)CV_OJ## $size## [1] 8 5 2 1[1] 146 146 160 306[1] -Inf 0.000000 4.666667 160.$method[1] misclass attr(, class )[1] prune tree.sequence g.tree_size - CV_OJ $ size deviance - CV_OJ $ dev plot (tree_size, deviance / nrow (training_OJ), type = b , x lab = tree_size , ylab = deviance )1 2 3 4 5 6 7 8tree_sizedevianceAs we can see in the plot, the lowest classification error rate is tree size 5.h.pruned_tree - rpart (Purchase ~ ., data = training_OJ, method = class , control = rpart.control (cp = 0.01)) rpart.plot (pruned_tree)LoyalCH = 0.LoyalCH = 0.PriceDiff = 0.SpecialCH = 1CHCHMMMMCHMMCHMMMMyes noi.Pred_tree = predict (pruned_tree, newdata = testing_OJ, type = class ) table (Pred_tree, testing_OJ $ Purchase)Pred_tree CH MMCH 147 47MM 12 64(147 + 64) / 270[1] 0.Thus we can find the test error in part.d. is 0.8074074 and the part.i. is 0.7814815. Which means the pruned trees is better than the unpruned trees.