My Coding >
Software >
R >
Regression analysis >
Logistic regression analysis with R
Logistic regression analysis with RLogistic regression is an analysis which allow to predict discrete value on the basis os statistical analysis of the given distribution. Now we will make simple analysis on R with build in dataset mtcars I will not give much details about cleaning and studying this data set. Some useful information about preliminary data analysis you can find in the beginning of linear regression example
Creating model for Logistic regression analysismtcars contains column vs which represents engine type 0 – V shaped and `- straight. We will try to understand, can we work out engine type in dependants of car weight(wt), engine power (hp) and engine displacement (disp) if any. This dataset have row names representing the car model. So we will convert name of the rows into another column and also remove data out of our interest. Because our result can have only two values 0 and 1 we need to apply logistic regression analysis Before we start, we need ot load libraries dplyr and tibble
As for now we have all data selected and named. In total it is 32 lines in dataset (not much) and we will create our model (train it) on the set of 26 cars randomly selected. And the rest data (6 observations) will be used for test set of our model
Please note, that every time I’ve run this script, the random selection will be different. To avoid it for test purposes only, you can fix the start point for random selection by using command set.seed(any number) We also use generalized linear model function glm() with binomial logistic regression. Prediction on the basis of our modelSo now, when our model is created and trained, we can calculate and predict on its basis. How to do it. Function predict() will do this prediction, but the result will not be discrete, but kind of function and we need to understand, how to split the results into two groups.
We can use 0 as a separating point for this prediction, or, of the two types of data are equally distributed, we can use mean value (-3.186593), or, I thin, it is better to clusterings these 2 groups into two sets
And now we are checking our test set for predictions
As it is possible to see, we are predicting 5 points in any model, and only one questional point for Hornet 4 drive is playing around the border. This can happen because of the small training data set, or some unusual structural differences in the engine of this car. ar. Problem with this datasetThe detailed analysis of the dataset given reveals, that one of the cars in the training set is standing out of this set:
This Porsche 914-2 equipped with opposite engine, which is not straight and not V-type. This fact can disturb our training system
|
Last 10 artitles
9 popular artitles
|
|
© 2020 MyCoding.uk -My blog about coding and further learning. This blog was writen with pure Perl and front-end output was performed with TemplateToolkit. |