My Coding >
Software >
R >
R libraries >
R: dplyr library
R: dplyr librarydplyr library is very useful for data manipulation in R In order to use this library, it is necessary to install it once and then load it every time starting R or Rstudio
Basic functions from dplyr in RFor these examples, I will use build-in dataset: airquality and usually I will show only 5 first lines of the data with function head( … , n = 5) filter() - filter original datasetfilter() - allows you to filter data on the basis on some given conditions. For example, lt’s remove all data with Temp less than 80 degrees
mutate() and transmute() - recalculate new dataCalculate new data column on the basis of existing columns and add it to the data set with function mutate(), or make it separate with transmute() In our dataset, all temperatures are given in Fahrenheit. Therefore we will recalculate then into Celsius values.
or, to make an independent datasets
select() - select columns from datasetsIf we need to have only few selected columns in our dataset, then we can use select() function
summarise() - to generate summary from our datasetssummarise() can calculate some summaries from our dataset according to the given parameters. Before doing this, it is necessary to make sure that all absent data were treated accordingly. For example in our example, we will not use data with NA values, by applying na.rm = TRUE conditions Let’s calculate average temperature
And now we will calculate average temperature for every month by grouping data by month. Furthermore, we will round average temperature to one digit
arrange() for arranging, or sorting data by few columnsIt is possible to sort data by few columns with function arrange() In this example we will sort our dataset by Day, and for the same day we will sort data my Month
sample_n() and sample_frac() for random samplingSometimes for some tests, like for selecting independent data, we can randomly sample some chunk of data. We can do it as exact number of lines sample_n() and by selecting of the fraction of our dataset with sample_frac() Select 5 random lines from our datasets
And now let’s select 5% of random data from our set (8 lines in total)
count() for counting dataIt is possible to count any data with basic grouping with count() function. Let’s count, how many data we have for each month
Pipe %>% operations in dplyrPipe operation simplified standard step-by-step operations by removing intermediate datasets. Let’s calculate average temperature for each selected month Traditional way without pipeIn this way we will select required Mont, then we will group our data by this monthand then we will calculate summary
Pipe %>% for this taskIn fact, we do not need all intermediate results and we can omit them by using pipe
|
Last 10 artitles
9 popular artitles
|
|
© 2020 MyCoding.uk -My blog about coding and further learning. This blog was writen with pure Perl and front-end output was performed with TemplateToolkit. |