Title: | Simple but Efficient Rowwise Jobs |
---|---|
Description: | Creating efficiently new column(s) in a data frame (including tibble) by applying a function one row at a time. |
Authors: | Alexandre Courtiol [aut, cre] , Romain François [aut] |
Maintainer: | Alexandre Courtiol <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.1.3 |
Built: | 2024-10-25 03:00:20 UTC |
Source: | https://github.com/courtiol/lay |
Datasets containing information about the use of pain relievers for non medical purpose.
A tibble with either 100 or 55271 rows, and 8 variables:
The identifier code of the respondent
Ever use hydrocodone nonmedically?
Ever use ever percocet, percodan, tylox, oxycontin... nonmedically?
Ever used codeine nonmedically?
Ever used tramadol nonmedically?
Ever used morphine nonmedically?
Ever used methadone nonmedically?
Ever used vicodin, lortab or lorcert nonmedically?
These datasets are a small subset from the "National Survey on Drug Use and Health, 2014".
All variables related to drug use have been recoded into vectors of integers talking value 0 for
"No/Unknown" and value 1 for "Yes". The original variable names were the same as those defined
here but in upper case and ending with the number 2. The dataset called drugs
contain the first
100 rows of the one called drugs_full
.
https://www.icpsr.umich.edu/web/NAHDAP/studies/36361
United States Department of Health and Human Services. Substance Abuse and Mental Health Services Administration. Center for Behavioral Health Statistics and Quality. National Survey on Drug Use and Health, 2014. Ann Arbor, MI: Inter-university Consortium for Political and Social Research (distributor), 2016-03-22. doi:10.3886/ICPSR36361.v1
drugs drugs_full
drugs drugs_full
Create efficiently new column(s) in data frame (including tibble) by applying a function one row at a time.
lay(.data, .fn, ..., .method = c("apply", "tidy"))
lay(.data, .fn, ..., .method = c("apply", "tidy"))
.data |
A data frame or tibble (or other data frame extensions). |
.fn |
A function to apply to each row of
|
... |
Additional arguments for the function calls in |
.method |
This is an experimental argument that allows you to control which internal method is used to apply the rowwise job:
The default has been chosen based on these benchmarks. |
lay()
create a vector or a data frame (or tibble), by considering in turns each row of a data
frame (.data
) as the vector input of some function(s) .fn
.
This makes the creation of new columns based on a rowwise operation both simple (see Examples; below) and efficient (see the Article benchmarks).
The function should be fully compatible with {dplyr}
-based workflows and follows a syntax close
to dplyr::across()
.
Yet, it takes .data
instead of .cols
as a main argument, which makes it possible to also use
lay()
outside dplyr
verbs (see Examples).
The function lay()
should work in a wide range of situations, provided that:
The input .data
should be a data frame (including tibble) with columns of same class, or of
classes similar enough to be easily coerced into a single class. Note that .method = "apply"
also allows for the input to be a matrix and is more permissive in terms of data coercion.
The output of .fn
should be a scalar (i.e., vector of length 1) or a 1 row data frame (or
tibble).
If you use lay()
within dplyr::mutate()
, make sure that the data used by dplyr::mutate()
contain no row-grouping, i.e., what is passed to .data
in dplyr::mutate()
should not be of
class grouped_df
or rowwise_df
. If it is, lay()
will be called multiple times, which will
slow down the computation despite not influencing the output.
A vector with one element per row of .data
, or a data frame (or tibble) with one row per row of .data
. The class of the output is determined by .fn
.
# usage without dplyr ------------------------------------------------------------------------- # lay can return a vector lay(drugs[1:10, -1], any) # lay can return a data frame ## using the shorthand function syntax \(x) .fn(x) lay(drugs[1:10, -1], \(x) data.frame(drugs_taken = sum(x), drugs_not_taken = sum(x == 0))) ## using the rlang lambda syntax ~ fn(.x) lay(drugs[1:10, -1], ~ data.frame(drugs_taken = sum(.x), drugs_not_taken = sum(.x == 0))) # lay can be used to augment a data frame cbind(drugs[1:10, ], lay(drugs[1:10, -1], ~ data.frame(drugs_taken = sum(.x), drugs_not_taken = sum(.x == 0)))) # usage with dplyr ---------------------------------------------------------------------------- if (require("dplyr")) { # apply any() to each row drugs |> mutate(everused = lay(pick(-caseid), any)) # apply any() to each row using all columns drugs |> select(-caseid) |> mutate(everused = lay(pick(everything()), any)) # a workaround would be to use `rowSums` drugs |> mutate(everused = rowSums(pick(-caseid)) > 0) # but we can lay any function taking a vector as input, e.g. median drugs |> mutate(used_median = lay(pick(-caseid), median)) # you can pass arguments to the function drugs_with_NA <- drugs drugs_with_NA[1, 2] <- NA drugs_with_NA |> mutate(everused = lay(pick(-caseid), any)) drugs_with_NA |> mutate(everused = lay(pick(-caseid), any, na.rm = TRUE)) # you can lay the output into a 1-row tibble (or data.frame) # if you want to apply multiple functions drugs |> mutate(lay(pick(-caseid), ~ tibble(drugs_taken = sum(.x), drugs_not_taken = sum(.x == 0)))) # note that naming the output prevent the automatic splicing and you obtain a df-column drugs |> mutate(usage = lay(pick(-caseid), ~ tibble(drugs_taken = sum(.x), drugs_not_taken = sum(.x == 0)))) # if your function returns a vector longer than a scalar, you should turn the output # into a tibble, which is the job of as_tibble_row() drugs |> mutate(lay(pick(-caseid), ~ as_tibble_row(quantile(.x)))) # note that you could also wrap the output in a list and name it to obtain a list-column drugs |> mutate(usage_quantiles = lay(pick(-caseid), ~ list(quantile(.x)))) }
# usage without dplyr ------------------------------------------------------------------------- # lay can return a vector lay(drugs[1:10, -1], any) # lay can return a data frame ## using the shorthand function syntax \(x) .fn(x) lay(drugs[1:10, -1], \(x) data.frame(drugs_taken = sum(x), drugs_not_taken = sum(x == 0))) ## using the rlang lambda syntax ~ fn(.x) lay(drugs[1:10, -1], ~ data.frame(drugs_taken = sum(.x), drugs_not_taken = sum(.x == 0))) # lay can be used to augment a data frame cbind(drugs[1:10, ], lay(drugs[1:10, -1], ~ data.frame(drugs_taken = sum(.x), drugs_not_taken = sum(.x == 0)))) # usage with dplyr ---------------------------------------------------------------------------- if (require("dplyr")) { # apply any() to each row drugs |> mutate(everused = lay(pick(-caseid), any)) # apply any() to each row using all columns drugs |> select(-caseid) |> mutate(everused = lay(pick(everything()), any)) # a workaround would be to use `rowSums` drugs |> mutate(everused = rowSums(pick(-caseid)) > 0) # but we can lay any function taking a vector as input, e.g. median drugs |> mutate(used_median = lay(pick(-caseid), median)) # you can pass arguments to the function drugs_with_NA <- drugs drugs_with_NA[1, 2] <- NA drugs_with_NA |> mutate(everused = lay(pick(-caseid), any)) drugs_with_NA |> mutate(everused = lay(pick(-caseid), any, na.rm = TRUE)) # you can lay the output into a 1-row tibble (or data.frame) # if you want to apply multiple functions drugs |> mutate(lay(pick(-caseid), ~ tibble(drugs_taken = sum(.x), drugs_not_taken = sum(.x == 0)))) # note that naming the output prevent the automatic splicing and you obtain a df-column drugs |> mutate(usage = lay(pick(-caseid), ~ tibble(drugs_taken = sum(.x), drugs_not_taken = sum(.x == 0)))) # if your function returns a vector longer than a scalar, you should turn the output # into a tibble, which is the job of as_tibble_row() drugs |> mutate(lay(pick(-caseid), ~ as_tibble_row(quantile(.x)))) # note that you could also wrap the output in a list and name it to obtain a list-column drugs |> mutate(usage_quantiles = lay(pick(-caseid), ~ list(quantile(.x)))) }