Data Transformacion Rstudio
Data Transformacion Rstudio
Data Transformacion Rstudio
dplyr
dplyr functions work with pipes and expect tidy data. In tidy data:
Manipulate Cases Manipulate Variables
A B C A B C
& EXTRACT CASES EXTRACT VARIABLES
pipes
Row functions return a subset of rows as a new table. Column functions return a set of columns as a new vector or table.
Each variable is in Each observation, or x %>% f(y)
its own column case, is in its own row becomes f(x, y) pull(.data, var = -1) Extract column values as
filter(.data, …) Extract rows that meet logical
a vector. Choose by name or index.
Summarise Cases
w
www
ww criteria. filter(iris, Sepal.Length > 7)
w
www pull(iris, Sepal.Length)
weight = NULL, .env = parent.frame()) Randomly Use these helpers with select (),
summary function select fraction of rows.
e.g. select(iris, starts_with("Sepal"))
summarise(.data, …)
Compute table of summaries.
w
www
ww sample_frac(iris, 0.5, replace = TRUE)
sample_n(tbl, size, replace = FALSE, weight =
contains(match) num_range(prefix, range) :, e.g. mpg:cyl
ends_with(match) one_of(…) -, e.g, -Species
w
ww summarise(mtcars, avg = mean(mpg))
Count number of rows in each group defined slice(.data, …) Select rows by position. MAKE NEW VARIABLES
slice(iris, 10:15)
by the variables in … Also tally().
w
ww count(iris, Species)
w
www
ww top_n(x, n, wt) Select and order top n entries (by
group if grouped data). top_n(iris, 5, Sepal.Width)
These apply vectorized functions to columns. Vectorized funs take
vectors as input and return vectors of the same length as output
(see back).
VARIATIONS vectorized function
summarise_all() - Apply funs to every column.
summarise_at() - Apply funs to specific columns. mutate(.data, …)
summarise_if() - Apply funs to all cols of one type. Logical and boolean operators to use with filter() Compute new column(s).
<
>
<=
>=
is.na()
!is.na()
%in%
!
|
&
xor() w
wwww
w mutate(mtcars, gpm = 1/mpg)
transmute(.data, …)
Group Cases See ?base::Logic and ?Comparison for help. Compute new column(s), drop others.
Use group_by() to create a "grouped" copy of a table.
dplyr functions will manipulate each "group" separately and
w
ww transmute(mtcars, gpm = 1/mpg)
mtcars %>%
arrange(.data, …) Order rows by values of a
column or columns (low to high), use with
w
www mutate_all(faithful, funs(log(.), log2(.)))
mutate_if(iris, is.numeric, funs(log(.)))
w
www
ww group_by(cyl) %>% w
www
ww desc() to order from high to low.
arrange(mtcars, mpg) mutate_at(.tbl, .cols, .funs, …) Apply funs to
A B C
vectorized function summary function Use bind_cols() to paste tables beside each
other as they are. + y
C v 3
d w 4
dplyr::near() - safe == for floating point sd() - standard deviation EXTRACT ROWS
numbers var() - variance x y
MISC A B.x C B.y D Use by = c("col1", "col2", …) to A B C A B D
RStudio® is a trademark of RStudio, Inc. • CC BY SA RStudio • [email protected] • 844-448-1212 • rstudio.com • Learn more with browseVignettes(package = c("dplyr", "tibble")) • dplyr 0.7.0 • tibble 1.2.0 • Updated: 2019-08