Most data operations are done on groups defined by variables.
group_by()
takes an existing tbl and converts it into a grouped tbl
where operations are performed "by group". ungroup()
removes grouping.
A grouped data frame with class grouped_df
,
unless the combination of ...
and add
yields a empty set of
grouping columns, in which case a tibble will be returned.
These function are generics, which means that packages can provide implementations (methods) for other classes. See the documentation of individual methods for extra arguments and differences in behaviour.
Methods available in currently loaded packages:
Currently, group_by()
internally orders the groups in ascending order. This
results in ordered output from functions that aggregate groups, such as
summarise()
.
When used as grouping columns, character vectors are ordered in the C locale
for performance and reproducibility across R sessions. If the resulting
ordering of your grouped operation matters and is dependent on the locale,
you should follow up the grouped operation with an explicit call to
arrange()
and set the .locale
argument. For example:
This is often useful as a preliminary step before generating content intended for humans, such as an HTML table.
Prior to dplyr 1.1.0, character vector grouping columns were ordered in the
system locale. If you need to temporarily revert to this behavior, you can
set the global option dplyr.legacy_locale
to TRUE
, but this should be
used sparingly and you should expect this option to be removed in a future
version of dplyr. It is better to update existing code to explicitly call
arrange(.locale = )
instead. Note that setting dplyr.legacy_locale
will
also force calls to arrange()
to use the system locale.
Other grouping functions:
group_map()
,
group_nest()
,
group_split()
,
group_trim()
example(read10xVisium)
#>
#> rd10xV> dir <- system.file(
#> rd10xV+ file.path("extdata", "10xVisium"),
#> rd10xV+ package = "SpatialExperiment")
#>
#> rd10xV> sample_ids <- c("section1", "section2")
#>
#> rd10xV> samples <- file.path(dir, sample_ids, "outs")
#>
#> rd10xV> list.files(samples[1])
#> [1] "raw_feature_bc_matrix" "spatial"
#>
#> rd10xV> list.files(file.path(samples[1], "spatial"))
#> [1] "scalefactors_json.json" "tissue_lowres_image.png"
#> [3] "tissue_positions_list.csv"
#>
#> rd10xV> file.path(samples[1], "raw_feature_bc_matrix")
#> [1] "/__w/_temp/Library/SpatialExperiment/extdata/10xVisium/section1/outs/raw_feature_bc_matrix"
#>
#> rd10xV> (spe <- read10xVisium(samples, sample_ids,
#> rd10xV+ type = "sparse", data = "raw",
#> rd10xV+ images = "lowres", load = FALSE))
#> # A SpatialExperiment-tibble abstraction: 99 × 7
#> # Features = 50 | Cells = 99 | Assays = counts
#> .cell in_tissue array_row array_col sample_id pxl_col_in_fullres
#> <chr> <lgl> <int> <int> <chr> <int>
#> 1 AAACAACGAATAGTTC-1 FALSE 0 16 section1 2312
#> 2 AAACAAGTATCTCCCA-1 TRUE 50 102 section1 8230
#> 3 AAACAATCTACTAGCA-1 TRUE 3 43 section1 4170
#> 4 AAACACCAATAACTGC-1 TRUE 59 19 section1 2519
#> 5 AAACAGAGCGACTCCT-1 TRUE 14 94 section1 7679
#> 6 AAACAGCTTTCAGAAG-1 FALSE 43 9 section1 1831
#> 7 AAACAGGGTCTATATT-1 FALSE 47 13 section1 2106
#> 8 AAACAGTGTTCCTGGG-1 FALSE 73 43 section1 4170
#> 9 AAACATGGTGAGAGGA-1 FALSE 62 0 section1 1212
#> 10 AAACATTTCCCGGATT-1 FALSE 61 97 section1 7886
#> # ℹ 89 more rows
#> # ℹ 1 more variable: pxl_row_in_fullres <int>
#>
#> rd10xV> # base directory 'outs/' from Space Ranger can also be omitted
#> rd10xV> samples2 <- file.path(dir, sample_ids)
#>
#> rd10xV> (spe2 <- read10xVisium(samples2, sample_ids,
#> rd10xV+ type = "sparse", data = "raw",
#> rd10xV+ images = "lowres", load = FALSE))
#> # A SpatialExperiment-tibble abstraction: 99 × 7
#> # Features = 50 | Cells = 99 | Assays = counts
#> .cell in_tissue array_row array_col sample_id pxl_col_in_fullres
#> <chr> <lgl> <int> <int> <chr> <int>
#> 1 AAACAACGAATAGTTC-1 FALSE 0 16 section1 2312
#> 2 AAACAAGTATCTCCCA-1 TRUE 50 102 section1 8230
#> 3 AAACAATCTACTAGCA-1 TRUE 3 43 section1 4170
#> 4 AAACACCAATAACTGC-1 TRUE 59 19 section1 2519
#> 5 AAACAGAGCGACTCCT-1 TRUE 14 94 section1 7679
#> 6 AAACAGCTTTCAGAAG-1 FALSE 43 9 section1 1831
#> 7 AAACAGGGTCTATATT-1 FALSE 47 13 section1 2106
#> 8 AAACAGTGTTCCTGGG-1 FALSE 73 43 section1 4170
#> 9 AAACATGGTGAGAGGA-1 FALSE 62 0 section1 1212
#> 10 AAACATTTCCCGGATT-1 FALSE 61 97 section1 7886
#> # ℹ 89 more rows
#> # ℹ 1 more variable: pxl_row_in_fullres <int>
#>
#> rd10xV> # tabulate number of spots mapped to tissue
#> rd10xV> cd <- colData(spe)
#>
#> rd10xV> table(
#> rd10xV+ in_tissue = cd$in_tissue,
#> rd10xV+ sample_id = cd$sample_id)
#> sample_id
#> in_tissue section1 section2
#> FALSE 28 27
#> TRUE 22 22
#>
#> rd10xV> # view available images
#> rd10xV> imgData(spe)
#> DataFrame with 2 rows and 4 columns
#> sample_id image_id data scaleFactor
#> <character> <character> <list> <numeric>
#> 1 section1 lowres #### 0.0510334
#> 2 section2 lowres #### 0.0510334
spe |>
group_by(sample_id)
#> tidySingleCellExperiment says: A data frame is returned for independent data analysis.
#> # A tibble: 99 × 7
#> # Groups: sample_id [2]
#> .cell in_tissue array_row array_col sample_id pxl_col_in_fullres
#> <chr> <lgl> <int> <int> <chr> <int>
#> 1 AAACAACGAATAGTTC-1 FALSE 0 16 section1 2312
#> 2 AAACAAGTATCTCCCA-1 TRUE 50 102 section1 8230
#> 3 AAACAATCTACTAGCA-1 TRUE 3 43 section1 4170
#> 4 AAACACCAATAACTGC-1 TRUE 59 19 section1 2519
#> 5 AAACAGAGCGACTCCT-1 TRUE 14 94 section1 7679
#> 6 AAACAGCTTTCAGAAG-1 FALSE 43 9 section1 1831
#> 7 AAACAGGGTCTATATT-1 FALSE 47 13 section1 2106
#> 8 AAACAGTGTTCCTGGG-1 FALSE 73 43 section1 4170
#> 9 AAACATGGTGAGAGGA-1 FALSE 62 0 section1 1212
#> 10 AAACATTTCCCGGATT-1 FALSE 61 97 section1 7886
#> # ℹ 89 more rows
#> # ℹ 1 more variable: pxl_row_in_fullres <int>