vignettes/overview.Rmd
overview.Rmd
Abstract
A brief overview of the tidySpatialExperiment package - demonstrating the SpatialExperiment-tibble abstraction, compatibility with the tidyverse ecosystem, compatibility with the tidyomics ecosystem and a few helpful utility functions.
tidySpatialExperiment provides a bridge between the SpatialExperiment
package and the tidyverse ecosystem. It
creates an invisible layer that allows you to interact with a
SpatialExperiment
object as if it were a tibble; enabling
the use of functions from dplyr, tidyr, ggplot2 and plotly. But, underneath,
your data remains a SpatialExperiment
object.
tidySpatialExperiment also provides five additional utility functions.
If you would like to learn more about tidySpatialExperiment and tidyomics, the following links are a good place to start:
The tidyomics ecosystem also includes packages for:
Working with genomic features:
Working with transcriptomic features:
SummarizedExperiment
objects.SingleCellExperiment
objects.Seurat
objects.Working with cytometry features:
And a few associated packages:
Package | Functions available |
---|---|
SpatialExperiment |
All |
dplyr |
arrange ,bind_rows , bind_cols ,
distinct , filter , group_by ,
summarise , select , mutate ,
rename , left_join , right_join ,
inner_join , slice , sample_n ,
sample_frac , count ,
add_count
|
tidyr |
nest , unnest , unite ,
separate , extract ,
pivot_longer
|
ggplot2 |
ggplot |
plotly |
plot_ly |
Utility | Description |
---|---|
as_tibble |
Convert cell data to a tbl_df
|
join_features |
Append feature data to cell data |
aggregate_cells |
Aggregate cell-feature abundance into a pseudobulk
SummarizedExperiment object |
rectangle |
Select cells in a rectangular region of space |
ellipse |
Select cells in an elliptical region of space |
You can install the stable version of tidySpatialExperiment from Bioconductor.
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("tidySpatialExperiment")
Or, you can install the development version of tidySpatialExperiment from GitHub.
if (!requireNamespace("pak", quietly = TRUE))
install.packages("pak")
pak::pak("william-hutchison/tidySpatialExperiment")
Here, we attach tidySpatialExperiment and an example
SpatialExperiment
object.
# Load example SpatialExperiment object
library(tidySpatialExperiment)
example(read10xVisium)
A SpatialExperiment
object represents assay-feature
values as rows and cells as columns. Additional information about the
cells is stored in the reducedDims
, colData
and spatialCoords
slots.
tidySpatialExperiment provides a SpatialExperiment-tibble
abstraction, representing cells as rows and cell data as columns, in
accordance with the tidy observation-variable convention. The cell data
is made up of information stored in the colData
and
spatialCoords
slots.
The default view is now of the SpatialExperiment-tibble abstraction.
spe
## # A SpatialExperiment-tibble abstraction: 99 × 7
## # Features = 50 | Cells = 99 | Assays = counts
## .cell in_tissue array_row array_col sample_id pxl_col_in_fullres
## <chr> <lgl> <int> <int> <chr> <int>
## 1 AAACAACGAATAGTTC-1 FALSE 0 16 section1 2312
## 2 AAACAAGTATCTCCCA-1 TRUE 50 102 section1 8230
## 3 AAACAATCTACTAGCA-1 TRUE 3 43 section1 4170
## 4 AAACACCAATAACTGC-1 TRUE 59 19 section1 2519
## 5 AAACAGAGCGACTCCT-1 TRUE 14 94 section1 7679
## 6 AAACAGCTTTCAGAAG-1 FALSE 43 9 section1 1831
## 7 AAACAGGGTCTATATT-1 FALSE 47 13 section1 2106
## 8 AAACAGTGTTCCTGGG-1 FALSE 73 43 section1 4170
## 9 AAACATGGTGAGAGGA-1 FALSE 62 0 section1 1212
## 10 AAACATTTCCCGGATT-1 FALSE 61 97 section1 7886
## # ℹ 89 more rows
## # ℹ 1 more variable: pxl_row_in_fullres <int>
But, our data maintains its status as a
SpatialExperiment
object. Therefore, we have access to all
SpatialExperiment
functions.
spe |>
colData() |>
head()
## DataFrame with 6 rows and 4 columns
## in_tissue array_row array_col sample_id
## <logical> <integer> <integer> <character>
## AAACAACGAATAGTTC-1 FALSE 0 16 section1
## AAACAAGTATCTCCCA-1 TRUE 50 102 section1
## AAACAATCTACTAGCA-1 TRUE 3 43 section1
## AAACACCAATAACTGC-1 TRUE 59 19 section1
## AAACAGAGCGACTCCT-1 TRUE 14 94 section1
## AAACAGCTTTCAGAAG-1 FALSE 43 9 section1
spe |>
spatialCoords() |>
head()
## pxl_col_in_fullres pxl_row_in_fullres
## AAACAACGAATAGTTC-1 2312 1252
## AAACAAGTATCTCCCA-1 8230 7237
## AAACAATCTACTAGCA-1 4170 1611
## AAACACCAATAACTGC-1 2519 8315
## AAACAGAGCGACTCCT-1 7679 2927
## AAACAGCTTTCAGAAG-1 1831 6400
spe |>
imgData()
## DataFrame with 2 rows and 4 columns
## sample_id image_id data scaleFactor
## <character> <character> <list> <numeric>
## 1 section1 lowres #### 0.0510334
## 2 section2 lowres #### 0.0510334
Most functions from dplyr are available for use with the
SpatialExperiment-tibble abstraction. For example, filter()
can be used to filter cells by a variable of interest.
spe |>
filter(array_col < 5)
## # A SpatialExperiment-tibble abstraction: 6 × 7
## # Features = 50 | Cells = 6 | Assays = counts
## .cell in_tissue array_row array_col sample_id pxl_col_in_fullres
## <chr> <lgl> <int> <int> <chr> <int>
## 1 AAACATGGTGAGAGGA-1 FALSE 62 0 section1 1212
## 2 AAACGAAGATGGAGTA-1 FALSE 58 4 section1 1487
## 3 AAAGAATGACCTTAGA-1 FALSE 64 2 section1 1349
## 4 AAACATGGTGAGAGGA-1 FALSE 62 0 section2 1212
## 5 AAACGAAGATGGAGTA-1 FALSE 58 4 section2 1487
## 6 AAAGAATGACCTTAGA-1 FALSE 64 2 section2 1349
## # ℹ 1 more variable: pxl_row_in_fullres <int>
And mutate
can be used to add new variables, or modify
the value of an existing variable.
## # A SpatialExperiment-tibble abstraction: 99 × 8
## # Features = 50 | Cells = 99 | Assays = counts
## .cell in_tissue array_row array_col sample_id in_region pxl_col_in_fullres
## <chr> <lgl> <int> <int> <chr> <lgl> <int>
## 1 AAACAAC… FALSE 0 16 section1 FALSE 2312
## 2 AAACAAG… TRUE 50 102 section1 FALSE 8230
## 3 AAACAAT… TRUE 3 43 section1 TRUE 4170
## 4 AAACACC… TRUE 59 19 section1 FALSE 2519
## 5 AAACAGA… TRUE 14 94 section1 FALSE 7679
## 6 AAACAGC… FALSE 43 9 section1 FALSE 1831
## 7 AAACAGG… FALSE 47 13 section1 FALSE 2106
## 8 AAACAGT… FALSE 73 43 section1 FALSE 4170
## 9 AAACATG… FALSE 62 0 section1 FALSE 1212
## 10 AAACATT… FALSE 61 97 section1 FALSE 7886
## # ℹ 89 more rows
## # ℹ 1 more variable: pxl_row_in_fullres <int>
Most functions from tidyr are also available. Here,
nest()
is used to group the data by sample_id
,
and unnest()
is used to ungroup the data.
# Nest the SpatialExperiment object by sample_id
spe_nested <-
spe |>
nest(data = -sample_id)
# View the nested SpatialExperiment object
spe_nested
## # A tibble: 2 × 2
## sample_id data
## <chr> <list>
## 1 section1 <SptlExpr[,50]>
## 2 section2 <SptlExpr[,49]>
# Unnest the nested SpatialExperiment objects
spe_nested |>
unnest(data)
## # A SpatialExperiment-tibble abstraction: 99 × 7
## # Features = 50 | Cells = 99 | Assays = counts
## .cell in_tissue array_row array_col sample_id pxl_col_in_fullres
## <chr> <lgl> <int> <int> <chr> <int>
## 1 AAACAACGAATAGTTC-1 FALSE 0 16 section1 2312
## 2 AAACAAGTATCTCCCA-1 TRUE 50 102 section1 8230
## 3 AAACAATCTACTAGCA-1 TRUE 3 43 section1 4170
## 4 AAACACCAATAACTGC-1 TRUE 59 19 section1 2519
## 5 AAACAGAGCGACTCCT-1 TRUE 14 94 section1 7679
## 6 AAACAGCTTTCAGAAG-1 FALSE 43 9 section1 1831
## 7 AAACAGGGTCTATATT-1 FALSE 47 13 section1 2106
## 8 AAACAGTGTTCCTGGG-1 FALSE 73 43 section1 4170
## 9 AAACATGGTGAGAGGA-1 FALSE 62 0 section1 1212
## 10 AAACATTTCCCGGATT-1 FALSE 61 97 section1 7886
## # ℹ 89 more rows
## # ℹ 1 more variable: pxl_row_in_fullres <int>
The ggplot()
function can be used to create a plot
directly from a SpatialExperiment
object. This example also
demonstrates how tidy operations can be combined to build up more
complex analysis.
spe |>
filter(sample_id == "section1" & in_tissue) |>
# Add a column with the sum of feature counts per cell
mutate(count_sum = purrr::map_int(.cell, ~
spe[, .x] |>
counts() |>
sum()
)) |>
# Plot with tidySpatialExperiment and ggplot2
ggplot(aes(x = reorder(.cell, count_sum), y = count_sum)) +
geom_point() +
coord_flip()
Different packages from the tidyomics ecosystem are easy to use together. Here, tidygate is used to interactively gate cells based on their array location.
spe_regions <-
spe |>
filter(sample_id == "section1") |>
mutate(region = tidygate::gate_chr(array_col, array_row))
The tidyomics ecosystem places an emphasis on interacting
with cell data. To interact with feature data, the
join_features()
function can be used to append
assay-feature values to cell data.
# Join feature data in wide format, preserving the SpatialExperiment object
spe |>
join_features(features = c("ENSMUSG00000025915", "ENSMUSG00000042501"), shape = "wide") |>
head()
## # A SpatialExperiment-tibble abstraction: 99 × 9
## # Features = 6 | Cells = 99 | Assays = counts
## .cell in_tissue array_row array_col sample_id ENSMUSG00000025915
## <chr> <lgl> <int> <int> <chr> <dbl>
## 1 AAACAACGAATAGTTC-1 FALSE 0 16 section1 0
## 2 AAACAAGTATCTCCCA-1 TRUE 50 102 section1 0
## 3 AAACAATCTACTAGCA-1 TRUE 3 43 section1 0
## 4 AAACACCAATAACTGC-1 TRUE 59 19 section1 0
## 5 AAACAGAGCGACTCCT-1 TRUE 14 94 section1 0
## 6 AAACAGCTTTCAGAAG-1 FALSE 43 9 section1 0
## 7 AAACAGGGTCTATATT-1 FALSE 47 13 section1 0
## 8 AAACAGTGTTCCTGGG-1 FALSE 73 43 section1 0
## 9 AAACATGGTGAGAGGA-1 FALSE 62 0 section1 0
## 10 AAACATTTCCCGGATT-1 FALSE 61 97 section1 0
## # ℹ 89 more rows
## # ℹ 3 more variables: ENSMUSG00000042501 <dbl>, pxl_col_in_fullres <int>,
## # pxl_row_in_fullres <int>
# Join feature data in long format, discarding the SpatialExperiment object
spe |>
join_features(features = c("ENSMUSG00000025915", "ENSMUSG00000042501"), shape = "long") |>
head()
## tidySpatialExperiment says: A data frame is returned for independent data
## analysis.
## # A tibble: 6 × 7
## .cell in_tissue array_row array_col sample_id .feature .abundance_counts
## <chr> <lgl> <int> <int> <chr> <chr> <dbl>
## 1 AAACAACGAA… FALSE 0 16 section1 ENSMUSG… 0
## 2 AAACAACGAA… FALSE 0 16 section1 ENSMUSG… 0
## 3 AAACAAGTAT… TRUE 50 102 section1 ENSMUSG… 0
## 4 AAACAAGTAT… TRUE 50 102 section1 ENSMUSG… 1
## 5 AAACAATCTA… TRUE 3 43 section1 ENSMUSG… 0
## 6 AAACAATCTA… TRUE 3 43 section1 ENSMUSG… 0
Sometimes, it is necessary to aggregate the gene-transcript abundance from a group of cells into a single value. For example, when comparing groups of cells across different samples with fixed-effect models.
The aggregate_cells()
function can be used to aggregate
cells by a specified variable and assay, returning a
SummarizedExperiment
object.
spe |>
aggregate_cells(in_tissue, assays = "counts")
## class: SummarizedExperiment
## dim: 50 2
## metadata(0):
## assays(1): counts
## rownames(50): ENSMUSG00000002459 ENSMUSG00000005886 ...
## ENSMUSG00000104217 ENSMUSG00000104328
## rowData names(1): feature
## colnames(2): FALSE TRUE
## colData names(2): in_tissue .aggregated_cells
The ellipse()
and rectangle()
functions can
be used to select cells by their position in space.
Removing the .cell
column will return a tibble. This is
consistent with the behaviour in other tidyomics packages.
## tidySpatialExperiment says: Key columns are missing. A data frame is
## returned for independent data analysis.
## # A tibble: 6 × 4
## in_tissue array_row array_col sample_id
## <lgl> <int> <int> <chr>
## 1 FALSE 0 16 section1
## 2 TRUE 50 102 section1
## 3 TRUE 3 43 section1
## 4 TRUE 59 19 section1
## 5 TRUE 14 94 section1
## 6 FALSE 43 9 section1
The sample_id column cannot be removed with tidyverse
functions, and can only be modified if the changes are accepted by
SpatialExperiment’s colData()
function.
# sample_id is not removed, despite the user's request
spe |>
select(-sample_id)
## # A SpatialExperiment-tibble abstraction: 99 × 7
## # Features = 50 | Cells = 99 | Assays = counts
## .cell in_tissue array_row array_col sample_id pxl_col_in_fullres
## <chr> <lgl> <int> <int> <chr> <int>
## 1 AAACAACGAATAGTTC-1 FALSE 0 16 section1 2312
## 2 AAACAAGTATCTCCCA-1 TRUE 50 102 section1 8230
## 3 AAACAATCTACTAGCA-1 TRUE 3 43 section1 4170
## 4 AAACACCAATAACTGC-1 TRUE 59 19 section1 2519
## 5 AAACAGAGCGACTCCT-1 TRUE 14 94 section1 7679
## 6 AAACAGCTTTCAGAAG-1 FALSE 43 9 section1 1831
## 7 AAACAGGGTCTATATT-1 FALSE 47 13 section1 2106
## 8 AAACAGTGTTCCTGGG-1 FALSE 73 43 section1 4170
## 9 AAACATGGTGAGAGGA-1 FALSE 62 0 section1 1212
## 10 AAACATTTCCCGGATT-1 FALSE 61 97 section1 7886
## # ℹ 89 more rows
## # ℹ 1 more variable: pxl_row_in_fullres <int>
# This change maintains separation of sample_ids and is permitted
spe |>
mutate(sample_id = stringr::str_c(sample_id, "_modified")) |>
head()
## # A SpatialExperiment-tibble abstraction: 99 × 7
## # Features = 6 | Cells = 99 | Assays = counts
## .cell in_tissue array_row array_col sample_id pxl_col_in_fullres
## <chr> <lgl> <int> <int> <chr> <int>
## 1 AAACAACGAATAGTTC-1 FALSE 0 16 section1… 2312
## 2 AAACAAGTATCTCCCA-1 TRUE 50 102 section1… 8230
## 3 AAACAATCTACTAGCA-1 TRUE 3 43 section1… 4170
## 4 AAACACCAATAACTGC-1 TRUE 59 19 section1… 2519
## 5 AAACAGAGCGACTCCT-1 TRUE 14 94 section1… 7679
## 6 AAACAGCTTTCAGAAG-1 FALSE 43 9 section1… 1831
## 7 AAACAGGGTCTATATT-1 FALSE 47 13 section1… 2106
## 8 AAACAGTGTTCCTGGG-1 FALSE 73 43 section1… 4170
## 9 AAACATGGTGAGAGGA-1 FALSE 62 0 section1… 1212
## 10 AAACATTTCCCGGATT-1 FALSE 61 97 section1… 7886
## # ℹ 89 more rows
## # ℹ 1 more variable: pxl_row_in_fullres <int>
# This change does not maintain separation of sample_ids and produces an error
spe |>
mutate(sample_id = "new_sample")
## Error in .local(x, ..., value): Number of unique 'sample_id's is 2, but 1 was provided.
The pxl_col_in_fullres
and
px_row_in_fullres
columns cannot be removed or modified
with tidyverse functions. This is consistent with the behaviour
of dimension reduction data in other tidyomics packages.
# Attempting to remove pxl_col_in_fullres produces an error
spe |>
select(-pxl_col_in_fullres)
## Error in `select_helper()`:
## ! Can't select columns that don't exist.
## ✖ Column `pxl_col_in_fullres` doesn't exist.
# Attempting to modify pxl_col_in_fullres produces an error
spe |>
mutate(pxl_col_in_fullres)
## Error in `dplyr::mutate()`:
## ℹ In argument: `pxl_col_in_fullres`.
## Caused by error:
## ! object 'pxl_col_in_fullres' not found
## R version 4.4.0 (2024-04-24)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 22.04.4 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so; LAPACK version 3.10.0
##
## locale:
## [1] LC_CTYPE=C.UTF-8 LC_NUMERIC=C LC_TIME=C.UTF-8
## [4] LC_COLLATE=C.UTF-8 LC_MONETARY=C.UTF-8 LC_MESSAGES=C.UTF-8
## [7] LC_PAPER=C.UTF-8 LC_NAME=C LC_ADDRESS=C
## [10] LC_TELEPHONE=C LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C
##
## time zone: UTC
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] tidySpatialExperiment_1.1.0 ggplot2_3.5.1
## [3] ttservice_0.4.0 tidyr_1.3.1
## [5] dplyr_1.1.4 tidySingleCellExperiment_1.14.0
## [7] SpatialExperiment_1.14.0 SingleCellExperiment_1.26.0
## [9] SummarizedExperiment_1.34.0 Biobase_2.64.0
## [11] GenomicRanges_1.56.0 GenomeInfoDb_1.40.0
## [13] IRanges_2.38.0 S4Vectors_0.42.0
## [15] BiocGenerics_0.50.0 MatrixGenerics_1.16.0
## [17] matrixStats_1.3.0 BiocStyle_2.32.0
##
## loaded via a namespace (and not attached):
## [1] rlang_1.1.3 magrittr_2.0.3
## [3] compiler_4.4.0 DelayedMatrixStats_1.26.0
## [5] systemfonts_1.1.0 vctrs_0.6.5
## [7] stringr_1.5.1 pkgconfig_2.0.3
## [9] crayon_1.5.2 fastmap_1.2.0
## [11] magick_2.8.3 XVector_0.44.0
## [13] ellipsis_0.3.2 labeling_0.4.3
## [15] scuttle_1.14.0 utf8_1.2.4
## [17] rmarkdown_2.27 UCSC.utils_1.0.0
## [19] ragg_1.3.2 purrr_1.0.2
## [21] xfun_0.44 zlibbioc_1.50.0
## [23] cachem_1.1.0 beachmat_2.20.0
## [25] jsonlite_1.8.8 highr_0.10
## [27] rhdf5filters_1.16.0 DelayedArray_0.30.1
## [29] Rhdf5lib_1.26.0 BiocParallel_1.38.0
## [31] parallel_4.4.0 R6_2.5.1
## [33] bslib_0.7.0 stringi_1.8.4
## [35] limma_3.60.0 jquerylib_0.1.4
## [37] Rcpp_1.0.12 bookdown_0.39
## [39] knitr_1.46 R.utils_2.12.3
## [41] Matrix_1.7-0 tidyselect_1.2.1
## [43] abind_1.4-5 yaml_2.3.8
## [45] codetools_0.2-20 lattice_0.22-6
## [47] tibble_3.2.1 withr_3.0.0
## [49] evaluate_0.23 desc_1.4.3
## [51] pillar_1.9.0 BiocManager_1.30.23
## [53] plotly_4.10.4 generics_0.1.3
## [55] sparseMatrixStats_1.16.0 munsell_0.5.1
## [57] scales_1.3.0 glue_1.7.0
## [59] lazyeval_0.2.2 tools_4.4.0
## [61] data.table_1.15.4 locfit_1.5-9.9
## [63] fs_1.6.4 rhdf5_2.48.0
## [65] grid_4.4.0 DropletUtils_1.24.0
## [67] edgeR_4.2.0 colorspace_2.1-0
## [69] GenomeInfoDbData_1.2.12 HDF5Array_1.32.0
## [71] cli_3.6.2 textshaping_0.3.7
## [73] fansi_1.0.6 S4Arrays_1.4.0
## [75] viridisLite_0.4.2 gtable_0.3.5
## [77] R.methodsS3_1.8.2 sass_0.4.9
## [79] digest_0.6.35 SparseArray_1.4.3
## [81] dqrng_0.4.0 farver_2.1.2
## [83] rjson_0.2.21 htmlwidgets_1.6.4
## [85] memoise_2.0.1 htmltools_0.5.8.1
## [87] pkgdown_2.0.9 R.oo_1.26.0
## [89] lifecycle_1.0.4 httr_1.4.7
## [91] statmod_1.5.0