Inferring trajectories

library(dyno)
library(tidyverse)

dynwrap::infer_trajectory is the main function to infer a trajectory. It requires two things:

  • A dataset, wrapped using dynwrap::wrap_expression
  • A TI method. This can be one of the 59 TI method from dynmethods, or a name of a method in which case it will retrieve the relevant method from dynmethods.
dataset <- wrap_expression(
  counts = example_dataset$counts,
  expression = example_dataset$expression
)
model <- infer_trajectory(dataset, ti_comp1())
## Loading required namespace: hdf5r

This model now contains the main information on the trajectory, i.e. the milestone_network and progressions:

model$milestone_network
##              from            to length directed
## 1 milestone_begin milestone_end      1    FALSE
head(model$progressions, 10)
##    cell_id            from            to percentage
## 1    Cell1 milestone_begin milestone_end 0.25088041
## 2    Cell2 milestone_begin milestone_end 0.36976202
## 3    Cell3 milestone_begin milestone_end 0.56873343
## 4    Cell4 milestone_begin milestone_end 0.91506325
## 5    Cell5 milestone_begin milestone_end 0.21259337
## 6    Cell6 milestone_begin milestone_end 0.91000325
## 7    Cell7 milestone_begin milestone_end 0.90920327
## 8    Cell8 milestone_begin milestone_end 0.64542720
## 9    Cell9 milestone_begin milestone_end 0.60957610
## 10  Cell10 milestone_begin milestone_end 0.06369673

While running methods inside a docker or singularity container reduces problems with dependencies and makes an analysis more reproducible, it can also create a considerable overhead. We plan to wrap wrap some more “popular” methods directly into R. See https://github.com/dynverse/dynmethods/issues/152 for an overview.

Parameters

Optionally, you can also give it some parameters. These are all documented within the relevant functions in dynmethods (also available in the reference section):

?ti_comp1
## Component 1
## 
## Description:
## 
##      Will generate a trajectory using Component 1.
## 
##      This method was wrapped inside a container.
## 
## Usage:
## 
##      ti_comp1(dimred = "pca", ndim = 2L, component = 1L)
##      
## Arguments:
## 
##   dimred: Which dimensionality reduction method to use. Domain: pca,
##           mds, tsne, ica, lle, landmark_mds, mds_sammon, mds_isomds,
##           mds_smacof, umap, dm_diffusionMap. Default: pca. Format:
##           character.
## 
##     ndim: . Domain: U(2, 30). Default: 2. Format: integer.
## 
## component: . Domain: U(1, 10). Default: 1. Format: integer.
## 
## Value:
## 
##      A TI method wrapper to be used together with 'infer_trajectory'
model <- infer_trajectory(dataset, ti_comp1(dimred = "landmark_mds"))

Priors

The method will error if it requires some prior information that is not provided in the dataset. Some methods also have optional prior information, in which case you can give them to the method using the give_priors argument:

dataset <- dataset %>% add_prior_information(start_id = example_dataset$prior_information$start_id)
model <- infer_trajectory(dataset, ti_comp1(), give_priors = c("start_id"))

An overview of all possible priors is given in the dynwrap tibble:

dynwrap::priors
prior_id name description type
start_id Start cell(s) One or more start cell identifiers soft
end_id End cell(s) One or more end cell identifiers soft
end_n # end states The number of end states soft
start_n # start states The number of start states soft
groups_id Cell clustering Named character vector linking the cell identifiers to different states/branches hard
groups_n # states Number of states/branches, including start, end and intermediary states soft
groups_network State network Dataframe containing the known network between states/branches. Contains a from and to column hard
timecourse_continuous Time course (continuous) Named numeric vector linking the cell ids to time points hard
timecourse_discrete Time course (discrete) Named numeric vector linking the cell ids to time course points hard
features_id Marker genes Genes/features known to be important in the dynamic process soft
dataset The full dataset The full dataset, including (if available) the gold standard hard

You can find all methods that can use a certain type of prior by filtering the dynmethods::methods. For example, to find all method wrappers that can use timecourse data:

dynmethods::methods %>%
  filter(map_lgl(wrapper_inputs, ~ any(c("timecourse_discrete", "timecourse_continuous") %in% .$input_id))) %>% 
  select(method_id, method_name)
## # A tibble: 2 x 2
##   method_id method_name
##   <chr>     <chr>      
## 1 grandprix GrandPrix  
## 2 scuba     SCUBA

Reproducibility

To make the execution of a method reproducible, fix the seed either using set.seed() or through the seed argument of infer_trajectory:

set.seed(1)
model <- dynwrap::infer_trajectory(dataset, ti_comp1())

Running multiple methods or datasets

Often it is useful to run multiple methods and/or use multiple datasets. While you can easily parallelise this yourself, we provide a helper function for this: dynwrap::infer_trajectories(). This function integrates well with future interpretation and plotting functions.

models <- dynwrap::infer_trajectories(
  dataset, 
  method = list(ti_comp1(), ti_angle())
)

This function generates a data frame containing the different models and extra information.

models
## # A tibble: 2 x 7
##   dataset_ix method_ix dataset_id    method_id method_name model    summary
##        <int>     <int> <chr>         <chr>     <chr>       <list>   <list> 
## 1          1         1 20190409_163… comp1     Component 1 <S3: dy… <tibbl…
## 2          1         2 20190409_163… angle     Angle       <S3: dy… <tibbl…

Errors

Some methods can generate errors which are beyond our control. To know what and where a method generated an error, you can turn on the verbosity:

dataset$expression@x[] <- NA
model <- dynwrap::infer_trajectory(dataset, ti_comp1(), give_priors = c("start_id"), verbose = TRUE)
## Error: Error during trajectory inference 
## Loading required package: dynutils
## Error in svd(x, nu = 0, nv = k) : infinite or missing values in 'x'
## Calls: <Anonymous> ... <Anonymous> -> <Anonymous> -> prcomp.default -> svd
## Execution halted
## Executing 'comp1' on '20190409_163010__data_wrapper__pbfhZY5hhE'
## With parameters: list(dimred = "pca", ndim = 2L, component = 1L),
## inputs: expression, and
## priors : 
## Input saved to /tmp/Rtmp50BQLE/file44714125ec76/ti
## Running method using babelwhale
## Running /usr/bin/docker run -e 'TMPDIR=/tmp2' --workdir /ti/workspace -v \
##   '/tmp/Rtmp50BQLE/file44714125ec76/ti:/ti' -v \
##   '/tmp/Rtmp50BQLE/file447173b4375f/tmp:/tmp2' \
##   'dynverse/ti_comp1:v0.9.9.01' --dataset /ti/input.h5 --output \
##   /ti/output.h5
## Loading required package: dynutils
## Error in svd(x, nu = 0, nv = k) : infinite or missing values in 'x'
## Calls: <Anonymous> ... <Anonymous> -> <Anonymous> -> prcomp.default -> svd
## Execution halted

Running from the command line

It is also possible to run each method in dynmethods from the command-line:

docker run dynverse/ti_comp1 --help
## Usage: LOCAL=/path/to/folder; MOUNT=/ti; docker run -v $LOCAL:$MOUNT dynverse/ti_comp1
## 
## 
## Options:
##  -h, --help
##      Show this help message and exit
## 
##  --dataset=DATASET
##      Filename of the dataset, example: $MOUNT/dataset.(h5|loom). h5 files can be created by cellranger or dyncli.
## 
##  --loom_expression_layer=LOOM_EXPRESSION_LAYER
##      If available, the name of the loom-layer containing normalised log-transformed data.
## 
##  --output=OUTPUT
##      Filename of the output trajectory data, example: $MOUNT/output.h5.
## 
##  --parameters=PARAMETERS
##      A file containing method-specific parameters, example: $MOUNT/parameters.(h5|yml).
## 
##  --dimred=DIMRED
##      Parameter: Which dimensionality reduction method to use
##      Domain: {pca, mds, tsne, ica, lle, landmark_mds, mds_sammon, mds_isomds, mds_smacof, umap, dm_diffusionMap}
##      Default: pca
##      Format: character
## 
##  --ndim=NDIM
##      Parameter: 
##      Domain: U(2, 30)
##      Default: 2
##      Format: integer
## 
##  --component=COMPONENT
##      Parameter: 
##      Domain: U(1, 10)
##      Default: 1
##      Format: integer
## 
##  --verbosity=VERBOSITY
##      The verbosity level: 0 => none, 1 => critical (default), 2 => info, 3 => debug.
## 
##  --seed=SEED
##      A seed to be set to ensure reproducability.

This container will output an HDF5 file, which can be read by tools such as dynutils::read_h5().