> Method users > User guide > Inferring trajectories

Inferring trajectories

Parameters
Priors
Reproducibility
Running multiple methods or datasets
Errors
Running from the command line

library(dyno)
library(tidyverse)

dynwrap::infer_trajectory is the main function to infer a trajectory. It requires two things:

A dataset, wrapped using dynwrap::wrap_expression
A TI method. This can be one of the 59 TI method from dynmethods, or a name of a method in which case it will retrieve the relevant method from dynmethods.

dataset <- wrap_expression(
  counts = example_dataset$counts,
  expression = example_dataset$expression
)
model <- infer_trajectory(dataset, ti_comp1())
## Loading required namespace: hdf5r

This model now contains the main information on the trajectory, i.e. the milestone_network and progressions:

model$milestone_network
##              from            to length directed
## 1 milestone_begin milestone_end      1    FALSE

head(model$progressions, 10)
##    cell_id            from            to percentage
## 1    Cell1 milestone_begin milestone_end 0.25088041
## 2    Cell2 milestone_begin milestone_end 0.36976202
## 3    Cell3 milestone_begin milestone_end 0.56873343
## 4    Cell4 milestone_begin milestone_end 0.91506325
## 5    Cell5 milestone_begin milestone_end 0.21259337
## 6    Cell6 milestone_begin milestone_end 0.91000325
## 7    Cell7 milestone_begin milestone_end 0.90920327
## 8    Cell8 milestone_begin milestone_end 0.64542720
## 9    Cell9 milestone_begin milestone_end 0.60957610
## 10  Cell10 milestone_begin milestone_end 0.06369673

While running methods inside a docker or singularity container reduces problems with dependencies and makes an analysis more reproducible, it can also create a considerable overhead. We plan to wrap wrap some more “popular” methods directly into R. See https://github.com/dynverse/dynmethods/issues/152 for an overview.

Parameters

Optionally, you can also give it some parameters. These are all documented within the relevant functions in dynmethods (also available in the reference section):

?ti_comp1

## Component 1
## 
## Description:
## 
##      Will generate a trajectory using Component 1.
## 
##      This method was wrapped inside a container.
## 
## Usage:
## 
##      ti_comp1(dimred = "pca", ndim = 2L, component = 1L)
##      
## Arguments:
## 
##   dimred: Which dimensionality reduction method to use. Domain: pca,
##           mds, tsne, ica, lle, landmark_mds, mds_sammon, mds_isomds,
##           mds_smacof, umap, dm_diffusionMap. Default: pca. Format:
##           character.
## 
##     ndim: . Domain: U(2, 30). Default: 2. Format: integer.
## 
## component: . Domain: U(1, 10). Default: 1. Format: integer.
## 
## Value:
## 
##      A TI method wrapper to be used together with 'infer_trajectory'

model <- infer_trajectory(dataset, ti_comp1(dimred = "landmark_mds"))

Priors

The method will error if it requires some prior information that is not provided in the dataset. Some methods also have optional prior information, in which case you can give them to the method using the give_priors argument:

dataset <- dataset %>% add_prior_information(start_id = example_dataset$prior_information$start_id)
model <- infer_trajectory(dataset, ti_comp1(), give_priors = c("start_id"))

An overview of all possible priors is given in the dynwrap tibble:

dynwrap::priors

prior_id	name	description	type
start_id	Start cell(s)	One or more start cell identifiers	soft
end_id	End cell(s)	One or more end cell identifiers	soft
end_n	# end states	The number of end states	soft
start_n	# start states	The number of start states	soft
groups_id	Cell clustering	Named character vector linking the cell identifiers to different states/branches	hard
groups_n	# states	Number of states/branches, including start, end and intermediary states	soft
groups_network	State network	Dataframe containing the known network between states/branches. Contains a from and to column	hard
timecourse_continuous	Time course (continuous)	Named numeric vector linking the cell ids to time points	hard
timecourse_discrete	Time course (discrete)	Named numeric vector linking the cell ids to time course points	hard
features_id	Marker genes	Genes/features known to be important in the dynamic process	soft
dataset	The full dataset	The full dataset, including (if available) the gold standard	hard

You can find all methods that can use a certain type of prior by filtering the dynmethods::methods. For example, to find all method wrappers that can use timecourse data:

dynmethods::methods %>%
  filter(map_lgl(wrapper_inputs, ~ any(c("timecourse_discrete", "timecourse_continuous") %in% .$input_id))) %>% 
  select(method_id, method_name)
## # A tibble: 2 x 2
##   method_id method_name
##   <chr>     <chr>      
## 1 grandprix GrandPrix  
## 2 scuba     SCUBA

Reproducibility

To make the execution of a method reproducible, fix the seed either using set.seed() or through the seed argument of infer_trajectory:

set.seed(1)
model <- dynwrap::infer_trajectory(dataset, ti_comp1())

Running multiple methods or datasets

Often it is useful to run multiple methods and/or use multiple datasets. While you can easily parallelise this yourself, we provide a helper function for this: dynwrap::infer_trajectories(). This function integrates well with future interpretation and plotting functions.

models <- dynwrap::infer_trajectories(
  dataset, 
  method = list(ti_comp1(), ti_angle())
)

This function generates a data frame containing the different models and extra information.

models
## # A tibble: 2 x 7
##   dataset_ix method_ix dataset_id    method_id method_name model    summary
##        <int>     <int> <chr>         <chr>     <chr>       <list>   <list> 
## 1          1         1 20190409_163… comp1     Component 1 <S3: dy… <tibbl…
## 2          1         2 20190409_163… angle     Angle       <S3: dy… <tibbl…

Errors

Some methods can generate errors which are beyond our control. To know what and where a method generated an error, you can turn on the verbosity:

dataset$expression@x[] <- NA
model <- dynwrap::infer_trajectory(dataset, ti_comp1(), give_priors = c("start_id"), verbose = TRUE)
## Error: Error during trajectory inference 
## Loading required package: dynutils
## Error in svd(x, nu = 0, nv = k) : infinite or missing values in 'x'
## Calls: <Anonymous> ... <Anonymous> -> <Anonymous> -> prcomp.default -> svd
## Execution halted
## Executing 'comp1' on '20190409_163010__data_wrapper__pbfhZY5hhE'
## With parameters: list(dimred = "pca", ndim = 2L, component = 1L),
## inputs: expression, and
## priors : 
## Input saved to /tmp/Rtmp50BQLE/file44714125ec76/ti
## Running method using babelwhale
## Running /usr/bin/docker run -e 'TMPDIR=/tmp2' --workdir /ti/workspace -v \
##   '/tmp/Rtmp50BQLE/file44714125ec76/ti:/ti' -v \
##   '/tmp/Rtmp50BQLE/file447173b4375f/tmp:/tmp2' \
##   'dynverse/ti_comp1:v0.9.9.01' --dataset /ti/input.h5 --output \
##   /ti/output.h5
## Loading required package: dynutils
## Error in svd(x, nu = 0, nv = k) : infinite or missing values in 'x'
## Calls: <Anonymous> ... <Anonymous> -> <Anonymous> -> prcomp.default -> svd
## Execution halted

Running from the command line

It is also possible to run each method in dynmethods from the command-line:

docker run dynverse/ti_comp1 --help
## Usage: LOCAL=/path/to/folder; MOUNT=/ti; docker run -v $LOCAL:$MOUNT dynverse/ti_comp1
## 
## 
## Options:
##  -h, --help
##      Show this help message and exit
## 
##  --dataset=DATASET
##      Filename of the dataset, example: $MOUNT/dataset.(h5|loom). h5 files can be created by cellranger or dyncli.
## 
##  --loom_expression_layer=LOOM_EXPRESSION_LAYER
##      If available, the name of the loom-layer containing normalised log-transformed data.
## 
##  --output=OUTPUT
##      Filename of the output trajectory data, example: $MOUNT/output.h5.
## 
##  --parameters=PARAMETERS
##      A file containing method-specific parameters, example: $MOUNT/parameters.(h5|yml).
## 
##  --dimred=DIMRED
##      Parameter: Which dimensionality reduction method to use
##      Domain: {pca, mds, tsne, ica, lle, landmark_mds, mds_sammon, mds_isomds, mds_smacof, umap, dm_diffusionMap}
##      Default: pca
##      Format: character
## 
##  --ndim=NDIM
##      Parameter: 
##      Domain: U(2, 30)
##      Default: 2
##      Format: integer
## 
##  --component=COMPONENT
##      Parameter: 
##      Domain: U(1, 10)
##      Default: 1
##      Format: integer
## 
##  --verbosity=VERBOSITY
##      The verbosity level: 0 => none, 1 => critical (default), 2 => info, 3 => debug.
## 
##  --seed=SEED
##      A seed to be set to ensure reproducability.

This container will output an HDF5 file, which can be read by tools such as dynutils::read_h5().