Processing bemovi output

bemovi processing

If you did not carry on with the analysis, re-load the necessary packages and paths:

library(bemovi)
library(randomForest)
library(data.table)
library(ggplot2)

# project directory (you create this one yourself)
to.data <- "/Users/Frank//Bemovi demo/"

# you also have to provide two folders:
# the first contains the video description
# the second holds the raw videos
video.description.folder <- "0 - video description/"
video.description.file <- "video.description.txt"
raw.video.folder <- "1 - raw/"

# the following intermediate directories are automatically created
particle.data.folder <- "2 - particle data/"
trajectory.data.folder <- "3 - trajectory data/"
temp.overlay.folder <- "4a - temp overlays/"
overlay.folder <- "4 - overlays/"
merged.data.folder <- "5 - merged data/"
ijmacs.folder <- "ijmacs/"

# video frame rate (in frames per second)
fps <- 25

# size of a pixel in micrometer
pixel_to_scale <- 1000/240

Process data

For further processing in R, the data has to be loaded from the merged database (Master.RData), which is placed in the merged.data.folder.

load(paste0(to.data,merged.data.folder,"Master.RData"))
Sys.sleep(30)

The data is saved under the name trajectory.data as a data.table, for efficient processing of the usually large tables (more than 1,000,000 rows).

Although the video segmentation should be fine-tuned to yield the highest signal-to-noise ratio, a fraction of the identified trajectories may still be artefactual trajectories. To remove these, the filter_data() function selects trajectories based on the minimum net displacement, their duration, the detection frequency and the median step length.

trajectory.data.filtered <- filter_data(trajectory.data,50,1,0.1,5)
trajectory.data.filtered$type <- "filtered data"
trajectory.data$type <- "raw data"

The following figure shows the effect of the filtering.

$The figure shows how the parameters of the function can be used to filter out artefactual trajectories (in red) due to erroneous detection of particles or moving debris versus real trajectories by target individuals (in black).$

The figure shows how the parameters of the function can be used to filter out artefactual trajectories (in red) due to erroneous detection of particles or moving debris versus real trajectories by target individuals (in black).

Trajectories contain information about morphology and movement on the frame or between-frame level. This information can be used for different purposes, for instance comparing morphology or movement of individuals between treatments, and also the prediction of species identities. To aggregate information on the individual level (i.e. the trajectory-level) we use the summarize_trajectories() function. This function can calculate either the mean and the standard deviation, or the median and the interquartile range (to choose the latter, set calculate.median=T). of the morphological properties (area, mean grey value, perimeter, width, length and aspect ratio). Likewise, the mean and variability (turning angle, step length, gross speed), as well as the extremes (min and max gross speed, max net displacement) and sums (gross and net displacement) for some movement metrics are calculated and aggregated at the trajectory level. The function takes as arguments the dataset to be summarized, and has and option to save the results to the hard disk (set write=T), which the requires a path to the destination directory (here the data gets written to the merged.data.folder).

morph_mvt <- summarize_trajectories(trajectory.data.filtered, write=T, calculate.median=F, to.data, merged.data.folder)

Visualization

Besides creating static plots to overlay the data from different processing (e.g., filter_data() function), dynamic overlays between the original video and the extracted data are very helpful in trouble shooting tracking and detection problems and for illustration of the results. Especially when species identities are predicted based on the trajectory properties, the overlay video can help to check the identities of species in mixed cultures, which would be difficult, just based on the extracted data. For these purposes, we have developed the create_overlays function, which can visualize results in different ways.

The arguments of the function are the paths to the merged.data.folder, the raw videos and a directory in which the resulting overlays videos are saved. The width and height of the original videos have to specified, as well as the difference lag. Two types of visualizations are possible: “traj” builds up the trajectories from the first frame, which remain present till the last frame. Especially, when many particles are tracked, this can result in cluttered videos. The type “label” outlines detected trajectories with their individual identifiers for that particular video. If species were predicted and the prediction added to the Master.RData file (as column predict_spec), the predict_spec option can be set to TRUE. In this case, the species label used to label the trajectories appears in different colours. The arguments IJ.path and memroy are used as previously, whereas the contrast.enhancement argument allows to increase the contrast on the original video.

create_overlays(to.data, merged.data.folder, raw.video.folder,
                      temp.overlay.folder, overlay.folder,
                      2048,
                      2048,
                      difference.lag,
                      type="traj",
                      predict_spec=F,
                      IJ.path,
                      contrast.enhancement = 1.0,
                      memory = memory.alloc)

A single frame showing an overlay between the original video and the data extracted from the video. The left image shows the plot with the “traj” option, whereas the right image shows the labelled particles.

Data analysis

After processing the data with bemovi, you can start extracting information such as the number of particles per video, distribution of traits such as body size and speed and predict species identities based on traits. But first we have to predict the species identities in the mixed culture. (Recall that each of the videos is of either a monoculture of one of three species of ciliate, or of a mixed-culture containing all three species; and that this information is stored in the video.description.txt file).

Predict species identities with randomForest

First we have to create a training dataset based on the videos of monocultures. We do so by only selecting trajectories which are well represented i.e. long trajectories which clearly show movement and specific speeds.

# create species variable for treatment cultures
morph_mvt$treatment_label <- ifelse(morph_mvt$treatment == "P", "Paramecium", ifelse(morph_mvt$treatment == "C", "Colpidium", "Tetrahymena"))

morph_mvt_P <- morph_mvt[morph_mvt$treatment == "P" & morph_mvt$net_speed > 15 & morph_mvt$net_disp > 200, ]
morph_mvt_C <- morph_mvt[morph_mvt$treatment == "C" & morph_mvt$net_speed > 15 & morph_mvt$net_disp > 200, ]
morph_mvt_T <- morph_mvt[morph_mvt$treatment == "T" & morph_mvt$net_speed > 2 & morph_mvt$net_disp > 100, ]
training_data <- rbind(morph_mvt_P,morph_mvt_C,morph_mvt_T)
rm(morph_mvt_P, morph_mvt_C, morph_mvt_T)

Then we can use this data to predict the different species in monocultures.

####### Classify species in mixed culture by RandomForest #################
library(randomForest)
training_data <- training_data[complete.cases(training_data),]
rf_fit_all <- randomForest(factor(treatment) ~ mean_area + sd_area + mean_perimeter + sd_perimeter + mean_major + sd_major +  mean_ar + sd_ar + 
                             mean_turning + sd_turning + gross_disp + max_net + net_disp + net_speed + max_step + 
                             min_step + sd_step + sd_gross_speed + max_gross_speed + min_gross_speed, data=training_data, importance=TRUE, proximity=TRUE)

The classification error among monocultures is very low with about 3 percent and the following confusion matrix shows which species get most often confounded:

print(rf_fit_all$confusion)

     C   P   T class.error
C 1006   6   5 0.010816126
P   10 410   0 0.023809524
T    4   0 512 0.007751938

We then use the randomForest classification model to predict species identities in the mixed cultures.

morph_mvt$predict_spec <- predict(rf_fit_all, 
    morph_mvt)
take_all <- as.data.table(morph_mvt)
take_all <- take_all[, list(id, predict_spec)]
setkey(take_all, id)
setkey(trajectory.data.filtered, id)
data <- trajectory.data.filtered[take_all]

# create species variable
data$species <- ifelse(data$predict_spec == 
    "P", "Paramecium", ifelse(data$predict_spec == 
    "C", "Colpidium", ifelse(data$predict_spec == 
    "T", "Tetrahymena", "unknown")))

Counts

Finally, we aggregate the data for each experimental unit. For the mixed cultures we can count the number of individuals per replicate, frame and species, and then take the mean of these counts over the whole video.

suppressMessages(library(dplyr))
# calculate automatically the densities
# from video

count_per_frame <- data %>% filter(treatment == 
    "PCT") %>% group_by(file, treatment, 
    replicate, sample, species, frame) %>% 
    summarise(count = length(trajectory)) %>% 
    # extrapolate to density per ml
mutate(dens.ml = count * 13.84)

mean_density_per_ml <- count_per_frame %>% 
    group_by(file, treatment, replicate, 
        sample, species) %>% summarise(mean.dens.ml = mean(dens.ml))
mean_density_per_ml

Source: local data table [30 x 6]
Groups: file, treatment, replicate, sample

        file treatment replicate sample
       (chr)    (fctr)     (int)  (int)
1  Data00061       PCT         1      1
2  Data00061       PCT         1      1
3  Data00061       PCT         1      1
4  Data00063       PCT         1      2
5  Data00063       PCT         1      2
6  Data00063       PCT         1      2
7  Data00065       PCT         1      3
8  Data00065       PCT         1      3
9  Data00065       PCT         1      3
10 Data00067       PCT         1      4
..       ...       ...       ...    ...
Variables not shown: species (chr),
  mean.dens.ml (dbl)

Likewise, we can use the predicted species identities to look at the cell size distribution in the mixed cultured.

scale_fill_discrete2 <- function(...) scale_fill_brewer(palette = "Set1")
ggplot(data = morph_mvt, aes(x = mean_area, 
    fill = predict_spec)) + geom_histogram(position = "identity", 
    alpha = 0.5) + theme_bw() + xlab("Area (in micrometer)") + 
    ylab("Counts") + theme(panel.grid.major = element_blank(), 
    panel.grid.minor = element_blank(), axis.text.x = element_text(size = 14), 
    axis.title.x = element_text(size = 18), 
    axis.text.y = element_text(size = 14), 
    axis.title.y = element_text(angle = 90, 
        size = 18, vjust = 0.25), legend.key = element_blank(), 
    legend.title = element_blank(), legend.text = element_text(size = 12), 
    legend.direction = "horizontal", legend.position = "top", 
    strip.text.x = element_text(face = "italic", 
        size = 14), strip.text.y = element_text(size = 14)) + 
    scale_fill_discrete2() + scale_y_log10(breaks = c(1, 
    10, 100)) + scale_x_log10() + coord_cartesian(ylim = c(1, 
    1000))
# ggsave('/Users/Frank/Dropbox/bemovi
# demo/figures/size_distribution.png',
# dpi=300)

Cell size distribution of different species in the mixed culture.