1 Getting started

1.1 Intruduction

The talusR package provides a workflow for Talus mass-spectrum data—import, QC, differential analysis, and visualization. But it assumes that your data have already been processed by DIA-NN (e.g. *.tsv or *.csv outputs containing normalized, protein-level intensity values). It further expects one sample per mass-spectrometer run (with DIA-NN normalization applied across all runs), and that your sample metadata table includes a run column whose entries exactly match the sample names in the intensity signal file.

Key features include:

  • Data import & structure
    Import DIA-NN–processed protein intensities into S4 objects:

    • TalusDataSet for a whole dataset
    • TalusDataSetList (a SimpleList of TalusDataSet) for per-fraction splits

    Each object carries sample metadata and feature annotations.

  • Quality control: Run PCA or other multivariate checks to confirm that technical replicates and fractions cluster as expected.

  • Differential analysis: use a simple wrapper around matrixTests to t-test and limma to fit linear models, define contrasts, apply independent filtering by adjusted p-value or log-fold change thresholds, and return results in TalusResult or TalusResultList objects.

  • Visualization: Create PCA scatterplots, volcano plots, per-protein intensity plots.

  • Bioconductor integration: Built on SummarizedExperiment and S4Vectors, talusR classes inherit the generics and methods of the Bioconductor ecosystem, for convenient plug into annotation, pathway analysis, or reporting tools.

1.2 Installation

# if you've installed the Bioconductor dependencies, then
devtools::install_github('chaochaowong/talusR',
                         build_vignettes = TRUE)

# if haven't installed the Bioconductor dependencies, then
BiocManager::install("chaochaowong/talusR",
                     dependencies = TRUE
                     build_vignettes = TRUE
)

1.3 Loading libraries

library(talusR)
library(ggplot2)
library(dplyr)

2 Import Talus data

The core function read_talus() imports protein-level intensity signals for whole dataset or, if split_by_fraction = TRUE, splits by fractions.

2.1 Whole dataset

When split_by_fraction = FALSE, read_talus() returns a single TalusDataSet instance. This mode is ideal when:

  • Your data have no fraction annotations
  • You want to run PCA-based QC to verify that protein-level signals from each fraction cluster as expected
# get paths
protein_file <- system.file('extdata', 'test_data.tsv', 
                            package = 'talusR')
meta_file    <- system.file('extdata', 'test_meta.csv',
                            package = 'talusR')
# without split by fraction -> return TalusDataSet
tds  <- talusR::read_talus(file = protein_file,
                           meta_file = meta_file,
                           which_proteinid = "Protein.Ids",
                           which_run = "Run",
                           remove_few_measurements = TRUE,
                           split_by_fraction = FALSE)
tds
#> class: TalusDataSet 
#> dim: 1500 36 
#> metadata(3): intensity_group metric log_transform
#> assays(1): abundance
#> rownames(1500): P82675;P82675-2 Q6ZSR9 ... Q5JTZ9
#>   Q9H078;Q9H078-2;Q9H078-3
#> rowData names(5): Protein.Group Protein.Ids Protein.Names Genes
#>   First.Protein.Description
#> colnames(36): 240805_Sarthy_DMSO_chrom_03.d 240805_Sarthy_DMSO_nuc_05.d
#>   ... 240805_Sarthy_DMSO_cyto_06.d 240805_Sarthy_Etopo_cyto_02.d
#> colData names(3): Run Tx Frx
#> intensity_group: protein 
#> metric:          DIA-NN 
#> log_transform:   log2

2.2 Per-fraction splits

When split_by_fraction = TRUE, read_talus() returns a TalusDataSetList instance—a SimpleList of TalusDataSet instances, one per fraction, each holding its protein-level signals.

tdsl <- read_talus(file = protein_file,
                   meta_file = meta_file,
                   which_proteinid = "Protein.Ids",
                   which_fraction = "Frx",
                   which_sequence = NA,
                   which_run = "Run",
                   remove_few_measurements = TRUE,
                   split_by_fraction = TRUE,
                   log_transform = 'log2',
                   intensity_group = "protein",
                   metric = "DIA-NN")

tdsl
#> TalusDataSetList of length 3
#> names(3): chrom cyto nuc
#> 
#> [[1]] — name: chrom
#> class: TalusDataSet 
#> dim: 1269 12 
#> metadata(3): intensity_group metric log_transform
#> assays(1): abundance
#> rownames(1269): P82675;P82675-2 Q6ZSR9 ... Q5JTZ9
#>   Q9H078;Q9H078-2;Q9H078-3
#> rowData names(5): Protein.Group Protein.Ids Protein.Names Genes
#>   First.Protein.Description
#> colnames(12): 240805_Sarthy_DMSO_chrom_03.d
#>   240805_Sarthy_DMSO_chrom_01.d ... 240805_Sarthy_Etopo_chrom_02.d
#>   240805_Sarthy_DMSO_chrom_04.d
#> colData names(3): Run Tx Frx
#> intensity_group: protein 
#> metric:          DIA-NN 
#> log_transform:   log2 
#> 
#> [[2]] — name: cyto
#> class: TalusDataSet 
#> dim: 1435 12 
#> metadata(3): intensity_group metric log_transform
#> assays(1): abundance
#> rownames(1435): P82675;P82675-2 Q6ZSR9 ... Q5JTZ9
#>   Q9H078;Q9H078-2;Q9H078-3
#> rowData names(5): Protein.Group Protein.Ids Protein.Names Genes
#>   First.Protein.Description
#> colnames(12): 240805_Sarthy_DMSO_cyto_01.d 240805_Sarthy_DMSO_cyto_02.d
#>   ... 240805_Sarthy_DMSO_cyto_06.d 240805_Sarthy_Etopo_cyto_02.d
#> colData names(3): Run Tx Frx
#> intensity_group: protein 
#> metric:          DIA-NN 
#> log_transform:   log2 
#> 
#> [[3]] — name: nuc
#> class: TalusDataSet 
#> dim: 1226 12 
#> metadata(3): intensity_group metric log_transform
#> assays(1): abundance
#> rownames(1226): P82675;P82675-2 Q6ZSR9 ... Q5JTZ9
#>   Q9H078;Q9H078-2;Q9H078-3
#> rowData names(5): Protein.Group Protein.Ids Protein.Names Genes
#>   First.Protein.Description
#> colnames(12): 240805_Sarthy_DMSO_nuc_05.d 240805_Sarthy_DMSO_nuc_01.d
#>   ... 240805_Sarthy_Etopo_nuc_01.d 240805_Sarthy_Etopo_nuc_03.d
#> colData names(3): Run Tx Frx
#> intensity_group: protein 
#> metric:          DIA-NN 
#> log_transform:   log2

3 QC

Use plot_pca() to assess sample clustering and variance structure.

3.1 TalusDataSet

Run PCA on the full dataset to verify that protein-level signals from each fraction cluster as expected.

talusR::plot_pca(tds, color_by='Frx')
PCA on TalusDataSet.

Figure 1: PCA on TalusDataSet

3.2 TalusDataSetList

When using per-fraction splits, perform PCA on each fraction independently:

plot_pca(tdsl, color_by='Tx')
PCA performed separately on each fraction in the TalusDataSetList instance.

Figure 2: PCA performed separately on each fraction in the TalusDataSetList instance

4 Differential analysis and visualization

Core statistical methods:

  • talus_limma: a wrapper function for limma::lmFit() to fit linear models and test contrasts.
  • talus_row_t_welch: a wrapper function for matrixTests::row_t_welch() to perform per-protein Welch’s t-tests.

Summary & visualization:

  • summary_talus: summarize differential results, flag significant proteins, and return a data.frame of significant proteins.
  • plot_volcano: create a volcano plot of log-fold changes vs adjusted p-values.
  • plot_per_protein: display protein intensity profiles across fractions or conditions.

4.1 limma

Use talus_limma() to fit linear models with empirical Bayesian and moderate t-statistics across conditions (e.g. drug treatment) to perform differential analysis.

Here the condition group is Tx: DMSO, etopo, and doxo. Set the factor levels so that DMSO is the reference:

tx_levels <- c('DMSO', 'Etopo', 'Doxo')
tdsl <- endoapply(tdsl, function(x) {
  colData(x)$Tx <- factor(colData(x)$Tx, levels = tx_levels)
  x
})

Next, run talus_limma() with a ~ 0 + Tx design formula to compare each treatment against the DMSO control:

res_limma <- talus_limma(tdsl, design = ~ 0 + Tx)

Finally, summarize and filter for significant proteins with with adjusted p-value < 0.05 and |logâ‚‚ fold change| > 0.5.

limma_filt <- summary_talus(res_limma[['chrom']], 
                            alpha = 0.05, 
                            lfc_threshold = 0.5)
#> 
#> TxEtopo_vs_TxDMSO :
#> 
#> Threshold: alpha < 0.05; |lfc_threshod| > 0.5
#> Total features: 1269
#> Up-regulated:   1
#> Down-regulated: 1
#> 
#> 
#> TxDoxo_vs_TxDMSO :
#> 
#> Threshold: alpha < 0.05; |lfc_threshod| > 0.5
#> Total features: 1269
#> Up-regulated:   2
#> Down-regulated: 0
limma_filt
#> $TxEtopo_vs_TxDMSO
#>                                  id      logFC  AveExpr          t      P.Value
#> 1 P11388;P11388-2;P11388-3;P11388-4 -0.8769478 18.53434 -13.717557 5.507541e-08
#> 2                            P27694  0.8630436 16.38714   8.934821 3.400613e-06
#>      adj.P.Val        B                     Protein.Group
#> 1 6.614556e-05 8.195588 P11388;P11388-2;P11388-3;P11388-4
#> 2 2.042068e-03 4.853289                            P27694
#>                         Protein.Ids Protein.Names Genes
#> 1 P11388;P11388-2;P11388-3;P11388-4   TOP2A_HUMAN TOP2A
#> 2                            P27694    RFA1_HUMAN  RPA1
#>                          First.Protein.Description
#> 1                        DNA topoisomerase 2-alpha
#> 2 Replication protein A 70 kDa DNA-binding subunit
#> 
#> $TxDoxo_vs_TxDMSO
#>       id    logFC  AveExpr         t      P.Value    adj.P.Val        B
#> 1 Q9H8H0 1.298674 14.40518 12.532423 1.339523e-07 0.0001615464 7.586870
#> 2 Q9UMY1 1.006518 14.53278  8.657043 4.554389e-06 0.0027462964 4.613002
#>   Protein.Group Protein.Ids Protein.Names Genes First.Protein.Description
#> 1        Q9H8H0      Q9H8H0   NOL11_HUMAN NOL11      Nucleolar protein 11
#> 2        Q9UMY1      Q9UMY1    NOL7_HUMAN  NOL7       Nucleolar protein 7

Visualize the statistiics by plot_volcano():

plot_volcano(res_limma[['chrom']], 
             alpha = 0.05,
             lfc_threshold = 0.5, 
             use_adjP = TRUE, 
             label_top_n=10)

4.2 t-test

Apply talus_row_t_welch() to perform per-protein Welch’s t-tests on the TalusDataSetList:

res_t <- talus_row_t_welch(tdsl, design = ~ 0 + Tx)
names(res_t)
#> [1] "chrom" "cyto"  "nuc"

Filter significant proteins using summary_talus() (adjusted p-value < 0.05, |logâ‚‚FC| > 0.5):

# show contrast level
cat('\nContrast levels:\n')
#> 
#> Contrast levels:
names(res_t[['chrom']])
#> [1] "TxEtopo-vs-TxDMSO" "TxDoxo-vs-TxDMSO"

# summarize the result per fraction
ttest_filt <- summary_talus(res_t[['chrom']], 
                            alpha = 0.05, 
                            lfc_threshold = 0.5)
#> 
#> TxEtopo-vs-TxDMSO :
#> 
#> Threshold: alpha < 0.05; |lfc_threshod| > 0.5
#> Total features: 1269
#> Up-regulated:   1
#> Down-regulated: 1
#> 
#> 
#> TxDoxo-vs-TxDMSO :
#> 
#> Threshold: alpha < 0.05; |lfc_threshod| > 0.5
#> Total features: 1269
#> Up-regulated:   1
#> Down-regulated: 0

Visualize the statistics by plot_volcano().

# try to re-arrange the contrast_levels
plot_volcano(res_t[['chrom']], 
             alpha = 0.05,
             lfc_threshold = 0.5, 
             use_adjP = TRUE,
             contrast_levels = names(res_t[['chrom']])[c(2, 1)],
             label_top_n=10)

4.3 Per-protein plot

Extract a protein ID (e.g. TOP2A) and plot the intensities across conditions (e.g., fractions and treatments):

protein_id <- as.data.frame(rowData(tdsl[[1]])) %>% 
  dplyr::filter(Genes == 'TOP2A') %>% pull(Protein.Group)

plot_per_protein(tdsl, 
                 protein_id = 'P11388;P11388-2;P11388-3;P11388-4',
                 group_by = 'Tx') +
  labs(title = 'TOP2A')

Or just for chrom fraction only:

plot_per_protein(tdsl[['chrom']], 
                 protein_id = 'P11388;P11388-2;P11388-3;P11388-4',
                 group_by = 'Tx') +
  labs(title = 'TOP2A')

5 sessionInfo

sessionInfo()
#> R version 4.4.0 (2024-04-24)
#> Platform: aarch64-apple-darwin20
#> Running under: macOS Ventura 13.6.1
#> 
#> Matrix products: default
#> BLAS:   /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRblas.0.dylib 
#> LAPACK: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.12.0
#> 
#> locale:
#> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
#> 
#> time zone: America/Los_Angeles
#> tzcode source: internal
#> 
#> attached base packages:
#> [1] stats4    stats     graphics  grDevices utils     datasets  methods  
#> [8] base     
#> 
#> other attached packages:
#>  [1] readr_2.1.5                 dplyr_1.1.4                
#>  [3] ggplot2_3.5.2               talusR_1.0.3               
#>  [5] SummarizedExperiment_1.34.0 Biobase_2.64.0             
#>  [7] GenomicRanges_1.56.2        GenomeInfoDb_1.40.1        
#>  [9] IRanges_2.38.1              S4Vectors_0.42.1           
#> [11] BiocGenerics_0.50.0         MatrixGenerics_1.16.0      
#> [13] matrixStats_1.5.0           knitr_1.50                 
#> [15] BiocStyle_2.32.1           
#> 
#> loaded via a namespace (and not attached):
#>  [1] gtable_0.3.6            xfun_0.52               bslib_0.9.0            
#>  [4] ggrepel_0.9.6.9999      lattice_0.22-6          tzdb_0.5.0             
#>  [7] vctrs_0.6.5             tools_4.4.0             generics_0.1.4         
#> [10] parallel_4.4.0          tibble_3.2.1            pkgconfig_2.0.3        
#> [13] Matrix_1.7-0            RColorBrewer_1.1-3      lifecycle_1.0.4        
#> [16] GenomeInfoDbData_1.2.12 compiler_4.4.0          farver_2.1.2           
#> [19] stringr_1.5.1           tinytex_0.57            statmod_1.5.0          
#> [22] htmltools_0.5.8.1       sass_0.4.10             yaml_2.3.10            
#> [25] tidyr_1.3.1             pillar_1.10.2           crayon_1.5.3           
#> [28] jquerylib_0.1.4         DelayedArray_0.30.1     cachem_1.1.0           
#> [31] limma_3.60.6            abind_1.4-8             tidyselect_1.2.1       
#> [34] digest_0.6.37           stringi_1.8.7           purrr_1.0.4            
#> [37] bookdown_0.43           labeling_0.4.3          fastmap_1.2.0          
#> [40] grid_4.4.0              cli_3.6.5               SparseArray_1.4.8      
#> [43] magrittr_2.0.3          S4Arrays_1.4.1          withr_3.0.2            
#> [46] UCSC.utils_1.0.0        scales_1.4.0            bit64_4.6.0-1          
#> [49] rmarkdown_2.29          XVector_0.44.0          httr_1.4.7             
#> [52] bit_4.6.0               matrixTests_0.2.3       hms_1.1.3              
#> [55] evaluate_1.0.3          rlang_1.1.6             Rcpp_1.0.14            
#> [58] glue_1.8.0              BiocManager_1.30.25     vroom_1.6.5            
#> [61] rstudioapi_0.17.1       jsonlite_2.0.0          R6_2.6.1               
#> [64] zlibbioc_1.50.0