8000 GitHub - mugpeng/DROMA_Set: DROMASet is a comprehensive R package for managing and analyzing drug response and omics data across multiple projects. It provides a robust framework for handling complex multi-omics datasets with integrated drug sensitivity information, enabling seamless cross-project comparisons and analyses.
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

DROMASet is a comprehensive R package for managing and analyzing drug response and omics data across multiple projects. It provides a robust framework for handling complex multi-omics datasets with integrated drug sensitivity information, enabling seamless cross-project comparisons and analyses.

License

Notifications You must be signed in to change notification settings

mugpeng/DROMA_Set

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

22 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

DROMA_Set: Drug Response and Omics Multi-project Analysis Set

Website R License: mpl-2-0

Overview

DROMA_Set is a comprehensive R package for managing and analyzing drug response and omics data across multiple projects. It provides a robust framework for handling complex multi-omics datasets with integrated drug sensitivity information, enabling seamless cross-project comparisons and analyses.

It is a part of DROMA project. Visit the official DROMA website for comprehensive documentation and interactive examples.

Key Features

  • πŸ”¬ Multi-omics Data Management: Support for various molecular profile types (mRNA, CNV, mutations, methylation, proteomics)
  • πŸ’Š Drug Response Integration: Comprehensive treatment response data handling and analysis
  • πŸ”— Cross-Project Analysis: Advanced tools for comparing and analyzing data across multiple projects
  • πŸ“Š Sample Overlap Detection: Automatic identification and analysis of overlapping samples between projects
  • πŸ—„οΈ Database Integration: Robust SQLite database connectivity with efficient data storage and retrieval
  • πŸ“ˆ Flexible Data Loading: Smart data loading with filtering by data type, tumor type, and specific features
  • 🎯 Metadata Management: Comprehensive sample and treatment metadata handling with ProjectID tracking

Installation

From GitHub (Recommended)

# Install devtools if you haven't already
if (!requireNamespace("devtools", quietly = TRUE)) {
    install.packages("devtools")
}

# Install DROMA_Set
devtools::install_github("mugpeng/DROMA_Set")

Dependencies

The package requires the following R packages:

  • DBI (>= 1.1.0)
  • RSQLite (>= 2.2.0)
  • methods

These will be automatically installed when you install DROMA_Set.

Quick Start

1. Load the Package

library(DROMA.Set)

2. Connect to Database

# Connect to your DROMA database
connectDROMADatabase("path/to/your/droma.sqlite")

# List available projects
projects <- listDROMAProjects()
print(projects)

3. Create DromaSet Objects

# Create a single DromaSet for one project
gCSI <- createDromaSetFromDatabase("gCSI", "path/to/droma.sqlite")

# Create a MultiDromaSet for multiple projects
multi_set <- createMultiDromaSetFromDatabase(
    project_names = c("gCSI", "CCLE"),
    db_path = "path/to/droma.sqlite"
)

4. Load and Analyze Data

# Load molecular profiles
gCSI <- loadMolecularProfiles(gCSI, molecular_type = "mRNA", 
                             features = c("BRCA1", "BRCA2", "TP53"))

# Load treatment response data
gCSI <- loadTreatmentResponse(gCSI, drugs = c("Tamoxifen", "Cisplatin"))

# Cross-project molecular analysis
mRNA_data <- loadMultiProjectMolecularProfiles(multi_set, 
                                              molecular_type = "mRNA",
                                              overlap_only = FALSE)

# Cross-project treatment response analysis
drug_data <- loadMultiProjectTreatmentResponse(multi_set,
                                              drugs = c("Tamoxifen", "Cisplatin"),
                                              overlap_only = FALSE)

Core Classes

DromaSet Class

The DromaSet class represents a single project's drug response and omics data:

# Create DromaSet
dataset <- createDromaSetFromDatabase("project_name", "database.sqlite")

# Load all molecular profiles
dataset <- loadMolecularProfiles(dataset, molecular_type = "all")

# Check available data types
availableMolecularProfiles(dataset)
availableTreatmentResponses(dataset)

Key Methods:

  • loadMolecularProfiles(): Load omics data (mRNA, CNV, mutations, etc.)
  • loadTreatmentResponse(): Load drug sensitivity data
  • availableMolecularProfiles(): List available molecular data types
  • availableTreatmentResponses(): List available treatment response types

MultiDromaSet Class

The MultiDromaSet class manages multiple projects for cross-project analysis:

# Create MultiDromaSet
multi_set <- createMultiDromaSetFromDatabase(c("gCSI", "CCLE"), "database.sqlite")

# Find overlapping samples
overlap_info <- getOverlappingSamples(multi_set)

# Load molecular data across projects
mRNA_data <- loadMultiProjectMolecularProfiles(multi_set, 
                                              molecular_type = "mRNA")

# Load treatment response data across projects
drug_data <- loadMultiProjectTreatmentResponse(multi_set,
                                              drugs = c("Tamoxifen", "Cisplatin"))

Key Methods:

  • getOverlappingSamples(): Identify samples present in multiple projects
  • loadMultiProjectMolecularProfiles(): Load molecular data across multiple projects
  • loadMultiProjectTreatmentResponse(): Load treatment response data across multiple projects
  • getDromaSet(): Extract individual DromaSet from MultiDromaSet
  • availableProjects(): List available projects

Advanced Features

1. Load All Molecular Profiles

# Load all available molecular profile types
all_data <- loadMolecularProfiles(dataset, molecular_type = "all")

# Cross-project loading of all molecular types
all_cross_data <- loadMultiProjectMolecularProfiles(multi_set,
                                                   molecular_type = "all")

2. Sample and Data Filtering

# Filter by data type and tumor type
filtered_data <- loadMolecularProfiles(dataset,
                                      molecular_type = "mRNA",
                                      data_type = "CellLine",
                                      tumor_type = "breast cancer")

# Load specific features and samples
specific_data <- loadMolecularProfiles(dataset,
                                      molecular_type = "mRNA",
                                      features = c("BRCA1", "TP53"),
                                      samples = c("sample1", "sample2"))

# Cross-project filtering by data type and tumor type
filtered_cross_data <- loadMultiProjectMolecularProfiles(multi_set,
                                                        molecular_type = "mRNA",
                                                        data_type = "CellLine",
                                                        tumor_type = "breast cancer",
                                                        overlap_only = FALSE)

3. Database Management

# Connect to database
connectDROMADatabase("droma.sqlite")

# Add new data to database
updateDROMADatabase(expression_matrix, "new_project_mRNA")

# List all tables
tables <- listDROMADatabaseTables()

# Close connection
closeDROMADatabase()

4. Cross-Project Analysis Workflow

# 1. Create MultiDromaSet
multi_set <- createMultiDromaSetFromDatabase(c("gCSI", "CCLE"))

# 2. Find overlapping samples
overlaps <- getOverlappingSamples(multi_set)
cat("Found", overlaps$overlap_count, "overlapping samples")

# 3. Load molecular data for overlapping samples
mRNA_data <- loadMultiProjectMolecularProfiles(multi_set,
                                              molecular_type = "mRNA",
                                              features = c("BRCA1", "BRCA2"),
                                              overlap_only = FALSE,
                                              data_type = "CellLine")

# 4. Load drug response data for overlapping samples
drug_data <- loadMultiProjectTreatmentResponse(multi_set,
                                              drugs = c("Tamoxifen", "Cisplatin"),
                                              overlap_only = FALSE,
                                              data_type = "CellLine")

# 5. Perform correlation analysis
for (project in names(mRNA_data)) {
    if (project %in% names(drug_data)) {
        # Analyze correlations between gene expression and drug response
        # Your analysis code here
    }
}

Data Types Supported

Molecular Profiles

  • mRNA: Gene expression data
  • cnv: Copy number variation data
  • mutation_gene: Gene-level mutation data
  • mutation_site: Site-specific mutation data
  • fusion: Gene fusion data
  • meth: DNA methylation data
  • proteinrppa: Reverse-phase protein array data
  • proteinms: Mass spectrometry proteomics data

Treatment Response

  • drug: Drug sensitivity/response data

Database Structure

The DROMA database uses a standardized table naming convention:

  • {project}_{datatype}: Data tables (e.g., gCSI_mRNA, CCLE_drug)
  • sample_anno: Sample metadata with ProjectID tracking
  • drug_anno: Drug/treatment metadata with ProjectID tracking
  • projects: Project summary information

Examples

Comprehensive examples are provided in the examples/ directory:

  • examples/produce_dromaset.R: Basic DromaSet usage
  • examples/produce_multidromaset.R: MultiDromaSet cross-project analysis
  • examples/produce_droma_database.R: Database creation and management

Performance Tips

  1. You may Use overlap_only = TRUE when loading cross-project data to focus on same samples
  2. Specify features parameter to load only genes/drugs of interest
  3. Use return_data = TRUE when you only need the data without updating the object
  4. Filter by data_type and tumor_type to reduce data loading time and focus on specific sample types
  5. Load molecular profiles incrementally rather than using molecular_type = "all" for large datasets

Contributing

We welcome contributions! Please see our contributing guidelines:

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

Citation

If you use DROMA_Set in your research, please cite:

Li, S., Peng, Y., Chen, M. et al. Facilitating integrative and personalized oncology omics analysis with UCSCXenaShiny. Commun Biol 7, 1200 (2024). https://doi.org/10.1038/s42003-024-06891-2

License

This project is licensed under the MPL-2 License - see the LICENSE file for details.

Support

Changelog

Version 0.4.4

Refactor updateDROMADatabase and updateDROMAProjects functions to improve project tracking and metadata handling; enhance listDROMADatabaseTables to filter out backup tables and include created/updated dates; update documentation for new parameters in updateDROMAAnnotation function to support vector inputs for age, data type, and other attributes.

Enhancements Made: βœ… Removed projects table auto-updates from updateDROMADatabase βœ… Added _mutation_raw table exclusion across all relevant functions βœ… Added dataset_type parameter to updateDROMAProjects βœ… Enhanced updateDROMAAnnotation with vector support and created_date logic βœ… Improved parameter validation and documentation

Version 0.4.3

Add updateDROMAProjects function to manage project metadata in DROMA database; enhance listDROMADatabaseTables with feature and sample counts; minor adjustments in example script.

Version 0.4.1

  • Initial release
  • DromaSet and MultiDromaSet classes
  • Database integration and management
  • Cross-project analysis capabilities
  • Comprehensive molecular profile support
  • Sample overlap detection and analysis
  • Enhanced metadata management with ProjectID tracking
  • Support for loading all molecular profile types with molecular_type = "all"
  • Split cross-project data loading into specialized functions:
    • loadMultiProjectMolecularProfiles() for molecular data
    • loadMultiProjectTreatmentResponse() for treatment response data
  • Added data_type and tumor_type filtering parameters for enhanced sample selection

DROMA_Set - Empowering multi-project drug response and omics analysis πŸ§¬πŸ’Š

About

DROMASet is a comprehensive R package for managing and analyzing drug response and omics data across multiple projects. It provides a robust framework for handling complex multi-omics datasets with integrated drug sensitivity information, enabling seamless cross-project comparisons and analyses.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

0