DROMA_Set is a comprehensive R package for managing and analyzing drug response and omics data across multiple projects. It provides a robust framework for handling complex multi-omics datasets with integrated drug sensitivity information, enabling seamless cross-project comparisons and analyses.
It is a part of DROMA project. Visit the official DROMA website for comprehensive documentation and interactive examples.
- π¬ Multi-omics Data Management: Support for various molecular profile types (mRNA, CNV, mutations, methylation, proteomics)
- π Drug Response Integration: Comprehensive treatment response data handling and analysis
- π Cross-Project Analysis: Advanced tools for comparing and analyzing data across multiple projects
- π Sample Overlap Detection: Automatic identification and analysis of overlapping samples between projects
- ποΈ Database Integration: Robust SQLite database connectivity with efficient data storage and retrieval
- π Flexible Data Loading: Smart data loading with filtering by data type, tumor type, and specific features
- π― Metadata Management: Comprehensive sample and treatment metadata handling with ProjectID tracking
# Install devtools if you haven't already
if (!requireNamespace("devtools", quietly = TRUE)) {
install.packages("devtools")
}
# Install DROMA_Set
devtools::install_github("mugpeng/DROMA_Set")
The package requires the following R packages:
DBI
(>= 1.1.0)RSQLite
(>= 2.2.0)methods
These will be automatically installed when you install DROMA_Set.
library(DROMA.Set)
# Connect to your DROMA database
connectDROMADatabase("path/to/your/droma.sqlite")
# List available projects
projects <- listDROMAProjects()
print(projects)
# Create a single DromaSet for one project
gCSI <- createDromaSetFromDatabase("gCSI", "path/to/droma.sqlite")
# Create a MultiDromaSet for multiple projects
multi_set <- createMultiDromaSetFromDatabase(
project_names = c("gCSI", "CCLE"),
db_path = "path/to/droma.sqlite"
)
# Load molecular profiles
gCSI <- loadMolecularProfiles(gCSI, molecular_type = "mRNA",
features = c("BRCA1", "BRCA2", "TP53"))
# Load treatment response data
gCSI <- loadTreatmentResponse(gCSI, drugs = c("Tamoxifen", "Cisplatin"))
# Cross-project molecular analysis
mRNA_data <- loadMultiProjectMolecularProfiles(multi_set,
molecular_type = "mRNA",
overlap_only = FALSE)
# Cross-project treatment response analysis
drug_data <- loadMultiProjectTreatmentResponse(multi_set,
drugs = c("Tamoxifen", "Cisplatin"),
overlap_only = FALSE)
The DromaSet
class represents a single project's drug response and omics data:
# Create DromaSet
dataset <- createDromaSetFromDatabase("project_name", "database.sqlite")
# Load all molecular profiles
dataset <- loadMolecularProfiles(dataset, molecular_type = "all")
# Check available data types
availableMolecularProfiles(dataset)
availableTreatmentResponses(dataset)
Key Methods:
loadMolecularProfiles()
: Load omics data (mRNA, CNV, mutations, etc.)loadTreatmentResponse()
: Load drug sensitivity dataavailableMolecularProfiles()
: List available molecular data typesavailableTreatmentResponses()
: List available treatment response types
The MultiDromaSet
class manages multiple projects for cross-project analysis:
# Create MultiDromaSet
multi_set <- createMultiDromaSetFromDatabase(c("gCSI", "CCLE"), "database.sqlite")
# Find overlapping samples
overlap_info <- getOverlappingSamples(multi_set)
# Load molecular data across projects
mRNA_data <- loadMultiProjectMolecularProfiles(multi_set,
molecular_type = "mRNA")
# Load treatment response data across projects
drug_data <- loadMultiProjectTreatmentResponse(multi_set,
drugs = c("Tamoxifen", "Cisplatin"))
Key Methods:
getOverlappingSamples()
: Identify samples present in multiple projectsloadMultiProjectMolecularProfiles()
: Load molecular data across multiple projectsloadMultiProjectTreatmentResponse()
: Load treatment response data across multiple projectsgetDromaSet()
: Extract individual DromaSet from MultiDromaSetavailableProjects()
: List available projects
# Load all available molecular profile types
all_data <- loadMolecularProfiles(dataset, molecular_type = "all")
# Cross-project loading of all molecular types
all_cross_data <- loadMultiProjectMolecularProfiles(multi_set,
molecular_type = "all")
# Filter by data type and tumor type
filtered_data <- loadMolecularProfiles(dataset,
molecular_type = "mRNA",
data_type = "CellLine",
tumor_type = "breast cancer")
# Load specific features and samples
specific_data <- loadMolecularProfiles(dataset,
molecular_type = "mRNA",
features = c("BRCA1", "TP53"),
samples = c("sample1", "sample2"))
# Cross-project filtering by data type and tumor type
filtered_cross_data <- loadMultiProjectMolecularProfiles(multi_set,
molecular_type = "mRNA",
data_type = "CellLine",
tumor_type = "breast cancer",
overlap_only = FALSE)
# Connect to database
connectDROMADatabase("droma.sqlite")
# Add new data to database
updateDROMADatabase(expression_matrix, "new_project_mRNA")
# List all tables
tables <- listDROMADatabaseTables()
# Close connection
closeDROMADatabase()
# 1. Create MultiDromaSet
multi_set <- createMultiDromaSetFromDatabase(c("gCSI", "CCLE"))
# 2. Find overlapping samples
overlaps <- getOverlappingSamples(multi_set)
cat("Found", overlaps$overlap_count, "overlapping samples")
# 3. Load molecular data for overlapping samples
mRNA_data <- loadMultiProjectMolecularProfiles(multi_set,
molecular_type = "mRNA",
features = c("BRCA1", "BRCA2"),
overlap_only = FALSE,
data_type = "CellLine")
# 4. Load drug response data for overlapping samples
drug_data <- loadMultiProjectTreatmentResponse(multi_set,
drugs = c("Tamoxifen", "Cisplatin"),
overlap_only = FALSE,
data_type = "CellLine")
# 5. Perform correlation analysis
for (project in names(mRNA_data)) {
if (project %in% names(drug_data)) {
# Analyze correlations between gene expression and drug response
# Your analysis code here
}
}
- mRNA: Gene expression data
- cnv: Copy number variation data
- mutation_gene: Gene-level mutation data
- mutation_site: Site-specific mutation data
- fusion: Gene fusion data
- meth: DNA methylation data
- proteinrppa: Reverse-phase protein array data
- proteinms: Mass spectrometry proteomics data
- drug: Drug sensitivity/response data
The DROMA database uses a standardized table naming convention:
{project}_{datatype}
: Data tables (e.g.,gCSI_mRNA
,CCLE_drug
)sample_anno
: Sample metadata with ProjectID trackingdrug_anno
: Drug/treatment metadata with ProjectID trackingprojects
: Project summary information
Comprehensive examples are provided in the examples/
directory:
examples/produce_dromaset.R
: Basic DromaSet usageexamples/produce_multidromaset.R
: MultiDromaSet cross-project analysisexamples/produce_droma_database.R
: Database creation and management
- You may Use
overlap_only = TRUE
when loading cross-project data to focus on same samples - Specify
features
parameter to load only genes/drugs of interest - Use
return_data = TRUE
when you only need the data without updating the object - Filter by
data_type
andtumor_type
to reduce data loading time and focus on specific sample types - Load molecular profiles incrementally rather than using
molecular_type = "all"
for large datasets
We welcome contributions! Please see our contributing guidelines:
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature
) - Commit your changes (
git commit -m 'Add amazing feature'
) - Push to the branch (
git push origin feature/amazing-feature
) - Open a Pull Request
If you use DROMA_Set in your research, please cite:
Li, S., Peng, Y., Chen, M. et al. Facilitating integrative and personalized oncology omics analysis with UCSCXenaShiny. Commun Biol 7, 1200 (2024). https://doi.org/10.1038/s42003-024-06891-2
This project is licensed under the MPL-2 License - see the LICENSE file for details.
- π§ Email: yc47680@um.edu.mo
- π Issues: GitHub Issues
- π Documentation: Package Documentation
Refactor updateDROMADatabase and updateDROMAProjects functions to improve project tracking and metadata handling; enhance listDROMADatabaseTables to filter out backup tables and include created/updated dates; update documentation for new parameters in updateDROMAAnnotation function to support vector inputs for age, data type, and other attributes.
Enhancements Made: β Removed projects table auto-updates from updateDROMADatabase β Added _mutation_raw table exclusion across all relevant functions β Added dataset_type parameter to updateDROMAProjects β Enhanced updateDROMAAnnotation with vector support and created_date logic β Improved parameter validation and documentation
Add updateDROMAProjects function to manage project metadata in DROMA database; enhance listDROMADatabaseTables with feature and sample counts; minor adjustments in example script.
- Initial release
- DromaSet and MultiDromaSet classes
- Database integration and management
- Cross-project analysis capabilities
- Comprehensive molecular profile support
- Sample overlap detection and analysis
- Enhanced metadata management with ProjectID tracking
- Support for loading all molecular profile types with
molecular_type = "all"
- Split cross-project data loading into specialized functions:
loadMultiProjectMolecularProfiles()
for molecular dataloadMultiProjectTreatmentResponse()
for treatment response data
- Added
data_type
andtumor_type
filtering parameters for enhanced sample selection
DROMA_Set - Empowering multi-project drug response and omics analysis π§¬π