Data used in "comboFM: leveraging multi-way interactions for systematic prediction of drug combination effects"
Factorization machines
Drug combinations
Machine learning
3. Good health
DOI:
10.5281/zenodo.3782333
Publication Date:
2020-05-02
AUTHORS (7)
ABSTRACT
This repository contains the data used in [1] for predicting the responses of drug combinations in cancer cell lines. The data comes from the NCI-ALMANAC dataset generated by the US National Cancer Institute (https://dtp.cancer.gov/ncialmanac/). The subset of data used in the experiments consists of 50 randomly sampled FDA-approved drugs, tested in 617 combinations and in various concentration pairs against all the 60 cell lines from NCI-60 panel. In this data subset, a total of 333 180 drug combination response measurements and 222 120 monotherapy response measurements of single drugs are available in the form of percentage growth of the cell lines. The implementation of the method proposed in [1] is available on: https://github.com/aalto-ics-kepaco/comboFM. Description of the files: NCI-ALMANAC_subset_555300.csv: Drug combination reseponse dataset used in the experiments (subset of NCI-ALMANAC_combinations_measured_across_all_cellines.csv). Each row represents a drug combination response measurement - there are 555 300 measurements in total, consisting of a total of 333 180 drug combination response measurements and 222 120 monotherapy response measurements of single drugs. The file has six columns: columns 'Conc1' and 'Conc2' represent the concentrations of the two drugs, columns 'Drug1' and 'Drug2' contain the drug names and 'CellLine' contains the ID of the cell line against which the drug combination was screened. The last column, 'PercentageGrowth', contains the measured responses in the form of percentage growth of the cell line. NCI-ALMANAC_combinations_measured_across_all_cellines.csv. Drug combination response dataset from which the subset was taken. This dataset is constructed based on the data available on the NCI-ALMANAC website (https://wiki.nci.nih.gov/display/NCIDTPdata/NCI-ALMANAC). From the dataset available on the website, a median across studies (experiment IDs) was taken and combinations with measurements across all cell lines were selected. Directory data contains drug combination responses and feature matrices needed to run the experiments, each row corresponding to a row in CombALMANAC_555300.csv. It contains the following files: cell_lines__one-hot_encoding.csv: One-hot encodings for the cell lines (555 300 rows, 60 columns). drug1__one-hot_encoding.csv: One-hot encodings for the first set of drugs (555 300 rows, 50 columns). drug2__one-hot_encoding.csv: One-hot encodings for the second set of drugs (555 300 rows, 50 columns). drug1_concentration__one-hot_encoding.csv: One-hot encodings for the concentrations of the first set of drugs (555 300 rows, 46 columns). drug2_concentration__one-hot_encoding.csv: One-hot encodings for the concentrations of the second set of drugs (555 300 rows, 46 columns). cell_lines__gene_expression.csv: Gene expression data for the cell lines, 0.05% of genes with the highest variance selected (555 300 rows, 78 columns). drug1__estate_fingerprints.csv: Estate fingerprints for the first set of drugs, bits with zero-variance removed (555 300 rows, 34 columns). drug2__estate_fingerprints.csv: Estate fingerprints for the second set of drugs, bits with zero-variance removed (555 300 rows, 34 columns). drug1_drug2_concentration__values.csv: Concentration values for the first and second set of drugs, both in the same file (555 300 rows, 2 columns). drug2_drug1_concentration__values.csv: Similar as above, but for different order of the drugs (555 300 rows, 2 columns). responses.csv: File that contains the drug combination responses (555 300 rows). Subdirectory additional_data contains additional data files based on which the features were constructed: NCI-60__gene_expression.txt: The full gene expression dataset obtained from cellmineR package. drugs__SMILES.csv: SMILES for the drug compounds for computing the fingerprints. drugs__estate_fingerprints.csv: Fingerprints (EState) for the drug compounds. Directory experimental_validation_data contains the results from the experimental validation, described in [1]. References: [1] Julkunen, H.; Cichonska, A.; Gautam, P.; Szedmak, S.; Douat, J.; Pahikkala, T.; Aittokallio, T. and Rousu, J. comboFM: leveraging multi-way interactions for systematic prediction of drug combination effects.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES ()
CITATIONS ()
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....