A copy number variation (CNV)
is when the number of copies of a particular gene varies from one individual
to the next. Following the completion of the Human Genome Project, it became
apparent that the genome experiences gains and losses of genetic material.
The extent to which copy number variation contributes to human disease is
not yet known. It has long been recognized that some cancers are associated
with elevated copy numbers of particular genes.
The goal of the Dependency Map (DepMap)
portal is to empower the research community to make discoveries related to
cancer vulnerabilities by providing open access to key cancer dependencies
analytical and visualization tools.
DepMap Copy Number data
In order to process DepMap Expression data we need
to download the follwoing datasets from DepMap website.
Cell Line Sample Info
Static primary key assigned by DepMap to each cell line
Cell line name with alphanumeric characters only
Previous naming system that used the stripped cell line name followed by
the lineage; no longer assigned to new cell lines
cell line identifiers (not a comprehensive list)
line ID used in Cosmic cancer database
sex: Sex of tissue
donor if known
source: Source of
cell line vial used by DepMap
Number of replicates used in Achilles CRISPR screen passing QC
Difference in the means of positive and negative controls normalized by
the standard deviation of the negative control distribution
Growth pattern of cell line (Adherent, Suspension, Mixed adherent and
suspension, 3D, or Adherent (requires laminin coating))
Medium used to grow cell line
Percentage of cells remaining GFP negative on days 12-14 of cas9
activity assay as measured by FACs
research resource identifier
Tissue collection site
Indicates whether tissue sample is from primary or metastatic site
General cancer lineage category
Subtype of disease; specific disease name
age: If known, age
of tissue donor at time of sample collection
Sanger Institute Cell Model Passport ID
Cancer type classifications in a standardized form
Gene level copy number data, log2
transformed with a pseudo count of 1. This is generated by mapping genes
onto the segment level calls.
Rows: cell lines
Columns: genes (HGNC
symbol and Entrez ID)
1750 Cell Lines
35 Primary Diseases
Not all DepMap_IDs in "sample_info.csv"
file are present in "CCLE_gene_cn.csv" file. Moreover, it is better to
have a separate file for features/genes/probes based on the following data
model. You can download a file by clicking on its file name.