Users supply a query gene, and the system finds putative
functional paralogs, namely genes that are similar to the query gene based on combinatorial similarity
of attribute annotations.
Genes Like Me Algorithm
Genes Like Me calculates similarity scores between each query gene and all remaining candidate
genes in the
GeneCards database for 8 attributes that appear in
table 1.
For all attributes except Gene Ontology, and sequence paralogy,
the similarity score between a query gene and a candidate gene is calculated in the following
manner: each descriptor score (DS) is the result of dividing its rank by Log
10
of its frequency in the database

Descriptor ranks are each assigned the value of 1, except for those associated with the Gene
Ontology (GO) attribute, which are assigned the descriptor's evidence code (Buza et al. 2008);
for example Inferred from Direct Assay (IDA) will receive a descriptor score of 5

The attribute score (AS) is the sum of the descriptor scores for those descriptors shared by
both the query gene and the candidate gene, divided by the sum of the descriptor scores for all
descriptors associated with the query gene

For the sequence paralogy attribute, if a partner candidate is also identified as a sequence paralog (SP),
then it is assigned a value of 1 for this attribute and 0 otherwise

Gene expression data was mined from BioGPS (http://biogps.org/). The similarity score is the mean Pearson correlation (P.Corr) between all expression vectors for the query gene and candidate gene

This improves finding Genes Like Me for expression patterns, since it looks for vector correlations rather than binary expression
pattern exact matches and is therefore less stringent.
The attribute score is then multiplied by the weight given for the attribute and all attribute
scores are then summed to give the Genes Like Me score (PHS)
Table 1
The attributes used in
Genes Like Me algorithms with their contributing data sources.
Attribute |
Data Source |
Sequence paralogy |
|
Domains |
- InterPro (Ensembl)
- Blocks
|
Super Pathways |
|
Expression patterns |
|
Phenotypes |
- Mouse Genome Informatics (MGI)
|
Compounds |
- Tocris Bioscience
- Human Metabolome Database(HMDB)
- BitterDB
- DrugBank
- Novoseek (formerly Alma Knowledge Server)
- PharmGKB
- FDA Approved Drugs
- DGIdb
- ClinicalTrials
- ApexBio
|
Disorders |
- MalaCards
- On-line Mendelian Inheritance in Man(OMIM)
- UniProtKB
- University of Copenhagen DISEASES
- Novoseek (formerly Alma Knowledge Server)
- GENATLAS
- GeneTests (formerly GeneClinics)
- The Breast Cancer Gene Database (BCGD)
|
Gene Ontology |
- Entrez Gene (National Center for Biotechnology Information - NCBI)
- Ensembl
|