GenesLikeMe GenesLikeMe | Functional Patterns

About Genes Like Me (formerly GeneDecks Partner Hunter)

Genes Like Me is a novel analysis tool which provides a similarity metric by highlighting shared descriptors between genes, based on the rich annotation within the GeneCards compendium of human genes.

Users supply a query gene, and the system finds putative functional paralogs, namely genes that are similar to the query gene based on combinatorial similarity of attribute annotations.

Genes Like Me Algorithm

Genes Like Me calculates similarity scores between each query gene and all remaining candidate genes in the GeneCards database for 8 attributes that appear in table 1. For all attributes except Gene Ontology, and sequence paralogy, the similarity score between a query gene and a candidate gene is calculated in the following manner: each descriptor score (DS) is the result of dividing its rank by Log₁₀ of its frequency in the database

Descriptor ranks are each assigned the value of 1, except for those associated with the Gene Ontology (GO) attribute, which are assigned the descriptor's evidence code (Buza et al. 2008); for example Inferred from Direct Assay (IDA) will receive a descriptor score of 5

The attribute score (AS) is the sum of the descriptor scores for those descriptors shared by both the query gene and the candidate gene, divided by the sum of the descriptor scores for all descriptors associated with the query gene

For the sequence paralogy attribute, if a partner candidate is also identified as a sequence paralog (SP), then it is assigned a value of 1 for this attribute and 0 otherwise

Gene expression data was mined from BioGPS (http://biogps.org/). The similarity score is the mean Pearson correlation (P.Corr) between all expression vectors for the query gene and candidate gene

This improves finding Genes Like Me for expression patterns, since it looks for vector correlations rather than binary expression pattern exact matches and is therefore less stringent.

The attribute score is then multiplied by the weight given for the attribute and all attribute scores are then summed to give the Genes Like Me score (PHS)

Table 1

The attributes used in Genes Like Me algorithms with their contributing data sources.

Attribute	Data Source
Sequence paralogy	Ensembl HomoloGene
Domains	InterPro (Ensembl) Blocks
Super Pathways	GeneCards
Expression patterns	BioGPS
Phenotypes	Mouse Genome Informatics (MGI)
Compounds	Tocris Bioscience Human Metabolome Database(HMDB) BitterDB DrugBank Novoseek (formerly Alma Knowledge Server) PharmGKB FDA Approved Drugs DGIdb ClinicalTrials ApexBio
Disorders	MalaCards On-line Mendelian Inheritance in Man(OMIM) UniProtKB University of Copenhagen DISEASES Novoseek (formerly Alma Knowledge Server) GENATLAS GeneTests (formerly GeneClinics) The Breast Cancer Gene Database (BCGD)
Gene Ontology	Entrez Gene (National Center for Biotechnology Information - NCBI) Ensembl