“Coiled-Coil Domain Containing 142”的意思、由来-开放百科全书

The coiled-coil domain containing 142 (CCDC142) is a gene which in humans encodes the CCDC142 protein. The CCDC142 gene is located on chromosome 2 (at 2p13), spans 4339 base pairs and contains 9 exons. The gene codes for the coiled-coil domain containing protein 142 (CCDC142), whose function is not yet well understood.^[1]^[2] There are two known isoforms of CCDC142.^[1] CCDC142 proteins produced from these transcripts range in size from 743 to 665 amino acids and contain signals suggesting protein movement between the cytosol and nucleus.^[3] Homologous CCDC142 genes are found in many animals including vertebrates and invertebrates but not fungus, plants, protists, archea, or bacteria.^[1] Although the function of this protein is not well understood, it contains a coiled-coil domain and a RINT1_TIP1 motif located within the coiled-coil domain.^[3]^[4]

Locus

CCDC142 is found on the – strand of chromosome 2 (2p13.1), with the genomic sequence spanning bases 74,472,832 to 74,483,230.^[1] The coding region is 8292 base pairs long, encoding for two protein isoforms 743 to 665 amino acids in length.^[1] On the telomeric side, CCDC142 is followed by the MOGS and MRPL53 genes. On the centromeric side, it is followed by the C31, LBX2, LBX2-AS1, and PCGF1 genes.^[1]

mRNA

In Homo sapiens, the CCDC142 gene encodes for two alternatively spliced isoforms of the mRNA, called isoform 1 and isoform 2.^[3] Both of these isoforms have 9 exons. Isoform 1 is the longer of the two, being 4339bp long, while isoform 2 is 2253bp long.^[3] The main difference between the isoforms that isoform 2 has a shorter exon 9 and 3' UTR.^[3] Isoform 1 is the longest variant of the gene and protein and is the subject of this article.^[1]

Conservation

Paralogs

Orthologs

Below is a table of a variety of orthologs of CCDC142 whose protein sequence identity was compared to the Homo sapiens protein amino acid sequence. CCDC142 has more than 73% amino acid similarity in mammals, but is less conserved in other vertebrates and in invertebrates.^[16]

Phylogeny

CCDC142 is closely related in mammals, mollusks and amphibians, reptiles and birds, and in fish.^[5] The CCDC142 gene goes as far back as Drosophila melanogaster, which split from the human lineage 847 million years ago. CCDC142 has mutated at a greater rate than both Cytochrome C (a highly conserved protein) and Fibrinogen A (a rapidly mutating protein). This indicates that CCDC142 is a rapidly mutating gene with an increasing rate of mutation (that is, evolution) over time.

Protein

Primary Structure, Variants, and Isoforms

The main isoform of the CCDC142 protein is 743 amino acids in length and the second isoform is 665 amino acids long. The difference in length is made entirely by amino acids missing from the C-terminus of isoform 2.^[1]

Domains and Motifs

The predicted coiled-coil domain of CCDC142 is from amino acids 308–719.^[2] A RINT1_TIP1 motif is also present from amino acids 490–621. RINT1_TIP1 is a family that includes RINT-1 (a protein involved in radiation-induced check point control) and TIP-1 (a yeast protein which is involved in Golgi transport).^[4] The extra ~250 amino acids found in the distant ortholog CCDC142 proteins are not found in the Homo sapiens genome the near CCDC142 gene.

Post-Translational Modifications

CCDC142 is predicted to have 6 phosphorylation sites, 4 methylation sites, 1 palmitoylation site, 1 sumoylation site, and 1 weak Nuclear Localization Signal.^[6]^[7]^[8]^[9]^[10] These modifications indicate that CCDC142 is localized to the nucleus and cytosol. Refer to the Conceptual Translation for annotations of these sites in the protein.

Structure prediction

Expression

Promoters and Regulatory Factors

The promoter region for CCDC142 was identified using the El Dorado program at Genomatix, it spans bases 74482896–74483908 in chromosome 2.^[14] This 1013bp region spans 1071–58bp upstream of the start codon of CCDC142.^[14] There is a region in the promoter which binds a large number of Krueppel-like transcription factors and BED zinc-finger proteins.^[14] This region has no single-nucleotide polymorphisms (SNPs) located in it.^[15] Many of the transcription factors that bind to the promoter region of CCDC142 have functions dealing with tumor suppression, neurogenesis, DNA damage, and photoreception.^[14] This promoter region also contains a mammalian C-type LTR TATA box which overlaps with the transcription start site of the gene.^[14]

RNA Binding Proteins

A number of possible RNA binding proteins bind to both the 3’ and 5’ untranslated regions (UTRs) of the CCDC142 mRNA. The PABPC1 and RBMX protein binding sites occur in high frequency in the 3’ UTR, with 49 and 21 sites respectively.^[16]

Expression

Function and Biochemistry

Composition

CCDC142 has a relatively typical distribution of amino acids compared to other Homo sapiens proteins.^[5] However, some variations are noted across orthologs.^[5] Leucine is present in large amounts relative to other proteins (at over 15% of the protein) and asparagine is present in low amounts relative to other proteins (at less than 0.7% of the protein).^[5]

The coiled-coil domain and RINT1_TP1 motif of CCDC142 contain higher amounts of leucine relative to the rest of the protein (at over 16.6% of the region), higher amounts of glutamine (at over 8.4% of the region), and similarly low amounts of asparagine (at less than 0.7% of the region).^[5]

Interacting Proteins

Clinical Significance

Pathology and Diseases

Copy number gain in the CCDC142 loci, including 25 other genes, showed a phenotype of developmental delay and significant developmental or morphological phenotypes.^[22] One result with a copy number loss in the CCDC142 loci, including 29 other genes, showed phenotypes of short stature, abnormal face shape, delayed speech and language development, overlapping toe, intrauterine growth retardation, patent ductus arteriosus, and delayed gross motor development.^[22] However, the effect of CCDC142 may have been confounded for these phenotypes since there were also abnormalities in many other genomic sections.

Mutations

There are a number of SNPs located in the CCDC142 gene. Some of these in the promoter region and 5’ UTR are within anchor sequences for transcription factors, and affect transcription factor binding if they are changed.

There are many SNPs in the protein's coding sequence which change CCDC142's amino acid composition. One SNP with a high prevalence rate in the population (1.8%) is notable for its change in chemistry, with a tyrosine to an asparagine shift at amino acid 548.^[15]

There are also numerous SNPs located in the large 3’ UTR of the gene, with many of these binding to areas containing stem loop structures in the mRNA. An SNP with a 7.7% prevalence rate (guanine to adenosine at bp4285) is in the 3’ UTR but not located in the conserved stem loop region.^[15]

These SNPs have been annotated in the Conceptual Translation located in the Protein section above.

Multiple Sequence Alignment

In the Multiple Sequence Alignment above (created using the CLUSTALW and TEXSHADE programs at SDSC Biology Workbench), organisms are labeled by the first letter of their genus and the first two letters of their species. The whole CCDC142 protein is highly conserved in mammals.^[5] The regions containing the Homo sapiens coiled-coil domain and the RINT1_TIP1 motif region are highly conserved in distant homologs.^[5] 12 of the 15 amino acids that match across all organisms in this region are nonpolar.^[5] Conserved Region 1 contains mostly nonpolar amino acids.^[5] Conserved Region 2 contains mostly nonpolar and basic amino acids. Conserved Region 3 contains both polar and nonpolar amino acids.^[5] Conserved Region 5 contains mostly nonpolar and basic amino acids.^[5]

Additional Transcription Factor Information

References

1. ^¹²³⁴⁵⁶⁷⁸{{Cite web|url=https://www.ncbi.nlm.nih.gov/gene/84865|title=CCDC142 coiled-coil domain containing 142 [Homo sapiens (human)] – Gene – NCBI|website=www.ncbi.nlm.nih.gov|access-date=2016-05-01}}
2. ^¹²{{Cite web|url=https://www.ncbi.nlm.nih.gov/protein/NP_116168.3|title=coiled-coil domain-containing protein 142 [Homo sapiens] – Protein – NCBI|website=www.ncbi.nlm.nih.gov|access-date=2016-05-01}}
3. ^¹²³⁴{{Cite web|url=https://www.uniprot.org/uniprot/Q17RM4|title=CCDC142 – Coiled-coil domain-containing protein 142 – Homo sapiens (Human) – CCDC142 gene & protein|website=www.uniprot.org|access-date=2016-05-01}}
4. ^¹{{Cite web|url=http://www.kegg.jp/ssdb-bin/ssdb_motif?kid=hsa:84865|title=SSDB Motif Search Result: hsa:84865|website=www.kegg.jp|access-date=2016-05-01}}
5. ^¹²³⁴⁵⁶⁷⁸⁹¹⁰¹¹{{Cite web|url=http://workbench.sdsc.edu/|title=SDSC Biology Workbench|last=|first=|date=|website=|publisher=|access-date=}}
6. ^{{Cite web|url=http://www.cbs.dtu.dk/services/NetPhos/|title=NetPhos 2.0 Server|website=www.cbs.dtu.dk|access-date=2016-05-01}}
7. ^{{Cite web|url=http://www.bioinfo.tsinghua.edu.cn/~tigerchen/memo.html|title=Memo:Protein Methylation Prediction|website=www.bioinfo.tsinghua.edu.cn|access-date=2016-05-01}}
8. ^{{Cite web|url=http://www.bioinfo.tsinghua.edu.cn/~tigerchen/NBA-Palm/prediction.php|title=:::NBA-Palm – Prediction of Palmitoylation Site Implemented In Naive Bayesian Algorithm:::|website=www.bioinfo.tsinghua.edu.cn|access-date=2016-05-01}}
9. ^{{Cite web|url=http://www.abgent.com/sumoplot|title=SUMOplot™ Analysis Program {{!}} Abgent|website=www.abgent.com|access-date=2016-05-01}}
10. ^{{Cite web|url=http://nls-mapper.iab.keio.ac.jp/|title=NLS_Mapper|website=nls-mapper.iab.keio.ac.jp|access-date=2016-05-01}}
11. ^¹{{Cite web|url=http://www.sbg.bio.ic.ac.uk/~phyre2/html/page.cgi?id=index|title=PHYRE2 Protein Fold Recognition Server|last=Kelley|first=Lawrence|website=www.sbg.bio.ic.ac.uk|access-date=2016-05-01}}
12. ^¹{{Cite web|url=http://toolkit.tuebingen.mpg.de/quick2_d|title=Quick2D|last=Remmert|first=Michael|website=toolkit.tuebingen.mpg.de|access-date=2016-05-01}}
13. ^¹²{{Cite web|url=http://zhanglab.ccmb.med.umich.edu/I-TASSER/|title=I-TASSER server for protein structure and function prediction|website=zhanglab.ccmb.med.umich.edu|access-date=2016-05-01}}
14. ^¹²³⁴{{Cite web|url=https://www.genomatix.de/|title=Genomatix – NGS Data Analysis & Personalized Medicine|website=www.genomatix.de|access-date=2016-05-01}}
15. ^¹²{{Cite web|url=https://www.ncbi.nlm.nih.gov/SNP/snp_ref.cgi?showRare=on&chooseRs=all&locusId=84865&mrna=NM_032779.3&ctg=NT_022184.16&prot=NP_116168.3&orien=reverse&refresh=refresh|title=SNP linked to Gene (geneID:84865) Via Contig Annotation|last=snpdev|website=www.ncbi.nlm.nih.gov|access-date=2016-05-01}}
16. ^{{Cite web|url=http://rbpdb.ccbr.utoronto.ca/|title=RBPDB: The database of RNA-binding specificities|website=rbpdb.ccbr.utoronto.ca|access-date=2016-05-01}}
17. ^¹{{Cite web|url=http://human.brain-map.org/microarray/search/show?exact_match=true&search_term=CCDC142&search_type=gene&donors=14380,15496,10021,9861,12876,15697|title=Microarray Data :: Allen Brain Atlas: Human Brain|website=human.brain-map.org|access-date=2016-05-01}}
18. ^{{Cite web|url=https://www.ncbi.nlm.nih.gov/UniGene/ESTProfileViewer.cgi?uglist=Hs.430199|title=EST Profile – Hs.430199|website=www.ncbi.nlm.nih.gov|access-date=2016-05-01}}
19. ^{{Cite web|url=https://www.ncbi.nlm.nih.gov/geo/|title=Home – GEO – NCBI|last=geo|website=www.ncbi.nlm.nih.gov|access-date=2016-05-01}}
20. ^¹{{Cite web|url=https://www.ncbi.nlm.nih.gov/geo/tools/profileGraph.cgi?ID=GDS3596:1451178_at|title=GDS3596 / 1451178_at|website=www.ncbi.nlm.nih.gov|access-date=2016-05-01}}
21. ^{{Cite web|url=https://www.ncbi.nlm.nih.gov/geo/tools/profileGraph.cgi?ID=GDS4795:ILMN_3023885|title=GDS4795 / ILMN_3023885|website=www.ncbi.nlm.nih.gov|access-date=2016-05-01}}
22. ^¹{{Cite web|url=https://www.ncbi.nlm.nih.gov/clinvar/?term=ccdc142%255Bgene%255D|title=No items found – ClinVar – NCBI|last=ClinVar|website=www.ncbi.nlm.nih.gov|access-date=2016-05-05}}

Locus

mRNA

Conservation

Paralogs

Orthologs

Phylogeny

Protein

Primary Structure, Variants, and Isoforms

Domains and Motifs

Post-Translational Modifications

Structure prediction

Expression

Promoters and Regulatory Factors

RNA Binding Proteins

Expression

Function and Biochemistry

Composition

Interacting Proteins

Clinical Significance

Pathology and Diseases

Mutations

Multiple Sequence Alignment

Additional Transcription Factor Information

References

Further reading