22/06/2021
The first results of the international 'Non-canonical ORF Working Group' to annotate novel translated open reading frames (ORFs) in the human genome have been made publicly available. The initiative includes researchers from 37 different Institutions, including Mar Albà, ICREA researcher at the Research Programme on Biomedical Informatics (GRIB, IMIM-UPF), and former PhD student, Jorge Ruiz-Orera. The latter researcher, now at the Max Delbrück Center, co-leads the project together with researchers at the European Bioinformatics Institute and the Broad Institute. The effort also has the support of Gencode/Ensembl, HGNC and Uniprot.
Evidence has accumulated in recent years that many ORFs that are not annotated as coding can nevertheless be translated into proteins. This means that the amount of proteins encoded in the genome is larger than previously thought. These advancements have been possible due to a new technique, known as ribosome profiling or Ribo-Seq. The analysis of Ribo-seq data can capture the movement of the ribosome in steps of three nucleotides, which correspond to codons, and thus detect bona fide translated ORFs. Until now, these Ribo-seq ORFs were scattered in the literature and difficult to access. The new initiative will allow easy access to the data through public databases.
The researchers have collected data from already published studies and built a consensus list that contains 7,264 new translated ORFs, including many ORFs in long non-coding RNAs, as well as upstream ORFs (uORFs) in coding transcripts. The list will be accessible from the Gencode web page and, in subsequent phases of the project, it will be part of standard gene annotations. The project paves the way for the large-scale annotation and characterization of small proteins that have remained hidden in the genome. A more complete annotation of the human proteome will be key to identify mutations that inactivate small proteins and cause disease.
Preprint: Mudge, Ruiz-Orera, Prensner et al. A community-driven roadmap to advance research on translated open reading frames detected by Ribo-seq. bioRxiv June 10, 2021.