Live Blogger: Madison Fitzgerald
Editors: Lirong Shi and Ryan Schildcrout

This piece was written live during the 8th annual RNA Symposium, “Unmasking the Power of RNA: From Structure to Medicine” hosted by the University of Michigan’s Center for RNA Biomedicine. Follow MiSciWriters’ coverage of this event on Twitter with the hashtag #umichrna.

For millennia, humans have been collecting and compiling information. Literally – the first encyclopedia was published in the 1st Century by Pliny the Elder, a Roman statesman. Our funny-named friend compiled 37 chapters worth of information on topics such as astronomy, botany, geology, pharmacology, zoology, and human physiology. In the modern era, scientists look to other encyclopedic sources to find information. We go to PubMed to read journal articles, to GenBank or BV-BRC to view sequenced genomes, and to Kegg Pathway to browse metabolic pathways found in our favorite species. We use these databases containing information from disparate sources to inform our research and facilitate scientific discovery. University of Connecticut Professor Dr. Brenton R. Graveley and his team take a different approach. Rather than compiling data from experiments performed using a variety of materials, methods, and data analysis pipelines, this consortium is generating data with standardized protocols. The end goal? A comprehensive encyclopedia of RNA elements (ENCORE). In his talk at the Center for RNA Biomedicine’s 2024 RNA Symposium, Dr. Graveley presented the progress towards this goal. But first, what are functional RNA elements? 

When most people think of RNA, they are remembering lectures on messenger RNA (or mRNA) from a high school biology class. mRNA is the crucial intermediary that converts information encoded in our DNA genomes into proteins that perform specific tasks in our cells. However, this type of RNA represents only 1-5% of RNA in a single cell. The other types of RNA fulfill many roles in the cell. To name a few, transfer RNAs and ribosomal RNAs collaborate to translate mRNA into protein while microRNAs, long non-coding RNAs, and enhancer RNAs all regulate gene expression. (With all of these types of RNAs, you can see how an encyclopedia would be useful.) The identification of regulatory RNAs can be quite challenging. Luckily, RNAs often act in concert with RNA-binding proteins (RBPs) to carry out their regulatory function. This enables ENCORE to leverage our extensive scientific toolbox of techniques to characterize RNA elements in the human genome. 

Dr. Graveley begins his talk by laying out the major hurdles to large-scale research endeavors like ENCORE. First, the validation of reagents like antibodies is key. Antibodies are proteins that recognize specific molecular structures, enabling our body to “see” a pathogen in order to fight off infection. Since antibodies bind in such a specific manner, they are used in many methods to study proteins. Many antibodies are commercially available, but they can vary widely in their effectiveness. After they “literally scoured the earth for antibodies”, Dr. Graveley and his team tested 2,146 antibodies, but only 1,103 met their quality standards. Another complication of large research studies is ensuring that all the experiments are done in the same way so that datasets are directly comparable to each other. To overcome this barrier, the ENCORE consortium ran experiments using the same types of cells, protocols, and data analysis pipelines. Dr. Graveley admits that the two cell lines used for ENCORE “aren’t [his] favorite, but [he] didn’t pick them.” Rather, these cell lines were selected because they were extensively characterized by an earlier consortium that identified functional DNA elements. After the ENCORE data was collected, uniform quality standards were applied to all datasets. If a particular experiment did not meet these standards, it was not made publicly available. Finally, the ENCORE consortium did their best to account for batch effects, or non-biological variables that can randomly affect the data that is collected. After carefully planning experiments to account for these hurdles, they began the data collection for ENCORE .

Since ENCORE is more of a foundational resource than a project studying a single molecular mechanism, Dr. Graveley presented two types of data that were collected for ENCORE. To identify RNA sequences that are bound by specific RBP’s, they performed eCLIP-seq. This technique leverages those validated antibodies to isolate a protein of interest. Then, researchers can use sequencing to identify RNAs that were bound by that protein. At the time this research was performed, 23% of RBPs had no known function. Determining which RNAs are bound by each RBP is an important step in understanding the role of each RBP in the cell. Next, Dr. Graveley described ENCORE’s use of RNA Bind-N-Seq, a technique that predicts the binding sequence preferred by a particular RBP. In RNA Bind-N-Seq, an RBP is mixed with a bunch of RNA fragments with random sequences. Next, the RBP is isolated from the mixture and separated from the bound RNA fragments. Researchers can then determine which RNA fragments were bound to an RBP by sequencing them. Interestingly, Dr. Graveley and colleagues found that the RNA Bind-N-Seq data closely aligned with binding sites they had identified using eCLIP-seq. This indicates that other proteins aren’t required to facilitate binding. In addition to eCLIP-seq and Bind-N-Seq, ENCORE also sequenced RNA from cells where each RBP was depleted, used ChIP-seq to identify DNA sequences bound by each RBP, and used fluorescent microscopy to locate each RBP within the cell.

Although each ENCORE dataset is interesting on its own, the true power in this encyclopedia is the ability to layer different types of data together to determine how each RNA-RBP pair is altering gene expression. Paired with genetic markers of disease, ENCORE can offer insights into human health. Dr. Graveley offered an example where a patient has a mutation in a site bound by an RBP. Integrating RNA-seq, eCLIP-seq, and RNA Bind-N-Seq data from ENCORE revealed that this particular mutation disrupts the RBP::RNA binding interaction. Lack of RBP binding alters how this gene is expressed, impacting the patient’s health.

After 4 years of funding, ENCORE has been a massive success. Dr. Graveley and his colleagues have performed at least one of the assays described in this article for 562 RBPs. Most of these datasets are already publicly available through either Project ENCODE (a completed consortium effort) or through his lab’s webpage. All of the fluorescent microscopy data are available in an RBP Image Database. With the goal of completing a map of all RNA::RBP interactions, Dr. Graveley admits that they are only about halfway done. New RBPs are identified each year, ensuring his goal is a challenging one. Regardless, it is safe to say that ENCORE has achieved its status as an encyclopedia and has become an important resource that will fuel research for decades to come.


Dr. Brenton R. Graveley earned his B.A. in Molecular, Cellular, and Developmental Biology at the University of Colorado, his Ph.D. in Microbiology and Molecular Genetics at the University of Vermont, and completed his postdoctoral studies at Harvard University. He currently is the Chair of the Department of Genetics and Genome Sciences at the University of Connecticut, Associate Director of the Institute for Systems Genomics, and the Health Net, Inc. Chair in Genetics and Developmental Biology. His research focuses on the role of RNA binding proteins and RNA biology in contributing to human biology and disease. Information about his work can be found on his UConn faculty page.

Leave a comment