Ljiljana Pasa Tollic
Pacific Northwest National Laboratory, Richland, WA, USA

Combinatorial post-translational modifications (PTMs), signal peptide cleavages, proteolytic processing and site mutations are all important biological processes that largely go undetected in traditional bottom-up proteomic analyses. While several PTMs are successfully identified using bottom-up methods, information including stoichiometry of modifications on a single protein, or presence of a combination of multiple modifications on a single proteoform1 is practically impossible to infer from peptide-level data. Because most proteins in a typical global proteomic study are not identified with 100% sequence coverage, it is not known whether the lacking sequence coverage is due to biologically relevant proteolytic processing events, sample preparation, or MS duty cycle. The potential information gleaned from top-down (i.e. intact protein) studies, or through integration of top-down and bottom-up approaches,2 is vast and is rapidly becoming an important avenue for proteomic studies.

Recent advances in MS instrumentation, separation, and bioinformatics significantly increased the throughput of top-down proteomics, allowing the identification of hundreds of intact proteins and their isoforms.3,4 However, most of these efforts involve additional sample pre-fractionation steps (e.g. GELFrEE), which are often labor intensive, require large sample sizes, and are inadequate in terms of quantitation. To tackle these challenges, we have optimized commercially available LCMS platforms for high-throughput, comprehensive and sensitive top-down quantitative analysis. This approach has been successfully applied to several of microbial systems, e.g. Salmonella5.

However, complications at both the experimental and data-analyses levels remain, particularly in the case of a large number of co-occurring PTMs and other modifications, demonstrating a need for further improvement in the top-down proteomics field. Histones for instance contain a large number of combinatorial modifications, providing a significant analytical challenge. While many modifications are localized to the N-terminus, modifications span across the entire sequence emphasizing the need for top-down studies to accurately reflect the number of relevant proteoforms. To this end, we have developed a histone specific two dimensional (2D) LC-MS/MS platform that enabled identification of over 700 histone isoforms from 7.5 µg of starting material in a single 24-hour analysis.6 The increased dynamic range and sensitivity offered by 2D LC–MS/MS platform enables detection of low abundant proteoforms involved in chromatin regulation and allows for top-down analyses of samples of limited quantity.

Similarly, bottom-up quantitative studies can provide information on several thousand proteins, however, depending on activated biological processes, data can be difficult to interpret when going from a peptide to protein abundance. Initial applications of a novel top-down based accurate mass and time tag approach for quantitative analysis of intact proteins included characterization of the native forms of human salivary proteins potentially relevant to oral salivary diagnostics,7 and insights into the underexplored mechanism of epigenetic control of gene expression for generating profitable bioactive compounds in fungus. Examples featured here highlight the complexity of comparing peptide abundance values in the context of protein abundance, and suggest that future top-down studies may be required for comprehensive analysis of biological processes.