

Consequently, the vast majority of protein sequences available in public databases do not have a solved structure at this point in time. Although the Protein Data Bank (PDB) 5 provides experimentally solved structural data for an increasing number of protein domains, solving protein structures remains costly, time consuming and, in certain instances, technically difficult. Knowledge of the structure of a newly discovered protein is thus highly valuable in determining the role it plays in biological processes, and it can serve as an important stepping stone in generating hypotheses or suggesting experiments to further explore the protein’s nature. Functional properties of a protein domain, such as enzymatic activity 3 or the ability to interact with other proteins 4, can often be derived from the approximate spatial arrangement of its amino acid chain in the folded state.


It does, however, remain challenging to efficiently classify the operational role of the individual protein entities identified in such procedures. Recent decades have seen rapid growth in high-throughput procedures capable of identifying the proteomic profile of a cell in any state 1, 2. Proteomes constitute the backbone of cellular function by carrying out the tasks encoded in the genes expressed by a given cell type. Since its official release in August 2011, RaptorX has processed ~6,000 sequences submitted by ~1,600 users from around the world. At present, it takes RaptorX ~35 min to finish processing a sequence of 200 amino acids. Consequently, RaptorX delivers high-quality structural models for many targets with only remote templates. RaptorX distinguishes itself from other servers by the quality of the alignment between a target sequence and one or multiple distantly related template proteins (especially those with sparse sequence profiles) and by a novel nonlinear scoring function and a probabilistic-consistency algorithm. This protocol presents a community-wide web-based method using RaptorX ( ) for protein secondary structure prediction, template-based tertiary structure modeling, alignment quality assessment and sophisticated probabilistic alignment sampling. To this end, the availability of reliable three-dimensional atomic models of proteins is often crucial. A key challenge of modern biology is to uncover the functional role of the protein entities that compose cellular proteomes.
