Sample preparation for complete LC-MS/MS sequencing of a pure protein sample differs substantially from the preparation used for protein identification in complex samples. To sequence a protein de novo, the experimental design should aim to identify each amino acid (a.a.) multiple times within different peptides, instead of relying on only a few peptides per protein for complex protein mixtures. Efforts to increase protein sequence coverage typically rely on using multiple proteases to digest the protein. In this study, we conducted a large-scale statistical analysis of protein sequencing data from samples digested with multiple proteases to understand the impact of using different combinations of proteases to improve the depth of sequence coverage in the application of de novo protein sequencing. The data presented here can help guide the choice of proteases for maximum coverage during protein sequencing.


MS data for 166 monoclonal antibodies were compiled for use in this study. Each antibody protein sample was digested separately with different proteases and analyzed by LC-MS/MS. The new Pro/Ala (P/A) protease (not commercially available yet) was tested to characterized NIST and mAb04Hc antibody standards. De novo peptide sequencing was performed with Novor search algorithm. Protein sequences were assembled using REmAb™. Yeast complex mixtures were analyzed using Search GUI and PeptideShaker.


Combination of existing proteases with orthogonal activities significantly increase confidence scores in de novo protein sequencing; however, there is a need for new proteases targeting specific amino acid(s) (a.a.) or a.a. sequences, to increase antibody sequencing accuracy.


The Pro/Ala Protease can be used to further validate protein sequencing

The numbers in brackets refer to the LC-MS/MS run order. The red vertical dotted lines indicate cleavage sites. Four of these proteases are C-ter amino acid-specific; Trypsin, Pepsin, Chymotrypsin and Pro/Ala (P/A), and one is N-ter-specific: Asp-N. Our data highlights that Pepsin preferentially cleaves the C₁, N₁ site while Chymotrypsin mainly targets the C₁ site. Trypsin, Pepsin, Chymotrypsin and Asp-N proteases show proline-related inhibition in the vicinity of the cutting site; in contrast, the Pro/Ala protease has a preference for the C1 Proline. These findings demonstrate that the protease P/A can be used to efficiently complement protein sequencing data.

Figure 1. Seq2Logo sequence motif logo displaying analysis of C-ter (A) and N-ter (B) fragments of yeast protein extract digested with five different proteases. (Click to enlarge image)

Protease combinations are important to achieve maximum coverage

De novo sequencing perfomed using Novor with b,c, y and z ions highlighted (from HCD, ETD spectra). Despite their non tryptic nature, a good coverage of both N- and C-terminal ends is observed. The conserved region of the NIST antibody was estimated to be covered at 89% using Trypsin, 70% using Chymotrypsin, 91% using the Pro/Ala protease, 73% using Asp-N and 70% using Pepsin. We found those values vary from one antibody to another.

Figure 2. MS/MS and annotation spectra resulting from Asp-N protease digestion (A) and Pro/Ala Protease (B) of the NIST antibody. (Click to enlarge image)

Use of diverse proteases facilitates maximum coverage of CDR3

Depicted by different colours, different proteases (Trypsin, Chymotrypsin, Pro/Ala, Asp-N and Pepsin) were utilized to cleave the antibodies to generate fragments of different sizes. The black arrows highlight the different cleavage sites found within a given sequence. A nearly complete coverage of both HCDR3s (bolded and underlined in red) is obtained after cleavage by each protease. The advantage of using different proteases is the resulting sequence overlap that can be used to facilitate de novo protein assembly.

Figure 3. Graphical representation showcasing the overall sequence coverage spanning the HCDR3 peptide sequence for two different standard monoclonal antibodies, mAb04Hc (A) and NIST (B). (Click to enlarge image)

The use of many different proteases translates to wider coverage

The median depth of coverage for the HCDR3 region and the remaining portion of the variable regions are listed in the following table for each combination of the proteases used in this study. The first column lists the protease combination, where each protease is represented by a single letter code: P (Pepsin), T (Trypsin), C (Chymotrypsin), A (Asp-N), and L (Lys-C). The second and third columns list the median depths of coverage of the HCDR3 and the other variable portions, respectively. The “depth of coverage” for an amino acid is defined as the unique PSMs covering the amino acid. Repeated MS/MS scans of the same precursor were counted as a single PSM. We compared the median depth of coverage achieved by different combinations of proteases by examining all amino acids from 166 antibodies. As expected, a greater number of different proteases increase coverage. In all cases, the HCDR3 is less covered than any other variable region. Surprisingly, when a limited number of proteases is employed, Pepsin seems to significantly contribute to improve amino acid coverage. We propose that this is most likely associated with the generation of a greater number of peptides with miscleaved sites and therefore a wider pool of different peptides.

Table 1. CDR coverage analysis of 166 antibodies. (Click to enlarge image)

Materials and Methods

Proteases and Target Proteins

Trypsin, Lys-C, Chymotrypsin, Pepsin, Pro/Ala, and recombinant Asp-N proteases were used to disgest NISTmAb humanized IgG1 monoclonal antibody RM8671, monoclonal mAb04 and yeast extracts.


Protein lysates were analyzed with an Orbitrap Fusion Series instrument coupled to the LC EvoSep One in both HCD and ETD mode. ETD spectra were acquired with three different collision energies.

Data Analysis

De novo peptide sequencing was performed with Novor search algorithm and the protein sequences were assembled with REmAB™. Yeast complex samples were analyzed with Search GUI-3.3.13 and PeptideShaker-1.16.37 using no enzyme restriction, 25ppm with protein, peptide and 1% psm FDR, while considering a combination of X! tandem, MS-GF+ and Comet. Sequence motif analysis was perfomed using SeqtoLogo (http://www.cbs.dtu.dk/biotools/Seq2Logo/).


Successful de novo antibody sequencing depends on full coverage of the protein of interest that is best achieved through repeated identification of amino acids in different peptides with overlapping sequences; different proteases  with different cleavage site rules can be used to make  de novo antibody protein sequencing a success.

Less  specific proteases such as chymotrypsin and pepsin generate more overlapping peptides than more specific proteases such as trypsin and Lys-C.

However, we observed that the presence of proline can result in inefficient cutting by trpsin, chymotrypsin, Asp-N, and pepsin. The Pro/Ala protease is capable of cutting peptides at C₁ proline sites. Our findings show that the Pro/Ala protease can be used as a complementary tool to de novo protein sequencing. Particularly, in the case of antibody sequencing, additional proteases will be important for targeting conserved amino acid or specific motifs to facilitate their sequencing. We found that this was especially important for the CDR3 region, which is often a difficult-to-sequence antibody area.


We would like to thank Maria Gerpe for proofreading this poster.

Reproduced from Le Bihan, T., Taylor, P., McDonald, Z., Liu, Q., Shen, J., Gorospe, K., Xu, X., Hosfield, C., Ma, B. (2019). Increased De Novo Protein Sequencing Coverage with Optimal Protease Cocktail. ASMS 2019 Atlanta, TP 020, with permission.