In-Depth Characterization of Monoclonal Antibodies with a Single Experiment and Fully Automated Data Analysis
Paul Taylor 1, Jonathan Krieger 1, Qixin Liu 2, Mingjie Xie 2, Lian Yang 2, 3, Bin Ma 2, 3.
1 The Hospital for Sick Children, Toronto, ON, Canada
2 Rapid Novor Inc., Waterloo, ON, Canada
3 University of Waterloo, Waterloo, ON, Canada
Monoclonal antibodies are one of the most important protein pharmaceuticals. A critical step in antibody drug development is the in-depth characterization of the protein molecule, including the primary sequence, mutations, glycosylation, and other important modifications. Multiple experiments are usually required for obtaining all such information. Human intervention is the norm for the analyses of the data from different sources. As such, the in-depth characterization of an antibody protein is currently a long and error-prone process. In this work a fully automated data analysis workflow based solely on LC-MS/MS is developed to characterize an antibody in-depth.
The monoclonal antibody protein is reduced, alkylated, and digested with six enzymes: Tryspin, Chymotrypsin, AspN, GluC, Proteinase K, and Pepsin. LC-MS/MS is performed on each digest. Novor software is used for de novo peptide sequencing. An in-house database search software, FasterDB, is used to find reference sequences from an antibody database. The de novo peptides are mapped to the reference to determine their relative positions. The consensus of the de novo sequences are taken as the real protein sequence. Once the primary sequence is determined. The MS/MS spectra are mapped to the derived protein sequence again with FasterDB for PTM and glycosylation characterization. The peak areas of each PTM and glycosylation form is calculated. Leucine and isoleucine are disambiguated by the combined use of their frequencies in the antibody database and the digestion specificity of Chymotrypsin and Pepsin.
The workflow is tested with the Waters’ IgG-I antibody standard (product number 186006552). The full sequences for both the heavy and light chains are fully recovered (Figure 1) with high coverage for each amino acid.
Compared to the sequence provided by Waters, two variations were discovered on the heavy chain. The first replaces two amino acids at 49-50 from MG to GM, and the second replaces three amino acids at 68-70 from SIT to TIS. Both changes are on the variable region, and are supported by strong MS/MS signal peaks (Figures 2). Since the variations do not change the intact mass of any tryptic peptide, and only slightly change the MS/MS spectra, they can only be picked up with de novo peptide sequencing. Peptide mapping or a homology-based sequencing would have failed in detecting these mutations.
Additionally, six glycosylation forms were identified (Figure 3). Five out of the six identified forms have slightly different retention time. Thus, they are not the result of in source fragmentation during MS.
The workflow can routinely de novo sequence the antibody proteins and profile the glycosylation. The amino acid inference is based on de novo sequencing (namely a “true de novo” method). This allows the detection of more mutations than a homology-based sequencing method.
Figure 1. Software’s interactive coverage view for light chain. Each AA is covered by multiple peptides. Colors indicate peptides from different enzyme digests. Paled color indicates the peptide’s peak area < 0.1%. The ruler at the top highlights the CDR, variable, and constant regions of the antibody. Heavy chain coverage is similar.
Figure 2. Evidential spectra for the mutations at heavy chain 49-50 (published MG vs our sequenced GM), and 68-70 (published SIT vs. our sequenced TIS). These mutations could not have been detected with a homology-based method as the published wrong sequences already match the spectra significantly.
Figure 3. Glycosylation forms identified by the workflow.