A Large-Scale Comparison of MS-based Antibody De Novo Protein Sequencing and Targeted DNA Sequencing


The DNA sequences of antibodies are highly diverse due to the V-(D)-J recombination and hypersomatic mutations. As such, a new antibody of interest is unlikely to appear in any existing sequence database. Consequently, the database search approach commonly used in proteomics does not work for antibodies. De novo sequencing of antibody proteins directly from MS data is an improvement over the conventional database approach. However current published de novo sequencing methods are based on data from few, selected antibodies that may not be representative of samples. Here, we conduct the first large scale test of the REmAb™ sequencing platform by comparing its protein and DNA sequencing results of 24 myeloma cell lines.


The cells were grown in Gibco™ IMDM medium supplemented with 1% FCS. Supernatants were collected after 72 h in culture. After removal of BSA, saples were digested with Trypsin, LysC, Chymotrypsin, Pepsin, and AspN (Promega, WI, US). MS data were collected with Orbitrap Fusion. Protein sequences were assembled with the REmAb™ sequencing platform. The Ile and Leu were determined with the WILD™ method. The DNA sequences were generated using a novel hybrid capture approach using custom probes designed to target all annotated alleles of the V, J, and C regions of the IMGT database. The tBLASTn program was used to translate the DNA reads and align them with the protein sequence to check the correctness of protein sequencing results.

Preliminary Results

Twenty-four Myeloma cell lines have been processed thus far. Two cell lines (KMS-12BM and UTMC2) failed to express antibodies (verified by intracellular flow cytometry and MS independently). Two (AMO-1 and H929) produced a very low amount of antibodies requiring additional enrichment in future for sequencing, leaving 20 cell lines that produced sufficient antibodies for MS-based sequencing using our standard protocol. Four cell lines (EJM, LP1, OCI-My1, OCI-My6) express both heavy and light chains, whereas the other 16 express only light chains.

For each of the 20 protein samples, 5 LC-MS/MS runs were performed on an Orbitrap Fusion instrument. On average each LC-MS run produced between 5,000-10,000 MS/MS spectra, including HCD, ETD and EThcD spectra. The data is of high quality, with mass error of no more than 3 ppm for most spectra. All of the expressed heavy and light chains were sequenced with high confidence on the REmAb™ sequencing platform. Figure 1 shows the protein sequence of cell line OCI-My5 and its MS/MS spectra coverage. Each amino acid is supported by tens of unique peptide-spectrum matches (PSM). A PSM is unique if it has a unique combination of its sequence, PTM, charge state, and fragmentation parameter. The coverage of other cell lines is similar.

Fig. 1: Screenshot of REmAb™ software showing the coverage of the protein sequencing result of cell line OCI-My5. The top sequence is obtained with REmAb™ . Each colored bar below the sequence indicates a unique peptide-spectrum match (PSM) covering the area. Different colors indicate different enzymes used for proteolysis.  Actual coverage depth is greater than shown as the figure is cropped.

Among the 20 cell lines, three (KHM11, MM1, OCI-My6) do not have DNA sequencing data. The DNA reads and the protein sequences of the other 17 cell lines were compared by mapping the DNA reads to the protein sequences. Figure 2 shows the alignment for cell line OCI-My5. Notice that DNA reads were translated into amino acid sequences before the alignment. The figure shows that each amino acid of the protein sequencing result is confirmed with tens to hundreds of DNA reads.

Fig. 2: Screenshot of the alignment between the protein sequence (top line) and the translation of the DNA sequencing reads (bottom lines) for OCI-My5. The number after each read indicates the number of identical reads aligning to the same location.  Actual coverage depth is greater than shown as the figure is cropped.

The 17 compared cell lines produced in total 20 chains (17 light + 3 heavy). With the exception of U266 light chain, all other protein sequences were confirmed with DNA reads (including the identities of the isobaric Ile and Leu). For U266, all but the first 16 amino acids of the N-term were confirmed with DNA reads. The first 16 amino acids were not covered. Further examination of the data suggested that this was not a protein sequencing error. Instead, there are no DNA reads spanning the area of the first 16 amino acids. We also noticed that U266 had a significantly lower number of DNA reads than the other cell lines.


  • By using the right MS experiments and software tools, antibody proteins can be de novo sequenced routinely
  • The isobaric Ile and Leu can also be distinguished with w-ions from EThcD spectra

Reproduced from McDonald, Z., Chow, S., Gorospe, K., Xu, X., Taylor, P., Liu, Q., Li, Z., Han, Z., Pugh, T., Trudel, S., & Ma., B. (2019). A Large-Scale Comparison of MS-based Antibody De Novo Protein Sequencing and Targeted DNA Sequencing. ASMS 2019 Atlanta, WP 046, with permission.

Image result for rapid novorPrincess Margaret Cancer CentreImage result for university of waterloo logoRelated image


Unsure whether or not you should be using DNA sequencing or protein sequencing for your project?

Related image

Contact us to set up a free 10-minute consultation