Proteomic antibody sequencing: an overview

If you search the internet for ‘antibody sequencing’, you will be rewarded with a long list of resources and services. The terms used to describe antibody sequencing are overloaded and highly similar. We felt a single resource that explained the why and how of proteomic antibody sequencing was needed.

Monoclonal sequencing versus repertoire sequencing

An antibody sample is often described based on its ‘clonality’. Clonality means the number of distinct antibody sequences or lineages that are present in the sample.

  • Monoclonal: The sample contains antibodies that share the same sequence. Monoclonality is a precondition for proteomic antibody sequencing. In rare cases, simple mixtures of monoclonal antibodies can be proteomically sequenced, but success is not generally guaranteed.
  • Polyclonal: The sample contains antibodies with diverse sequences and generally span a large quantitative range. Antibodies from serum would be characterized as polyclonal.
  • Oligoclonal: The sample contains a limited number of antibody lineages, as defined by the complementarity determining regions (CDRs).

This article focuses exclusively on methods for sequencing monoclonal antibodies. For information on Digital Proteomics’ work on sequencing polyclonal antibodies, also called repertoire sequencing, visit our page on Alicanto

Why would I need proteomic sequencing?

There are two general approaches to recovering the sequencing of the heavy and light chains of a monoclonal antibody. The most straightforward approach is to sequence transcripts or genetic material derived from the source B-cell. This approach is applicable to hybridomas, phage or yeast display, or single-cell screening approaches and is the most cost effective and reliable method for sequencing an antibody. However, the source cell is not always available. The genetic material may be unavailable if the hybridoma has been lost, or if the antibody was a gift. In this case, proteomic sequencing can be used to recover the antibody sequence using technologies such as Edman degradation or mass spectrometry.

Edman degradation

Edman degradation, also called Edman sequencing, is a mature technology for determining a protein sequence, reading one amino acid at a time starting for the N-terminus. A key advantage of Edman sequencing is the low material requirements (generally less than 1 ug is needed). Edman sequencing quality degrades as more amino acids are processed, so it is typically used to determine only the first 30-50 amino acids. It is rarely used to perform full-length antibody sequencing.

Mass spectrometry-based sequencing

Mass spectrometry has emerged as the preferred method for full-length antibody sequencing. In nearly all mass spectrometry protocols for antibody sequencing, the antibody is digested into 15-20 amino acid long peptides, which are then analyzed in the mass spectrometer. In a perfect world, each peptide results in a fragment spectrum. In proteomic antibody sequencing, each fragment spectrum must be interpreted and reveals a portion of the antibody sequence. Typically, the antibody is digested with multiple enzymes with distinct cleavage motifs thereby ensuring that every region of the antibody generates several overlapping peptides. By generating overlapping peptides as shown in Figure 1, proteomic sequencing technologies can reassemble the antibody sequence in a similar fashion to how genomes are assembled from shotgun sequencing reads.

Peptide mapping versus de novo sequencing
At this point, it’s worth making a distinction between peptide mapping and de novo sequencing technologies. In peptide mapping, the goal is to confirm a known sequence. De novo sequencing is the process of deriving a previously unknown sequence from a fragment spectrum or collection of spectra. Peptide mapping tends to require fewer enzymatic digests and fewer mass spectrometry runs than de novo antibody sequencing, which translates to a lower price tag. When we refer to proteomic sequencing in this article, we are talking about de novo sequencing.

Figure 1. The overview of the proteomic sequencing process. First the antibody sample is digested with multiple enzymes. Each enzyme has a distinct cleavage pattern (represented as a different color) and no single enzyme produces peptides for every amino acid of the antibody sequence. Proteomic antibody sequencing utilizes the overlapping peptides generated from multiple enzymatic digests to achieve complete coverage of the antibody and confirm that no sequence has been omitted.

The Valens approach

At Digital Proteomics we have been sequencing monoclonal antibodies for almost a decade. We spun out of the University of California – San Diego and our Valens service builds on two unique technologies developed there: shotgun protein sequencing and template proteogenomics.

Shotgun protein sequencing [1]: Shotgun genome assembly merges overlapping sequencing reads into a full-length chromosome. Unlike in nucleic acid sequencing, the fragment spectrum that is produced by the mass spectrometer is not a sequence. Instead, the sequence of the peptide must be derived from the fragmentation spectrum, which is noisy and often incomplete. Therefore, de novo sequencing a single spectrum is very error prone. It’s so error prone that very few de novo sequencing software companies actually report how many full-length peptides they get correct. The idea behind shotgun protein sequencing is to avoid de novo sequencing a single spectrum. Instead, the algorithm recruits mass spectra from overlapping peptides (without sequencing the spectra), and merges the spectra into large spectral contigs that have more complete information content and a higher signal to noise ratio. In shotgun protein sequencing, the sequence of the antibody is derived by de novo sequencing the spectral contig.

Template proteogenomics [2]: Peptide mapping is not a suitable method for sequencing an entire antibody since a significant portion of the antibody is unique and generally unknown prior to sequencing. However, large portions of the antibody sequence are un-mutated from the germline sequences and can be confidently determined using peptide mapping. Template proteogenomics treats the germline as a template, using peptide mapping to identify the partial sequences of the target antibody that are unmutated from germline. Template proteogenomics then uses the shotgun protein sequencing method to recover the unique portions (typically the complementarity determining regions, or CDRs).

By combining the strengths of peptide mapping with de novo sequencing, Valens is able to quickly and accurately recover the full-length sequence of a purified monoclonal antibody. Read more about our Valens service or contact us about your antibody sequencing needs and questions!


[1] Bandeira, N, Pham, V, Pevzner, P, Arnott, D, Lill, JR (2008). Automated de novo protein sequencing of monoclonal antibodies. Nat. Biotechnol., 26, 12:1336-8. 19060866

[2] Castellana, NE, McCutcheon, K, Pham, VC, Harden, K, Nguyen, A, Young, J, Adams, C, Schroeder, K, Arnott, D, Bafna, V, Grogan, JL, Lill, JR (2011). Resurrection of a clinical antibody: template proteogenomic de novo proteomic sequencing and reverse engineering of an anti-lymphotoxin-α antibody. Proteomics, 11, 3:395-405. 21268269