Addressing intrinsic challenges for next generation sequencing of immunoglobulin repertoires.
Abstract
Antibodies are essential molecules that help to provide immunity against a vast population of environmental pathogens. This antibody conferred protection is dependent upon genetic diversification mechanisms that produce an impressive repertoire of lymphocytes expressing unique B-cell receptors. The advent of high throughput sequencing has enabled researchers to sequence populations of B-cell receptors at an unprecedented depth. Such investigations can be used to expand our understanding of mechanistic processes governing adaptive immunity, characterization of immunity related disorders, and the discovery of antibodies specific to antigens of interest. However, next generation sequencing of immunological repertoires is not without its challenges. For example, it is especially difficult to identify biologically relevant features within large datasets. Additionally, within the immunology community, there is a severe lack of standardized and easily accessible bioinformatics analysis pipelines. In this work, we present methods which address many of these concerns. First, we present robust statistical methods for the comparison of immunoglobulin repertoires. Specifically, we quantified the overlap between the antibody heavy chain variable domain (V H ) repertoire of antibody secreting plasma cells isolated from the bone marrow, lymph nodes, and spleen lymphoid tissues of immunized mice. Statistical analysis showed significantly more overlap between the bone marrow and spleen VH repertoires as compared to the lymph node repertoires. Moreover, we identified and synthesized antigen-specific antibodies from the repertoire of a mouse that showed a convergence of highly frequent VH sequences in all three tissues. Second, we introduce a novel algorithm for the rapid and accurate alignment of VH sequences to their respective germline genes. Our tests show that gene assignments reported from this algorithm were more than 99% identical to assignments determined using the well-validated IMGT software, and yet the algorithm is five times faster than an IgBlast based analysis. Finally, in an effort to introduce methods for the standardization, transparency, and replication of future repertoire studies, we have built a cloud-based pipeline of bioinformatics tools specific to immunoglobulin repertoire studies. These tools provide solutions for data curation and long-term storage of immunological sequencing data in a database, annotation of sequences with biologically relevant features, and analysis of repertoire experiments.