Computational Analysis of Zika Virus: Flavivirus Antibody Epitope Data Mapped Onto The Zika Virus Proteome Suggest Potential Shared and Unique Epitopes

Download as pdf or txt
Download as pdf or txt
You are on page 1of 35

Computational analysis of Zika virus: Flavivirus antibody epitope data mapped

onto the Zika virus proteome suggest potential shared and unique epitopes

Xiaojun Xu, Kerrie Vaughan*, Alessandro Sette and Bjoern Peters

Immune Epitope Database (IEDB) and Vaccine Discovery, La Jolla Institute, La Jolla,
CA. 92037, USA

*To whom correspondence may be addressed. Email: [email protected]

INTRODUCTION

The epitope targets of humoral and cellular immune responses in Zika virus (ZIKV) are
currently unknown due to the relatively recent emergence of ZIKV as a pandemic threat
associated with severe birth defects [Fauci 2016; Driggers 2016, Heymann 2016; Rubin
2016, Brasil 2016]. However, ZIKV is a member of the Flaviviridae family, a group of
viruses for which a wealth of data describing epitope targets is readily available in the
IEDB. Because of their phylogenetic relatedness and sequence similarity, it is likely that
some of these Flavivirus epitopes may be conserved in ZIKV, possibly contributing to
some degree of preexisting immunity in areas where ZIKV and other Flaviruses, such as
Dengue virus (DENV) and Japanese encephalitis virus (JEV) are co-circulating. Epitope
conservation also raises the possibility that preexisting antibodies to other flaviviruses
might enhance ZIKV pathogenesis because of the antibody dependent enhancement
(ADE) phenomenon. Finally, epitope similarity and viral cross reactivity poses a potential
challenge to antibody-based diagnostic assays.

Conversely, instances where known epitopes map to regions of the Flavivirus proteome
that are significantly divergent between ZIKA and other flaviviruses are also of interest.
In those cases, it is reasonable to speculate that the corresponding ZIKV sequences
might also be immunogenic, as they are likely to have similar exposure to adaptive
immune receptors. In that case, the epitopes could be of significant diagnostic value,
allowing discriminating past ZIKV exposure from exposure to other co-circulating
Flaviviruses.

Here we use the cumulative data housed in the IEDB related to antibody epitopes
derived from all Flaviviruses to examine these issues. This is an ongoing analysis which

1
we are planning to update and expand the breadth of this analysis as more data become
available. Please provide corrections and suggestions for improvements to Kerrie
Vaughan [[email protected]].

METHODS
ZIKV sequence database and determination of sequence conservation
The Entrez package from Biopython was utilized to query the NCBI protein data
repository for full-length ZIKV polyprotein sequences, identified as records having
taxonomic ID 64320 and length greater than 3,000. These records were then secondarily
processed to extract associated information (strain/isolate name, accession ID, year,
and location). Table 1 lists of all ZIKV strains (including redundancies) retrieved.
Following removal of redundant sequences (different entries with 100% sequence
identity), unique Zika polyprotein sequences were aligned using MAFFT [Katoh and
Standley 2013]. The positional identity and deletion rate were then computed based on
the alignment profile.

Selection of DENV Sequences.


The sequence identity analysis of DENV polyproteins was conducted on a
representative set of DENV assembled in [Weiskopf 2014]. Briefly, full-length DENV
polyprotein sequences for DENV1-4 serotypes were retrieved from NCBI Protein
database using query: txid11053[orgn] AND 3000:5000[slen]. To avoid over
representation of the sequences from one particular geographical origin, the number of
sequences derived from each nation was limited to a maximum of 10. In the end, 162
DENV1, 171 DENV2, 169 DENV3, and 53 DENV4 non-redundant sequences were
selected for analysis.

IEDB epitope data retrieval


To integrate T and B cell epitope data from the IEDB into the sequence retrieval pipeline,
we queried a SQL version of the IEDB. The query specified the retrieval of records
where epitope_source_organism_taxonomic ID = 11051 (Flavivirus genus) for T cell and
B cell assays, only (no binding or elution data), and included the following fields
associated with the records: epitope id, description (sequence), antigen name, position,
antigen id (accession) epitope source organism name, host, antibody purification status
(monoclonal/polyclonal), antibody name (for mAbs) and assay group.

Compilation and alignment of ZIKV sequences


A set of full-length ZIKV proteomes was retrieved by querying the NCBI Protein
database May 3, 2016. This set includes isolates from the recent outbreak in South
America, as well as sequences from previous outbreaks in different geographic
locations. Redundant sequences were removed and the final set of sequences is
presented in Table 1.

2
Alignment of known epitope data with ZIKV
Epitope data from Flaviviruses extracted from the IEDB were compared to reference
ZIKV sequence MR766 [accession: YP_002790881]. Each epitope containing Flavivirus
protein sequence was aligned to the Zika polyprotein sequence in order to identify its
corresponding positions. Next, the degree of sequence identity between each epitope
and its mapped position on ZIKA was calculated as the percentage of the identical
residues in the epitope aligned region. Finally, a positional response frequency
(RFscore) was calculated using previously established parameters
(http://help.iedb.org/entries/91331263-Immunome-Browser-3-0). Data presented in the
figures are running averages with window size of 9 amino acids. Comparison of
Flavivirus envelope protein sequences was accomplished by generating a percent
identity matrix using Clustal Omega (Clustal2.1).

3D Structural Analysis
Structural analyses were performed using the cryo-Em structure of ZIKV (PDB 5IRE)
[Sirohi 2016]. Solvent accessibility scores were determined using ASAView: Solvent
Accessibility Graphics for proteins (http://www.abren.net/asaview) [Ahmed 2004]. 3D
renderings were generated using Jmol (Jsmol), which is an open source software for
interactive 3D viewing of chemical structures that uses JavaScript. The 3D rendering of
epitopes identified by query of the IEDB was accomplished by mapping Flavivirus
epitope positions onto the ZIKV proteome.

RESULTS

Compilation of Zika virus sequences from NCBI


We compiled all full-length ZIKV sequences reported to date in the NCBI protein
sequence database. Table 1 summarizes the sequences retrieved in terms of isolate or
strain name, accession ID, as well as the year and location of isolation. As of May 3,
2016, a total of 85 ZIKV sequences were retrieved. Of these, 38 are from the 2015-2016
outbreak originating in the Americas, while another 47 sequences are from isolates
identified in Canada, Europe, Africa, Asia, Southeast Asia, including the original ZIKV
isolate MR766 identified from Uganda in 1947. From this list, 15 redundant sequences
were removed, resulting in a set 70 unique polyproteins for subsequent analyses.

All ZIKV isolates have high sequence similarity


To assess the degree of variability among different ZIKV isolates, a multiple sequence
alignment was performed using all compiled ZIKV sequences. Figure 1 displays the
degree of conservation for the consensus residue at each position. Data are presented
from all isolates for all individual protein cleavage products of the ZIKV polyprotein (core,
prM/M, E, etc). High conservation was observed across all proteins with an average of
98.9 %. An inspection of individual proteins shows slightly greater sequence variability
for the structural proteins (97.8%) compared to non-structural proteins (99.2%).
Individual data are shown in Table 2 These data show that the greatest sequence

3
divergence for ZIKV isolates residues within the anchor region (aa105-122) at 95.9%
identity; however, of those viral proteins likely to play major roles in immune targeting,
sequence similarity is high, even for surface-exposed antigens such as E.

To put these data into the larger context, we compared the degree of variability between
different ZIKV isolates with the degree of variability across other Flaviviruses, such as
DENV. The DENV virus group is composed of four antigenically distinct serotypes
having 65-70% sequence identity between each other. Conversely, the overall sequence
variation within individual DENV serotypes can vary as much as 3% [Sukupolvi-Petty
2010; Holmes 2003; Rico-Hesse 1990]. Our analysis of sequence variability within
DENV serotypes showed that at the level of genomic polyprotein, sequence identity was
98.6%, 98.1%, 98.9% and 99.0% for serotypes 1-4, respectively. However, the average
degree of polyprotein sequence variability across all four serotypes was greater (83%).

Next, to investigate the potential for cross reactivity at the antigen level, we assessed the
percent sequence identity between the E protein of several well-studied Flaviviruses and
ZIKV (Table 3). We found that sequence identity for this important antibody target
ranges from 42.7%-58.2%, for YFV, WNV JEV, DENV1, DENV2, DENV3 and
DENV4.(Values in red Fig 2). Thus the degree of sequence similarity within ZIKV
isolates is comparable to the sequence similarity observed within a given DENV
serotype, and at the level of the E antigen, sequence similarity between ZIKV and other
Flaviviruses is >55%. These findings suggest that there may be sufficient similarity in
immune response targets between ZIKV and other Flaviviruses to make it feasible to
translate what is known immunologically for these viruses to ZIKV.

Retrieval of Flavivirus antibody/B cell epitopes from the IEDB


A significant amount of work has been done to define and characterize immune
response targets in different viruses within the Flavivirus genus. More than 8,500 unique
antibody and T cell epitopes have been described in the IEDB that are recognized in
humans and animal models . Of importance from the standpoint of potential serological
cross reactivity are DENV, West Nile virus (WNV), Japanese encephalitis virus (JEV)
and Yellow fever virus (YFV), because of their antigenic similarity to ZIKV (~42-58%),
and their overall circulation frequency. While responses have been observed to all three
surface exposed structural viral proteins (core, prM/M, and envelope (E)), and to a lesser
extent to the non-structural protein 1 (NS1), the majority of antibody responses among
flaviviruses are directed against the E protein [Dowd 2014; Dowd and Pierson 2011,
Heinz and Stiasny 2012]. Moreover, the E protein is the major target of neutralizing
antibodies and has therefore been identified as an immunological ‘hot spot’ of known
and potential protective epitopes [Pierson and Diamond 2015; Fibriansah 2015a,b; de
Alwis 2012; Dowd and Pierson 2011; Pierson 2008]. To survey the current ‘universe’ of
antibody/B cell epitope data reported in the literature for viruses within this genus, we
performed a query using the IEDB to retrieve all B cell epitopes defined in Flavivirus
antigens. The resulting 1,002 epitopes included 471 epitopes defined in humans and

4
537 defined in rodent models, as listed in Supplemental Table 1 (Bcell_Flavi, google
doc link).

Of note for the consideration of antibody data captured in the IEDB, two epitopes are
reported as distinct entities if they have any difference in molecular structures even if
they largely overlap. Thus in many cases, the epitopes reported in different studies
overlap the same antigenic site.

To date, the majority of antibody epitopes have been defined for DENV (73%), followed
by WNV (11%) and JEV (5%). Data from YFV are far fewer (<1%). A breakdown of
antibody responses by antigen shows that the vast majority of reported epitopes were
derived from the E protein (76%), followed by NS1 (14%), prM/M (4%), capsid (2%) and
NS5 (1%), with little or no data reported for NS2A, NS3, NS4A and NS4B. Next, using
the Immunome Browser feature available within the IEDB, the reported DENV antibody
epitopes were mapped along a reference proteome [DENV1 Nauru/West Pac/1974;
P17763]. The Immunome Browser feature enables the visualization of epitopes plotted
with their corresponding response frequency (# subjects responding/ #subjects tested)
along the entire virus polyprotein, thus highlighting regions of immune prominence.
Figures 2a and 2b show the DENV epitope data for humans (418 epitopes) and mice
(343 epitopes), respectively. Specifically, the graph depicts the lower (dark pink) and
upper (light pink) bounds of responses frequencies for each residue along the entire
3,392 aa proteome. Thus there is greater confidence in the degree of immune reactivity
where the upper and lower bounds are closer (more white background). This
visualization reveals areas targeted in both hosts, as well as regions that are associated
with differential reactivity between human and mouse reports. For example, while
antibody responses in both species target the E protein, relative prominence differs
among domains I, II and III.

Similarly, while less data (119 epitopes) are available for WNV, Immunome Browser
analysis revealed a similar trend; however, the NS1 protein has been more heavily
studied in WNV (Supplemental Figure 1). For YFV and JEV, we also observed that a
majority of antibody epitopes were mapped to E, though with far fewer antibody data
(data not shown). Thus, taking the antibody data as a whole from viruses within the
Flavivirus genus, the vast majority of epitopes defined to date have been defined for the
E protein. Therefore our initial analyses will focus on the E protein, and a secondary
assessment of NS1 data will follow (see below).

Alignment of B cell epitopes in Flaviviruses to ZIKV sequences – E protein


To assess if dominant targets in the envelope protein of other flaviviruses are conserved
in ZIKV and therefore potentially cross-reactive, we aligned the protein sequences for
which epitopes had been mapped with the reference ZIKV proteome and determined the
conservation of the epitope positions. These data revealed that 71 of known epitope
targets within the E protein in other Flavivirus species are completely conserved (100%
sequence ID) within ZIKV. Another 106 of previously described epitopes are conserved

5
at 80-93% identity to ZIKV. This is on a per residue basis and includes both linear and
discontinuous determinants (monoclonal and polyclonal). This includes epitopes defined
in all four DENV serotypes, as well as WNV, and represents neutralizing and/or
protective sites described in humans and in rodent models. In all, there are 211 unique
epitopes having ≥80% identity to ZIKV, representing distinct, as well as overlapping
(residues common to more than one epitope) regions on ZIKV E. Thus, our initial
analysis seeking to identify potential antigenic overlap between ZIKV and other more
well-studied Flaviviruses revealed a subset of known B cell/antibody epitopes with
identity to ZIKV.

Next, to further evaluate these conserved sites in terms of overall relative


immunodominance, the Immunome Browser was used to display response frequency
along the E protein. For this, a panel of 9 proteomes from representative Flaviviruses
was aligned with the ZIKV reference proteome to establish overall sequence identity
between ZIKV and a reference set of Flaviviruses. Response frequency scores
(RFscore) were then mapped per residue and overlaid with the sequence identity
calculations. Figures 3a and 3b show the degree of conservation (sequence ID) with
corresponding RFscore for each Flavivirus residue position mapped onto ZIKV E protein
for humans and mice, respectively (dotted horizontal lines represent mean values).

Several major sites showed high sequence conservation with ZIKV (≥65% identity).
These included residues in the N-terminus, including the region containing the
conserved, protective fusion loop, as well as residues in the C-terminus. Looking at
response frequency in humans, a similar picture emerged, with major response regions
(lower CI values ≥0.16) observed at sites throughout E. Regions of high sequence
identity and response frequency are summarized in Table 4, showing overlap in all three
domains of E.

Regions of low sequence identity (<35%) were also observed. Table 5 provides a
summary of these data. Such regions have diagnostic potential, as responses directed
against these particular sequences may provide evidence of past ZIKV infections,
allowing for discrimination among other Flavivirus infections.

Correlation between conservation and epitope reactivity and structural features of the
ZIKV envelope protein
Next, overall solvent-accessible surface area scores (SASA) were calculated for the
ZIKV E protein, to identify those regions on the protein that are surface exposed versus
those that are buried within the structure. SASA scores were then compared to the
sequence conservation and the response frequency scores separately. Figure 5 shows
SASA and RF scores determined per residue for the E protein. These data show that of
the 13 major regions of reactivity (RFscore ≥ 0.2), 5 correspond with surface exposed
regions (SASA ~40%). There are another 6 regions where high reactivity that
correspond to more intermediate exposure (SASA 10-30%). Only two regions of high

6
reactivity (aa115 and 140), correspond to SASA of less than 10% (mostly buried). Thus
observed reactivity corresponds most frequently with intermediate and exposed
residues, whereas few highly reactive sites are located in non-exposed regions.

The envelope ectodomain contains three domains; DI (aa1-51, 133-196), DII (aa51-133,
196-286) and DIII (aa302-403). The tip of DII contains the highly conserved fusion loop
(aa98-110), which enables fusion by interacting with the host endosomal membrane. DI
is a 9-stranded barrel structure that works with DII as a hinge to expose DII during
invasion. DIII (Ig-like) is thought to contain the receptor binding site [Kostyuchenko 2016;
Pierson 2008]. Regions distinct between neuroinvasive versus febrile-illness causing
Flaviviruses, include aa 67, the glycan loop (aa145-165), kl-loop (aa281) and DE-loop
(aa368-369). The α-Helical transmembrane region 456-477 and 484-502. To gain a
greater insight with respect to the biologically active E protein, we made use of recently
published [Sirohi 2016, Kostyuchenko 2016] PDB structures and analyses to generate
3D maps of the above data with associated structural annotation. For this we mapped all
data from Figure 3a (conservation and RF scores) and 4 (SASA and RF scores) onto the
PDB 5IRE structure (Figure 5). To access the interactive renderings of these data
please use the link: Figure 5. 3D Visualization of Env (http://moles.liai.org/zikaTest.html).

Analysis of these data on the 3D structure revealed, as expected, critical sites such as
the fusion loop (98-110) which is highly conserved, surface exposed and which is a high
frequency antibody target. However, there are also numerous regions with more
intermediate scores for which making a clear assessment is difficult. These include sites
that are partially exposed with good RF, as well as those with high conservation and
SASA, but low RFscores, as examples.

Lastly, we sought to identify the subset of Flavivirus antibody epitopes that have been
associated with in vitro virus neutralization and in vivo live challenge studies. This subset
included 62 unique sites defined in human subjects and 193 sites defined in rodent
models. For this, the corresponding sites were mapped onto the ZIKV E PDB 5IRE
structure (Figure 6). Here, we find sites mapping within DII and DIII for the human and
mouse data, however, more sites from the murine studies mapped onto ZIKV. In both
cases, the well-known fusion loop site is conserved. In total, there are 286 monoclonal
antibodies, 65 from human (for 62 epitopes) and 221 from mice (for 193 epitopes)
associated with those assays. A list of these monoclonal antibodies and their references
is provided in Table 6.

Alignment of B cell epitopes in Flaviviruses to ZIKV sequences – NS1 protein


We next performed our pipeline analysis on the non-structural protein 1 (NS1). NS1
makes up the second largest set of Flavivirus-related data (14%), representing 131
antibody epitopes derived predominantly from DENV and WNV. There are several
notable characteristics making NS1 of interest for analysis herein. Firstly, among
Flaviviruses, NS1 genes share a high degree of homology. Further, NS1 exists in
multiple forms in different cellular locations, including membrane-bound, as well as in the

7
form of a soluble lipoparticle, making NS1 the only non-structural protein that is secreted
during pathogenesis and therefore becomes a target of humoral responses [Muller and
Young 2013]. NS1 has also been considered a potential target for therapeutic inhibitor
design due to its role in viral replication; both forms of this protein have been shown to
be immunogenic. Finally, NS1 has been used successfully as a diagnostic tool in
detecting early infection [Muller and Young 2013].

Analysis of the overall sequence identity of NS1 for all aligned ZIKV isolates showed
99.2% sequence similarity (Table 2). Thus sequence identity for ZIKV NS1 is higher
than that observed even within individual DENV serotypes, where we found DENV1
(98.4%), DENV2 (98.2%), DENV3 (99%) and DENV4 (98.4%) [data not shown]. Next, a
comparison of the ZIKV NS1 protein sequence to NS1 from other Flaviviruses showed a
similar range of percent identities as was observed with the E protein (Table 7).
Interestingly, the highest NS1 sequence identity was observed for WNV (56.1%) and
JEV (56.7%), whereas our earlier comparison of Flavivirus E proteins showed that
higher identity was observed for DENV1 (57.8%) and 3 (58.2%) (Table 3).

Next, in order to evaluate the level of NS1 sequence identity and epitope, the same
panel of 9 representative proteomes was aligned with the ZIKV reference proteome, and
as before, response frequency scores were mapped per residue and overlaid with the
sequence identity calculations. Figures 7a and 7b shows the degree of sequence
identity with corresponding response score for each residue position mapped onto ZIKV
NS1 protein for humans and mice, respectively. Here, surface accessibility scores were
also included.

Inspection of these data revealed notable differences in response reactivity between


human subjects and mice. Overall, a greater number of NS1 epitopes have been
described in mouse models than for human hosts (112 versus 31 epitopes, respectively).
Thus, of the total 131 epitopes described to date for NS1, human determinants represent
only 24% compared to 84% for mice. Moreover, Figure 7 showed that response
frequencies are much lower in humans (0.13, 95%CI) as compared to those observed
for mice (0.34, 95%CI). For those regions identified to date as having high response
frequency scores in humans, including aa1-20, 35-55, 110-130, 135-150 and 250-260,
there are however notable overlap with mouse responses. Other high response regions
observed in mice, such as aa60-90 and 289-320, are not present in the human data.
These regions may be as yet unstudied in humans and/or non-immunogenic. Overall,
the majority of high response regions on NS1 corresponded to sites where sequence
identity are >50%.

Similar to that observed for the E protein, analysis of surface accessibility by SASA
score for this NS1 protein showed a complex picture with no statistically significant
correlations. Of the ~10 regions showing high sequence identity, 6 sites corresponded
with SASA ≥40% (surface exposed). However, there were also several regions of high
sequence identity (e.g. aa60-64, 121-122, 335) that corresponded to inaccessible,

8
mostly buried sites (SASA ≤10). An analysis between NS1 and E also performed in
order to compare and contrast surface accessibility between these two very different
viral antigens targeted by antibody responses. Here we found that the average SASA
score for the E protein was 34% compared to 29% for NS1. Further, while nearly 40% of
all E residues are accessible at the surface (≥40%); surface exposed residues comprise
only 25% of NS1. By contrast, nearly 50% of NS1 residues are buried within the
structure compared to just 20% for the E protein.

Regions of high sequence identity and response frequency for NS1 are summarized in
Table 8, showing overlap in all three domains. Regions of low sequence identity (<35%)
were also observed (Table 9). Such regions have diagnostic potential, as responses
directed against these particular sequences may provide evidence of past ZIKV
infections, allowing for discrimination among other Flavivirus infections.

Correlation between conservation and epitope reactivity and structural features of the
ZIKV NS1 protein
Data generated for NS1 (Figure 7, sequence ID, RFscores and SASA) were mapped
onto the recently published PDB 5IY3 structure [Song 2016]. To access the interactive
renderings of these data please use this link:http://moles.liai.org/zikaNS1.html. As was
observed for the E protein, inspection of these renderings with respect to sequence
identity and surface accessibility showed a complex picture with no clear relationships.
However, the 3D rendering of response frequency for NS1 reveal greater distinction
among its different regions. Here, we show side-by-side comparison of these data for
humans and mice (Figure 8) reflecting response patterns observed in Figure 7. For
example, within the “β-roll” region corresponding to residues 1-29 [Akey 2014], mouse
antibody responses are high (blue), whereas while human responses overlap in this
region, scores are more intermediate (green). It is important to note that in the PDB
structure for ZIKV NS1, the N-terminal residues 1-164 are absent.

DISCUSSION

Epitope targets of humoral and cellular immune responses in ZIKV are not frequently
defined due to the relatively recent emergence of ZIKV as a pandemic threat. There is,
however, a large body of immunological data available for other viruses within the
Flavivirus genus. These data, representing more than 8,000 unique epitopes, are
available from the IEDB and can therefore be used for a comparative analysis against
the ZIKV proteome. Using a computational approach, we sought to determine the level
of sequence similarity among all ZIKV isolates reported to date, as well as the sequence
similarity between ZIKV and other closely related Flavivirus species to determine the
level of potential correspondence between immunogenic and non-immunogenic regions.
To this end, we were able to identify numerous regions of shared sequence identify with
known human antibody reactivity against the E and NS1 proteins from other

9
Flaviviruses, and conversely, we also highlight several regions of low sequence identity
of potential use for diagnostic purposes to discriminate among these pathogens.

The recent publication of the first cryo-EM structure of Zika virus [Sirohi 2016] provided
an important insight in to the structure of ZIKV, and highlighted its similarity to other
Flaviviruses in general and DENV in particular. At the antigen level, the sequence and
structural analysis of ZIKV E protein in comparison to Dengue virus E by Kostyuchenko
et al. highlighted the high degree of structural similarity between the two antigens. At the
same time also elucidates features of the ZIKV E protein unique to both neuroinvasive
and febrile-illness-casing Flaviviruses, as well as greater thermal stability [Kostyuchenko
2016]. Since studies describing immunological characterization of ZIKV are not yet
available, we sought to use the data and tools available within the IEDB to determine
whether the data for taxonomically and antigenically related Flaviviruses, such as
dengue virus, YFV and WNV may shed some insights into potential targets of immune
recognition of ZIKV.

We found 71 known epitope targets within the E protein completely conserved within
ZIKV, with another 106 conserved at 80-93% identity to ZIKV. This included both linear
and discontinuous determinants (monoclonal and polyclonal) defined in all four DENV
serotypes, as well as WNV. In all, there are 211 unique epitopes having ≥80% identity to
ZIKV, representing distinct, as well as overlapping (residues common to more than one
epitope) regions on ZIKV E.

To visualize these data, we then used the Immunome Browser to compare sequence
conservation and antibody reactivity along the length of the E protein. Mapping of
response frequency scores (RFscore) with the sequence identity calculations showed
several regions of high conservation and high RFscore. These overlaps highlight regions
representing potentially conserved targets of immune reactivity among Flaviviruses.
Regions unique to ZIKV (low sequence ID to other Flaviviruses) were also noted. These
sites may be of interest for investigating their utility in diagnostics, as they may for
example to help differentiate between ZIKV and Dengue infection.

3D analysis of the E protein with respect to sequence conservation, surface accessibility


and response frequency revealed a complex picture. Certain well-known sites such as
the fusion loop met expectations of high conservation, surface exposed and highly
reactive. However, we found no direct correlation among any of the three measures.
Finally, we were able to show that many of the epitope residues involved in virus
neutralization and/or in vivo survival in other Flavivirus species, in humans as well as
murine models, are conserved in the ZIKV E protein. While many of the well-known virus
neutralizing monoclonal antibodies described for dengue and West Nile such as, E16,
3H5, 1A1D-2, 4E11, E53 [Pierson 2008], 5H2, E24 and E34 [Diamond 2008] are not
conserved in ZIKV. Nevertheless, our study showed that the epitopes from 65 human
and 221 mouse neutralizing monoclonal are conserved with ZIKV E. Conversely, we
identified 10 site that are unique to ZIKV E (low sequence identity, but immunogenic in

10
other Flaviviruses). Such unique regions would have important diagnostic potential, as
responses directed against these particular sequences would provide evidence of past
ZIKV infections, making it possible to discriminate among other Flavivirus infections and
would provide a potential tool to study the suspected link with birth defects.

In addition to the E protein, the non-structural 1 protein (NS1) is a major target of


antibody reactivity among Flaviviruses. Comparison of ZIKV NS1 sequences to known
Flaviviruses reactivity is also of considerable interest due to its role in pathogenesis,
having both membrane-bound and soluble forms, its potential use as a target for viral
inhibition therapeutic agents and because of its demonstrated utility as a biomarker for
detection of early infection. Recent structural analyses of ZIKV NS1 compared to other
Flaviviruses suggest significant differences for the ZIKV protein, which may in turn
contribute to the unique pathogenesis observed for this virus [Song 2016]. A second
pipeline analysis was therefore performed on NS1 to determine sequence identity and to
map existing NS1 epitope data onto the ZIKV. We conclude that the overall sequence
conservation for NS1 is similar to that of E, and that while more NS1-specific data are
currently available for murine hosts compared to human subjects, there is a large degree
of overlap among highly reactive regions. Finally, an analysis of surface accessibility
between NS1 and E revealed an interesting juxtaposition, namely that 50% of NS1
residues are buried with only 25% exposed, whereas only 20% of residues for the E
protein are exposed and 40% are exposed; nevertheless both antigens are major targets
of antibody reactivity.

The data presented herein are encouraging as they suggest that there is
correspondence of potentially protective sites among the relative newcomer ZIKV and
other more well-studied Flaviviruses. Further, this work demonstrates the potential utility
of using existing epitope data from antigenically homologous and/or taxonomically-
related organisms to analyze potential overlap for emerging threats that were heretofore
understudied or entirely new. Going forward, it will be important to repeat these analyses
to include other antigens, such as NS1, NS4 and NS5 for full T cell epitope analysis.

REFERENCES

1. Fauci AS, Morens DM: Zika Virus in the Americas - Yet Another Arbovirus
Threat. N Engl J Med 2016, 374:601-604.
2. Driggers RW, Ho CY, Korhonen EM, Kuivanen S, Jääskeläinen AJ, Smura T,
Rosenberg A, Hill DA, DeBiasi RL, Vezina G, Timofeev J, Rodriguez FJ, Levanov
L, Razak J, Iyengar P, Hennenfent A, Kennedy R, Lanciotti R, du Plessis A,
Vapalahti O. Zika Virus Infection with Prolonged Maternal Viremia and Fetal
Brain Abnormalities. N Engl J Med. 2016 Mar 30. [Epub ahead of print]

11
3. Heymann DL, Hodgson A, Sall AA, Freedman DO, Staples JE, Althabe F,
Baruah K, Mahmud G, Kandun N, Vasconcelos PF, et al: Zika virus and
microcephaly: why is this situation a PHEIC? Lancet 2016.
4. Rubin EJ, Greene MF, Baden LR: Zika Virus and Microcephaly. N Engl J Med
2016.
5. Katoh, Kazutaka, and Daron M. Standley. 2013. “MAFFT Multiple Sequence
Alignment Software Version 7: Improvements in Performance and Usability.”
Molecular Biology and Evolution 30 (4): 772–780.
6. Shandar Ahmad, M. Michael Gromiha, Hamed Fawareh and Akinori Sarai BMC
Bioinformatics (2004) 5:51.
7. Sukupolvi-Petty S, Austin SK, Engle M, Brien JD, Dowd KA, Williams KL,
Johnson S, Rico-Hesse R, Harris E, Pierson TC, Fremont DH, Diamond MS.
Structure and function analysis of therapeutic monoclonal antibodies against
dengue virus type 2. J Virol. 2010;84(18):9227-9239.
8. Holmes, E. C., and S. S. Twiddy. 2003. The origin, emergence and evolutionary
genetics of dengue virus. Infect. Genet. Evol. 3:19-28.
9. Rico-Hesse, R. 1990. Molecular evolution and distribution of dengue viruses type
1 and 2 in nature. Virology 174:479-493.
10. Dowd KA, Mukherjee S, Kuhn RJ, Pierson TC. Combined effects of the structural
heterogeneity and dynamics of flaviviruses on antibody recognition. J Virol.
2014;88(20):11726-11737.Pierson TC, Diamond MS. A game of numbers: the
stoichiometry of antibody-mediated neutralization of flavivirus infection. Prog Mol
Biol Transl Sci. 2015;129:141-166.
11. Dowd KA, Pierson TC. Antibody-mediated neutralization of flaviviruses: a
reductionist view. Virology. 2011 Mar 15;411(2):306-315.
12. Fibriansah G, Ibarra KD, Ng TS, Smith SA, Tan JL, Lim XN, Ooi JS,
Kostyuchenko VA, Wang J, de Silva AM, Harris E, Crowe JE Jr, Lok SM.
DENGUE VIRUS. Cryo-EM structure of an antibody that neutralizes dengue virus
type 2 by locking E protein dimers. Science. 2015 Jul 3;349(6243):88-91.
13. Fibriansah G, Tan JL, Smith SA, de Alwis R, Ng TS, Kostyuchenko VA, Jadi RS,
Kukkaro P, de Silva AM, Crowe JE, Lok SM. A highly potent human antibody
neutralizes dengue virus serotype 3 by binding across three surface proteins. Nat
Commun. 2015;6:6341.
14. de Alwis R, Smith SA, Olivarez NP, Messer WB, Huynh JP, Wahala WM, White
LJ, Diamond MS, Baric RS, Crowe JE Jr, de Silva AM. Identification of human
neutralizing antibodies that bind to complex epitopes on dengue virions. Proc
Natl Acad Sci U S A. 2012;109(19):7439-7444.
15. Heinz FX, Stiasny K. Flaviviruses and their antigenic structure. J Clin Virol.
2012 Dec;55(4):289-295.
16. Diamond MS, Pierson TC. Molecular Insight into Dengue Virus Pathogenesis and
Its Implications for Disease Control. Cell. 2015;162(3):488-492.
17. Pierson TC, Fremont DH, Kuhn RJ, Diamond MS. Structural insights into the
mechanisms of antibody-mediated neutralization of flavivirus infection:
implications for vaccine development. Cell Host Microbe. 2008;4(3):229-238.

12
18. Sirohi D, Chen Z, Sun L, Klose T, Pierson TC, Rossmann MG, Kuhn RJ. The 3.8
Å resolution cryo-EM structure of Zika virus. Science. 2016 Mar 31. pii: aaf 5316.
[Epub ahead of print].
19. Kostyuchenko VA, Lim EX, Zhang S, Fibriansah G, Ng TS, Ooi JS, Shi J, Lok
SM. Structure of the thermally stable Zika virus. Nature. 2016; doi:
10.1038/nature17994. [Epub ahead of print]
20. Diamond MS, Pierson TC, Fremont DH. The structural immunology of antibody
protection against West Nile virus. Immunol Rev. 2008;225:212-225.
21. Weiskopf D, Angelo MA, Azerdo EL, Sidney J, Greebaum JA, Fernando AN,
Broadwater A, Kolla, RV, de Silva AD, de Silva AM, Mattia KA, Doranz BJ, Grey
HM, Shresta S, Peters B, Sette A. Proc Natl Acad Sci U S A. 2013 May
28;110(22):E2046-53. doi: 10.1073/pnas.1305227110. Epub 2013 Apr 11.
22. Muller DA, Young PR. The Flavivirus NS1: Molecular and structural biology,
immunology, role in pathogenesis, and application as a diagnostic biomarker.
Antiviral Res 2013;98:192-208.
23. Song, H., Qi, J., Haywood, J., Shi, Y., Gao, G.F. Zika virus NS1 structure reveals
diversity of electrostatic surfaces among Flaviviruses. Nat.Struct.Mol.Biol 2016;
23(5):456-458.
24. Akey DL, Brown WC, Dutta S, Konwerski J, Jose J, Jurkiw TJ, DelProposto J,
Ogata CM, Skiniotis G, Kuhn RJ, Smith JL. Flavivirus NS1 structures reveal
surfaces for associations with membranes and the immune system. Science
2014;343:881-885.

ACKNOWLEDGMENTS

We would like to thank Dr. Shandar Ahmad for his assistance with the ASAview
analysis. This work was supported by the Immune Epitope Database and Analysis
Program, contract # HHSN272201200010C.

13
Table 1. ZIKV sequences retrieved from NCBI Protein database

14
Table 1. ZIKV sequences, continued.

15
Figure 1. The degree of sequence conservation for consensus residue at each position
for ZIKV polyprotein. The redline represents the sequence identity and the blue
represents the percentage of residue deletions (if any). Deletions present at N- or C-
termini are likely result of sequencing error (not biological significance).

16
Table 2. Summary on ZIKV protein sequence identity

17
Table 3. Pairwise sequence comparison of E protein from other Flaviviruses

ZIKA DENV1 DENV2 DENV3 DENV4 YFV WNV JEV


ZIKA 100.0 57.8 53.5 58.2 55.8 42.7 54.6 55.1
DENV1 100.0 68.9 77.9 63.8 44.0 51.5 50.8
DENV2 100.0 68.6 63.8 45.9 48.1 48.0
DENV3 100.0 63.5 42.5 47.2 49.2
DENV4 100.0 41.1 50.5 49.0
YFV 100.0 44.3 44.4
WNV 100.0 78.4
JEV 100.0

18
a.

b.

Figure 2. Immunome Browser plot of antibody response frequency scores (RFscores)


for Dengue viruses: a) Human and b) Mouse.

19
a.

b.

Figure 3. The Flavivirus sequence identity and response frequency visualized onto the
ZIKV E protein: a) Human reactivity and b) Mouse reactivity. The sequence identity data
shown (red) represent a running average (window of 9 aa) while the response frequency
scores represent individual residues.

20
Table 4. Conservation and response frequency of ZIKV E protein

21
Table 5. Unique sites on E with low sequence ID to other Flaviviruses

Low Conservation (% Location on E


identity)
68-70 (12.5%) DII
138-142 (12.5-25%) DI
149-153 (12.5%) DI
157-168 (12.5-25%) DI
199-201 (12.5%) DII
232-235 (12.5-25%) DII
280-283 (12.5%) DII
343-347 (12.5-25%) DIII
349-352 (12.5-25%) DIII
374-376 (12.5%) DIII

22
Figure 4. Comparison of surface accessibility as measured by SASA scores (green) with
human antibody responses. All data points above 40% (horizontal green line) are
considered surface exposed.

23
Figure 5. 3D rendering of sequence identity, lower bound 95% confidence interval
(RFscore), and surface accessible scores (SASA) mapped onto the ZIKV E protein
structure PDB 5IRE.

24
Figure 6. 3D rendering of all reported neutralizing and protective (in vivo survival)
antibody epitopes defined for other Flaviviruses mapped onto ZIKV E protein. Matching
residues for humans (red) included: 73, 76, 78, 79, 100, 101, 103, 104, 106, 107, 108,
109 111, 309, 316, 391, 487. Matching residues for mice (blue) include 67, 98, 99, 101,
102, 104, 106, 107, 108, 118, 182, 222, 236, 266, 309, 316, 323, 332, 334, 335, 336,
338, 386, 392, 397, 398, 399, 400, 401. Not all residues are annotated due to space
constraints.

25
Table 6. Neutralizing mAbs

26
Table 6. Neutralizing mAbs, continued

27
28
Table 7. Pairwise sequence comparison of NS1 from other Flaviviruses

ZIKA DENV1 DENV2 DENV3 DENV4 YFV WNV JEV


ZIKA 100.0 53.5 55.0 55.2 52.7 47.4 56.1 56.7
DENV1 100.0 72.4 75.7 68.5 41.6 49.7 51.1
DENV2 100.0 73.3 71.0 43.6 55.1 52.8
DENV3 100.0 72.4 42.5 52.3 50.9
DENV4 100.0 41.6 51.4 51.7
YFV 100.0 43.6 46.2
WNV 100.0 77.8
JEV 100.0

29
a.

100 1
Lower bound 95% CI Seq ID to ZIKV SASA Score
90
80

Lower bound 95% CI


70
60
Percentage

50 0.5
40
30
20
10
0 0
100
109
118
127
136
145
154
163
172
181
190
199
208
217
226
235
244
253
262
271
280
289
298
307
316
325
334
343
352
1
10
19
28
37
46
55
64
73
82
91

Position along ZIKV NS1 protein


b.

100 1
Lower bound 95% CI Seq ID to ZIKV SASA Score
90
80

Lower bound 95% CI


70
60
Percentage

50 0.5
40
30
20
10
0 0
100
109
118
127
136
145
154
163
172
181
190
199
208
217
226
235
244
253
262
271
280
289
298
307
316
325
334
343
352
1
10
19
28
37
46
55
64
73
82
91

Position along ZIKV NS1 protein

Figure 7. The Flavivirus sequence identity and response frequency visualized onto the
ZIKV NS1 protein: a) Human reactivity and b) Mouse reactivity. The sequence identity
data shown (red) represent a running average (window of 9 aa) while the response
frequency scores represent individual residues.

30
‘β -ladder’ (181-352)
a. b.

‘Wing’
(30-180)

Color coding: ‘β-roll’ (1-29)


Blue = high Seq ID, RFscore, SASA
Orange, Yellow, Green, Cyan = intermediate
Red = low Seq ID, RFscore, SASA

Figure 8. 3D rendering of lower bound 95% confidence interval (RFscore) mapped onto
the ZIKV NS1 protein structure PDB 5IY3. A) Human reactivity and B) Mouse reactivity.

31
Table 8. Conservation and response frequency of ZIKV NS1 protein

High Conservation (% High RFscore (Lower CI) Location on NS1


identity)
4-6 (61-63%) 4-6 (0.31-0.36) β-roll
18-28 (61-65%) 18-19 (0.35) β-roll
39-46 (61-63%) 39-46 (0.44) Wing
56-68 (61-66%) NA -
81-91 (61-75%) NA -
119-123 (65-71%) 119-123 (0.37-0.67) Wing
137-138 (61%) 137-138 (0.26) Wing
149-173 (61-76%) 149-152 (0.24) Wing
197-215 (61-79%) NA -
223-263 (61-80%) 252-261 (0.22) β-ladder
266-274 (62-69%) NA -
283-288 (61-69%) NA -
296-309 (62-84%) NA -
312-325 (62-82%) NA -
331-343 (63-99%) 333-343 (0.28-0.31) β-ladder
348-352 (61-73%) 348-350 (0.26) β-ladder

*Residue numbering represents positions on the E protein ORF and not on the
polyprotein. Sequence conservation and RFscore regions are presented in
corresponding rows so regions of interest align. NA, not applicable; Structural features
on NS1: β-roll = 1-29; Wing region = 30-180; β-ladder = 181-352

32
Table 9. Unique sites on NS1 with low sequence ID to other Flaviviruses

Low Conservation (% identity) Location on NS1


98-114 (26-35%) Wing
128-132 (19-31%) Wing

33
Supplemental Figure 1. Immunome Browser plot of antibody response frequency
scores (RFscores) for WNV (all hosts).

34
Revision history

Version number: 1.0


Date: 05/17/2016
Archive location: http://moles.liai.org/ZIKV_report_1.0.pdf

Version number: 1.1


Date: 05/21/2016
Description: figure/table renumbering and minor reformatting; updating on link to the
interactive figure 5; addition of reference; fixing minor typographic errors.
Archive location: http://moles.liai.org/ZIKV_report_1.1.pdf

Version number: 1.2


Date: 05/24/2016
Description: ZIKV NS1 data were added, as reflected in the text and additional tables
and figures.
Archive location: http://moles.liai.org/ZIKV_report_1.2.pdf

35

You might also like