Immunity as a function of the unicellular
state: implications of emerging genomic data

Donald R. Forsdyke*, Christopher A. Madill and Scott D. Smith

Trends in Immunology (2002)23, 575-579

With copyright permission from Elsevier Science Ltd. This version has small differences from the published version, two extra figures, and an end-note.

An elementary immune system

Polymorphism creates unpredictability

Junk DNA is transcribed

Repetitive elements transcribed in infected cells

Double-stranded RNA as an alarm signal

Purine-loading to avoid self-recognition

Intracellular protein “immune receptors”


End Note

Instead of being greeted as supporting the growing corpus of immunological theory, recent advances in the bioinformatic analysis of genomes have often surprised the discoverers and failed to attract the attention of immunologists.

    The view that multicellular immune systems  are adaptations of already highly evolved unicellular immune systems that are capable of self/not-self discrimination can assist our comprehension of phenomena such as “junk” DNA, genetic polymorphism and the ubiquity of repetitive elements. 

    The “hidden transcriptome,” revealed by run-on transcription of genes or repetitive elements contains a diverse repertoire  of RNA “immune receptors,” with the potential to form double-stranded RNA with viral RNA “antigens,” thus triggering intracellular alarms.

Unicellular organisms are likely to have evolved some 800 million years before multicellular organisms. Brücke dubbed single cells “elementary organisms [1] implying that many multicellular level functions might have prototypic equivalents at the unicellular level. We here explore implications of the postulate that immune systems of multicellular organisms arose as extensions of immune systems preexisting at the unicellular level [2,3].

Fig. 1. The cell as an elementary immune organism. The left circle (a) represents a multicellular organism with Y-shaped antibodies of various specificities. The right circle (b) represents a unicellular organism with a repertoire of “antibody-like” protein and “antibody-like” RNA molecules (stem-loop structures) These are referred to as “immune receptors” implying that parts of these molecules can interact with intracellular antigens.

An elementary immune system

In a clonal unicellular population where asexual reproduction predominates, self-destruction (i.e. apoptosis) is the simplest mechanism to prevent spread of a pathogen and to promote survival of a “selfish gene”. However, even such a primitive defense needs to be coupled to specific and adaptable sensors. We propose that such a sensory system is provided by a multiplicity of structurally distinct macromolecules, of which we emphasize here proteins and RNAs (Fig. 1). Many of these will have distinct properties (e.g. catalytic, structural, transporting, templating, etc.). 

    On the other hand, there is a high probability that, in the crowded cytosol [4], one or more of these molecules will be able to bind an invading virus with sufficient affinity to tag it as “not-self,” thus initiating an innate immune response. Such an “immunological repertoire” could develop either over evolutionary or, as in the case of antibodies, over somatic time [5]. Whatever the mechanism and timing of the diversification process, there is a need to eliminate receptors with an affinity for “self” antigens.

     Unfortunately, given the high replication and mutation rates of viruses relative to those of their hosts, it would be highly probable that viruses would preadapt to avoid interaction with hostile host macromolecules. What a virus had “learned” (by mutation and selective proliferation) in one host, it would exploit on the next host. New information on host genome polymorphism suggests that this difficulty may now not be so formidable as it once appeared (see below). 

    In an elementary unicellular immune system viruses that, through mutation, acquired the ability to inactivate host apoptotic mechanisms, would preferentially survive. In the ensuing arms race, an intracellular “inflammatory” host response would have evolved to limit viral activities. However, in multicellular organisms apoptosis of the primarily infected cell might limit the opportunity to alert other target cells and cells of the immune system (e.g. for MHC-peptide presentation). Sophistications developed at the multicellular level are considered below.

Polymorphism creates unpredictability

Fig. 2.  Specific and general functions of a protein as reflected in its structure. Dedicated functions are associated with conserved, internal, hydrophobic, globular domains. Potential immune receptor functions are associated with variable, external, hydrophilic, non-globular domains.  

On average, the haploid maternal and paternal contributions to your diploid genome are likely to differ from each other at least once every 0.5-2.0 kilobases, and general intraspecies differences may arise at least once every 185 bases [6]. Such polymorphism should decrease the extent to which a pathogen from one host can anticipate the genomic characteristics of its next host. When the polymorphism affects proteins, it probably affects sequences of relatively low complexity that correspond to hydrophilic non-globular domains at the protein surface [7]. Thus, these domains, usually not critical for the specialized function of the protein, are available for interaction with complementary molecular patterns of intracellular pathogens (“not-self;” Fig. 2). These same domains should also have the potential to react with “self” proteins, sometimes to an extent sufficient to trigger adverse responses in the host (intracellular “autoimmune” pathology). Organisms with mutations avoiding this would have been favoured over evolutionary time [8-10].


Junk DNA is transcribed

Had we not known of the existence of an antibody repertoire, the discovery of sets of V-genes would have been greeted with surprise. However, our surprise at learning that 98% of our DNA is non-genic has been somewhat blunted by a facile explanation, – “junk[11,12].  

    To be functional it is likely that non-genic DNA would have to be transcribed [13]. Recent investigations of the transcriptional activities of the ß-globin region of human chromosome 11 [14], and of entire chromosomes 21 and 22, reveal a “hidden transcriptome,” corresponding to a large number of low copy number cytoplasmic RNAs. It is estimated that there is “an order of magnitude” more transcriptionally active DNA than can be accounted for by conventional genes [15].  Can this be dismissed as mere cytoplasmic “junk,” an unavoidable consequence of the existence of genomic “junk”?  

    To understand its role, if any, in the economy of the organism, we need to know, by analogy with known transcriptional processes, whether there are specific promoters, whether there are dedicated RNA polymerases, whether transcription occurs randomly or under specific conditions, and whether transcripts are diverse and include appreciable non-repetitive DNA.


Repetitive elements transcribed in infected cells

Fig. 3.  Location of Alu elements is likely to permit downstream transcription of variable genomic segments. Alu and other repetitive elements are shown in part of the 100 kilobase segment of human chromosome one containing the two exon gene, G0S2

    Horizontal arrows indicate transcription directions of G0S2 (grey boxes), of Alu elements (red boxes demarcated by vertical dashed lines) and of other repetitive elements (cyan boxes). The abbreviated names of repetitive elements are printed vertically. 

    Purine-loading (excess of purines/kb over pyrimidines/kb; grey balls) and CpG frequency (dinucleotides/kb; green continuous line) were evaluated for 400 base windows moving in steps of 25 bases. When purine frequency equals pyrimidine frequency, purine-loading is zero. 

    Values for CpG frequency (plotted on the same scale) are zero or positive. The CpG peak (“CpG island”) associated with G0S2 indicates a gene expressed in the germ line. (Note that, if the sequences of a virus and its host are known, then it should be possible to locate host segments complementary to virus segments and, from displays such as this, determine the feasibility of their transcription.)

Much non-genic DNA consists of repetitive elements, the most prominent of which in humans are the 1,090,000 Alu elements [16]. Both conventional genes and repetitive elements can provide promotors for the transcription of non-genic DNA. Some gene transcripts have been found longer then expected due to a failure of transcriptional termination (“run-on” transcription; [17,18]). Some classes of repetitive element contain promoters from which transcription can initiate and extend beyond the bounds of an  element into neighbouring genomic regions [19-22].

    Are such extended transcripts generated randomly in time? In the case of Alu elements, transcription (by RNA polymerase III) has been observed to increase at times of cell stress (e.g. viral infection, heat shock). Indeed, viral infection can trigger the heat shock response with the induction of heat shock proteins (for Refs. see [23]). Thus, it is possible that Alu transcription reflects as adaptive response to virus infection (for Refs. see [24]).

    Is the location of Alu elements likely to permit downstream transcription of variable genomic segments? Figure 3 shows a segment of human chromosome one containing the G0/G1 switch gene 2 (G0S2), which is upregulated in activated lymphocytes [25]. The gene demonstrates the general phenomenon of “purine loading” (more purines than pyrimidines) which is characteristic of most RNAs of most organisms. Thus, when transcription is to the right of the promoter exons are purine-loaded, and when transcription is to the left of the promoter exons are pyrimidine-loaded (i.e. negatively purine-loaded). In both circumstances the RNAs end up being purine-loaded [26], for reasons discussed below.

     The transcription direction of G0S2 being to the right (indicated by the horizontal arrow in Fig. 3a), the gene and the corresponding mRNA are purine-loaded. This purine-loading extends for about a kilobase downstream of G0S2 into a region with no repetitive elements. Thus, if there were conditions such that transcription did not terminate, then the extended transcript would itself be purine-loaded and contain non-repetitive DNA. 

     Also shown in Figure 3 are various repetitive elements with assigned potential transcription directions. Although within a class of repetitive element there is some variability, by definition the repetitive elements themselves tend to diminish genome variability. However, the regions downstream of Alu elements are often devoid of other repetitive elements. For example, the pyrimidine-loaded leftward-transcribing Alu element downstream of G0S2 has a clear downstream region that retains the pyrimidine-loading of the original transcript. On the other hand, several kilobases upstream of G0S2 are two leftward-transcribing Alu elements, one of which transcribes into a region that is purine-loaded and contains repetitive elements of the L2 family. These results are illustrative of the general features of this genomic region. A parallel study of the region of a much smaller human chromosome containing the FOSB/G0S3 gene (chromosome 19; [27]), revealed a much tighter packing of repetitive elements (data not shown). 

Fig. 4. Two RNA molecules (blue and red) meeting, "kissing", and forming dsRNA. (For space reasons this figure was omitted from the final paper.)


Double-stranded RNA as an alarm signal

Although protein molecules can recognize specific nucleic acids (and the converse), it is convenient here to consider proteins recognizing proteins and RNAs recognizing RNAs. In the cytosol RNA molecules adopt characteristic stem-loop configurations (Fig. 1b), and RNA-RNA interactions can initiate by way of a “kissing” homology-search between bases at the tips of loops. If sequence complementary is found (e.g. G pairing with C, and A pairing with U) then two RNA species can pair, partially or completely, to generate a length of double-stranded RNA (dsRNA) that in some circumstances can play a regulatory role [Fig. 4; see ref. 28].

     If a virus introduced its own RNA into a cell, would there be sufficient variability among host RNA species for a host “immune receptor” RNA to form a segment of dsRNA with the “not-self” RNA of the virus? Calculations made elsewhere [29] show this to be feasible, especially if the entire genome were available for transcription. Would the dsRNA be able to initiate an adaptive intracellular “inflammatory” response? How would the host cell prevent generation of “self” dsRNAs?

     Formation of dsRNA has long been recognized as an early cellular response to viral entry. Protein synthesis can be inhibited non-specifically by very low concentrations of dsRNA [30].  This involves activation of dsRNA-dependent protein kinase (PKR), which inhibits a protein involved in the initiation of protein synthesis. Evasive viral strategies would include the acceptance of mutations to avoid formation of dsRNA (see below), and inhibition of cell components required for the formation of, or the response to, dsRNA [31,32].

      Virus-infected cells produce interferons, which can be considered part of the inflammatory response. The interferons induce a general anti-viral state spreading together with various chemokines from the cell of origin to other cells [33]. Their production is powerfully stimulated by dsRNA [34]. There is now growing evidence that, both in animals and plants, another more sequence specific “inflammatory” response to dsRNA arises as part of an intracellular mechanism for self/not-self discrimination [35]. Just as in the antibody response there is amplification of the production of specific antibody, so, courtesy of enzymes such as RNA-dependent-RNA polymerase and “dicer,” there is amplification of the production of specific “immune receptor” RNA (for Refs. see accompanying paper of Martinez et al. [36]).

Fig. 5. Run-on transcription reveals the "hidden transcriptome." (For space reasons this figure was omitted from the original paper.)


  Purine-loading to avoid self-recognition

Although it is currently believed that host cells detect dsRNA of virus origin [35], given the functioning of dsRNA as an alarm signal (Fig. 5), viruses should have evolved to avoid the formation of dsRNA replicative intermediates. Indeed, viruses with dsRNA genomes have adaptations that would appear to conceal their genomes from host cell surveillance mechanisms [37]. More than twenty base pairs are needed to activate PKR in vitro [38], or to silence specific genes [39]

    Among the RNA species of a cell there might be two whose members, by chance, happened to have enough base complementarity for formation of a mutual duplex of a length sufficient to trigger alarms. Thus, there would need to have been an evolutionary selection pressure favouring mutations in host RNAs that decrease the possibility of their interaction with other "self" RNAs in the same cell. In many cases mutations to a purine would assist this, since purines do not pair with purines. Indeed, interaction with “self” RNAs seems to have been avoided by "purine-loading" the loop regions of these RNAs, thus avoiding the initial loop-loop "kissing" reactions which precede more complete formation of dsRNA. The above-mentioned excess of purines, observed both at RNA and at DNA levels (in mRNA-synonymous DNA strands), is found in a wide variety of organisms and their viruses [26,40].          

     Exploratory "kissing" interactions between hybridizing nucleic acids involve transient base stacking interactions [28] with the exclusion of structured water. Such reactions have a strong entropy-driven component, and so might increase as temperature increases (i.e. fever [4]). Accordingly, purine-loading should be high in thermophiles, as is indeed found [41; R. Lambros, J. Mortimer and D. Forsdyke, unpublished work]

    Furthermore, proteins with a tendency to become involved in autoimmune reactions have acquired runs of charged amino acids with no known function at the protein level [42,43]. Charged amino acids correspond to codons rich in purines, which should countermand formation of dsRNA. Thus, the presence of runs of charged amino acids may be a consequence of the need to purine-load RNA, and not vice-versa.

    A general increase in transcription in cells exposed to “stress” (simulating virus invasion [44]), would dictate a period of preincubation without stress before testing for specific transcription. This has indeed been found as a requirement for studies with freshly explanted human lymphocytes [27].


Intracellular protein “immune receptors”

Amino acids in proteins do not pair on a one-to-one basis, like bases in nucleic acids. Nevertheless, similar considerations might apply in the case of protein molecules (Figs. 1b, 2). These would form heteroaggregates (aggregates of self-proteins with pathogen proteins), and “not-self” homoaggregates (aggregates of individual pathogen protein species) by mechanisms discussed elsewhere [4,8-10, 45,46]. Recent observations of diseases associated with protein aggregation suggest an interconnection between protein “self” and RNA “self” homoaggregates, which may both be required for disease progression [38, 46-47].


While the existence of an intracellular immune system remains unproven, a growing number of disparate observations appear comprehensible from this perspective. Non-genic “junk” DNA can be viewed in much the same way as we view the diverse genes encoding the variable regions of immunoglobulin antibodies. Just as B-cells capable of synthesizing a unique anti-self antibody would be eliminated during somatic time to prevent self-reactivity, so junk DNA would have been screened over evolutionary time (by positive selection of individuals in which favourable mutations had been collected together by recombination) to decrease the probability of two complementary “self” transcripts interacting to form dsRNA segments of more than 20 bases. High polymorphism of non-genic DNA would make it difficult for viruses to anticipate the “immune receptor” RNA repertoire of future hosts. Since viruses can be enriched for either purines or pyrimidines [29], the repertoire should include both purine-rich and pyrimidine-rich segments (Fig. 3). The initiating event is one of self/not-self discrimination, be it between two RNA species or between two protein species, and be it extracellular or intracellular.

Acknowledgements We thank Jim Gerlach for assistance with computer configuration, and Jerzy Jurka and coworkers for access to Repbase. The Canadian Bioinformatics Resource (Halifax) provided access to the GCG program suite. Andrew Reynolds kindly provided the Brücke text. Queen’s University hosts DRF’s web pages where full texts of several of the references may be found.

1 Brücke, E. (1861) Die Elementarorganismen. Sitzungsber. Math-Nat. Cl. K. Akad. Wiss. 44, 381-406

2 Forsdyke, D.R. (1991) Early evolution of MHC polymorphism. J. Theor. Biol. 150, 451-456

3 Forsdyke, D.R. (1992) Two signal model of self/not-self discrimination: an update. J. Theor. Biol. 154, 109-118

4 Forsdyke, D.R. (1995) Entropy-driven protein self-aggregation as the basis for self/not-self discrimination in the crowded cytosol.  J. Biol. Sys. 3, 273-287

5 Lewis, S.M. (1994) The mechanism of V(J)D joining. Adv. Immunol. 56, 27-150

6 Stephens, J.C. et al. (2001) Haplotype variation and linkage disequilibrium in 313 human genes. Science 293, 489-493

7 Bustamente, C.D. et al. (2000) Solvent accessibility and purifying selection within proteins of Escherichia coli and Salmonella enterica. Mol. Biol. Evol. 17, 301-308

8 Forsdyke,D.R. (2001a) Adaptive value of polymorphism in intracellular self/not-self discrimination. J. Theor. Biol. 210, 425-434

9 Forsdyke, D.R. (2001b) The Origin of Species, Revisited. McGill-Queen’s University Press

10 Forsdyke, D.R. (2001c) Functional constraint and molecular evolution. In Encyclopedia of Life Sciences, vol. 7, pp. 396-403, Nature Publishing Group, London

11 Ohno, S. (1972) So much junk DNA in our genome. Brookhaven Symp. Biol. 23, 366-370

12 Pennisi, E. (2002) Charting a genome’s hills and valleys. Science 296, 1601-1603

13 Mattick, J. S. (2001) Non-coding RNAs: the architects of eukaryotic complexity. EMBO Reps. 2, 986-991

14 Plant, K.E. et al. (2001) Intergenic transcription in the human ß-globin gene cluster. Mol. Cell. Biol. 21, 6507-6514

15 Kapranov, P. et al. (2002) Large-scale transcriptional activity in chromosomes 21 and 22. Science 296, 916-919

16 Jurka, J. et al. (1996) CENSOR: a program for identification and elimination of repetitive elements from DNA sequences. Comput. Chem. 20, 119-122

17 Heximer, S.P. et al. (1998) Expression and processing of G0/G1 Switch Gene 24 (G0S24/TIS11/TTP/NUP475) RNA in cultured human blood mononuclear cells. DNA Cell Biol. 17, 249-263

18 Iseli, C. et al. (2002) Long range heterogeneity at the 3’ ends of human mRNAs. Genome Res. 12, 1068-1074

19 Manley, J.L. and Colozzo, M.T. (1982) Synthesis in vitro of an exceptionally long RNA transcript promoted by an AluI sequence. Nature 300, 376-379

20 Feuchter, A.E. et al. (1992) Strategy for detecting cellular transcripts promoted by human endogenous long terminal repeats. Genomics 13, 1237-1246

21 Ferrigno, O. et al. (2001) Transposable B2 SINE elements can provide mobile RNA polymerase II promoters. Nature Genet. 28, 77-81

22 Nigumann, P. et al. (2002) Many human genes are transcribed from the antisense promoter of LI retroposon. Genomics 79, 628-634

23 Forsdyke, D.R. (1985) Heat shock proteins defend against intracellular pathogens. J. Theor. Biol. 115, 471-473

24 Kim, C. et al. (2001) Genome-wide chromatin remodelling modulates the Alu heat shock response. Gene 276, 127-133

25 Russell, L. and Forsdyke, D.R. (1991) A human putative lymphocyte G0/G1 switch gene containing a CpG-rich island encodes a small basic protein with the potential to be phosphorylated. DNA Cell Biol. 10, 581-591

26 Forsdyke, D.R. and Mortimer, J.R. (2000) Chargaff’s legacy. Gene 261, 127-137

27 Heximer, S.P. et al. (1996) Sequence analysis and expression in cultured lymphocytes of the human FOSB gene (G0S3). DNA Cell Biol. 12, 1025-1038

28 Eguchi, Y. et al. (1991) Antisense RNA.  Annu. Rev. Biochem. 60, 631-652

29 Cristillo, A.D. et al. (2001) Double-stranded RNA as a not-self alarm signal. J. Theor. Biol. 208, 475-491

30 Ehrenfeld, E. and Hunt, T. (1971) Double-stranded poliovirus RNA inhibits initiation of protein synthesis by reticulocyte lysates. Proc. Natl. Acad. Sci. USA 68,1075-1078

31 Elia, A. et al. (1996) Regulation of the double-stranded RNA-dependent protein kinase PKR by RNAs encoded by a repeated sequence of the Epstein-Barr virus genome. Nucleic Acids Res. 24, 4471-4478

32 Mittelsten Scheid, O. (1999) New tool for Swiss army knife. Nature 397, 25

33 Suzuki, K. et al. (1999) Activation of target-tissue immune-recognition molecules by double-strand polynucleotides. Proc. Natl. Acad. Sci. USA 96, 2285-2290

34 Marcus, P. (1983) Interferon induction by viruses: one molecule of dsRNA as the threshold for induction. Interferon 5, 115-180

35 Plasterk, R.H.A. (2002) RNA silencing; The Genome’s Immune System. Science 296, 1263-1265

36 Martinez, M.A. et al. (2002) RNA interference of HIV replication. Trends Immunol.

37 Bamford, D.H. (2002) Those magnificent molecular machines: logistics in dsRNA virus transcription. EMBO Reports 3, 317-318

38 Tian, B. et al. (2000) Expanded CUG repeat RNAs form hairpins that activate the double-stranded RNA-dependent protein kinase PKR. RNA 6, 79-87

39 Elbashir, S.M. et al. (2001) RNA interference is mediated by 21- and 22-nucleotide RNAs. Genes Devel. 15, 188-200

40 Saul, A. and Battistutta, D. (1988) Codon usage in Plasmodium falciparum. Mol. Biochem. Parasitol. 27, 35-42

41 Lao, P.J. and Forsdyke, D.R. (2000) Thermophilic bacteria strictly obey Szybalski's transcription direction rule and politely purine-load RNAs with both adenine and guanine. Genome Res. 10, 228-236

42 Brendel, V. et al. (1991) Very long charge runs in systemic lupus erythematosus-associated autoantigens. Proc. Natl. Acad. Sci. USA 88, 1536-1540

43 Dohlman, J.G. et al. (1993) Long charge-rich alpha-helices in systemic autoantigens. Biochem. Biophys. Res. Comm. 195, 686-696

44 Suzuki, T. et al. (2000) Control selection for RNA quantitation. Biotechniques 29, 332-337

45 Forsdyke, D.R. (1999) Heat shock proteins as mediators of "danger" signals: implications of the slow evolutionary fine-tuning of sequences for the antigenicity of cancer cells. Cell Stress Chaperones 4, 205-210 

46 Forsdyke, D.R. (2000) Double-stranded RNA and/or heat-shock as initiators of chaperone mode switches in diseases associated with protein aggregation. Cell Stress Chaperones 5, 375-376

47 Peel, A.L. et al. (2001) Double-stranded RNA-dependent protein kinase, PKR, binds preferentially to Huntington’s disease (HD) transcripts and is activated in HD tissue. Hum. Mol. Genet. 10, 1531-1538

End Note (27 Dec 2003)

The above view of the role of "junk DNA" predicted that large sets of low abundance "non-coding transcripts" would be a feature of many eukaryotic genomes and that, in view of the postulated role in intracellular aspects of immunological defenses, they would not be evolutionarily conserved. This was greatly supported by the discovery of multiple "non-coding" transcripts in cDNA libraries prepared from humans and mice. In a paper entitled "Complete Sequencing and Characterization of 21243 full-length human cDNAs", Toshio Ota and coworkers noted:

"It is interesting to note this type of "non-coding" transcripts was also found in mouse cDNA collections. ... What was significant was that [the] majority of the examined cDNAs were not evolutionally conserved. In this dataset of mouse genes, identification of 11665  similar transcripts (which would be categorized as "unclassified" according to our scheme) has also been reported. This suggests that there are little conservation for these "unclassified" transcripts and/or that there are huge numbers of such transcripts (at least in the order of 100000). Interestingly, ... we have recently examined the promoter activities of randomly isolated genomic DNA fragments on a large scale and observed that there are cryptic promoter activities throughout the genomic DNA (unpublished data). It may be possible that those cryptic promoters may act at low frequency to produce aberrant (or sporadic) transcripts."

Ota et al. (2004) Nature Genetics 36, 40-45.

Update on Heat-Shock Response and Self/Not-Self Discrimination (2004 abstract) (Click Here)

Return to Theoretical Immunology Index (Click Here)

Return to HomePage (Click Here)

Last edited on 19 Jan 2005 by Donald Forsdyke