Tuesday, March 13, 2018

Making Sense of Genes by Kostas Kampourakis

Kostas Kampourakis is a specialist in science education at the University of Geneva, Geneva (Switzerland). Most of his book is an argument against genetic determinism in the style of Richard Lewontin. You should read this book if you are interested in that argument. The best way to describe the main thesis is to quote from the last chapter.

Here is the take-home message of this book: Genes were initially conceived as immaterial factors with heuristic values for research, but along the way they acquired a parallel identity as DNA segments. The two identities never converged completely, and therefore the best we can do so far is to think of genes as DNA segments that encode functional products. There are neither 'genes for' characters nor 'genes for' diseases. Genes do nothing on their own, but are important resources for our self-regulated organism. If we insist in asking what genes do, we can accept that they are implicated in the development of characters and disease, and that they account for variation in characters in particular populations. Beyond that, we should remember that genes are part of an interactive genome that we have just begun to understand, the study of which has various limitations. Genes are not our essences, they do not determine who we are, and they are not the explanation of who we are and what we do. Therefore we are not the prisoners of any genetic fate. This is what the present book has aimed to explain.
If you are interested in real facts about genes and the history of gene definitions, then you will be sorely disappointed because the author has fallen for the ENCODE hype. Similarly, if you want to know about genomes and junk DNA then don't read this book. The author takes his cues from Junk DNA by Nessa Carey and The Deeper Genome by John Parrington.

Genomes and junk are the topics that interest me so let's look at some other excerpts from the book, keeping in mind that the main part of the book is about genetic determinism and the large-scale phenotypic effects of genes and alleles.

The concept of a "gene" was poorly defined in the first part of the twentieth century. That fuzzy definition is still common today. It imagines a gene as a nebulous entity responsible for some visible trait. It's the way most people still think of a gene and it's the way students are often taught when they study genetics. Kostas Kampourakis does a pretty good job of describing the history of this idea up until 1953.

The next stage is something he calls the "molecularization" of genes. That's the transformation from a gene as the subject of genetics to the idea that a gene is the subject of biochemistry and molecular biology. This is an important shift and the author is justified in emphasizing the transformation.

From this point on, the book gets pretty confusing. The part I like is that the author doesn't get bogged down in the old-fashioned idea that genes only encode proteins. From fairly early on in the book he recognizes that a gene can specify either a protein or a functional RNA.1 So far, so good.

The problems begin when he starts describing all the things that make a precise definition of a gene so difficult. Rather than treat these as exceptions that can be accommodated by a good working definition [What Is a Gene?], he focuses on the problems ...
Regulatory sequences, discontinuous genes, overlapping genes, trans-splicing, RNA editing, among other things, have made impossible the structural individualization of genes on DNA. Looking more closely into the phenomena presented in this chapter might make one argue that the RNA transcript should be considered as the "true" gene. ... The important conclusion from all these phenomena is that DNA does not contain distinct segments corresponding to the genes it is supposed to contain, or, in other words, that genes cannot be structurally individuated. These phenomena can therefore put the existence of genes into doubt. Do genes really exist? Perhaps they are a heuristic tool for research but nevertheless a human invention that we are still trying to force into existence.
Kampourakis has created a problem for himself by failing to point out that there are functional DNA sequences that don't count as genes using the molecular definition (regulatory sequences, centromeres, origins of replication) but do count as "genes" in the classic genetic sense since mutations in these sequences can produce an effect on the organism. His description would be much clearer if he had made this distinction.

In addition, he got confused by reading the ENCODE papers and falling for their paradigm shaft about the nature of genes [What is a gene, post-ENCODE?].

Now let's look at how the author deals with junk DNA. It's the subject of Chapter 11: "Genomes Are More than the Sum of Genes." That's an interesting title. It's correct, of course, especially if you take into account essential DNA sequences that aren't genes. However, it's a bit late in the book to be bringing up this topic. Here's what he says on page 210.
Is 98 percent of our DNA meaningless, as in the [example] above? Is it really "junk," perhaps the relic of our evolutionary history during which DNA sequences were simply accumulated? The answer is no, and in this chapter I explain why. The relevant knowledge has been emerging during the recent years, and we have come to know that much of what we used to call "junk" DNA seems to have important functions, particularly in the regulation of the the expression of genes.
As I pointed out above, Kampourakis should have addressed this point early on when he was discussing how to define a gene. He left readers with the impression that the only important genome sequences were genes. He brings up the old canard that protein-coding regions are the only ones that count and all the rest was thought to be junk. Now he proposes to refute this strawman by explaining what he should have made clear 100 pages earlier.2

In fairness, he notes that the strawman view has been challenged in the past.
However, it should be noted that although the details have emerged recently, several researchers had been long aware that "junk" DNA was not entirely useless and that some DNA that does not code for proteins has important roles (Palazzo & Gregory, 2014).
I find it interesting that he quotes a four-year-old paper from my colleagues where they explain the real history of the problem. The details have not emerged recently as Kampourakis claims. We've known about important non-coding DNA for 50 years!

So, what is this recent data that calls into question the existence of junk DNA? You can probably guess the answer. Kampourakis recognizes that the genes for transfer RNAs (tRNAs) and ribosomal RNAs (rRNAs) had been identified long ago. But then he says,
By that time [late 1960s], it had already become clear that nontranslated or noncoding RNA molecules, such as rRNA and tRNA, have an important role in gene expression. But as the ENCODE project showed, there are other functional sequences outside protein-coding genes, which encode certain noncoding RNA molecules. This led to the expanded definition of genes presented in Chapter 4, which includes the genes for noncoding RNA as well. Except for tRNA and rRNA, these genes encode other types of RNA molecules, such as small nuclear RNAs (snoRNAs) that are involved in RNA editing and micro RNAs (miRNAs) that have important regulatory functions. Although the details are still under study, the emerging evidence suggest there are a lot more genes encoding regulatory RNAs than proteins in the human genome (Morris & Mattick, 2014).
There are several things wrong with those sentences. For one thing, it totally misrepresents the history of the field. Noncoding RNAs such as snRNAs, miRNAs, and others were well known for many decades before ENCODE was started. Also, the definition of a gene as a DNA sequence that specifies a functional RNA was common in textbooks long before ENCODE. The ENCODE results did not prompt a serious revision of the definition of a gene in spite of the claims of ENCODE researchers. Finally, it is not true that there are more genes for regulatory RNAs than for proteins. (There are about 20,000 protein-coding genes.) The final results are not in but it's very unlikely that there are 20,000 different genes for noncoding RNAs. And even if that statement turns out to be true, it doesn't represent a significant fraction of the genome.

It's clear that Kampourakis is solidly in ENCODE camp and it's clear that he does not understand the Palazzo & Gregory paper and does not understand the evidence for junk DNA [Five Things You Should Know if You Want to Participate in the Junk DNA Debate].

Some beating of dead horses may be ethical, where here and there they display unexpected twitches that look like life.

Zuckerkandl and Pauling (1965)

Sandwalk readers are probably annoyed at me for beating a dead horse but here's the problem. It's been more that ten years since the initial ENCODE results were published and more than five years since the main results were published in 2012 (along with the massive publicity campaign). Criticisms of the ENCODE hype have been widely available in the scientific literature and elsewhere since 2007. Many experts in evolutionary biology have explained the evidence for junk DNA and pointed out the limitations of the ENCODE conclusions.

All of this information is available to anyone who studies the problem. All knowledgeable scientists recognize that the case for junk DNA is very strong. Kampoourakis addresses some of this criticism—notably the lack of conservation of presumed functional RNAs—but he ignores most of the other criticisms. Why? Why do so many authors perpetuate the ENCODE hype in the face of so much evidence that it's wrong? Is it because the publicity campaign organized by ENCODE researchers—with the help of Nature and Science—was so effective that it continues to overwhelm any attempt to correct the record? That's not a very good excuse for someone who is supposed to do the research before publishing a book on the subject of genes and genomes.

1. He's not very consistent. There are times in the second half of the book when he talks about genes as sequences that encode proteins.

2. Keep in mind that we include introns when we define a gene as a sequence that's transcribed. Thus, intron-containing protein-coding sequences make up 25% of our genome and known noncoding genes account for another 5%. Genes occupy 30% of our genome—a fact that should be mentioned in a book about genes.

Palazzo, A.F. and Gregory, T.R. (2014) The Case for Junk DNA. PLOS Genetics, 10:e1004351. [doi: 10.1371/journal.pgen.1004351]


  1. I don't have a intimate intellectual connection with genetics.
    yet its surprising he said , as a summery, Genes do nothing on their own,. no GENES for characters or diseases.
    I don't know but that seems a rejection of popular impressions of genes.
    I do suspect genes don't HAVE DIISEASE in them.
    i suspect its all breakdowns in the genetic system. so nobody has a gene for cancer from their family etc.
    Just possibly a weakness in gene systems , on a probability curve, leads to sooner cancer relative to the mean.

    Its also interesting that this denying genes are responsible for this or that is a rejection of evo psych in a previous thread.

  2. Obviously, nobody checked the orientation of the DNA molecules on the cover.

  3. Robert must be tolerated because on occasion he comes out with a real gem:

    I don't have a intimate intellectual connection with genetics.

  4. I think he's oversimplifying to say that genes dont code 'for' things. For most genes thats true. Genes that are embedded deep within networks cannot be assigned a simple role, but the genes at the terminals at the ends of the networks very often do. The gene that codes for the enzyme that makes the pigment that makes corn cobs purple, and when mutated gives yellow cobs, really does code for 'purpleness'

  5. John Harshman. Well done and you got me. thats funny. i didn't notice that. i just meant i know little or less about genetics (although some lines of thought bump into creationist thought and i confidently partake then).
    However I'm someone who insists genetics has nothing to do with the intellect of humans.
    We are immaterial souls and thats the thinker. the rest is just a giant memory operation which is physical material .
    Thats my only excuse for not noticing what i said.

  6. Larry and I have had this conversation already – an I apologize for rehashing.

    Disagreeing with Larry fills me with trepidation – but loyalty is in my German DNA and I still defer to my profs of the old school of classical genetics.

    Genes can have dominant alleles, that and there are two mechanisms for dominance:

    1 – dominance in trans
    2 – dominance in cis

    Larry would agree with #1 but dispute #2 - claiming regulatory regions of DNS do not constitute a “gene”

    This is inherently counter-intuitive to me - what is a “gene” if not some factor or unit of inheritance responsible for a trait? That is how “genes” were discovered by Muller: if you can zap it and mutate it – then it’s a gene.

    Of course – with the advent of the New Synthesis, matters became more complicated as demonstrated by J. B. S. Haldane, Ernst Mayr and the Beanbag genetics dispute.


    So where does this leave us today?

    But from what I can garner – the notion of “gene” is in flux and may return to something originally along the lines of what Johannsen, the originator of the term, first proposed. Here’s the problem: you can’t always reliably predict the phenotype from the genotype:


    Johannsen was remarkably prescient.

    To cite Ford Doolittle:

    Minimally, gene means more than it used to mean…
    … regulatory loci are also informational even if not transcribed…

    If I may summarize in broad strokes: if heritable information is by definition “genetic” – then regulatory sequences such as the “operator gene” as first described by Jacob and Monod is no differently “genetic” – part and parcel of whatever we finally end up defining as a “gene”

  7. The logic is this:
    1. ENCODE was a big expensive project
    2. Big expensive projects are inherently better and more important science than smaller projects
    3. ENCODE is better, more important, and therefore correct because it was big and expensive

    Not the best logic, but logic nevertheless.

  8. The dead horses quote comes from:
    Zuckerkandl E & Pauling L (1965) Evolutionary divergence and convergence in proteins, pp. 97-166 in Evolving Genes and Proteins, p.101
    Another Zuckerkandl E & Pauling L (1965) paper in J. Theoret. Biol. makes the most prescient comment on the role of DNA in evolutionary history: "there probably is more evolutionary history inscribed in the base sequence of nucleic acids than in the amino-acid sequence of corresponding polypeptide chains. By its implications, a degenerate code thus emphasizes the role of nucleic acids as “master molecules” over polypeptides,"

    These is can be seen in the determination of the genome of neanderthals and denisovans together with the evaluation of their evolutionary significance in comparison to the modern human genome

  9. Another gem:

    Thats my only excuse for not noticing what i said.

  10. I will try this one last time.

    What is a “gene”? That notion has been consistently revised:

    One gene = one enzyme
    One gene = one polypeptide
    One gene = one cistron

    The notion of gene was successively atomized in the hands of Molecular Biologists. The problems caused by “atomization” is common to all problems of reductionism (as a philosophical notion)

    Ford Doolittle touched on this concept by citing Djebali et al.

    FD:Increasingly, genomics is expanding the boundaries of information as geneticists have typically understood it. Minimally, gene means more than it used to mean. Djebali et al. (Nature 489(7414):101–108) write

    …the determination of genic regions is currently defined by the cumulative lengths of the isoforms and their genetic association to phenotypic characteristics, the likely continued reduction in the lengths of intergenic regions will steadily lead to the overlap of most genes previously assumed to be distinct genetic loci. This supports and is consistent with earlier observations of a highly interleaved transcribed genome, but more importantly, prompts the reconsideration of the definition of a gene. As this is a consistent characteristic of annotated genomes, we would propose that the transcript be considered as the basic atomic unit of inheritance. Concomitantly, the term gene would then denote a higher-order concept intended to capture all those transcripts (eventually divorced from their genomic locations) that contribute to a given phenotypic trait.

    As I understand it - Ford Doolittle is tipping his hat to Ernst Mayer's elucidation of the error of “Beanbag Genetics”. All “atoms” of inheritance (the cistrons”?) are simultaneously pleiotropic usually operating in overlapping and redundant networks. Meanwhile, all Phenotypes are multigenic. The modern notion of a gene requires revisiting, if I am understanding Doolittle correctly.

    Ford Doolittle continues where Djebali leaves off:

    However, regulatory loci are also informational even if not transcribed, and ENCODE has documented many long-range interactions between chromosomal regions that may be brought together physically in the nucleus, a very complex and structure-rich molecular machine, at some time during the cell cycle. Therefore, in this sense, the gross structure of the chromosome set also carries information that may be relevant to the function of genes, broadly defined

    Doolittle expands his critique further in


    I am a big fan of Doolittle’s musings. The only point I am attempting here (when citing Doolittle) would be a caveat - when one discusses genetics as an explanation of how inheritance operates at a phenotypic level – we may need to revisit the suggestion that …
    …One gene = one transcript
    … constitutes an outmoded model and in need of reconsideration

    My 5 cents…