Links & What I've Been Reading Q4 2022

Links

Cormac

1. The Passenger: A brief and imperfect guide for the perplexed. A bit over the top, but I thought this was the best piece on The Passenger.

The Passenger is an omni-dissolver, an intergalactic acid rain, a necromantic encyclopedia whose entries are unfamiliar tarot cards.

2. A new conversation with David Krakauer. 100% worth listening to.

3. An article by Krakauer in Nautilus: The Cormac McCarthy I Know. Montaigne, Wittgenstein, Schopenhauer, Melville, and more.

It is over tea and lunch with our friends and colleagues that we discussed everything. A typical day might include new results in prebiotic chemistry, the nature of autocatalytic sets, pretopological spaces in RNA chemistry, Maxwell’s demon, Darwin’s sea sickness, the twin prime conjecture, logical depth as a model of evolutionary history, Godel’s dietary habits, the weirdness of Spengler’s Decline of the West, and allometric scaling of the whale brain. I believe Cormac’s recent novels The Passenger and Stella Maris have their origins partly in this foment of ideas that connect domains of unyielding precision to the frailty of life and the militancy of society.

4. James Wood's review is pretty good: "To traffic in serious mathematics is to commune with truth; to traffic in words, to merely write novels, is to produce dim approximations of the truth. This is what too many colloquies at the Santa Fe Institute will do to a novelist’s self-esteem."

5. Joy Williams is also not bad: Great, Beautiful, Terrifying

Perhaps the business of The Passenger, for all its somber romanticism and Gnostic leanings, is to defer to this unconsciousness, to give shape to that which might well be the soul, or at least its most faithful companion.

McCarthy is not interested in the psychology of character. He probably never has been. He’s interested in the horror of every living creature’s situation.

6. This negative(!) review in Slate compares the book to Pynchon, DeLillo, Ellroy, and Lovecraft.

Machine Learning/AI

7. Building A Virtual Machine inside ChatGPT

So, inside the imagined universe of ChatGPT's mind, our virtual machine accesses the url https://chat.openai.com/chat, where it finds a large language model named Assistant trained by OpenAI. This Assistant is waiting to receive messages inside a chatbox.

8. On the persistent mental effects of looking at AI art: Relaxed/Flawed Priors As A Result Of Viewing AI Art. "Since this period of consuming a large amount of this flawed AI art, perhaps a dozen notable times, I've recognized myself initially parsing some visual stimulus in an incorrect way - one that maps to some flaw common in AI art - only to moments later consciously realize that I must have parsed the stimulus incorrectly and fix my initial perception."

9. Ebook semantic search using AI.

10. Wordcraft Writers Workshop

The Wordcraft Writers Workshop is a collaboration between Google's PAIR and Magenta teams, and 13 professional writers from a diverse set of creative writing backgrounds. Together we explore the limits of co-writing with LaMDA and foster an honest and earnest conversation about the rapidly changing relationship between technology and creativity.

11. Midwit AI: Inverse scaling can become U-shaped

12. Riffusion: using an image model to generate images of spectrograms, which are then turned into audio.

13. Nintil makes predictions about AI in 2026.

Forecasting

14. Scoring the midterm election forecasts from PredictIt, 538, and Manifold.

15. Prediction market does not imply causation

Take the other 95% of the proposed projects, give the investors their money back, and use the SWEET PREDICTIVE KNOWLEDGE to pick another 10% of the RCTs to fund for STAGGERING SCIENTIFIC PROGRESS and MAXIMAL STATUS ENHANCEMENT.

16. Michael Story: Why I generally don't recommend internal prediction markets or forecasting tournaments to organisations.

Metascience

17. Nintil: Limits and Possibilities of Metascience.

The failure of meta-entrepreneurship to establish deep links with entrepreneurship, given stronger incentives for improvement, makes me be pessimistic about the possibilities of these bidirectional linkages from manifesting in metascience. Hence I predict metascience and metascience entrepreneurship will continue walking separate paths: The next big NIH reform or new institution started will not be strongly influenced by academic or theoretical metascience.

The Rest

18. The Sweet Life: The Long-Term Effects of a Sugar-Rich Early Childhood. Using the end of WWII rationing in the UK to look at the effects of early sugar consumption. "Excessive sugar intake early in life led to higher prevalence of chronic inflammation, diabetes, elevated cholesterol and arthritis." Not entirely convinced, a lot of marginal/non-significant results, but Figure 5 is really wild.

19. On Galton: How to keep cakes moist and cause the greatest tragedies of the 20th century (Straussian)

Here’s a few highlights of Galton’s many experiments, studies, and investigations:

  • Tries to learn arithmetic by smell, succeeds

  • Worships a puppet to see if he can convince himself it has godlike powers, succeeds

  • Makes a walking stick with a hidden high-pitched whistle inside it, takes it to the zoo and whistles at all the animals (most don’t care, but the lions hate it)

  • Replaces the blood of a silver-grey rabbit with the blood of a lop-eared rabbit to see if it can still breed (it can)

  • Tells himself that everyone is spying on him to see if he can make himself insane, succeeds

  • Tries to consciously control all of his automatic bodily processes, nearly suffocates

  • Hears animal magnetism is all the rage, learns it in secret (it’s illegal), magnetizes 80 people

20. Scott Sumner on...Robert Louis Stevenson?! A very good piece that will probably add some items to your to-read list. "So what’s going on here? It cannot be that Stevenson is too difficult for the literary establishment, as he’s also popular with average readers. I suspect it is more nearly the opposite problem—Stevenson is too pleasurable. Some critics wrongly equate greatness with difficulty."

21. What it's like to dissect a cadaver. One of the many hidden benefits of living in the Bay Area?

22. The robot on EA. Don't fully endorse it, but quite interesting.

23. Walking with Nietzsche

The path that Nietzsche took is documented, so I followed him in his walking again (this time solo), starting with the Le Chemin de Nietzsche from the Hotel Cap Estel, the exact hiking trail Nietzsche took almost daily, now dedicated to him. The 2.5-mile arduous ascent with coastal views of the Mediterranean, which goes from the village of Èze bord-de-Mer to the main town of Èze, is perhaps the most beautiful hike I had ever summited, crowned by Èze’s Église Notre-Dame-de-l’Assomption, perhaps also the most sublime church I had ever seen.

24. Pynchon's archive.

25. Erik Hoel on the MFA's influence on literature.

Faulkner didn’t finish high school, recent research shows Woolf took some classes in the classics and literature but was mostly homeschooled, Dostoevsky had a degree in engineering. Joyce did major in literature, but even he entered medical school (before leaving), and also failed multiple classes in his undergraduate days. Not one of these great writers would now be accepted to any MFA in the country. The result of the academic pipeline is that contemporary writers, despite a surface-level diversity of race and gender that is welcomingly different than previous ages, are incredibly similar in their beliefs and styles, moreso than writers in the past.

26. Stuart Ritchie on the NIH deliberately crippling human genetics research because the results are politically inconvenient: The NIH's misguided genetics data policy.

Audio-Visual

27. And here's the 37-minute live version of Sister Ray.

What I've Been Reading

  • The Passenger/Stella Maris, by Cormac McCarthy. Dark and beautiful. This may well be the last great novel of the human era in literature. It would be fitting for the 89-year old McCarthy to be writing a coda for himself and humanity at the same time. Especially since he views the 20th century productions of science and engineering as far more important and groundbreaking than those in literature.

    The plot is mostly irrelevant. Both books consist mostly of conversations: bars and restaurants for the first, a psychiatric institution for the second. McCarthy grapples with every idea that's been on his mind for the last few decades: mathematics, physics, language, the unconscious, the sins of the father, Kant, evolution, psychology, gnosticism, genius. It's not just a novel of ideas, though—The Passenger is filled with yearning, regret, nostalgia, isolation...just an incredibly melancholic atmosphere in general. Stella Maris is geekier, and basically The Virgin Internal Voice vs The Chad Cerebration: The Novel.

    To the usual mix of Hemingway and Captain Ahab, McCarthy adds strains of Pynchon and DeLillo. It works.

    There's even a cool, oblique Borges allusion: toward the end, Bobby writes down a couple of lines from a 17th century German poet, Daniel von Czepko. Those lines form the epigram of A New Refutation of Time!

  • Tiger Technology: The Creation of a Semiconductor Industry in East Asia, by John a. Matthews. This book comes out of academic "management" studies, which entails a lot of bullshit. A lot of overdone abstract ideas that are never really tested, a lot of extremely silly diagrams, etc. And its predictions about the future (it came out in 2000) turned out quite wrong. Viewed purely as a collection of facts it's quite an interesting book, however.

  • Chip War: The Fight for the World's Most Critical Technology, by Chris Miller. Much stronger than the above, and also up to date as it just came out. Covers both the history of chip production across the world, as well as current issues and where they will lead in the future.

  • Bouvard and Pecuchet, by Gustave Flaubert. A comic(?) novel of ideas, which is also about Ideas. Quite weird, very bad, very good, not sure if I can really recommend it to anyone. Full review forthcoming.

  • Do Androids Dream of Electric Sheep?, by Philip K. Dick. Some interesting differences between the book and the movie. The latter is vastly superior. There's very little of the cyberpunk aesthetic present here, and Scott wisely ripped out almost everything about the artificial animals, the futuristic cult with the TV host antagonist, etc. Still, it's not bad.

  • Children of Dune, by Frank Herbert. I can confirm these get sillier and worse as the series goes on.

  • The Horse, the Wheel, and Language: How Bronze-Age Riders from the Eurasian Steppes Shaped the Modern World, by David W. Anthony. Pretty cool book on the origins of Indo-European, combining archaeological and linguistic evidence. Unfortunately it was written just before the ancient DNA era, so it contains some things we know today are inaccurate (though to Anthony's credit, he was leaning in the right direction). Dull in sections (dry lists of finds at various sites), but easily skimmable. It's difficult to recommend it when Reich's Who We Are and How We Got Here exists.

  • The Selected Poetry of Rainer Maria Rilke. I love some of his work, but overall not a fan of the average poem.

  • Flashman and the Mountain of Light, by George MacDonald Fraser. The audiobooks for this series are really well done. This time Flashman gets embroiled in the First Sikh War, a rather silly affair all around even without the fictional elements. Naturally, he gets his hands on the Koh-i-Noor. Not the best Flashman novel, but still good fun. The ending is pure perfection.

  • Murder as a Fine Art, by David Morrell. Historical detective fiction, in which an old, opium-addled De Quincey and his hot, spunky daughter are roped into a murder mystery and become citizen-detectives. Meticulously researched but not very good, unfortunately.

  • Nietzsche's On the Genealogy of Morals: Critical Essays. Just a dull collection of academic essays focusing on pointless minutiae and ignoring the big questions.




Forecasting Forecasting

The forecasting ecosystem is in a weird spot right now. The "traditional" approach began with large-scale experiments focusing on the wisdom of the crowds, market mechanisms, etc. People were inspired by the predictive power of financial markets and wanted to replicate their strengths in other domains—policy, diplomacy, war, disease, science.

We quickly began to see a divergence, starting with the "superforecaster" phenomenon: researchers noticed that certain people consistently outperformed the rest, and if you focused on their views you could outperform both the experts and the crowd. Competitive prediction aggregation/market platforms also have serious issues with information sharing between participants, which is a big part of why teams of top forecasters outperform markets as a whole. It has taken a long time for this movement to play out, but I feel it has been accelerating lately and wanted to write down a few comments on where I think forecasting is headed.

One issue is identifying superforecasters. If you don't have access to them (they are GJO's moat in a way), then you need to find them on your own (or at least find a way to reach out to them and attract them). Thus other projects like CSET/INFER ran crowd forecasting platforms and then picked out the top predictors for their "pro team", for example. Recent AI forecasting efforts have also tried to pick out a small number of top forecasters. And then you have groups like Swift and Samotsvety (as Scott Alexander says, "If the point of forecasting tournaments is to figure out who you can trust, the science has spoken, and the answer is “these guys”."). Why pay tens of thousands for a prediction market (which takes time and effort to organize) when you can just give a couple of grand to Nuño and get better answers, faster?

Others have tried to do away with the market mechanism even without having access to top forecasting talent. The DARPA SCORE program (which I've written about before) had two separate components for prediction, one market-based (Replication Markets) and another which used a structured group discussion format to arrive at estimates (RepliCATS). The results aren't out yet, but my understanding is that RepliCATS outperformed the markets.

Personally, I find the shift from open markets and various fancy scoring and incentive mechanisms to "put a handful of smart dudes in a room and ask them questions" a bit disappointing. Why did we even need the markets and forecasting platforms in the first place? To identify the smart dudes, of course—but is that all there is to it? As the top forecasters abandon markets and start competing against them, they are (in a way) pulling up the ladder behind them. We need the public tournaments to identify the talent in the first place, but if the money just goes straight into Samotsvety's pockets instead of open tournaments, new people can't join the ecosystem any more. Where is the next Samotsvety going to come from? Part of the problem is that there's a positive externality to identifying forecasting talent and it's difficult to capture that value, so we end up with a bit of a market failure.

Perhaps the only way to make markets competitive is to make them lucrative enough that it's worthwhile to form hedge fund-like teams which can generate internal benefits from information-sharing and deploy those onto the market, with the added benefit of honing them through competition. But that seems unlikely at the moment, the money just isn't there.

In one of the possible worlds ahead of us, the endpoint of this process will be the re-creation of the consulting firm—except for real this time. With the right kind of marketing angle I could easily see Samotsvety becoming a kind of 21st century McKinsey for the hip SV crowd that wants to signal that it needs actual advice rather than political cover. Could the forecasters avoid the pitfalls of the consultancy world?

What are the limits to forecasting accuracy? Eli Lifland is skeptical about the possibility of improving his abilities, but I'm not sure I buy that line entirely. We're still very early on, and many obvious low-hanging fruits have yet to be tried. If the forecasting-group-as-consultancy takes off, I would expect to see many serious attempts at improvement, starting with things like teaching domain experts forecasting and then putting them in close collaboration with top-tier generalists and forecasters.

What worries me is that this is a movement away from objective scoring and back towards reputation-based systems of trust. Once you leave the world of open markets and platforms, you become disconnected from their inescapable, public, and powerful error-correcting mechanisms—weak arguments can once again be laundered in the dirty soapwater of prominence and influence. Perhaps the current crop of top forecasters have the integrity to avoid going down that path, but how can that be maintained in the long run, with a powerful headwind of incentives and entryism blowing against us?




Against Effective Altruism

From Above: Metaethics

You're (probably) all theological anti-realists; just apply the same reasoning to the existence of moral facts! If the magical invisible sky god is obviously fake, why do you accept magical invisible sky moral facts? Just take the standard "rationalist" toolkit, apply it to realism, and it disappears in a puff of smoke. The arguments can just be copy pasted: for example, one of the classic (and most powerful) arguments from the New Atheism internet wars was that theists are really atheists about every god except their own. The moral realist is an anti-realist about all moral claims except the ones he happens to like! What is the base rate of moral truth, and why do you believe your inside view is enough to overcome that?

Realist arguments always come across as utterly absurd because their task is 1) extremely simple and obvious, and 2) impossible. All they have to do is say "we used methods x, y, z to uncover moral facts a, b, c, and you can replicate the procedure independently to verify our results". But it's never like that, it's always some interminable verbcel nonsense to cover up the fact that they don't actually have any access to the moral facts that their theory says they should have access to!

Whenever I hear the word "intuition" out of the mouth of a philosopher I reach for my Browning!1

It comes down to this:

  • Naturalism: no evidence
  • Non-naturalism: magic

Often they'll back off and come up with excuses about why moral facts aren't accessible in that way, but that just wrecks the whole thing. Even if realism is true, if there's no reliable empirical access to moral facts, then that's functionally equivalent to anti-realism. Any defense based on the immunity of moral propositions to empirical investigation also makes it impossible to find the truth. Your metaethics either has to have a way to determine what's true in ethics, or you're practically a nihilist.

Traditionally the problem has been solved with an appeal to God but EAs are, to a first approximation, 100% atheist. You can't pull an Euthyphro any more, so WHAT'S YOUR MORAL EPISTEMOLOGY MOTHERFUCKER? Why do epistemic standards seem to suddenly disappear when it comes to utilitarianism? Why am I constantly being asked to believe in the existence of these ontologically redundant entities?

The theologians at least have the decency to offer some kind of story: ask them about the origin of God and they might invoke the cosmological argument or the principle of sufficient reason. Ask a moralogian about the origin of moral facts and all you'll get in return is a stupefied bovine expression. The theologians can at least appeal to miracles. The moralogians appeal to nothing and expect you to accept it! Is the origin of moral facts natural, or supernatural? If natural, can we engineer our own? Why or why not? The universe, fundamentally, is dumb. You are positing the existence of entities that are very much not dumb. Where the hell did they come from?

Above all the moralogian is conspicuously shameless. Even in the 13th century, a man like Aquinas (who would not meet a single doubter in his entire life) felt it necessary to justify his faith and present arguments in favor of the existence of God. Today's moralogian on the other hand feels no such compulsion, although he is beset on all sides by skeptics! The EA.org page on meta-ethics speaks for itself:

Is this an excess of certainty, or is it because deep down the moralogian knows he has no real arguments?

And the moralogians are not stuck in airy castles of thought, they operate in the real world. The neoconservatives, for example, are a showcase of what happens when the moralogian takes hold of the reins of foreign policy—and it is a consistent ideology that genuinely seeks to spread the values it values. Buckhardt wrote that the foreign policy of Italian states of the Renaissance, free as it was from "moral scruples", gave him "the impression of a bottomless abyss". But who today could not prefer that naked self-interest to the neocon disaster of democracy and human rights? The effective altruists have yet to screw up that badly, but just look at the people who want to eliminate all wildlife and you have a good preview of what is possible—"Man, your head is haunted!"

Peter Singer offers one of the most memorable instances of intellectual cowardice in the entire history of philosophy. Like any reasonable person, he used to be an anti-realist. Then he read Parfit and realized that anti-realism meant utilitarianism was not the case (not sure why it took Parfit to point that out to him). Instead of abandoning utilitarianism, he became a realist just to salvage his ideology! Pathetic.

Hilariously, Parfit later abandoned realism for what he calls "non-realist cognitivism", which is basically the Sam Harris view except with bigger words.2 Part 7 of vol. 3 of On What Matters is an incredible trainwreck, worth skimming just to see what kind of pretzel shapes people will contort themselves into in order to avoid accepting the obvious. At least Parfit understands that adding a magical normative layer on top of reality is completely incompatible with the scientific weltanschauung.

"But Alvaro, your instrumental goals are sort of like morality, maybe we could just re-brand..." Just let it go, man.

From Straight Ahead: What's Going On Here?

You're not actually a utilitarian anyway. At best it's a kind of ideal. That's why you tithe 10%. Just go with the 90% of your intuition that says "this shit is whack, yo". How do ideologies avoid purity spirals? Heuristics against demandingness. It’s one heuristic battling another! Why not go all the way with the one that’s winning?

So you're probably not a realist, and probably not a utilitarian either...where does this EA compulsion come from? You must have been memed into it. Don't feel bad, it happens to all of us, that's how these things work.

From Below: Genealogy

Nature has never generated a terminal value except through hypertrophy of an instrumental value. To look outside nature for sovereign purposes is not an undertaking compatible with techno-scientific integrity, or one with the slightest prospect of success.

Banger tweet Mr. Land, as the */acc transsexual teenagers of twitter dot com like to say.

Instead of getting tangled up in all this philosophy mumbo jumbo we can just pulverize the question with Bulverism.3

What is morality for, exactly? What does it mean for altruism to be "effective"? EAs take it for granted that the most effective altruism is the altruism that helps its targets the most. I would argue that altruism is really meant to help the altruist, not the altruee. That's the only way it could have evolved. So here's my pitch to you: effective altruism is the altruism that raises your status the most. The conspicuous lack of caring about the "effectiveness" of altruism among normal people is a hint! Hundreds of millions per year for the NY Metropolitan Opera? Sure, why not! They're not misfiring, you are.

Of course the problem with optimizing for status is that if you're seen as optimizing for status rather than having a plausible excuse4, it's bad—the altruism that increases your status the most is also the one that you can credibly signal that you actually believe in. Thus we get Triversian self-deception where the altruist "really means it" (but of course if he really meant it he wouldn't be giving 10%). So Actual Effective Altruism is simply too gauche to exist. If you hang around the Bay Aryan rationalists then EA may well satisfy those goals (and I'm sure there are many in EA purely for cynical reasons). But if you're not part of that crowd...

Scott Alexander writes that "all of our values are unjustifiable crystallizations of heuristics at some level", and then continues specifically on utilitarianism:

To be absolutely brutal about it:

EXPLICIT MODEL: Helping others will key me in to networks of reciprocal altruism and raise my status in the community
EMOTIONAL EXPERIENCE: Desire to help others, empathy, horror at the suffering of others
REIFIED ESSENCE: “Utility”
ENDORSED VALUE: Utilitarianism, the belief that maximizing utility is the highest good regardless of what other goods it produces"

It's spot on! How someone can write that and go on believing in utilitarianism is beyond me, and Scott offers no explanation.

Now, you might be thinking "But Alvaro, you idiot, we're adaptation executors, not fitness maximizers! This is all perfectly alright, you see." Sure, we're adaptation executors, but that doesn't give you a blank check to execute whatever retarded adaptation was bred into your hairy great-....great-grandfather 500,000 years ago, and is now incompatible with the world you live in (or worse, become enslaved to "unjustified crystallizations" and meticulously engineered hyperstimuli designed to abuse your adaptations). Effective altruism is the coca cola of morality, and you are morally obese!

The adaptation for helping out people in your community has hypertrophied in the toxic sludge of modern civilization into an absurd ideology about maximizing imaginary sky utilons by helping people you will never meet, or who do not yet (and may never) exist. Given the rapid shift in our environment it's unsurprising that there are maladaptations in our system; but we can recognize and avoid them. Your "adaptation execution" has been memetically hijacked—where once you would get good things in return for your "altruism" (a stronger community, status, reciprocal altruism, coalition-building, or even "niceness, community, and civilization"), a runaway meme has now convinced you that it's actually better to get nothing!5 You get all the costs of religion, and none of the prosocial benefits! Even worse, the infected are trying to spread this meme to others. Things are in the saddle, and ride you! It's a particularly dumb version of your typical California cult in which there isn't even a creepy guy with a harem of underage girls at the top. What's the point, man?

There was a type of deer in Ireland whose antlers hypertrophied (probably through sexual selection) to the point that it killed off the entire species. When I look at effective altruists, all I see is overgrown antlers pulling them to the ground.

The absurdity is heightened because we obviously know where these tendencies come from, regardless of what philosophers try to imagine. We know where the evolved desire to gobble up an entire cake comes from—as you resist the clarion call of the chocolate cake, so you must also resist the call of "effective" altruism. A serious valuing of values can only begin when this baggage is dispensed with and laughed at.

Fin

There are plenty of arguments against utilitarianism's internal logic: problems with interpersonal comparisons, aggregation, second-order effects, negative utility, average utility, discounting, etc. Whether you go negative, average, rule, or fluorescent there are tons of inescapable and fatal flaws. Empirically, human beings don't have coherent utility functions so what are we even maximizing? Above all, utilitarians ignore the value (and necessity) of suffering—for life and for Life. The fine porcelain of your being was forged in the fires of hell.

But I don't think it's necessary to meet utilitarianism on its own turf, so...6

Let me also say that atheism for the masses, in retrospect, was an enormous error. Organized religion as a social technology is invaluable and the modern atomized welfare state is a pathetic replacement. Atheism for the intellectual class is perfectly alright, but in the age of mass literacy there is really no barrier between them and the rest of society. Was atheism inevitable? Perhaps. But the New Atheists certainly didn't help. Extrapolating this line of reasoning is left as an exercise to the reader.

Are there values which are not merely instrumental? In a way—Gnon and all that. Do they have anything to do with the values of effective altrusim? Of course not. But that's a story for another time.


  1. 1.In case anyone actually wants to take the intuitionist route: why trust your intuition? If it's due to some second-level intuition you're stuck in an infinite regress, if it's due to some external fact verifying the intuition then we can just use the empirical procedure and skip your intuition altogether. Where does your intuition come from, and what does the process that created it optimize for? It certainly did not optimize for truth—read Hoffman!
  2. 2.Unlike Wiblin, who believes in the Sam Harris view except with smaller words.
  3. 3.But Alvaro, isn't Bulverism...Bad? No. Genealogy matters.
  4. 4.Haha, would you look at that, I was just doing this other thing and by complete coincidence my status has gone up!
  5. 5."It is somewhat paradoxical that the tendencies and pressures in the direction of idealized moral systems should serve everyone in the group up to a point, but then be transformed by the same forces that molded them, into manipulations of the behavior of individuals that are explicitly against the interests of those being manipulated". Alexander, Biology of Moral Systems (1987).
  6. 6.Daybreak 95: "In former times, one sought to prove that there is no God – today one indicates how the belief that there is a God arose and how this belief acquired its weight and importance: a counter-proof that there is no God thereby becomes superfluous."



Links & What I've Been Reading Q3 2022

Links

Machine Learning/AI

1. Language Models Can Teach Themselves to Program Better

2. Ajeya Cotra update on AI timelines (shorter, of course).

3.The Library of Babel, stable diffusion edition. I love this bit from the Borges story:

When it was announced that the Library contained all books, the first reaction was unbounded joy. All men felt themselves the possessors of an intact and secret treasure. The universe was justified; the universe suddenly became congruent with the unlimited width and breadth of humankind's hope.

The certitude that everything has been written negates us or turns us into phantoms.

4. On how various plans miss the hard bits of the alignment challenge.

5. Understanding Conjecture: Notes from Connor Leahy interview

We think that in order for things to go well, there needs to be some sort of miracle. But miracles do happen. When Newton was thinking about stuff, what were the odds that motion on earth was governed by the same forces that governed motion in the stars? And what were the odds that all of this could be interpreted by humans? Then you see calculus and the laws of motion and you’re like “ah yes, that just makes sense.

6. Inverse scaling prize winners!

Forecasting

7. Five Questions for Michael Story: "Nearly all forecasters are paid more by their day jobs to do something other than forecasting. The market message is “don’t forecast”!"

8. On training experts to be forecasters. Lots of good points in this one, especially on the softer social aspects of forecasting.

Metascience

9. Rain, Rain, Go Away: 192 Potential Exclusion-Restriction Violations for Studies Using Weather as an Instrumental Variable

10. Status bias in peer review. Would be curious to see an attempt at estimating how much of this is actually justified. After all, research quality follows a power law, and past results are certainly indicative of future performance. Perhaps there is not enough status bias in peer review!

The Rest

11. All the cool kids are listening to The Lunar Society. José Luis Ricón says Dwarkesh "is probably the best podcaster there is right now". Tyler Cowen says "highly rated but still underrated!". The Stephen Hsu episode is my favorite, but do check out the other ones too.

12. From the robot: the map is of the territory. "I am affirming that you have write access to the realm of the Gods."

13. From the banana, on the efficacy of depression treatments and more.

14. Dysgenics by the Numbers. In my view probably overstates the rate of loss within societies a bit. But overall completely right. Probably doesn't matter though.

15. Scraping training data for your mind. “But Karl Ove”, Renberg says about his writing, “there is… nothing _there_”.

16. A Future History of Biomedical Progress

Progress in tools has created the potential for a radically different research ethos that will end biomedical stagnation. But to understand this new research ethos, we must first understand the telos of the mechanistic mind and why it is at odds with the biomedical problem setting.

17. Good interview with Vitalik. "The kinds of communities you get when low taxes are the primary reason to come are just really boring and lame".

18. Evaluating Longtermist Institutional Reform. Public choice, counterfactuals, long-range forecasting.

Audio-Visual

What I've Been Reading

  • Nostromo by Joseph Konrad. Tangled, fragmented, unclear, conflicting, and circular narratives/motivations/goals/priorities. A chopped-up story from various points of view, taking a look at a world filled with great characters surrounding the titular Nostromo. Politics, heroism, revolt, the worth of social status, reputations, perceptions, allegiances, and material vs idealistic interests. Betrayals of all kinds. Private and public vindications and redemptions. Great stuff.

  • The Glass Bead Game by Hermann Hesse. Fascinating Borgesian novel about a futuristic game that combines all arts and sciences into some sort of grand unified plaything. It's about music, duty, the lifecycle of organizations, transcendence, the life of the mind, and probably much more on top of that. Highly recommended.

  • Imperial Twilight: The Opium War and the End of China's Last Golden Age by Stephen R. Platt. Kind of weirdly structured, it mainly takes a look at the era from the point of view of various minor players, mostly traders, "supercargos", and so on. The big politics don't get much attention. Somewhat revisionist I guess? It's fine.

  • Break-Out from the Crystal Palace: The Anarcho-Psychological Critique; Stirner, Nietzsche, Dostoevsky by John Carroll. A fairly shallow exegesis, written in an indefensible style. Just go straight to the primary sources.

  • The Twilight World by Werner Herzog. Another Herzog book! Nowhere near as brilliant as Conquest of the Useless, unfortunately, but still not bad. Concerns of those Japanese soldiers who kept up the guerilla war for decades after the end of WW2, refusing to surrender and refusing to face reality. Very Herzogian with the jungle and everything. Some wonderful metaphors.

  • City of Golden Shadow by Tad Williams. Gwern gave it 5 stars so I couldn't resist, but I didn't enjoy it at all. Absurdly overlong at 800 pages, it just ends with a cliffhanger (and there's more than one sequel). There's a series of parallel fantastical stories set in a virtual reality and they're all pointless and awful. A bit outdated in terms of how it imagines the internet, it does have a few interesting ideas but overall I don't think it's worth the effort.
  • Zero to One by Peter Thiel. I guess it's the best business book I've ever read. A bunch of concepts from it have penetrated the broader culture (definite vs indefinite optimism for example). It's a quick read so go for it

  • Philip Larkin: Poems selected by Martin Amis by Philip Larkin. His best poems are great, but be warned that they are also extraordinarily pathetic in a way that can really fuck up your mood (if not your soul).

  • HHhH by Laurent Binet. Split into 257 short chapters, it blends a straightfoward and minimally fictionalized retelling of Operation Anthropoid (the assassination of Reinhard Heydrich) with all sorts of metafictional elements, as Binet constantly comments on the issues with constructing a historical novel, compares his approach to other books and movies, and even brings his personal life into it. Irony, humor, self-consciousness (especially about the author's view of Heydrich), the tension between history and fiction, and a slow, horrific build-up that absolutely fills you with terror. Strange how powerful emotionally a book that is at the same time so detached can be. Quite good and very different.

  • The Man Who Loved Only Numbers: The Story of Paul Erdős and the Search for Mathematical Truth by Paul Hoffman. Fun and highly readable pop biography, I blasted through it in a day. "I doubt if he would have recognized my first name even though I worked with him for twenty years. The only person he called by his first name was Tom Trotter, whom he called Bill."

  • The Making of the Fittest: DNA and the Ultimate Forensic Record of Evolution by Sean B. Carroll. A curious artifact from a different era. Perfectly captures the zeitgeist of the peak of internet atheism and creationism debates. One could make statements about human evolution then which would be quite dangerous today. The stuff on DNA and evolution is pretty wide and not that deep. All of it has been covered better elsewhere. There's a chapter on EvoDevo for example, but it stays on the surface of things and I would recommend reading Endless Forms Most Beautiful (by the same author!) instead. On top of the evolution stuff you also have a random sprinkling of skeptic-related causes (dull and cringey rants about chiropractors), plus a very generic liberal environmentalism which basically ignores everything the author had written up to that point. Probably more interesting as a marker of a (short, but memorable) era than a book about DNA and evolution.

  • The Moon and Sixpence by Somerset W. Maugham. A fictionalized retelling of the life of Paul Gaugin, as a middle-aged English man abandons his family to go be a painter in Paris (and eventually Tahiti). I wasn't convinced by the central character, and there's nothing to this novel beyond him. The "egotistic, single-minded genius" trope has been done much better elsewhere, and the novel really strays very far from the actual life of Gaugin.

  • The Golden Bowl by Henry James. I made it about 50 pages in. Not for me.

  • The Beginning of Infinity: Explanations That Transform the World by David Deusch. An unstructured mishmash of warmed-over pop science and a cavalcade of bad arguments around abduction, philosophy of science, intelligence, infinity, qualia, etc. The arguments
    about superhuman general intelligence not being possible because humans are universal Turing machines are utterly absurd and could be added verbatim to "On the Impossibility of Supersized Machines". One of the worst treatments of abduction in the history of philosophy, and that's really saying something. Deutsch's comments on heritability are downright idiotic, and it's clear that he didn't even bother spending 30 seconds reading the wikipedia page. He just makes stuff up (incorrectly). A lot of uppity commentary about shit he doesn't understand. And then it's just filled with a whole bunch of random shit, like a galaxybrained theory of why the UK has the best voting system, a terrible theory of aesthetics, etc.

  • First Light by Geoffrey Wellum. Fairly conventional WW2 memoir from a British fighter pilot. Not bad, not great.

  • The Marsh Arabs by Wilfred Thesiger. A standard tale of a British explorer somehow making himself accepted and comfortable among the primitive natives (look up Thesiger's pics), and bemoaning the disappearance of their way of life. This particular one, among the pastoral tribes of the marshes of southern Iraq. Perhaps what makes it unique is that it is set not in the 19th century, but in the late 1950s. Comfy but unexceptional, ultimately the Madan are just not that interesting.




Links & What I've Been Reading Q2 2022

Links

Machine Learning

1. Large Language Models are Zero-Shot Reasoners: "Simply adding “Let’s think step by step” before each answer increases the accuracy on MultiArith from 17.7% to 78.7% and GSM8K from 10.4% to 40.7% with GPT-3." Here's how different prompts compare:

2. DALL·E 2 is pretty crazy. Tons of good threads on twitter featuring its work, here's one of my favorites.

3. Gwern comments on GPT-3's 2nd Anniversary

A psychologist thrown back in time to 2012 is a one-eyed man in the kingdom of the blind, with no advantage, only cursed by the knowledge of the falsity of all the fads and fashions he is surrounded by; a DL researcher, on the other hand, is Prometheus bringing down fire.

4. Speaking of AI and Prometheus...

5. A model trained on /pol/ data successfully(?) sends out thousands of shitposts.

6. Yarvin contra AI x-risk fears. I am not convinced.

Forecasting

7. Very good and important: Beware boasting about non-existent forecasting track records.

8. Future Fed Chair Basil Halperin on prediction markets and monetary policy.

9. Nuño Sempere released 3 short and sweet papers on designing prediction scoring rules. Also subscribe to his excellent forecasting newsletter if you haven't already.

Metascience

10. The New Science report on the NIH. Enormous but very much worth your time.

11. On that baby brainwave study and more general issues around that sort of research.

12. In the Guardian: The big idea: should we get rid of the scientific paper?

13. Ideological biases in research evaluations? The case of research on majority—minority relations

Within this field, social contact and conflict theories emphasize different aspects of majority—minority relations, where the former has a left-liberal leaning in its assumptions and implications. We randomized the conclusion of the research they evaluated so that the research supported one of the two perspectives. Although the research designs are the same, those receiving the social contact conclusion evaluate the quality and relevance of the design more favorably. We do not find similar differences in evaluations of a study on a nonpoliticized topic.

Note the effect is quite small though.

Economic History

14. On the role of millet, rice, and timing of agriculture in Chinese state formation.

Book Reviews

15. The SSC book review contest is pretty strong this year as well. My favorite thus far: The Dawn Of Everything

A “Gossip Trap” is when your whole world doesn’t exceed Dunbar’s number and to organize your society you are forced to discuss mostly people. It is Mean Girls (and mean boys), but forever. And yes, gossip can act as a leveling mechanism and social power has a bunch of positives—it’s the stuff of life, really. But it’s a terrible way to organize society. So perhaps we leveled ourselves into the ground for 90,000 years.

16. Judge Woolsey on Ulysses: ""[i]n respect of the recurrent emergence of the theme of sex in the minds of [Joyce's] characters, it must always be remembered that his locale was Celtic and his season Spring. [...] [W]hilst in many places the effect of Ulysses on the reader undoubtedly is somewhat emetic, nowhere does it tend to be an aphrodisiac."

17. SMTM on Disco Elysium.

18. Devis Kedrosky reviews Koyama & Rubin's How the World Became Rich.

The Rest

19. Latest news from the Good, Actually dpt: incarceration cuts mortality by half.

20. Matt Lakeman continues his travel blogs, this time he reports from Ukraine.

21. What I learned gathering thousands of nootropic ratings.

22. Daniël Lakens has released a free ebook on improving your statistical inferences.

23. New evidence on the genetic history of Ashkenazi Jews: "our results suggest that the AJ founder event and the acquisition of the main sources of ancestry pre-dated the 14th century and highlight late medieval genetic heterogeneity no longer present in modern AJ."

24. Mechanical Watch (lots of crazy shit on this blog)

25. On foreign aid and ethnic conflict.

26. Eigenrobot gives advice to academic refugees.

Academia is characterized by well-trodden problems, hashed over for decades, and negligible novel data for resolving them. Industry is by comparison a mass of green field areas of inquiry with large budgets, minimal bureaucracy, and ample data.

Audio-Visual

27. And here's Masayoshi Takanaka's The Rainbow Goblins.

What I've Been Reading

  • Democracy in America, by Alexis de Tocqueville. Lives up to its reputation. Fascinating observations on law, politics, psychology, sociology, America's Westward expansion, and more. Prefigures Timur Kuran in many ways. Incredibly prescient. Interesting both in terms of what has stayed the same since it was written, but also for a look at all that has changed. "The French lawyer is simply a man extensively acquainted with the statutes of his country; but the English or American lawyer resembles the hierophants of Egypt, for like them he is the sole interpreter of an occult science."

  • The Memoirs of the Baron de Tott, on the Turks and the Tartars, by the Baron de Tott. Found through Braudel. Written just a few decades before Democracy in America, the Baron de Tott went East instead of West. And instead of seeing the future, he saw the past. Roughly at the time the American revolution was happening, the same time when Johnson and Boswell were drinking too much claret at the Mitre, de Tott was joining the Crimean Tatars on a slave raid into Southern Russia. Most fascinating for its observations of Ottoman society, and the role de Tott played in the Russo-Turkish war of '68-'74. Somewhat niche and obviously nowhere near as insightful as Democracy in America, but definitely worth a read if this is your kind of thing. What causes the fall of empires? Culture, Tott says: all decay ultimately comes from within.

  • Collapse of Complex Societies, by Joseph Tainter. Tainter's theory mostly comes down to decreasing marginal returns to additional societal complexity, which eventually leads to collapse. Parts of it are highly reminiscent of Chaisson's Energy-Rate Density paper (which everyone should read), but much more limited in scope. He's too focused on explaining everything with a single theory, leaving little room for contingency in history. He ignores the aspect of time: just because a system works well for 10 years does not mean it can work for 1000. And he treats rulers as being virtually unconstrained in their policy choices.

    The examples he marshals in support of this theory are not particularly convincing, and (at least in some cases like the Western Roman Empire), the Mancur Olson view which focuses on public choice issues (which Tainter pretty much dismisses out of hand) seems like a vastly better fit to me. Especially when it comes to contemporary society, the examples Tainter brings up seem like a slam dunk in favor of Olson and against Tainter! Take education for example: is it really plausible that the ballooning costs and declining efficiency of educational spending over the past few decades is due to increased complexity? Of course not, it's clearly an issue of special interest groups with socially misaligned incentives. Tainter misses it because he never actually dives into the details of exactly how increased complexity is supposed to be working to produce all these effects.

  • The Machiavellians: Defenders of Freedom, by James Burnham. On Machiavelli and some of his successors: Mosca, Sorel, Michels, Pareto. Published in 1943 and it shows. Strong on the general ideas about the objective treatment of power and politics, divorced from sentimentality and moralizing. Pretty weak on the specifics. I was expecting something deeper based on its reputation. A bit dull overall.

  • Sartor Resartus, by Thomas Carlyle. Borges mentions that it inspired him to read German philosophy and that's how it ended up on my list. What can I even say about this crazy book? Carlyle invents a fictional German philosopher, who has written a treatise on clothing, and then also invents a fictional English editor who tries to explain the German philosopher's work, which turns out to be a philosophy of everything. Layer upon layer of irony and postmodern misdirection, and that outrageous Carlylean 19th century style to top it off. Heavily influenced by Tristram Shandy. Surprisingly influential (especially in America), though I'm really not sure how seriously one is meant to take the ideas presented within.

  • Liftoff: Elon Musk and the Desperate Early Days That Launched SpaceX, by Eric Berger. Fast-paced and exciting, mostly based on insider interviews, Liftoff gives a good idea of what the crazy early years were like at SpaceX. Once they start launching the Falcon 9 it skips over a decade in a few paragraphs, which kind of sucks. If you were wondering exactly what factors made SpaceX succeed where everyone else has failed, you will probably come away from the book disappointed. Still, recommended.

  • Apollo: The Race to the Moon, by Charles Murray (yes, that Charles Murray). One of the better Apollo books, this one is focused mostly on the bureaucratic aspects with a few glimpses into engineering as well. At 500 pages it still feels far too short, as some major events and personalities are given very little space. Overall very strong, and it's truly astonishing how there was almost nothing at all in terms of the space program in 1960, how young everyone was, how nobody really knew what they were doing, etc. For some reason it seems to be out of print.

  • The Book That Changed Europe: Picart & Bernard's Religious Ceremonies of the World, by Lynn Hunt. The story of the publication of the titular book, and a look at the religious environment of the 18th century. Freethinking Protestant refugees congregate in cosmopolitan Amsterdam and make waves through their printing presses. Fascinating subject, terrible execution. Unorganized, repetitive, badly written, and filled with pointless digressions. There's an irrelevant digression in the very first paragraph of the book! Maybe if a competent editor had gone to town on it...Also, I think the authors wildly overrate the book's ultimate importance.

  • A Distant Mirror: The Calamitous 14th Century, by Barbara Tuchman. Audiobook. A look at 14th century Europe, mostly as it was seen from the perspective of the French nobleman Enguerrand VII de Coucy. Mainly based on the Chronicles of Froissart. Plague, the 100 years' war, religious fanaticism, popes and antipopes, peasant revolts, crusades, etc. Very entertaining, but it sacrifices quite a lot of rigor to get there. Too many blatantly false statements from the 14th century are taken at face value. And there's more than a bit of Monty Python about this: at points, I thought I discerned the distant—but unmistakable—beat of coconuts in the background of the audiobook. This sentence gives you the vibe: "A decision was perforce taken to march straight through the dark, fobidding forest of the Ardennes, where, Froissart remarks with awed inaccuracy, "no traveler had ever before passed.""

  • Project Hail Mary, by Andy Weir. Audiobook. It's the same schtick as The Martian all over again, but with more plotholes and a more impressive setting. Pleasant scifi entertaintment for the gym.




Links & What I've Been Reading Q1 2022

Links

Machine Learning

1. Incredibly cool from deepmind: ML applied to ancient Greek fragments can generate restoration hypotheses for the missing text and locate the fragment's origin in both time and place. Paper in Nature.

2. Incredibly uncool:

These researchers built an AI for discovering less toxic drug compounds. Then they retrained it to do the opposite. Within six hours it generated 40,000 toxic molecules, including VX nerve agent and "many other known chemical warfare agents.

Sufficiently advanced AI alignment is indistinguishable from AI risk?

3. Fantastic Gwern theory-fiction: It Looks Like You're Trying To Take Over The World.

4. Also on LW, Brain Efficiency: Much More than You Wanted to Know:

Eventually advances in software and neuromorphic computing should reduce the energy requirement down to brain levels of 10W or so, allowing for up to a trillion brain-scale agents at near future world power supply, with at least a concomitant 100x increase in GDP. All of this without any exotic computing.

5. Also on LW, New Scaling Laws for Large Language Models.

Forecasting

6. Karger, Atanasov & Tetlock, Improving Judgments of Existential Risk: Better Forecasts, Questions, Explanations, Policies.

7. How good are generalist forecasters vs experts, really? Gavin Leech revisits the literature and argues against the superforecasters. They still do as well or slightly better than the experts, but not by much. I feel the way the results are presented is a bit misleading.

Metascience

8. Derek Thompson in the Atlantic on Silicon Valley science funding.

9. In what sense is the science of science a science?

What makes my spidey sense tingle is that the objects in any such theory are (in part) a hypothetical space of possible discoveries, of possible explanations of the world. I called it a theory of discovery just above, but it might equally well be called a theory of the unknown, or theory of exploration, or theory of theories. Of course, some of the objects of any such theory would also be amenable to more standard descriptions: things like exploration strategies, or group dynamics. But some would be a lot stranger: currently unknown types of explanation, currently unknown types of theoretical entity.

Economic History

10. WW2 Japanese internment camps? You guessed it, Good, Actually! Internment had a positive effect on long-run incomes on the order of 9-22%. And remember to burn the cities, too. h/t ADS

11. Some issues with Putterman & Weil (2010), judging by the new results it doesn't seem all that problematic to the deep roots lit?

Book Reviews

12. There's a new Landmark Edition out, Xenophon's Anabasis. Here's a short review.

13. ZHPL on TLP's Sadly, Porn.

14. Scott on the same (the reviews are complementary goods).

Covid

15. Vaccination Rates and COVID Outcomes across U.S. States finds that it takes about $5000 worth of vaccines to save a life. Would be interesting to see a comparison to molnupiravir in terms of dollars per life saved.

16. A report from a covid human challenge experiment. Hopefully this paves the path for a faster response against the next pandemic.

The Rest

17. Against the Naming of Fungi

The egotism and futility of these costly initiatives is quite mind-boggling as the human threat to biological diversity multiplies. Rather than competing with animal and plant taxonomists, mycologists should show pluck in asserting philosophical independence from the waning fields of zoology and botany. By turning our attention towards experimental questions and away from cataloguing, mycologists may escape the shackles of Linnean fundamentalism.

18. Related(?), SMTM on citrus taxonomy, "in which the Bene Gesserit attempt to breed the Kumquat Haderach".

19. Luttwak on China: The myth of Chinese supremacy

Always improbable, G-2 became impossible when Xi Jinping arrived. For him only G-1 is good enough. Not because he is a megalomaniac but the opposite: he thinks, accurately, that unless the Party establishes an unchallenged global hegemony, with its rule is deemed superior to democratic governance, Communist China will collapse just as Soviet rule did. He is right.

20. Indian National Stock Exchange CEO scandal:

The drama intensified in February, when the Securities and Exchange Board of India released a 190-page regulatory order disclosing that Ramkrishna had sent sensitive information to an outsider described as a yogi in the Himalayas. [...] The yogi was non-corporeal, she said, but corresponded using the email address [email protected]

21. On the role of mathematics in the neolithic revolution. "The mathematical abilities of Neolithic humans advanced in concert with the new requirements of agricultural life. These needs can be summed up into three categories: Surplus, Trade, and Time." Here's wikipedia on the Rhind Mathematical Papyrus which dates to the 16thC BC.

22. From the new Institute for Progress, Progress is a Policy Choice.

23. Ed West on the coming demographic issues: 'Children of Men' is really happening (actually understates the problem imo).

24. Theses and counter-theses on sleep. Seems like one of those things where there's tons of variation and you're probably best off doing some rigorous self-experimentation?

25. Death Toll of Price Limits and Protectionism in the Russian Pharmaceutical Market. In 2012, Russia put price caps and protectionist regulations on various pharmaceuticals. The result was a decrease in supply, leading to a striking increase in mortality from diseases those drugs protect against.

26. Fluvoxamine-caffeine interaction:

Just learned that fluvoxamine, a common SSRI used to treat depression and other psychiatric conditions, increases the half-life of caffeine in the bloodstream. Like, to an absurd degree:

27. Modeling assortative mating and genetic similarities between partners, siblings, and in-laws

We found evidence of genetic similarity between partners for educational attainment (rg = 0.37), height (rg = 0.13), and depression (rg = 0.08). Common genetic variants associated with educational attainment correlated between siblings above 0.50 (rg = 0.68) and between siblings-in-law (rg = 0.25) and co-siblings-in-law (rg = 0.09). Comparisons between the genetic similarities of partners and siblings indicated that genetic variances were in intergenerational equilibrium. This study shows genetic similarities between extended family members and that assortative mating has taken place for several generations.

28. New EA GWAS with N=3 million, 12-16% variance explained.

29. "Las Pozas ("the Pools") is a surrealistic group of structures created by Edward James in a subtropical rainforest in the Sierra Gorda mountains of Mexico. It includes more than 80 acres (32 ha) of natural waterfalls and pools interlaced with towering surrealist sculptures in concrete."

30. The Senseless, Tragic Rape of Charles Bukowski’s Ghost by John Martin’s Black Sparrow Press

31. Stalin's amused notes on Lysenko.

32. A letter from Claude Shannon to Warren McCulloch, in behalf of L. Ron Hubbard.

33. No peeing towards Russia.

Audio-Visual

34. They found Shackleton's ship in the Antarctic, and it's perfectly preserved.

35. Kogonada's After Yang is one of my favorite new films in years. What if Roy Batty was a personal assistant, what happens to his adopted family after he dies? A poignant and wistful film about memory, death, and the legacy we leave behind us.

36. And here's DJ Shadow remixing King Gizzard & The Lizard Wizard.

What I've Been Reading

  • How to Think Like Shakespeare: Lessons from a Renaissance Education, by Scott L. Newstok. A Romantic old-man-yells-at-clouds tirade about modern education practices. It didn't change any of my views, but it didn't really attempt to do so in the first place: Newstok is a reformist, while I am strictly an abolitionist—and therefore far outside the target audience. I find it hard to separate mass education from the commoditization of knowledge, while Newstok believes we can have our cake and eat it too. In any case, if you want a passionate argument in favor of high-quality education interspersed with Shakespeare quotes, this is the book for you.
  • Dune, by Frank Herbert. Pretty great, Herbert constructs a deeply alluring world which pulls you in despite some rather hilariously implausible aspects. It's interesting how so much of the "plot" actually happens in the background. The audiobook is quite good.
  • Dune Messiah, by Frank Herbert. I was told the sequels get crazy, and this is a pretty good start in that direction! Can't wait to see where this nonsense ends up. This is basically a book of palace intrigue and scheming, with a rich religious/predestination/weird time loop sauce on top.
  • Star Maker, by Olaf Stapledon. I kept thinking that it felt like a really weird throwback to the 1920s-30s, then I looked it up and it was written in 1937. Whoops. It's a non-stop torrent of interesting science fiction ideas, but there's no continuity, no characters to latch on to, and the examination of the ideas stays at the surface level. It's just a series of "this happened, then this happened, then this happened" which I found rather boring.
  • The Island of Doctor Death and Other Stories and Other Stories, by Gene Wolfe. Some fantastic stories in this collection, in particular I loved Feather Tigers, Death of Dr. Island, Toy Theater, and Seven American Nights. Many of them are in that classic Wolfe style where you have to piece together what's going on from tiny hints left in the text, and it's all a bit ambiguous in the end and so on. There's a lot of focus on religion and death (with two stories, The Hero as Werwolf and The Doctor of Death Island, being fairly explicitly death-ist).
  • Orphans of the Sky, by Robert Heinlein. Fairly standard generation ship story. Juvenile and ham-fisted (there's a scene where the protagonist literally yells out "and yet it moves!"). Mutants and knife fights and all that. 12 year old me would've loved it.
  • Wittgenstein's Nephew, by Thomas Bernhard. Bernhard documents his friendship with Paul Wittgenstein (not the pianist), a black sheep of the Wittgenstein family who suffered from various mental problems. They're both rejected by Austrian society, and they both reject it. Bernhard's attitude toward awards (he views them as a kind of insult and punishment) really sums up his relation to his country. A bitter book, sad and pathetic and miserly. Recommended if you're in the market for a feel-bad memoir.
  • The Status Game: On Social Position and How We Use It, by Will Storr. There's quite a bit of overlap with The Elephant in the Brain, but Storr's book is obviously more focused on status. Also reminiscent of Goffman's Presentation of Self in Everyday Life. Lots of references to Boehm, Henrich, Kuran, Wrangham, etc. (You're probably better off going straight to the source?) If I had to choose between this and Elephant I'd go for Elephant, but they're fairly complementary so it won't be a waste of your time to read both. Parts of the book are focused on contemporary culture war issues, which felt a bit shallow and tiresome. Overall it's not bad though.
  • The Biology of Moral Systems, by Richard Alexander. There's a great core here, but I wouldn't recommend it. The basic idea of approaching moral systems from an evopsych perspective is useful. However, huge swathes of text are wasted on dull and low-quality academic bickering, many of the specifics (eg the arguments on the development of religion) are completely off, and the last third of the book is dedicated to a mostly fruitless discussion of nuclear war and mutually assured destruction.



The Best and Worst Books I Read in 2021

The Best

Ibn Battutah, The Travels of Ibn Battutah

Also known as A Masterpiece to Those Who Contemplate the Wonders of Cities and the Marvels of Travelling, this is a wonderful travelogue from the 14th century (or, more appropriately, the 8th century of the Hegira). Battutah was born in Morocco; he was not wealthy, but he was well-educated and went into the family business of Islamic law. At age 21, he set out for the pilgrimage to Mecca. He would extend his journey for decades, however, following traders in ships and caravans, relying on generous Muslim institutions and his talent for befriending rulers. He eventually covered virtually the entire Islamic world and beyond, from North Africa to China.

Battutah gets into all sorts of adventures (luckily escaping death by disease, shipwreck, pirates, bandits, and so on) and provides us with some incredible ethnographic observations. In Constantinople, he meets the Emperor. In India, he becomes a prominent and wealthy administrator under the rule of an erratic Sultan. In the Maldives, he marries six local women and lives a life of leisure under the shade of the palm trees. Yet his wanderlust compels him to keep moving. Battutah himself as a person, however, remains tantalizingly obscure.

Having divorced my wives I set sail. We came to a little island in the archipelago in which there was but one house, occupied by a weaver. He had a wife and family, a few coco-palms and a small boat, with which he used to fish and to cross over to any of the islands he wished to visit. His island contained also banana bushes, but we saw no land birds on it except two crows, which came out to us on our arrival and circled above our vessel. And I swear I envied that man, and wished that the island had been mine, that I might have made it my retreat until the inevitable hour should befall me.

 

Don DeLillo, Libra

A semi-fictionalized biography of Lee Harvey Oswald in the Oliver Stone tradition, suffused with that great DeLillo style. There's also a kind of meta parallel story of an FBI agent trying to piece together all the evidence, meticulously going through even the tiniest element (much like DeLillo himself). It's quite Pynchonesque with all the criss-crossing conspiracies, the CIA, paranoia, axes of control and influence, a series of coincidences, taking liberty with history...and the ultimately mysterious "fate" that brought Oswald to the assassination. It lacks Pynchon's humor though.

"I don't know what they want me to do." "Of course you know." "Tell me where it happens." "Miami." "That means nothing to me." "You've known for weeks." "What happens in Miami?" Ferrie took a while to finish chewing his food. "Think of two parallel lines," he said. "One is the life of Lee H. Oswald. One is the conspiracy to kill the President. What bridges the space between them? What makes a connection inevitable? There is a third line. It comes out of dreams, visions, intuitions, prayers, out of the deepest levels of the self. It's not generated by cause and effect like the other two lines. It's a line that cuts across causality, cuts across time. It has no history that we can recognize or understand. But it forces a connection. It puts a man on the path of his destiny."

 

Christopher de Hamel, Meetings with Remarkable Manuscripts: Twelve Journeys into the Medieval World

Twelve chapters, each one dedicated to a different medieval manuscript, from the 6th century Gospels of St. Augustine to the 16th century Spinola Book of Hours. The book is filled with fantastic, gorgeous, high-quality prints from these manuscripts, interspersed with history and commentary in a pleasant conversational style. It's not just about the manuscripts themselves, but also who owned them, their condition, how they've been maintained or altered, where they're housed, and the people taking care of them. Cultural differences in library regulatory practices are a virtually infinite source of comedy. Just lovely all around. Make sure you get the hardcover as the paperback is apparently printed in black & white.

     

Confirmation that he was indeed both scribe and artist is found in the shape of the spaces left for the insertion of initials. Both scribes 2 and 3 (let us exclude 1 for the moment) left simple rectangular blank spaces where large initials were to be painted later, without thought to their shape or composition, and they added guidewords in the margins to indicate what letters were to be supplied. When Hugo came to fill them in, his flamboyantly fluid and multi-tentacled initials fitted uncomfortably into these big draughty square apertures. However, during the stint written by the last scribe from folio 185v onwards, the edges of the script are moulded line by line to fit around the curves and limbs of the painted initials, nestling together snugly like a newly married couple in bed. Text and decoration must have been executed simultaneously by the same person. In short, scribe 4 must be Hugo.

 

Ananyo Bhattacharya, The Man From the Future: The Visionary Life of John von Neumann

Short, dense, and with a great balance between accessibility and dumbing down complex subjects. Bhattacharya approaches his subject by focusing on ideas. The first chapter takes care of JvN's early life, and the rest of the book is split up based on the subjects he worked on: mathematics, quantum mechanics, the nuclear bomb, computing, game theory, RAND, and artificial life. Large parts of the book (I'd say about a third) are dedicated not to von Neumann but rather the work other people did based on his ideas. The game theory chapter, for example, covers Nash, Schelling, Aumann, etc. in economics, and John Maynard Smith, Price, Hamilton, etc. in evolutionary game theory. Bhattacharya is good at making all these technical subjects accessible without dumbing them down too much. The one failing point is that JvN's personality, personal life, and professional relationships don't get much attention.

From 1944, meetings instigated by Norbert Wiener helped to focus von Neumann’s thinking about brains and computers. In gatherings of the short-lived ‘Teleological Society’, and later in the ‘Conferences on Cybernetics’, von Neumann was at the heart of discussions on how the brain or computing machines generate ‘purposive behaviour’. Busy with so many other things, he would whizz in, lecture for an hour or two on the links between information and entropy or circuits for logical reasoning, then whizz off again – leaving the bewildered attendees to discuss the implications of whatever he had said for the rest of the afternoon. Listening to von Neumann talk about the logic of neuro-anatomy, one scientist declared, was like ‘hanging on to the tail of a kite’. Wiener, for his part, had the discomfiting habit of falling asleep during discussions and snoring loudly, only to wake with some pertinent comment demonstrating he had somehow been listening after all.

 

Giorgio Vasari, The Lives of the Most Excellent Painters, Sculptors, and Architects

History by way of biography—Vasari tells a tale of rebirth and artistic progress as Europe emerges from the dark ages, rediscovers the ancients, and then strives to surpass them. Tons of interesting observations on competition, collaboration, the spread of technology, and the psychology of (artistic) greatness. More than 180 lives in over 2000 pages, starting with Cimabue in the 13thC and reaching a climax with Michelangelo in the 16th. Somewhat gossipy and often inaccurate, it nonetheless remains our best source of information on the art and artists of Renaissance Italy. Vasari was a fairly successful painter himself, and his personal aquaintance with both the technique and the business of painting gives us an inside view of the craft. Full review.

It is clear that Leonardo, through his comprehension of art, began many things and never finished one of them, since it seemed to him that the hand was not able to attain to the perfection of art in carrying out the things which he imagined; for the reason that he conceived in idea difficulties so subtle and so marvellous, that they could never be expressed by the hands, be they ever so excellent. And so many were his caprices, that, philosophizing of natural things, he set himself to seek out the properties of herbs, going on even to observe the motions of the heavens, the path of the moon, and the courses of the sun.

 

Arthur Schopenhauer, Essays and Aphorisms

Excerpts from Parerga und Paralipomena. Unexpectedly hilarious; Arthur would've been one hell of a poaster. Surprisingly similar to the pragmatists in many respects. Spans a huge number of topics: ethics, the will, intelligence, animal welfare, religion, suicide, writing, and much more.

Thus we see, for example, the Catholic clergy totally convinced of the truth of all the doctrines of its Church, and the Protestant clergy likewise convinced of the truth of all the doctrines of its Church, and both defending the doctrines of their confession with equal zeal. Yet this conviction depends entirely on the country in which each was born: to the South German priest the truth of the Catholic dogma is perfectly apparent, but to the North German priest it is that of Protestant dogma which is perfectly apparent. If, then, these convictions, and others like them, rest on objective grounds, these grounds must be climatic; such convictions must be like flowers, the one flourishing only here, the other only there.

 

Thucydides, The History of the Peloponnesian War

I'm a Herodotus man through and through, but I can appreciate the Thycydidean perspective as well. Though I'm not entirely sure what that perspective entails: how much of his work is prescriptive and how much of it is descriptive? He's obviously a skeptic when it comes to the supernatural, and there's very little room for morality in his history; is this an artifact of the lack of morality in the way the Athenian went about their affairs, or is this something Thuc projects onto them? In any case, while reading this, one must always keep in mind that the Athenians lost!

It's interesting to read an ancient historian write about battles with 60 hoplites and 20 archers, and that kind of accounting accuracy perfectly captures Thuc's personality.

"... For Athens alone of her contemporaries is found when tested to be greater than her reputation, and alone gives no occasion to her assailants to blush at the antagonist by whom they have been worsted, or to her subjects to question her title to rule by merit. Rather, the admiration of the present and succeeding ages will be ours, since we have not left our power without witness, but have shown it by mighty proofs; and far from needing a Homer for our eulogist, or other of his craft whose verses might charm for the moment only for the impression which they gave to melt at the touch of fact, we have forced every sea and land to be the highway of our daring, and everywhere, whether for evil or for good, have left imperishable monuments behind us. Such is the Athens for which these men, in the assertion of their resolve not to lose her, nobly fought and died; and well may every one of their survivors be ready to suffer in her cause."

 

J. A. Baker, The Peregrine

10 years of obsessive, monomaniacal peregrine-watching in the East of England distilled to 200 pure, intense, astonishing pages. An incredibly rich dish that you can only eat so much of before needing to take a break. Reflects and contains nature both in its form and content. Somewhat reminiscent of Urne-Buriall in that it starts out in a dry, scientific tone and then reaches stylistic extremes later on.

Famously recommended by Werner Herzog (along with Virgil and The Short Happy Life of Francis Macomber), and it is indeed extremely Herzogian. There's no green idealism here, the endless cycle of killing which sustains the peregrine is presented unapologetically. "Beauty is vapour from the pit of death", Baker writes.

 

He hovered, and stayed still, striding on the crumbling columns of air, curved wings jerking and flexing. Five minutes he stayed there, fixed like a barb in the blue flesh of the sky. His body was still and rigid, his head turned from side to side, his tail fanned open and shut, his wings whipped and shuddered like canvas in the lash of the wind. He side-slipped to his left, paused, then glided round and down into what could only be the beginning of a tremendous stoop. There is no mistaking the menace of that first easy drifting fall. Smoothly, at an angle of fifty degrees, he descended; not slowly, but controlling his speed; gracefully, beautifully balanced. There was no abrupt change. The angle of his fall became gradually steeper till there was no angle left, but only a perfect arc. He curved over and slowly revolved, as though for delight, glorying in anticipation of the dive to come. His feet opened and gleamed golden, clutching up towards the sun. He rolled over, and they dulled, and turned towards the ground beneath, and closed again. For a thousand feet he fell, and curved, and slowly turned, and tilted upright. Then his speed increased, and he dropped vertically down. He had another thousand feet to fall, but now he fell sheer, shimmering down through dazzling sunlight, heart-shaped, like a heart in flames. He became smaller and darker, diving down from the sun. The partridge in the snow beneath looked up at the black heart dilating down upon him, and heard a hiss of wings rising to a roar. In ten seconds the hawk was down, and the whole splendid fabric, the arched reredos and immense fan-vaulting of his flight, was consumed and lost in the fiery maelstrom of the sky.

And for the partridge there was the sun suddenly shut out, the foul flailing blackness spreading wings above, the roar ceasing, the blazing knives driving in, the terrible white face descending, hooked and masked and horned and staring-eyed. And then the back-breaking agony beginning, and snow scattering from scuffling feet, and show filling the bill’s wide silent scream, till the merciful needle of the hawk’s beak notched in the straining neck and jerked the shuddering life away.

And for the hawk, resting now on the soft flaccid bulk of his prey, there was the rip and tear of choking feathers, and hot blood dripping from the hook of his beak, and rage dying slowly to a small hard core within.

And for the watcher, sheltered for centuries from such hunger and such rage, such agony and such fear, there is the memory of that sabring fall from the sky, and the vicarious joy of the guiltless hunter who kills only through his familiar, and wills him to be fed.

The Worst

William Hazlitt, Selected Writings

I despise the style of his political writings. Puffed up, aiming to dazzle rather than illuminate. The cheap rhetoric of the ochlagogue. Actively offensive. The non-political writings are much better: they are merely unreadable and sophomoric. Hazlitt's entire aesthetic philosophy just boils down to "art should imitate nature" repeated over and over again, and I can't stand the way he expresses it.

It is not denied that the people are best acquainted with their own wants, and most attached to their own interests. But then a question is started, as if the persons asking it were at a great loss for the answer,—Where are we to find the intellect of the people? Why, all the intellect that ever was is theirs. The public opinion expresses not only the collective sense of the whole people, but of all ages and nations, of all those minds that have devoted themselves to the love of truth and the good of mankind,—who have bequeathed their instructions, their hopes, and their example to posterity,—who have thought, spoke, written, acted, and suffered in the name and on the behalf of our common nature. All the greatest poets, sages, heroes, are ours originally, and by right.

 

Carlos Ruiz Zafón, The Shadow of the Wind

Just a dull airport novel. The coincidences pile on top of eachother as we are treated to interminable exposition dumps from improbable sources that conveniently know everything. Stylistically it tries too hard and achieves nothing.

Destiny is usually just around the corner. Like a thief, a hooker, or a lottery vendor: its three most common personifications. But what destiny does not do is home visits. You have to go for it.

 

Ada Palmer, Too Like the Lightning

Love Palmer's blog but this book just wasn't for me. Even though I read plenty of older books, I found the affected faux-18thC style absolutely grating. The plot mostly seems to be based on the Star Wars prequels, with endless scenes of characters talking about the taxation of trade routes or some other similarly boring nonsense. And there's a magical boy thrown in there for good measure, as well.

I could ask any contemporary here, ‘Are you a majority?’ and I know what he or she would answer: Of course not, Mycroft. I have a Hive, a race, a second language, a vocation and an avocation, hobbies of my own; add up my many strats and you will soon reduce me to a minority of one, and hence my happiness. I am unique, and proud of my uniqueness, and prouder still that, by being no majority, I ensure eternal peace. You lie, reader. There is one majority still entrenched in our commingled world, a great ‘us’ against a smaller ‘them.’ You will see it in time. I shall give only one hint—the deadliest majority is not something most of my contemporaries are, reader, it is something they are not.




Aspects of the Seeker

In Averroës's Search, Borges tells the story of the Islamic philosopher Averroës trying, and failing, to understand Aristotle's writings on theater. Borges sums it up in the afterword:

In the preceding tale, I have tried to narrate the process of failure, the process of defeat. I thought first of that archbishop of Canterbury who set himself the task of proving that God exists; then I thought of the alchemists who sought the philosopher’s stone; then, of he vain trisectors of the angle and squares of the circle. Then I reflected that a more poetic case than these would be a man who sets himself a goal that is not forbidden to other men, but is forbidden to him. I recalled Averroës, who, bounded within the circle of Islam, could never know the meaning of the words tragedy and comedy.

History and literature offer many cases of ironically failed quests for knowledge.

Some phenomena disappear immediately once someone describes them. Douglas Adams wrote of a theory "which states that if ever anyone discovers exactly what the Universe is for and why it is here, it will instantly disappear". The modern world offers many such anti-inductive cases, above all in the movements of the stock market: successful trading strategies tend to stop working after they become known. On a civilizational scale, Malthusianism became irrelevant right at the time someone was able to articulate the idea, and it seems that the moment we are able to improve ourselves through genetic engineering, we will be wiped out by our artificial creations.

A second type of ill-fated seeker is one who finds what he is looking for, but his goal is also a punishment. William Beckford, categorically rejecting Ulysses' actions at the land of the Sirens (perhaps inspired by his own life, and perhaps commenting on all attempts to comprehend the universe) created the apostate Caliph Vathek whose obsessive quest for knowledge results in his damnation, and for whom Hell is both the object of desire and the punishment for that desire. There are those who argue that the libertine Beckford only adopted this biblical attitude against the Faustian spirit as an ironic orientalist façade, but the Caliph resists all attempts at interpretation.

Some seekers reach their goal, only to have it slip out of their hands. Scientists will occasionally chance on the right idea but lack the ability to prove it: Aristarchus of Samos was doomed by the apparent size of the stars and the lack of parallax. The Royal Navy discovered that lemons prevent scurvy, and then through terrible epistemic luck managed to lose that knowledge over the course of the 19th century: lemons were replaced by limes low in vitamin C, but nobody noticed because the ships were faster. The problem only reappeared when polar explorers started suffering from scurvy despite bringing lime juice with them—and the answer was only discovered by the miraculously good luck of experimenting on guinea pigs, one of very few animals that don't produce vitamin C on their own.

Finally the most ironic case of them all, that of the Dalmatian archbishop and heretic Marco Antonio de Dominis: a seeker who is able to find the answer, but is condemned to believe it is false. De Dominis, a contemporary of Kepler (who wrote in favor of the lunar theory of tides) and Galileo (who mocked it), was also an amateur astronomer and wrote a book on the tides titled Euripus.

The archbishop begins by presenting both empirical and theoretical arguments in favor of the thesis that the earth is a sphere. He then describes the luni-solar theory of tides: he (correctly) writes that tides are caused by the combined gravitational action of the sun and the moon, (correctly) predicts that high tide occurs simultaneously at antipodal points, and (correctly) shows that the cycle of spring and neap tides can be explained by the combined action of the sun and moon. He also (correctly) deduces that the diurnal inequality between tides will be greatest when the moon is above the tropic of Cancer or Capricorn. Finally, de Dominis explains (incorrectly) that since the two daily tides are always equal to each other, the theory must be false. The heretical archbishop died behind the bars of the Castel Sant'Angelo before his book could be published.




Links & What I've Been Reading Q4 2021

Metascience

1. Investigating the replicability of preclinical cancer biology: "50 experiments from 23 papers were repeated, generating data about the replicability of a total of 158 effects [...] for positive effects, the median effect size in the replications was 85% smaller than the median effect size in the original experiments"

2. A catastrophic failure of peer review in obstetrics and gynaecology: "I estimate that across these 46 articles, 346 (64%) of the 542 parametric tests (unpaired t tests, or, occasionally, ANOVA) and 151 (61%) of the 247 contingency table test (Pearson's Χ² or Fisher's exact test) that I was able to check were incorrectly reported."

3. The Business of Extracting Knowledge from Academic Publications: "Close to nothing of what makes science actually work is published as text on the web."

4. A large replication project in marketing, with fairly catastrophic results. Amusingly the abstract doesn't mention the rate of successful replication.

5. Increasing Politicization and Homogeneity in Scientific Funding: An Analysis of NSF Grants, 1990-2020. The methodology is somewhat questionable, but insteresting nonetheless.

Covid

6. Scott Alexander on the Ivermectin literature and the trouble with trying to wade through a bunch of questionable papers. Alexandros Marinos responds.

7. Zvi's latest.

You are probably going to get Omicron, if you haven’t had it already. The level of precaution necessary to change this assessment is very high, and you probably don’t want to pay that price.

8. ADS on the Zvi-Holden bet and taking ideas seriously.

Making a blockchain game might genuinely be the best use of Zvi’s time, and he might be acting both rationality and ethically in choosing to pursue it. And so this situation is Good, but only in a very limited and local sense. The tragedy isn’t Zvi’s decision, it’s that a scenario even exists where this is the decision he has to make.

9. Omicron spreading faster than delta because of immune evasion? SARS-CoV-2 Omicron VOC Transmission in Danish Households. Plus twitter thread.

Forecasting

10. Forecasting in the Field: academics and non-experts try to predict the effects of development interventions.

the average correlation between predicted and observed effects is 0.75. Recipient types are less accurate than academics on average, but are at least as accurate for interventions and outcomes that are likely to be more familiar to them. The mean forecast of each group outperforms more than 75% of the comprising individuals, and averaging just five forecasts substantially reduces error, indicating strong “wisdom-of-crowds” effects. Three measures of academic expertise (rank, citations, and conducting research in East Africa) and two measures of confidence do not correlate with accuracy. Among recipient-types, high-accuracy “superforecasters” can be identified using observables. Small groups of these superforecasters are as accurate as academic respondents.

Economic History

11. The United Fruit Company? Good, Actually.

Using administrative census data with census-block geo-references from 1973 to 2011, we implement a geographic regression discontinuity design that exploits a land assignment that is orthogonal to our outcomes of interest. We find that the firm had a positive and persistent effect on living standards. Company documents explain that a key concern at the time was to attract and maintain a sizable workforce, which induced the firm to invest heavily in local amenities that can account for our result.

Book Reviews

12. Reviews of Moby Dick from 1851. "This is an odd book, professing to be a novel; wantonly eccentric; outrageously bombastic; in places charmingly and vividly descriptive." I love it when modern editions of old books include their contemporary reviews, unfortunately it's not done very often.

13. ADS on Stubborn Attachments and Straussian writing.

Crypto

14. Bloomberg report on Tether, including the story of how a French screenwriter ended up owning a Bahamian bank.

15. Vitalik Buterin on Crypto Cities.

16. A Glimpse of the Deep: Finding a Creature in Ethereum's Dark Forest.

This monster was watching Ethereum for an obscure mistake deep in the process of creating a transaction: the reuse of a number while signing a transaction. I went searching for this creature, laid bait, saw it in the wild, and found unexplained tracks. To understand how this bot works, we need to begin by reviewing ECDSA and digital signatures.

The Rest

17. Some answers to my questions about Borges, Browne, and Quevedo: On Borges and Quevedo. "The (sad) irony in Tlon’s ending is, therefore, not in a contrast Quevedo vs Browne, then, but in the contrast (Borges + Quevedo + Browne) vs Tlon. Or, maybe, grecolatin tradition versus modernity. With a tinge of sad resignation for the slow but unstoppable victory of the second over the first."

18. And here's a very interesting essay (in Spanish) on Borges's "francophobia".

19. SMTM wrap up the Chemical Hunger series on the causes of obesity after 20 posts.

20. On the NIH and the challenges of funding alcohol consumption RCTs. The big alcohol study that didn't happen: My primal scream of rage.

21. RCT of health insurance in India finds few positive effects: Effect of Health Insurance in India: A Randomized Controlled Trial.

22. "Many young females report joining Draco Malfoy as his girlfriend."

23. An interesting ACX comment on reversals in artistic "progress".

it's a pattern that has repeated throughout history and around the world, one of naturalist art executed with great skill being deliberately replaced with highly abstract art not requiring as much skill.

The cave paintings of Chauvet Cave in France ca 30,000 BP (before present) are more natural and technically much more sophisticated than any cave or rock paintings found after 20,000 BP (some of which are quite abstract and stylized).

Reminds me of this paper on bursts of technological development 60-80kya that lasted for a few thousand years and then disappeared. Related, a great new article on the Antikythera mechanism.

24. The Browser interview with QNTM.

25. Nemets on the genetic history of the ancient Greeks and the identity of the Sea Peoples.

26. Razib Khan: Out of Africa's midlife crisis

two San from different groups both living in Namibia’s Northern Kalahari desert, and speaking click languages from the same family, are more genetically distinct from one another, by a solid 20%, than a person from Stockholm is from a person from Shanghai.

27. Don't take psychedelics. "Results revealed significant shifts away from ‘physicalist’ or ‘materialist’ views, and towards panpsychism and fatalism, post use."

28. Blind people have a pretty good understanding of color.

Audio-Visual

29. Interface | Part II, cool animation project.

30. A project that made 999 forgeries of a Warhol drawing, then randomly mixed in the original, and sold them.

31. How to Build a Supersonic Trebuchet.

32. And here's a cool remix of Hugh Masekela's Stimela.

What I've Been Reading

Non-Fiction

  • The Man from the Future: The Visionary Life of John von Neumann by Ananyo Bhattacharya. Bhattacharya approaches his subject by focusing on ideas. The first chapter takes care of JvN's early life, and the rest of the book is split up based on the subjects he worked on: mathematics, quantum mechanics, the nuclear bomb, computing, game theory, RAND, and artificial life. Large parts of the book (I'd say about a third) are dedicated not to von Neumann but rather the work other people did based on his ideas. The game theory chapter, for example, covers Nash, Schelling, Aumann, etc. in economics, and John Maynard Smith, Price, Hamilton, etc. in evolutionary game theory. Bhattacharya is good at making all these technical subjects accessible without dumbing them down too much. JvN's personality, personal life, professional relationships, etc. on the other hand are given scant attention.

    Overall it felt a bit too short. In less than 300 pages we get such a wide array of ideas, and the story of how they influenced so many people, that it often feels like we're just skimming the surface in a speedboat. I'd like to take a deeper, more ponderous ride in a submarine some day.

  • Meetings with Remarkable Manuscripts by Christopher de Hamel. Fantastically gorgeous book, filled with high-quality prints of medieval manuscripts. Pleasant conversational style. Just lovely all around. Not just about the manuscripts themselves, but also who owned them, their condition, where they're housed, the librarians taking care of them, etc.

  • The Rings of Saturn by W. G. Sebald. A book of digressions. The frame is a walking tour of England, and on it are bolted various musings on Sir Thomas Browne, Joseph Konrad, silk manufacture, the Taiping rebellion, and so on. The subjects flow into each other so you don't know where one digression begins and the other ends. However, Sebald kind of undersells how interesting his subjects are; comparing his notes on FitzGerald to the famous Borges essay, for example, makes me wonder how Sebald managed to turn such a fascinating subject into such a dull essay.

  • Conquistador: Hernán Cortés, King Montezuma, and the Last Stand of the Aztecs by Buddy Levy. I didn't love the book (it felt a bit sloppy, and the style isn't great), but Cortes is an incredible character. The determination, the ingenuity, the absolute ruthlesness. When he murders his wife at the end of the book, all you can think is "well of course he did". And self-aware too: "I and my companions suffer from a disease of the heart that can be cured only with gold"! Perhaps it is the contrast against the Aztecs that, in a way, softens his image? Going to try Prescott's History of the Conquest of Mexico next.

  • Over the Edge of the World: Magellan's Terrifying Circumnavigation of the Globe by Laurence Bergreen. Solid narrative pop history. Feels a bit rushed after the point of Magellan's death. Exciting, adventurous stuff as you'd expect from the age of exploration.

  • A Man on the Moon: The Voyages of the Apollo Astronauts by Andrew Chaikin. Covers the entire thing plus a ton of backstory, very thorough (within its scope). Focused on the astronauts, and much of it is the preoduct of interviews with those astronauts, which is kind of obvious at many points as you're only getting one person's perspective on certain events. It would have been better with a broader, more objective view, in my opinion. The latter parts (after the first moon landing) include a surprising amount of geology! I read three books on the early space program this year and none of them was completely satisfying, I'm still trying to find the Richard Rhodes of Apollo...




How I Made $10k Predicting Which Studies Will Replicate

Starting in August 2019 I took part in the Replication Markets project, a part of DARPA's SCORE program whose goal is to predict which social science papers will successfully replicate. I have previously written about my views on the replication crisis after reading 2500+ papers; in this post I will explain the details of forecasting, trading, and optimizing my strategy within the rules of the game.

The Setup

3000 papers were split up into 10 rounds of ~300 papers each. Every round began with one week of surveys, followed by two weeks of market trading, and then a one week break. The studies were sourced from all social science disciplines (economics, psychology, sociology, management, etc.) and were published between 2009 and 2018 (in other words, most of the sample came from the post-replication crisis era).

Only a subset of the papers will be replicated: ~100 papers were selected for a full replication, and another ~150 for a "data replication" in which the same methodology is applied to a different (but pre-existing) dataset.1 Out of the target 250 replications, only about 100 were completed by the time the prizes were paid out.

Surveys

The surveys included a link to the paper, a brief summary of the claim selected for replication, the methodology, and a few statistical values (sample size, effect size, test statistic values, p-value). We then had to answer three questions:

  1. What is the probability of the paper replicating?
  2. What proportion of other forecasters do you think will answer >50% to the first question?
  3. How plausible is the claim in general?

The papers were split up into batches of 10, and the top 4 scorers in each batch won awards of $80, $40, $20, and $20 for a total of $4,800 per survey round.

The exact scoring method was not revealed in order to prevent gaming the system, but after the competition ended the organizers wrote a technical blog post explaining the "surrogate scoring rule" approach. Since the replications were not completed yet, scoring predictions had to be done without reference to the "ground truth"; instead they generated a "surrogate outcome" based on all the survey answers and used that to score the predictions.2

Markets

Every user started each round with 1 point per claim (so typically 300).3 These points were the currency used to buy "shares" for every claim. Long share positions pay out if the paper replicates successfully and short positions pay out if it does not. Like a normal stock market, if you bought shares at a low price and the price went up, you could sell those shares for a profit.

The starting price of each claim was based on its p-value:

  • p<.05: 30%
  • p<.01: 40%
  • p<.001: 80%

The market did not operate like a typical stock market (ie a continuous double auction); instead, they used Robin Hanson's Logarithmic Market Scoring Rule which allows users to trade without a counterparty.4 Effectively it works as an automated market maker, making it costlier to trade the more extreme the price: taking a claim from 50% to 51% was cheap, while taking it from 98% to 99% was very expensive. Without any order book depth, prices could be rather volatile as it didn't take much for a single person to significantly shift the price on a claim; this also created profitable trading opportunities.

The payout for the markets was about $14k per round, awarded in proportion to winning shares in the papers selected for replication. Given the target of 250 replications, that means about 8% of the claims would actually resolve. The small number of actually completed replications, however, caused some issues: round 9, for example, only had 2 (out of the target 25) replications actually pay out.

Early Steps - A Simple Model

I didn't take the first round very seriously, and I had a horrible flu during the second round, so I only really started playing in round 3. I remembered Tetlock writing that "it is impossible to find any domain in which humans clearly outperformed crude extrapolation algorithms, less still sophisticated statistical ones", so I decided to start with a statistical model to help me out.

This felt like a perfect occasion for a centaur approach (combining human judgment with a model), as there was plenty of quantitative data, but also lots of qualitative factors that are hard to model. For example, some papers with high p-values were nevertheless obviously going to replicate, due to how plausible the hypothesis was a priori.5

Luckily someone had already collected the relevant data and built a model.6 Altmejd et al. (2019) combine results from four different replication projects covering 131 replications (which they helpfully posted on OSF). Here are the features they used ranked by importance:

Their approach was fairly complex, however, and I wanted something simpler. On top of that I wanted to limit the number of variables I would have to collect for every paper, as I had to do 300 of them in a week—any factors that would be cumbersome to look up (eg the job title of each author) were discarded. I also transformed a bunch of the variables, for example replacing raw citation counts with log citations per year.

I ended up going with a logistic ridge regression (shrinkage tends to help with out-of-sample predictions). The Altmejd sample was limited in terms of the fields covered (they only had social/cognitive/econ), so I just pulled some parameter values out of my ass for the other fields—in retrospect they were not very good guesses.7

1
2
3
cv.ridge <- cv.glmnet(as.matrix(mydata), y_class, alpha = 0, family = "binomial")

coef(cv.ridge, cv.ridge$lambda.min)
ParameterValue
intercept0.40
log # of pages-0.26
p value-25.07
log # of authors-0.67
% male authors0.90
dummy for interaction effects-0.77
log citations per year0.37
discipline: economics0.27
discipline: social psychology-0.77
discipline: education-0.40
discipline: political science0.10
discipline: sociology-0.40
discipline: marketing0.10
discipline: orgbeh0.1
discipline: criminology-0.2
discipline: other psychology-0.2

This model was then implemented in a spreadsheet, so all I had to do was enter the data, and the prediction popped up:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
=exp(Intercept+
F18*pval+
IF(N18="interaction",1,0)*interaction+
male*P18+
logauth*ln(O18)+
loglen*ln(L18)+
if(D18="Social",1,0)*social+
if(D18="Economics",1,0)*econ+
if(D18="PoliSci",1,0)*polisci+
if(D18="Education",1,0)*educ+
if(D18="Sociology",1,0)*sociology+
if(D18="Marketing",1,0)*marketing+
if(D18="OrgBeh",1,0)*OrgBeh+
if(D18="Criminology",1,0)*criminology+
if(D18="Other Psychology",1,0)*otherpsych+
ln(M18/(2019-E18))*logcitesperyear)

While my model had significant coefficients on # of authors, ratio male, and # of pages, these variables were not predictive of market prices in RM. Even the relation of citations to market prices was very weak. I think the market simply ignored any data it was not given directly, even if it was important. This gave me a bit of an edge, but also made evaluating the performance of the model more difficult as the market was systematically wrong in some ways.

Collecting the additional data needed for the model was fairly cumbersome: completing the surveys took ~140 seconds per paper when I was just doing it in my head, and ~210 seconds with the extra work of data entry. It also made the process significantly more boring.

Predictions

I will give a quick overview of the forecasting approach here; a full analysis will come in a future post, including a great new dataset I'm preparing that covers the methodology of replicated papers.

At the broadest level it comes down to: the prior, the probability of a false negative, and the probability of a false positive.8 One must consider these factors for both the original and the replication.9

What does that look like in practice? I started by reading the summary of the study on the RM website (which included the abstract, a description of the selected claim, sample size, p-value, and effect size). After that I skimmed the paper itself. If I didn't understand the methodology I checked the methods and/or conclusions, but the vast majority of papers were just straight regressions, ANOVAs, or SEMs. The most important information was almost always in the table with the main statistical results.

The factors I took into account, in rough order of importance:

  • p-value. Try to find the actual p-value, they are often not reported. Many papers will just give stars for <.05 and <.01, but sometimes <.01 means 0.0000001! There's a shocking number of papers that only report coefficients and asterisks—no SEs, no CIs, no t-stats.
  • Power. Ideally you'll do a proper power analysis, but I just eyeballed it.
  • Plausibility. This is the most subjective part of the judgment and it can make an enormous difference. Some broad guidelines:
    • People respond to incentives.
    • Good things tend to be correlated with good things and negatively correlated with bad things.
    • Subtle interventions do not have huge effects.
  • Pre-registration. Huge plus. Ideally you want to check if the plan was actually followed.
  • Interaction effect. They tend to be especially underpowered.
  • Other research on the same/similar questions, tests, scales, methodologies—this can be difficult for non-specialists, but the track record of a theory or methodology is important. Beware publication bias.
  • Methodology - RCT/RDD/DID good. IV depends, many are crap. Various natural-/quasi-experiments: some good, some bad (often hard to replicate). Lab experiments, neutral. Approaches that don't deal with causal identification depend heavily on prior plausibility.
  • Robustness checks: how does the claim hold up across specifications, samples, experiments, etc.
  • Signs of a fishing expedition/researcher degrees of freedom. If you see a gazillion potential outcome variables and that they picked the one that happened to have p<0.05, that's what we in the business call a "red flag". Look out for stuff like ad hoc quadratic terms.
  • Suspiciously transformed variables. Continuous variables put into arbitrary bins are a classic p-hacking technique.
  • General propensity for error/inconsistency in measurements. Fluffy variables or experiments involving wrangling 9 month old babies, for example.

Things that don't matter for replication but matter very much in the real world:

  • Causal identification! The plausibility of a paper's causal identification strategy is generally orthogonal to its chances of replicating.
  • Generalizability. Lab experiments are replicated in other labs.

Some papers were completely outside my understanding, and I didn't spend any time trying to understand them. Jargon-heavy cognitive science papers often fell into this category. I just gave a forecast close to the default and marked them as "low confidence" in my notes, then avoided trading them during the market round. On the other hand, sometimes I got the feeling that the jargon was just there to cover up bullshit (leadership studies, I'm looking at you) in which case I docked points for stuff I didn't understand. The epistemological problem of how to determine which jargon is legit and which is not, is left as an exercise to the reader.

Pour exemple

The data from Replication Markets are still embargoed, so I can't give you any real examples. Instead, I have selected a couple of papers that were not part of the project but are similar enough.

Ex. 1: Criminology

My first example is a criminology paper which purports to investigate the effect of parenting styles on criminal offending. Despite using causal language throughout, the paper has no causal identification strategy whatsoever. If criminologists had better GRE scores this nonsense would never have been published. The most relevant bits of the abstract:

The present study used path analyses and prospective, longitudinal data from a sample of 318 African American men to examine the effects of eight parenting styles on adult crime. Furthermore, we investigated the extent to which significant parenting effects are mediated by criminogenic schemas, negative emotions, peer affiliations, adult transitions, and involvement with the criminal justice system. Consonant with the study hypotheses, the results indicated that [...] parenting styles low on demandingness but high on responsiveness or corporal punishment were associated with a robust increase in risk for adult crime.

The selected claim is the effect of abusive parenting (the "abusive" parenting style involves "high corporal punishment" but low "demandingness" and "responsiveness") on offending; I have highlighted the outcome in the main regression table below. While the asterisks only say p<.01, the text below indicates that the p-value is actually <.001.

Make your own guess about the probability of replication and then scroll down to mine below.

I'd give this claim 78%. The results are obviously confounded, but they're confounded in a way that is fairly intuitive, and we would expect the replication to be confounded in the exact same way. Abusive parents are clearly more likely to have kids who become criminals. Although they don't give us the exact t-stat, the p-value is very low. On the negative side the sample size (318 people spread over 8 different parenting styles) isn't that big, I'm a bit worried about variance in the classification of parenting styles, and there's a chance that the (non-causal) relation between abusive parenting and offending could be lost in the controls.

This is a classic example of "just because it replicates doesn't mean it's good", and also a prime example of why the entire field of criminology should be scrapped.

Ex. 2: Environmental Psychology

My second example is an "environmental psychology" paper about collective guilt and how people act in response to global warming.

The present research examines whether collective guilt for an ingroup’s collective greenhouse gas emissions mediates the effects of beliefs about the causes and effects of global warming on willingness to engage in mitigation behavior.

N=72 people responded to a survey after a manipulation, on a) the causes and b) the importance of the effects of climate change. The selected claim is that "participants in the human cause-minor effect condition reported more collective guilt than did participants in the other three conditions (b* = .50, p <.05)". Again, make your own guess before scrolling down.

I'd go with 23% on this one. Large p-value, interaction effect, relatively small sample, and a result that does not seem all that plausible a priori. The lack of significance on the Cause/Effect parameters alone is also suspicious, as is the lack of signifiance on mitigation intentions. Lots of opportunities to find some significant effect here!

Spreadsheets

The worst part of Replication Markets was the user interface: it did not offer any way to keep track of one's survey answers, so in order to effectively navigate the market rounds I had to manually keep track of all the predictions. There was also no way to track changes in the value of one's shares, so again that had to be done manually in order to exit successful trades and find new opportunities. The initial solution was giant spreadsheets:

Since the initial prices were set depending on the claim's p-value, I knew ahead of time which claims would be most mispriced at the start of trading (and that's where the greatest opportunities were). So a second spreadsheet was used to track the best initial trades.11 The final column tracks how those trades worked out by the end of the market round; as you can see not all of them were successful (including some significant "overshoots"), but in general I had a good hit rate. As you can see, there were far more "longs" than "shorts" at the start: these were mostly results that were highly plausible a priori but had failed to get a p-value below 0.001.

["Final" is my estimate, "default" is the starting price, "mkt" is the final market price]

Finally, a third spreadsheet was used to track live trading during the market rounds. There was no clean way of getting the prices from the RM website to my sheet, so I copy/pasted everything, parsed it, and then inserted the values into the sheet. I usually did that a few times per day (more often at the start, since that was where most trading activity was concentrated). The claims were then ranked by the difference between my own estimate and the market. My current share positions were listed next to them so I knew what I needed to trade. The "Change" column listed the change in price since the last update, so I could easily spot big changes (which usually meant new trading opportunities).

["Live" is the current market price, "My" is my estimate, "Shares" is the current position]

Forget the Model!

After the third round I took a look at the data to evaluate the model and there were two main problems:

  • My own errors (prediction minus market price) were very similar with the errors of the model:
  • The model failed badly at high-probability claims, and failed to improve overall performance. Here's the root mean square error vs market prices, grouped by p-value:

Of course what the model was actually trying to predict was replication, not the market price. But market prices were the only guide I had to go by (we didn't even get feedback on survey performance), and I believed the market was right and the model was wrong when it came to low-p-value claims.

What would happen if everyone tried to optimize for predicting market prices? I imagine we could have gotten into weird feedback loops, causing serious disconnects between market prices and actual replication probability. In practice I don't think that was an issue though.

If I had kept going with the model, I had some improvements in mind:

  • Add some sort of non-linear p-value term (or go with z-scores instead).
  • Quantify my subjective judgment of "plausibility" and add it as another variable in the model.
  • Use the round 3 market data of 300 papers (possibly with extremized prices) to estimate a new model, which would more than triple my N from the original 131 papers. But I wasn't sure how to combine categorical data from the previous replications and probabilities from the prices in a single model.12

At this point it didn't seem worth the effort, especially given all the extra data collection work involved. So, from round 4 onward I abandoned the model completely and relied only on my own guesses.

Playing the Game

Two basic facts dictated the trading strategy:

  1. Only a small % of claims will actually be replicated and pay out.
  2. Most claims are approximately correctly priced.

It follows that smart traders make many trades, move the price by a small amount (the larger your trade the larger the price impact), and have a diversified portfolio. The inverse of this rule can be used to identify bad traders: anyone moving the price by a huge amount and concentrating their portfolio in a small number of bets is almost certainly a bad trader, and one can profitably fade their trades.

Another source of profitable trades was the start of the round. Many claims were highly mispriced, but making a profit depended on getting to them first, which was not always easy since everyone more or less wanted to make the same trades. Beyond that, I focused on simply allocating most of my points toward the most-mispriced claims.

I split the trading rounds into two phases:

  1. Trading based on the expected price movement.

  2. At the very end of the round, trading based on my actual estimate of replication probability.

Usually these two aspects would coincide, but there were certain types of claims that I believed were systematically mispriced by other market participants.13 Trading those in the hope of making profits during the market round didn't work out, so I only allocated points toward them at the end.

Another factor to take into consideration was that not all claims were equally likely to be selected for replication. In some cases it was pretty obvious that a paper would be difficult or impossible to replicate directly. I was happy to trade them, but by the end of the round I excluded them from the portfolio.14

Buying the most mispriced items also means you're stuck with a somewhat contrarian portfolio, which can be dangerous if you're wrong. Given the flat payout structure of the market, following the herd was not necessarily a bad idea. Sometimes if a claim traded strongly against my own forecast, I would lower the weight assigned to it or even avoid it completely. Suppose you think a study has a 30% chance of replicating, and a liquid market insists it has a 70% chance—how do you revise your forecast?

Reacting to Feedback

After every round I generated a bunch of graphs that were designed to help me understand the market and improve my own forecasts. This was complicated by the fact that there were no replication results—all I had to go by were the market prices, and they could be misleading.

Among other things, I compared means, standard deviations, and quartiles of my own predictions vs the market; looked at my means and RMSE grouped by p-value and discipline; plotted the distribution of forecasts, and error vs market price; etc.

One standard pattern of prediction markets is that extremizing the market prediction makes it better. Simplistically, you can think of the market price being determined by informed traders and uninformed/noise traders. The latter pull the price toward the middle, so the best prediction is going to be (on average) more extreme than the market's. This is made worse in the case of Replication Markets because of the LMSR algorithm which makes shares much more expensive the closer you get to 0 or 100%. So you can often improve on things by just extremizing the market forecast, and I always checked to see if my predictions were on the extremizing side vs the market.

Here you can see the density plots of my own vs the market forecasts, split up by p-value category. (The vertical line is the default starting price for each group.)

And here's the same data in scatterplot form:

My predictions vs the market.My predictions vs the market. Difference between my forecasts and the market, by discipline. The market was more confident in results from economics, at least in round 3.Difference between my forecasts and the market, by discipline. The market was more confident in results from economics, at least in round 3.

Over time my own predictions converged with the market. I'm not entirely sure how to interpret this trend. Perhaps I was influenced by the market and subtly changed my predictions based on what I saw. Did that make me more accurate or less? It's unclear, and based on the limited number of actual replication results it's impossible to tell. Another possibility is that the changing composition of forecasters over time made the market more similar to me?

Automated Trading

I think a lot of my success was due to putting in more effort than others were willing to. And by "putting in effort" I mean automating it so I don't have to put in any effort. In round 6 the trading API was introduced; at that point I dropped the spreadsheets and quickly threw together a desktop application (using C# & WPF) that utilized the API and included both automated and manual trading.15 Automating things also made more frequent data updates possible: instead of copy-pasting a giant webpage a few times a day, now everything updated automatically once every 15 minutes.

The main area on the left is the current state of the market and my portfolio, with papers sorted by how mispriced they are. Mkt is the current market price, My is my forecast, Position is the number of shares owned, Liq. Value is the number of points I could get by exiting this position, WF is a weight factor for the portfolio optimization, and Hist shows the price history of that claim.

On the right we have pending orders, a list of the latest orders executed on the market, plus logging on the bottom.

I used a simple weighting algorithm with a few heuristics sprinkled on top. Below you can see the settings for the weighting, plus a graph of the portfolio weights allocated by claim (the most-mispriced claims are on the left).

To start with I simply generated weights proportional to the square of the difference between the current market price and my target price (Exponent). Then,

  • multiplied that by a per-study weight factor (WF in the main screen),
  • multiplied that by ExtremeValueMultiplier for claims with extreme prices (<8% or >96%),
  • removed any claims with a difference smaller than the CutOff,
  • removed any claims with weight below MinThreshold,
  • limited the maximum weight to MaxPosition,
  • and disallowed any trading for claims that were already close to their target weight (NoWeightChangeBandwidth).

There was also another factor to take into consideration: the RM organizers ran some bots of their own. One simply traded randomly, while the other systematically moved prices back toward their default values. This created a predictable price pressure which had to be taken into account and potentially exploited: the DefDiffPenalizationFactor lowered the weight of claims that were expected to have adverse movements due to the bots.

Fading large price movements was automated, and I kept a certain amount of free points available so that I could take advantage of them quickly. Finally, turning the weighting algorithm into trades was fairly simple. If the free points fell below a threshold, the bot would automatically sell some shares. Most trades did not warrant a reaction however, and I had a semi-automated system for bringing the portfolio in line with the generated weights, which involved hitting a button to generate the orders and then firing them off.

High Frequency Trading

When there are a) obviously profitable trades to be made and b) multiple people competing for them, it's very easy to get into a competitive spiral that pushes speeds down to the minimum allowed by the available technology. That's how a replication prediction market ended up being all about shaving milliseconds off of trading algos.

By round 9 another player (named CPM) had also automated his trades and he was faster than me so he took all my profits by reacting to profitable opportunities before I could get my orders in—we were now locked in an HFT latency race. There was only one round left so I didn't want to spend too much time on it, but I did a small rewrite of my trading app so it could run on linux (thanks, .NET Core), which involved splitting it into a client (with the UI) and a server (with the trading logic), and patching in some networking so I could control it remotely.16 Then, I threw it up on my VPS which had lower ping to the RM servers.

When I first ran my autotrader, I polled the API for new trades once every 15 minutes17. Now it was a fight for milliseconds. Unfortunately placing the autotrader on the VPS wasn't enough, the latency was still fairly high and CPM crushed me again, though by a smaller margin this time. Sometimes I got lucky and snagged an opportunity before he could get to it though.

The Results

In money terms, I made $6640 from the surveys and $4020 from the markets for a total of $10,660 (out of a total prizepool of about $190k).

In terms of the actual replication results, the detailed outcomes are still embargoed, so we'll have to wait until next summer (at least) to get a look at them. Some broad stats can be shared however: the market predicted a 54% chance of replication on average—and 54% of the replications succeeded (the market isn't that good, it got lucky).

Of 107 claims that resolved, I have data on 31 which I made money on. For the rest I either had no shares, or had shares in the incorrect direction. Since I only have data on the successes, there's no way to judge my performance right now.

Survey vs Market Payouts

The survey round payout scheme was top-heavy, and small variations in performance resulted in large differences in winnings. The market payout on the other hand was more or less communistic. Everyone gets the same number of points; and it was difficult to either gain or lose too many of them in the two weeks of trading. As a result, the final distribution of prizes is rather flat. At best a good forecaster might increase earnings by ~10% by exploiting mispricings, plus a bit more through intelligent trading. The Gini coefficient of the survey payouts was 0.76, while the Gini of the market payouts was 0.63 (this is confounded by different participation levels, but you get the point).

This was backwards. I think one of the most important aspects of "ideal" prediction markets is that informed traders can compound their winnings, while uninformed traders go broke. The market mechanism works well because the feedback loop weeds out those who are consistently wrong. This element was completely missing in the RM project. I think the market payout scheme should have been top-heavy, and should have allowed for compounding across rounds, while the survey round should have been flatter in order to incentivize broader participation.

Conclusion

If the market had kept going, my next step would have been to use other people's trades to update my estimates. The idea was to look at their past trades to determine how good they were (based on the price movement following their trade), then use the magnitude of their trades to weigh their confidence in each trade, and finally incorporate that info in my own forecast. Overall it's fascinating how even a relatively simple market like this has tons of little nuances, exploitable regularities, and huge potential for modeling and trading strategies of all sorts.

In the end, are subsidized markets necessary for predicting replication? Probably not. The predictions will(?) be used to train our AI replacements, and I believe SCORE's other replication prediction project, repliCATS, successfully used (cheaper) discussion groups. It will be interesting to see how the two approaches compare. Tetlock's research shows that working as part of a team increases the accuracy of forecasters, so it wouldn't surprise me if repliCATS comes out ahead. A combination of teams (aided by ML) and markets would be the best, but at some point the marginal accuracy gains aren't really worth the extra effort and money.

I strongly believe that identifying reliable research is not the main problem in social science today. The real issue is making sure unreliable research is not produced in the first place, and if it is produced, to make sure it does not receive money and citations. And for that you have to change The Incentives.


PS. Shoot me an email if you're doing anything interesting and/or lucrative in forecasting.

PPS. CPM, rm_user, BradleyJBaker, or any other RM participant who wants to chat, hit me up!


  1. 1.For example a paper based on US GDP data might be "replicated" on German GDP data.
  2. 2.The Bayesian Truth Serum answers do not appear to be used in the scoring?
  3. 3.There were also some bonus points for continuous participation over multiple rounds.
  4. 4.There would be significant liquidity problems with a continuous double auction market.
  5. 5.I can't provide any specific examples until the embargo is lifted, sometime next year.
  6. 6.Cowen's Second Law!
  7. 7.If page count/# authors/% male variables are actually predictive, I suspect it's mostly as a proxy for discipline and/or journal. I haven't quantified it, but subjectively I felt there were large and consistent differences between fields.
  8. 8.The RM replications followed a somewhat complicated protocol: first, a replication with "90% power to detect 75% of the original effect size at the 5% level. If that fails, additional data will be collected to reach "90% power to detect 50% of the original effect size at the 5% level".
  9. 9.Scroll down to "Reconstruction of the Prior and Posterior Probabilities p0, p1, and p2 from the Market Price" in Dreber et al. 2015 for some equations.
  10. 10.In fact it's a lot lower than the .001 threshold they give.
  11. 11.In order to trade quickly at the start, I opened a tab for each claim. When the market opened, I refreshed them all and quickly put in the orders.
  12. 12.I still haven't looked into it, any suggestions? Could just estimate two different models and weighted average the coefficients - caveman statistics.
  13. 13.Behavioral genetics papers for example were undervalued by the market. Also claims where the displayed p-value was inaccurate - most people wouldn't delve into the paper and calculate the p-value, they just trusted the info given on the RM interface.
  14. 14.Another factor to take into consideration was that claims with more shares outstanding had lower expected value, especially during the first five rounds when only ~10 claims per round would pay out. The more winning shares on a claim, the less $ per share would be paid out (assuming the claim is replicated). At the end of the round I traded out of busy claims and into ignored ones in order to maximize my returns. After round 5 the number of claims selected for replication per round increased a lot, making this mostly irrelevant. Or so i thought: this actually turned out to be quite important since only a handful of replications were actually completed for each round.
  15. 15.The code is pretty ugly so I'm probably not going to release it.
  16. 16.A basic familiarity with network programming is an invaluable tool for every forecaster's toolkit.
  17. 17.The API had no websockets or long polling, so I had to poll the server for new trades all the time.