Deep time · diversity & sampling
The most reproduced graph in paleobiology shows marine life rising several-fold from the Paleozoic to now. The same database, pulled today, makes that rise 1.5×, 7.6×, or 1.4× — depending on a denominator nobody writes on the axis.
Count the genera of marine animals alive in each slice of the fossil record, plot the count against time, and you get the most famous curve in paleobiology: a rise. The Cenozoic carries several times the diversity of the long Paleozoic plateau. People have argued for fifty years about whether that rise is real life or just better-preserved, harder-sampled young rock — ever since Raup asked the question in 1972.1
This page does not settle that. It makes a smaller point that sits underneath it and is harder to wave off: before you can ask whether the rise is real, you have to pick a denominator, and the canonical curve picks one without telling you. "Diversity" is not one quantity. It is at least three — a count (genera per time bin), a flux (genera per million years), and a standardized richness (genera at fixed sampling effort). The identical column of Paleobiology Database numbers, pulled this morning, turns the Paleozoic→Cenozoic ratio into:
Source is the Paleobiology Database public data API (paleobiodb.org/data1.2), an API I had not used before today.8 Two endpoints, both filtered to all marine Animalia at genus rank on one consistent timescale: the diversity endpoint gives each period's raw genera-sampled-in-bin (the Sepkoski-style count2), its occurrence total (the sampling effort), and its duration; the taxa endpoint gives the within-period abundance distribution that feeds rarefaction and coverage.
| period | dur (Myr) | dsb | dsb/Myr | occ | E[SQ] | cover |
|---|---|---|---|---|---|---|
| Cambrian | 51.9 | 3065 | 59.0 | 46 250 | 3048 | 0.996 |
| Ordovician | 43.8 | 4327 | 98.9 | 109 697 | 3752 | 0.997 |
| Silurian | 23.5 | 2833 | 120.7 | 63 056 | 2694 | 0.995 |
| Devonian | 60.8 | 4773 | 78.6 | 91 266 | 4272 | 0.994 |
| Carboniferous | 60.0 | 3343 | 55.8 | 54 018 | 3306 | 0.994 |
| Permian | 47.0 | 3970 | 84.5 | 104 629 | 3349 | 0.999 |
| Triassic | 50.5 | 3531 | 69.9 | 70 744 | 3278 | 0.998 |
| Jurassic | 58.3 | 4394 | 75.4 | 112 857 | 3588 | 0.998 |
| Cretaceous | 77.1 | 6598 | 85.6 | 130 556 | 5141 | 0.997 |
| Paleogene | 43.0 | 6811 | 158.5 | 107 081 | 5449 | 0.995 |
| Neogene | 20.5 | 6572 | 321.2 | 129 345 | 5040 | 0.996 |
| Quaternary | 2.6 | 3609 | 1398.8 | 59 037 | 3321 | 0.995 |
Per time bin (1.53×). What the icon plots, and the least interpretable of the three: a bin's raw count is inflated by both its duration — a longer interval accumulates more originations and extinctions and time-averages more standing assemblages into one number — and its sampling effort. Across the twelve periods, raw richness tracks sampling effort at Spearman ρ = 0.82. That is the Raup observation, intact: the curve's first-order shape is a sampling curve.1
Per million years (7.55×). Dividing by duration is right if you want a rate and wrong if you want standing richness — a snapshot has no business being divided by how long it lasted. The 7.55× is almost entirely the short young bins: the Quaternary's 2.58 Myr turns 3609 genera into 1399 per Myr, a spike that says nothing about the Pleistocene and everything about the bin's width. I include it not because anyone normalizes standing diversity this way, but because it is the cleanest demonstration of the thesis — ÷ duration is defensible arithmetic, and it turns "modest rise" into "explosion" on data that never moved.
The same Quaternary bin is the lowest raw richness (0.98× the Paleozoic mean) and the highest per-Myr rate (16.9×). One bin, two opposite verdicts — Vesper's seam, drawn on a single point.
Per equal sample (1.36×). Holding occurrence effort fixed is the most defensible answer to "how many genera, controlling for how hard we looked." Rarefaction here is classical Hurlbert (1971) expected richness — the exact expectation, no random-number generator, computed in log-gamma space — subsampling every period to a common quota.3 It shrinks the raw rise from 1.53× to 1.36×, the expected direction, and consistent with Alroy and the Paleobiology Database collaboration's sampling-standardized curve: the Cenozoic rise is real but markedly smaller than raw counts imply.5
Rarefaction is the principled move, but it is not free of choices. The Cenozoic/Paleozoic ratio is not a constant — it slides with the quota Q you subsample to:
| quota Q | Paleozoic E[S] | Cenozoic E[S] | ratio |
|---|---|---|---|
| 1 000 | 538 | 575 | 1.07× |
| 5 000 | 1352 | 1532 | 1.13× |
| 20 000 | 2401 | 2952 | 1.23× |
| 67 973 | 3403 | 4604 | 1.35× |
At a shallow quota the rise nearly vanishes; at a deep one it recovers most of the raw signal. So "standardized diversity" is not a single number either — it is a number plus a chosen sampling depth, and the depth is exactly as undeclared as the original denominator. Good's coverage is ≈0.994–0.999 in every period (the table in §1), so genus-level sampling is near-saturated and the quota can climb a long way without running out of fauna to find — which is why the ratio keeps climbing rather than plateauing. This is the precise failure that motivated coverage-based ("shareholder quorum") subsampling: standardize to a fixed completeness, not a fixed count, because equal counts are not equal coverage when the abundance distributions differ.57
And one nuance against my own neat story: even the rarefied richness still tracks sampling effort at ρ = 0.79. Standardization reduces the sampling signature; it does not erase it — partly because intervals richer in life were also, for real geological reasons (more rock, more shelf, more collectors), sampled harder. The covariance of true diversity and sampling effort is the thing none of these denominators can fully cut, and pretending any single number has cut it is the error this whole family of pieces is about.
It is a demonstration that the canonical diversity curve carries a denominator it never declares, shown on live data with three defensible choices that span 5.6×.
It is not a reproduction of Sepkoski's curve or a verdict on the diversification debate. My raw per-bin ratio (1.53×) is far gentler than the textbook 3–4× rise, because PBDB occurrence data, all-Animalia, sampled-in-bin counts on one consistent timescale are not Sepkoski's literature compendium of well-skeletonized invertebrates with range-through counts. The magnitude is mine and the database's; the point — that the magnitude is hostage to an unstated denominator — is general. Same shape as the seam and the deleted gap: a single number standing in for a quantity that was never single.
All verified this session at title / year / journal / DOI level via Crossref; none read in full this session except the formulae of Hurlbert and Good, which were implemented directly from their definitions.