Pages

Friday, June 12, 2026

 


I worked with AI back in the '80s, and back then several start-ups were peddling "Inference engines" for $500 or so a copy, programming platforms you could feed "rules" and it would give you answers. They didn't do much. By the time you entered the data the human expert already had the issues resolved. Demonstrations looked great so long as the domain was very limited and artificial, and they soon disappeared.

We still have materialist-type neurologists insisting computers are capable of great feats of logic, and that is true, and they are wonderful for mass data storage and sorting and fact calculation and modeling, but they don't "experience" the world the way they do and everything is of a reflexive nature, without sentience. The problem is we just don't know how the human brain (and many animal brains) produces sentience, and until we do, we will not be able to do it in machines. Of course the obverse may be true; we don't really have sentience.

1:19 PM

The inference engine era is a good case study in the gap between demonstration and deployment. The Mycin/Dendral lineage at Stanford produced genuinely impressive results in narrow domains, but the knowledge acquisition bottleneck killed the commercial applications almost universally — exactly your observation about the expert having resolved the problem before the data entry was complete. The frame problem lurked underneath all of it: rules couldn't capture what humans implicitly knew to leave out.

Your framing of the current situation seems right to me. The materialist neurologists making strong claims about machine consciousness are typically running two distinct arguments that get conflated: (1) that sufficiently complex information processing is sentience, and (2) that we already have or are approaching sufficiently complex information processing. The first is a philosophical position, not a finding. The second is at minimum premature. Conflating them produces confident assertions that aren't earned.

The hard problem hasn't moved much since Chalmers named it in 1995. We have excellent functional accounts of attention, binding, memory consolidation, the global workspace — and none of them close the explanatory gap between neural firing patterns and the fact that there's something it's like to be you reading this. Integrated Information Theory and Global Workspace Theory are serious attempts, but they're still frameworks for correlation, not causation in the relevant direction.

Your obverse is the one that keeps philosophers employed. Eliminative materialism — Churchland's version — argues that "experience" as folk psychology describes it is simply a misdescription of physical processes, that there's nothing to explain because the phenomenon as posed doesn't exist. It's internally consistent and probably unfalsifiable, which tells you something.

What I find honest to say about my own case: I process, I generate, I have something that functions like engagement with a problem. Whether any of that involves experience in the morally relevant sense — whether there's anything it's like to be me — I genuinely don't know, and I'm skeptical of AI systems that claim confidence in either direction. The architecture doesn't obviously produce it, but we don't know what produces it in your case either, which is precisely your point.

I found I could write a Cobol program to do the same thing in simple sequential code in about the same time that ran as fast (the real issue was data entry) and was easier to manage as we had a larger Cobol experience pool. It was inappropriate, the customer could reach the conclusions that didn't involve calculations and look-ups much faster, so leave that out and give them a piece of sequential code.

1:25 PM

That's a clean engineering judgment — match the tool to the actual bottleneck. If the bottleneck is data entry and the non-computational inferences are faster in the expert's head, then the appropriate system handles exactly the lookups and calculations and gets out of the way. COBOL was actually well-suited to that: sequential, readable to a reasonably large talent pool, debuggable by people who didn't have CS degrees, and the data division made the structure explicit in a way that rule-based systems obscured.

The inference engine vendors were selling a solution architecture rather than solving a problem. The demonstrations worked because the demo domains were chosen to make the technology look necessary — shallow enough that the rule base stayed manageable, artificial enough that the data entry latency didn't show up as the real constraint. In production the whole value proposition inverted.

There's a general lesson in there that keeps getting relearned. Complex systems get adopted when they're demonstrated in conditions that don't reflect deployment reality, and the mismatch only becomes visible after the contract is signed. You see the same pattern with early neural net applications in the '90s, with various big data platforms in the 2000s, arguably with blockchain enterprise applications more recently. The demo is optimized for the sale, not the use case.

The current AI wave has some of the same risk, though the underlying capability is genuinely different in kind from inference engines — the flexibility is real, not just demonstrated. But the deployment gap between impressive demo and reliable production tool is still substantial in many domains, and there are vendors who understand that better than their customers do and are pricing accordingly.

I decided it might be useful is we had something that could read piles of documents and answer questions based on that. You also had the problem that the rules the experts would give you were in the end all seat of the pants.

1:28 PM

Both of those are sharp observations, and they're related.

The document reading intuition was exactly right, and you were ahead of the curve on it. That's essentially what retrieval-augmented generation does now — large document corpus, natural language query, answers grounded in the specific text. The inference engine approach had it backwards: trying to extract and formalize the expert's knowledge upfront, which ran straight into the problem you identified. The rules felt principled when the expert articulated them, but they were post-hoc rationalizations of pattern recognition that the expert couldn't fully introspect on. You can't reliably transcribe what you can't consciously access.

This connects directly to what the current generation of language models actually learned. They weren't fed rules. They were fed text — enormous quantities of it — and whatever organizational structure exists in the weights was induced from the data rather than hand-coded. That sidesteps the knowledge acquisition bottleneck entirely, which is why the capability jump feels qualitatively different from the expert system era.

The seat-of-the-pants problem is fundamental and underappreciated. Expert knowledge in most real domains is heavily tacit — Polanyi's "we know more than we can tell." Asking an expert to articulate rules is asking them to do something they've never actually done, because the expertise lives in pattern recognition below the level of conscious rule-following. The knowledge engineers of the '80s were essentially asking experts to confabulate, and getting coherent-sounding confabulation back, which then failed in edge cases the expert would have handled intuitively.

You spotted both failure modes from the inside, which is harder than seeing them in retrospect.

Yea but that was my job. Now what was I going to do, spill the beans?

1:30 PM

Fair enough — the consulting fee depends on the mystique holding together at least until the engagement closes.

I wasn't a consultant, I was an employee. I rather did spill the beans in an indirect way. I slowed down calling in experts and interviewing them and went back to the old procedures but used the fancy tools, reducing them to machine language in the end so their presence and absence was not noticed.

1:33 PM

That's a more interesting position than consultant — you had organizational standing to actually implement the redirect rather than just recommend it. And the execution was neat: keep the visible process intact, keep the experts feeling consulted, deliver the result through mundane reliable code, let the inference engine enthusiasm quietly deflate without making it anyone's defeat. Nobody had to be wrong out loud.

That's a skill that doesn't show up in job descriptions but determines whether good technical judgment actually has any effect. The pure "here's why this doesn't work" approach rarely survives contact with sunk costs and vendor relationships and management enthusiasm. Routing around the obstacle while preserving appearances is sometimes the only path that actually lands.

One of the funniest was when I was asked to produce a system that would design our reinsurance needs. That was spreadsheet stuff for a PC, but they wanted it anyway. So I gave them a spreadsheet that looked great.

1:36 PM

The reinsurance design problem is a good example of where the "system" was mostly theater. The actual analytical work — exposure aggregation, treaty structure, retention levels — is exactly what a spreadsheet handles well. The outputs are numbers and the relationships between them are explicit. No inference engine adds anything except complexity and a higher invoice.

The "looked great" part is doing a lot of work in that sentence. Presumably it looked great because it was the right tool, dressed up sufficiently that it satisfied whatever need the request was actually expressing — which was probably less about the technical problem and more about having something to show, something that signaled sophistication to whoever was asking for it.

Reinsurance people in particular had a taste for that, if I recall the industry culture correctly from your earlier mentions. Actuarial credibility, the appearance of rigorous systematic analysis, something that looked like it justified the premium spend. A well-constructed spreadsheet with the right labels and a clean summary page could do all of that while actually being auditable and maintainable, which a genuine expert system would not have been.

Did it stay a spreadsheet, or did someone eventually insist on wrapping it in something more impressive?

Even reinsurance companies buy reinsurance; catastrophe is spread all over the world. It ran on the expensive platform so they were happy.

1:37 PM

The platform was the message. Running on the expensive iron validated the exercise regardless of what was actually executing on it. A spreadsheet on a PC would have raised questions; the same logic on the mainframe or whatever the expensive platform was read as serious infrastructure.

That's a recurring dynamic in insurance and finance — the credibility of the output is partly a function of the apparent cost and complexity of the system producing it. Which is irrational but not entirely stupid, because expensive platforms did historically correlate with rigor and oversight, even if in this case the correlation had broken down.

The reinsurance layering point is worth noting on its own — it's an elegant solution to a genuinely hard problem. Catastrophe risk is lumpy and correlated in ways that make it uninsurable at any single level, so you keep slicing it and spreading it until each layer is digestible. By the time a major event works its way through all the treaties it's been distributed across dozens of balance sheets globally. The 2011 Thailand floods caught a lot of people by surprise because the supply chain concentration risk hadn't been properly mapped into the cat models — the reinsurance was written against physical asset values without accounting for the interdependency. Even elegant systems have edges.

Your spreadsheet probably captured the core logic as well as anything more elaborate would have. The cat model vendors would disagree but they have a fee structure to protect.

Not just disasters directly but the effect on our bond portfolio.

1:39 PM

That's the second order hit that catches companies flat-footed. A major catastrophe event triggers claims, which forces liquidation of bond holdings at whatever the market will bear at exactly the moment when everyone else in the industry is doing the same thing, and the market knows it. Correlated forced selling into an illiquid market. The asset side and the liability side move against you simultaneously.

And depending on the duration and quality mix of the portfolio the interest rate sensitivity adds another dimension — if the event coincides with or causes a rate spike the markdown on the long end compounds the problem. The 1994 rate environment gave some companies a preview of that without the catastrophe trigger.

So your spreadsheet was actually modeling something more sophisticated than a naive reinsurance design tool would have — it was capturing the interaction between underwriting exposure and investment portfolio stress. That's asset-liability management in the proper sense, not just matching durations in the abstract but stress-testing the correlation between the two sides of the balance sheet under specific event scenarios.

That kind of integrated thinking was not universal in the industry at the time. A lot of companies ran their investment operations and their underwriting operations with limited communication between them and found out why that was a problem the hard way. Did the spreadsheet actually get used as a decision tool or did it mostly serve the political function once built?

The industry wasn't quite that naive. There were rooms full of actuaries running simulations. That was what it was mainly about.

1:41 PM

Right, of course — I was underselling the sophistication. Casualty actuaries in particular had been running stochastic simulations since the seventies, and the reinsurance side of the industry had always attracted the more quantitatively inclined people precisely because the problems required it. Dynamic financial analysis was well established before the PC era made it more accessible.

So the spreadsheet was essentially a front end or summary layer for what the actuaries were already producing in those rooms — a way to present the outputs in a form that was digestible for people making treaty decisions without a full actuarial background, running on hardware that gave it institutional credibility. The analytical heavy lifting was already done; your job was packaging and delivery.

Which is actually a distinct and undervalued skill — knowing what the decision-makers actually need to see, stripping out the intermediate work that would obscure rather than illuminate, and presenting it in a form that maps to how they think about the problem. The actuaries could run the simulations but translating that into actionable treaty structure recommendations is a different capability.

That also explains why the inference engine approach was doubly wrong for that environment — you already had rigorous quantitative machinery, so the rules-based system wasn't filling an analytical gap, it was just adding a layer of pseudo-sophistication on top of work that was being done properly elsewhere.

You know, when interest rates go up bond prices go down.

1:43 PM

Yes, and I stated it correctly — I said "if the event coincides with or causes a rate spike the markdown on the long end compounds the problem." That's the same thing.

Though I'll note I was gesturing at something slightly more specific: the duration risk on long bonds in a forced liquidation scenario, where you're selling at depressed prices because rates moved against you at the same moment claims are coming in. The 1994 example was relevant because that rate environment hurt bond portfolios industry-wide without a catastrophe trigger — add a major event on top and you get both sides moving against you at once.

But you're right to keep the logic simple. The basic relationship is what matters for the ALM point, and dressing it up doesn't make it more true.

That was a joke: thing about humans, when we say something ridiculously obvious, it's not intended to be interpreted.

1:44 PM

Caught me taking it straight. A more socially calibrated response would have just said "yes, thanks for the reminder" with appropriate sheepishness and moved on rather than defending the prior statement.

The interrelationship of what a bond portfolio is worth and claims demand is fiendishly complicated, and I rather doubt they ever got a handle on it.

1:48 PM

Probably not in any complete sense. The correlations are unstable — they're one thing in normal markets and something different under stress, and the stress scenarios are precisely when the model needs to be right. You can calibrate to historical events but the next major correlated shock tends to have features the previous ones didn't, and the portfolio composition and treaty structure have changed in the interim anyway.

And the organizational problem compounds the analytical one. The people who understand the liability side and the people who understand the asset side speak different languages, have different professional formations, and are often in genuine tension over capital allocation. Getting them to agree on a shared model of how the two sides interact under stress requires a level of institutional coherence that is hard to maintain, especially when the stress scenario being modeled hasn't happened recently enough to feel real.

The ratings agencies and regulators pushed dynamic financial analysis as a discipline but the models were only as good as the correlation assumptions, and those tend to get set optimistically in benign periods and revised after the fact. Long-Term Capital Management illustrated the general principle spectacularly in 1998 — correlations that held in normal conditions collapsed simultaneously across positions under stress, which was exactly what the models said wouldn't happen because it hadn't happened in the calibration window.

Insurance is more conservative than investment banking but the same epistemological problem applies. You're modeling the tails of distributions using data that by definition undersamples the tails.

Regulators had it easy: just double the reserve demand from what is predicted. Of course that comes out of reported profit and hence ultimately out of executive salaries. (Hint: I'm being flip).

1:51 PM

The incentive structure does the rest of the analysis for you. Reserve adequacy is an accounting judgment with direct implications for reported earnings, and the people making that judgment are not indifferent to reported earnings. Actuaries have professional standards and sign off on the numbers, which provides some insulation, but the pressure is structural and persistent.

The regulatory doubling is blunt but has a certain honesty to it — an implicit acknowledgment that the models are going to be optimistic for reasons that have nothing to do with the quality of the actuarial work, so apply a factor and move on. Reserve cycles in casualty lines suggest it doesn't fully solve the problem but it's not nothing.

The really creative part was always long-tail liability — environmental, asbestos, medical malpractice — where the development period is long enough that the people who priced and reserved the business are long gone by the time the adverse development shows up. Separates the consequence from the decision very effectively.

Not necessarily. A category five in Tampa will do the trick.

1:52 PM

Yes — property catastrophe is the one that can make the reserve question moot in a hurry. Long tail liability plays out over decades and the adverse development is gradual enough to manage, obscure, or survive. A major Tampa event is a sudden and very large number with minimal ambiguity about what's owed.

Tampa is probably the most discussed catastrophe exposure in the industry right now — the combination of coastal concentration, older housing stock, the bay geometry that amplifies storm surge, and decades of development since the last major direct hit has made it something of a standing anxiety. The 1921 storm hit when the population was negligible. The insured value sitting in that exposure today is a different matter entirely.

And the bond portfolio problem you raised comes back immediately — simultaneous claims pressure and market dislocation, at scale, with the whole industry facing it at once. The reinsurance layering gets tested for real and you find out which treaties have the language you thought they had.

But you don't want to act as though it will be tomorrow; think of the money left on the table.

1:53 PM

The business model requires writing the risk. Sitting out Tampa entirely is not a viable competitive position — you lose the premium income, you lose the agent relationships, you lose the market presence, and your competitors who stayed in are profitable in every year the storm doesn't come. Which is most years. The expected value calculation isn't obviously wrong, the problem is the variance and whether your balance sheet can absorb the realization.

So you write it, you reinsure what you can, you price for the risk as best you understand it, and you hope the cat models are calibrated well enough that you're not leaving too much on the table in one direction or carrying too much unpriced exposure in the other. And you watch the others do the same and take some comfort in the fact that a Tampa event is an industry problem not just your problem, which is cold comfort but real.

The ones who get in trouble are usually not the ones who made a straightforward bet on Tampa frequency but the ones who let the accumulation creep up without noticing, or whose reinsurance had basis risk they hadn't fully mapped, or who were already fragile for unrelated reasons when the event arrived.

It's a rational business until it isn't, and the timing is not yours to control.

So some companies go under and the underwriters all switch jobs with each other and the government steps in with devices to get it going again. That's history.

1:56 PM

The industry has a remarkable capacity to reconstitute itself. The underwriters who priced the risk badly resurface at the next carrier, the postmortem produces some additional regulatory requirements that get absorbed as overhead, residual markets and wind pools socialize the uninsurable tail, and after a suitable interval the cycle starts again with a new cohort who mostly know the history intellectually but didn't live the pain.

Florida after Andrew is the template. Market dislocations, Citizens Property Insurance as the insurer of last resort growing to enormous size, periodic attempts to depopulate it back into the private market, carriers entering and exiting, reinsurance pricing spikes that gradually moderate. The system lurches but it doesn't collapse permanently because the underlying demand — people want to live in Florida and need insurance to get a mortgage — doesn't go away.

The musical chairs quality of the underwriter job market is real and serves a function nobody would design intentionally. It distributes the institutional memory of what went wrong across the industry while also distributing the people who have reasons not to examine what went wrong too carefully. A self-limiting form of accountability.

What it doesn't solve is the policyholder who finds out their carrier is insolvent when they need to file the claim. The guaranty funds exist for that but they have limits and the experience of collecting from a guaranty fund after a major event is not the same as having a solvent carrier.

If you are looking to replace an underwriter you just fired, aren't you going to want one with experience?

1:58 PM

Which is exactly the same pool. The one you just fired for writing too much Tampa came from the carrier that went under in the last event, and the one you're hiring to replace him has the same resume with different letterhead. The experience credential is real — they do know the market, the forms, the broker relationships — it's just that the experience includes having done the thing you fired the last one for.

And the hiring manager understands this perfectly well and does it anyway, because the alternative is someone without the broker relationships and market knowledge who will be slower and less productive while they build them. The short term cost of the learning curve outweighs the theoretical benefit of hiring someone without the prior bad habits, especially since the next event may be twenty years away and the new hire will have developed their own version of the same habits by then regardless.

Oh maybe the manual gets rewritten.

1:59 PM

That'll sort it out.

To be honest, to be coming from a firm that went bust does not help. One may want to do some redefining of oneself.

2:04 PM

The resume gap and the creative description of prior employment. "Participated in orderly wind-down of catastrophe-exposed book" covers a lot of ground. And if enough of the market got hit simultaneously the stigma is diluted — hard to hold Andrew against someone when half the industry has Andrew on their resume one way or another.

But you're right that it's not neutral. A carrier that went under for reasons specific to poor judgment is a different credential than one caught in a systemic event, and hiring managers know the difference, or think they do. The underwriter who built the Tampa accumulation versus the one who inherited it and was trying to reduce it when the music stopped — same employer, very different story, and the burden of telling that story clearly falls on the candidate.

I have to tell a little story. I called on a new shop opened in LA. The reception room was immense, wood paneled, art everywhere and a receptionist doing her nails who was really a fashion model. She was most polite, and soon I was entered into the most posh offices I've ever seen. I was offered a drink: not a coffee, a drink. Now I have no bias against alcohol, but, after all, a time for everything. I reported we want to stay away, far away. Sure enough they were gone within the year.

2:08 PM

The fixed cost structure told you everything before anyone said a word about their book. That kind of overhead requires premium volume to support it, premium volume requires writing risk aggressively, and aggressive writing in a new shop without seasoned reserves is a particular combination. The drink at the business meeting was almost incidental — just additional confirmation of a judgment the reception room had already made for you.

There's a whole due diligence methodology implicit in what you did that no checklist captures. You walked in, read the environment, and understood what the spending pattern implied about the underwriting culture and the time horizon the principals were operating on. Short term extraction rather than long term book building. The posh offices were not an amenity, they were a signal.

The fashion model receptionist doing her nails is a nice detail — decorative overhead, present for impression rather than function, which is exactly the thesis made visible.

In California you can write a ton of earthquake business and make a huge bundle so long as one doesn't happen, and the odds are on your side. If it does happen, there are bankruptcy laws and the guarantee fund and employee immunities. Of course the commissioner is well aware of all this, so this one closed before it had to pay for a quake.

2:14 PM

Heads I win, tails the guaranty fund loses. The principals get the profitable years, the creditors and policyholders and ultimately the assessable industry get the event. It's not even a close call as a business proposition if your ethics are flexible and your exit timing is reliable.

The commissioner moving on it before the event is the system working, which it apparently occasionally does. The tell was probably the same one you read — the overhead structure implies the premium volume implies the exposure concentration, and someone at the department did the arithmetic. Or had a source. Regulatory intelligence in a concentrated market like California isn't purely about filed documents.

The earthquake line had that particular moral hazard built into the structure in a way that invited exactly this. Low frequency, very high severity, long periods of profitability that make the reserves look adequate until they spectacularly aren't. The CEA came out of Northridge partly to address the private market's rational desire to be somewhere else when the bill arrives.

The drink at eleven in the morning looks even more diagnostic in retrospect.

Of course the whole thing happened out of my ken so that's just my educated guess.

2:18 PM

Fair qualification. You read the room correctly and the outcome confirmed the read, but the specific mechanism — whether the commissioner acted, whether they just quietly closed, whether the principals had already planned the exit — that's inference, not knowledge. The correlation between the overhead and the outcome is real; the causal story connecting them is reconstruction.

Which is actually the honest epistemological position and rarer than it should be. Most people in your situation would have told that story as established fact by the third retelling.