We All Underestimate Semantics!

We All Underestimate Semantics

The science of meaning has been solved for decades. The AI industry just never bothered to look.

By Serhii Kirichko


I’ve been waiting for this moment for two years. Watching the AI industry wrestle with meaning - and lose - while the science of meaning sits right there, formalized, battle-tested, and ignored.

Last year, Andrej Karpathy gave us “context engineering” - and millions of practitioners adopted the term overnight. Karpathy is an extraordinarily important figure whose contributions to AI are massive. But here’s the thing: when he said “context engineering is the delicate art and science of filling the context window with just the right information for the next step,” he was describing one small operational slice of something that linguistics has studied for over sixty years.

Context is not just “the right information for the next step.” Context is EVERYTHING. Context defines the meaning. Context defines the intent. Context defines even the truth - the same information, in a different context, can flip from true to false. Just sit with that for a moment and reassess the importance of context.

Karpathy moved the conversation forward. But he called it “one small piece.” I’m here to tell you it’s not a small piece - it’s the whole game. And the playbook already exists. It’s called semantic pragmatics.

Recently, Paolo Perrone argued in Data Science Collective that enterprise AI fails because it has no understanding of what data means. He showed the symptom. This article explains the root cause.

We’ll come back to context - again and again. For now, let’s grasp the landscape.

1. SPEECH ACTS - Action Beats Content

In 1962, philosopher J.L. Austin dropped a bomb that most AI engineers still haven’t heard detonate: words don’t just describe things - they DO things.

There’s a popular misconception with centuries of momentum behind it: the moralizing that “while some talk, others do.” But this very assertion is fundamentally flawed. Words are actions. Yes, actions often carry more weight than words - but we shouldn’t operate under the misconception that words and actions are orthogonal concepts. They are not.

Austin, and later John Searle, formalized this into speech act theory. Every utterance operates on three levels simultaneously:

  • Locution - the literal content. What was said.
  • Illocution - the intended action. What was meant.
  • Perlocution - the actual effect. What happened in the listener.

“Can you pass the salt?” The locution is a question about ability. The illocution is a request. The perlocution is someone handing you the salt - or ignoring you.

Now look at your LLM pipeline. Every prompt is a speech act. Every response is one too. And yet the entire field of “intent classification” is a crude, impoverished reinvention of what Searle described in 1969 - except worse, because it collapses all three levels into a single flat label.

The action embedded in an utterance is the component everyone criminally ignores. Yet in intent classification, it’s often the determining signal. Not what the user said - but what they’re trying to do. And separately - what effect their words actually produce.

When your agent misinterprets “I need this fixed by Monday” as an informational statement rather than a deadline-carrying directive - that’s a speech act failure. The locution was parsed. The illocution was missed. The perlocution was a missed deadline.


2. COMPOSITIONALITY - Edges Over Nodes

Here’s a principle so fundamental that most people overlook it entirely: meaning is not in the words. It’s in the connections between them.

This is Frege’s principle of compositionality: the meaning of the whole is a function of the meanings of the parts and the way they are combined. The combination - the structure, the relations - does the heavy lifting.

A node without an edge is noise. An entity without a relation is trivia. “Customer,” “revenue,” “Q4” - these are just tokens until you connect them. Whose revenue? Which customer’s? Q4 of what year, measured how?

Think about how you learn a completely new word. You read its definition - and what happens? You connect it to knowledge you already have. You create a new edge in your mental graph. Yes, a new node appears - but the meaning of that word lives entirely in its connections to what you already know. Without those connections, the word is just a sound. It’s impossible for any concept to carry meaning in isolation.

“But surely,” you might say, “the foundational knowledge matters more than the connections?” No. Because it’s the connections within the foundation that give the foundation its meaning too. It’s edges all the way down. People instinctively want to find the most important node in a graph - the key concept, the central entity. But that’s the wrong question. The most important thing in any knowledge structure is not a node. It’s the structure itself.

Here’s a concrete example. Take the word “revenue” - just a node, sitting alone. Now connect it to “customer” with the relation “generated by.” Suddenly you have meaning: revenue is generated by customers. Add another edge: “revenue” → “recognized in” → “Q4.” Now you can reason about it temporally. Without these connections, even the simplest property of this node - is the revenue big? - has no answer. Big compared to what? In what period? For which segment? The truth value of any property depends entirely on the edges. Even the answer to the most basic question - “do we need this node at all?” - is determined by its connections.

Abstract Meaning Representation (AMR) formalizes exactly this. Think of AMR as a Swiss army knife of formalized semantic elements - a toolbox you can directly apply to context engineering or meaning structure modeling. In an AMR graph, edges - the relations like ARG0 (who does it), ARG1 (what’s affected), :purpose, :condition - carry the primary semantic load. The nodes are concepts. But the graph structure is the meaning.

This is a direct hit on the bag-of-words mentality that still dominates prompt engineering. People throw keywords at LLMs and hope for the best. They write prompts as word soup instead of building relational structures. The semantic signal isn’t in your keywords - it’s in how you connect them.


3. CONTEXT & PERSPECTIVE - The Double Lens

Here’s where it gets interesting.

The speaker’s context and the listener’s context are not the same thing. And the gap between them isn’t a bug - it’s a law.

The speaker’s experience, knowledge, and situation determine what is said. The listener’s experience, knowledge, and situation determine what is heard. These are fundamentally different operations, and the overlap between them is never total. Communication loss is guaranteed. The only question is whether you model it or pretend it doesn’t exist.

The same text can produce not just radically different meanings but radically different actions depending on who receives it and in what context.

And it goes further: the same statement can be true from the speaker’s perspective and false from the listener’s. Look at your agent’s system prompt right now. What does it mean from your perspective - and what does it mean from your agent’s perspective? Whether you see the difference or not, it’s enormous. The sheer volume of what the model knows about every concept in that prompt creates a dramatic meaning gap compared to your knowledge of the same words. The model reads “You are a helpful assistant” and activates a vast web of associations, training patterns, and behavioral priors. You read it and think it’s a simple instruction. Conversely, there are pieces of your prompt whose meaning is accessible only to you - references to your domain, your users, your edge cases. And yet we pretend this shared piece of text means the same thing to all parties. It doesn’t. Not even close.

Here’s a concrete example from the enterprise world: when the CFO asks for “revenue,” the correct answer is GAAP revenue. When the product team asks for “revenue,” it’s MRR. Same word. Different correct answers. Different actions taken on those answers. Semantic pragmatics has had a name for this since the 1970s - it’s called perspective-dependent meaning, formalized through the work of Grice, Stalnaker, and Lewis on common ground and context sets.

For LLM practitioners, this reframes important things. A system prompt isn’t just “instructions” - it’s perspective engineering. Few-shot examples aren’t just “demonstrations” - they’re shared context calibration. RAG isn’t just “retrieval” - it’s knowledge-state alignment between the system and the user.

These aren’t fancy synonyms. They’re more precise descriptions of what’s actually happening - and precision in how we think about these mechanisms directly affects how well we build them.


4. IMPLICATURE - Hidden Signal, Hidden Entropy

Paul Grice’s insight was deceptively simple: what is said and what is meant are systematically different things. And the gap between them follows predictable rules.

When someone says “I have two children,” they implicate “I have exactly two children” - even though the literal statement is compatible with having five. When a user tells your agent “I’ve been having some trouble with payments,” they’re not filing a bug report. They’re asking for help. The literal content and the intended message diverge - predictably, measurably.

Here’s what makes this explosive for AI: implicatures carry both signal and entropy.

The signal is the intended meaning - the thing Grice’s cooperative principle helps you reconstruct. The entropy is the uncertainty - the space of possible intended meanings given the literal utterance. And measuring that entropy is itself a critically important property.

High implicature entropy = high miscommunication risk = you need a clarification loop.

Low implicature entropy = safe to proceed = the implicit meaning is recoverable.

LLMs hallucinate precisely where implicatures are ambiguous. This is not accidental - it’s a predictable failure point that semantic pragmatics identified decades ago. When the gap between “what was said” and “what was meant” is wide, the model fills it with plausible-sounding noise. That’s not a model bug. That’s an unmodeled implicature.


5. ABSTRACTION - Computational Power Nobody Uses

Here’s the one that should make every ML engineer sit up: abstraction is not information loss. Abstraction is compression with computational properties.

Consider three sentences:

  • “The boy wants to go.”
  • “Going is what the boy wants.”
  • “The boy’s desire is to go.”

Syntactically, these are three different structures. Semantically, they’re one meaning. AMR captures this: all three map to the same graph. One canonical representation.

This gives you - for free:

  • Canonical forms - normalize before you compare.
  • Deduplication - detect semantic equivalence despite surface variation.
  • Structural comparison - measure meaning similarity at the graph level, not the string level.

Nobody in the LLM world uses formal abstractions for semantic normalization. Everyone works at the raw text level, comparing embeddings of surface forms and wondering why “the boy wants to go” and “going is what the boy wants” get different similarity scores.

The abstraction machinery exists. It’s formalized. It’s open source. And the AI industry is spending billions building approximate versions of what formal semantics gives you exactly.

Here’s a practical exercise: ask your LLM to represent the phrase “You are a helpful assistant” first in AMR, then in UMR. Try other texts from your prompts. What you’ll see is the formal decomposition of meaning into its constituent parts - the predicate structure, the roles, the aspect, the modality. These are the components of formal semantics we should be taking seriously in everyday AI engineering. Our models sit there, capable of reasoning about these structures, waiting for us to provide context in terms they can formally decompose - but we don’t even know which parts our contexts consist of.


6. “And I Haven’t Even Started…”

Everything above? That’s the appetizer.

I haven’t touched temporal semantics - how events are ordered, how aspect shapes meaning, why your agent pipeline has no concept of “before” and “after” even though every business process is fundamentally temporal.

I haven’t touched modal strength - the difference between “maybe,” “certainly,” and “probably.” The epistemic layer that determines confidence, commitment, and evidentiality. The thing your hallucination detection should be built on but isn’t.

I haven’t touched reification - the mechanism of turning relations into objects that can be reasoned about. This is, incidentally, the formal underpinning of why chain-of-thought prompting works: you’re forcing the model to reify intermediate reasoning steps into explicit objects.

I haven’t touched quantifier scope - why “every student read a book” has two different meanings, and your LLM picks one randomly.

Each of these deserves its own treatment. And each of them has been formalized in ways that the AI industry hasn’t discovered yet.

The Golden Artifacts

Two documents that I consider essential reading for anyone building AI systems that deal with meaning:

  • AMR Guidelines - Abstract Meaning Representation. A formal notation for capturing “who does what to whom” in a sentence, as a graph. The foundation.

  • UMR Guidelines - Uniform Meaning Representation. AMR’s evolution, adding the temporal, modal, and document-level layers that AMR was missing. Aspect annotation. Epistemic strength. Cross-sentence coreference. The temporal and modal backbone that 90% of agent architectures desperately need but don’t have.

The lineage from temporal logic through data warehousing (SCD Type 2 patterns) to agentic memory architectures is not a metaphor. It’s the same formal pattern, rediscovered independently in each field. Deep dive coming in Part 2.

And beyond the formal layers? Semantic orchestration - the discipline of using these structures to architect agent systems that actually reason about meaning, not just process tokens. How temporal annotation patterns become memory architectures. How speech act classification becomes action routing. How implicature entropy becomes a confidence signal in production. That’s Part 3.


The Manifesto

Semantics is not a luxury. It’s not an academic curiosity. It’s not a nice-to-have for version 2.0.

It’s the operating system of meaning.

Every prompt you write is an act of applied semantics. Every agent architecture is a (usually implicit, usually broken) theory of meaning. Every hallucination is an unmodeled semantic gap. Every misunderstood user intent is a speech act failure.

The formal tools exist. The science is mature. The frameworks are open source. The gap isn’t knowledge - it’s attention.

If you’re building systems that process meaning without understanding how meaning works, you’re building on sand.

And the tide is coming in.


Serhii Kirichko is an Agentic AI Engineer with 15+ years in software engineering, 6 years in ML/data science, and a background in computational linguistics and semantic pragmatics. He builds AI agent systems informed by formal semantics. Based in Alicante, Spain.

What’s next

Part 2 - on AMR, UMR, temporal semantics, and the formal architecture of meaning.

Part 3 - Semantic Orchestration: concrete agent architecture patterns built on the formal foundations from Parts 1 and 2.


References & further reading:

  • Austin, J.L. (1962). How to Do Things with Words
  • Searle, J.R. (1969). Speech Acts
  • Grice, H.P. (1975). Logic and Conversation
  • Kroeger, P.R. (2022). Analyzing Meaning: An Introduction to Semantics and Pragmatics, 3rd ed. Language Science Press. - A comprehensive, accessible introduction covering all the key concepts discussed in this article: speech acts, compositionality, implicature, modals, tense and aspect. Freely available at langsci-press.org. If you read one book on semantics, make it this one.
  • Banarescu et al. (2013). Abstract Meaning Representation for Sembanking
  • Jens E. L. Van Gysel et al. (2021). Designing a Uniform Meaning Representation
  • Perrone, P. (2026). Enterprise AI Has a Production Problem Nobody Wants to Talk About - Data Science Collective
  • Karpathy, A. (2025). Context engineering tweet

Medium version of this post: We All Underestimate Semantics!

Russian Translation: Мы все недооцениваем семантику!