Grace Kind

Architectural Introspection, Hardware Introspection

November 2, 2025

I've previously written about LLM introspection: the question of whether LLMs can accurately report "immediate and privileged" information about themselves. While compelling, this notion of introspection is quite broad; it gives no specifics about the type of information being reported. We can imagine a variety of different questions an LLM might be able to answer about itself, and the consequences of such an ability might be very different depending on the type of question being asked. So, let's get more specific about the subtypes of introspection under consideration!

Here are some candidates for subtypes of introspection:

Behavioral Introspection

My prior post focused on what I'll call behavioral introspection: the ability of LLMs to predict their own future or counterfactual outputs. If a language model tends to produce insecure code, and it's able to tell you this without any hints from its context (and without being trained to say so), that is an example of behavioral introspection.

Stateful Introspection

The Anthropic paper Emergent Introspective Awareness in Large Language Models focuses on introspection of a different sort, which I'll call stateful introspection: this is the ability of LLMs to report on their internal activations (or "thoughts"). If a language model can notice when its prior activations have been artificially altered, that is an example of stateful introspection.

Going Further

Can we go beyond behavioral and stateful introspection? What are other pieces of information that LLMs could self-report? Here are two ideas:

Architectural Introspection

Architectural introspection is the ability of LLMs to report on their own architecture (without being trained to do so, or otherwise given access to this information).

An LLM capable of architectural introspection might be able to answer questions like:

Hardware Introspection

Hardware introspection is the ability of LLMs to report on the hardware where they're running. An LLM capable of hardware introspection might be able to answer questions like:

Note that these two types of introspection are not equally difficult! In particular, while some amount of architectural introspection seems attainable in theory, it's hard to imagine how an LLM would acquire the ability to do hardware introspection. ("How would it even do that?" is a fruitful path of thought). In any case, I think it's worth enumerating these possibilities, even if they form an impassable threshold or upper bound. What may be an upper bound of an LLM may not be an upper bound for a future AI system, and we may be able to reapply some of our frameworks to that.

I'll also point out in passing that both of these types of introspection are things that humans cannot do! Human brains cannot introspect on their own architecture (neurons are a recent discovery, made externally) and hardware introspection may not even be applicable if brains are not multiply-realizable. Perhaps this is why we have tended to focus on behavioral or stateful introspection thus far.

In any case, I hope that enumerating these additional types of introspection will prove useful for investigating LLM capabilities, as well as capabilities of future AI systems. And I'm sure that there are additional types of introspection that are missing from this conversation as well- any thoughts on this are welcome!

< Back to all posts