Explore LLM word representations using similarity analysis

Tivadar Danka dismantles a fundamental misconception in artificial intelligence: that we can directly compare the internal "thoughts" of different large language models by lining up their numbers. In this technical deep dive, Danka reveals that while two models might speak entirely different numerical dialects, they often arrive at the same semantic conclusions, a discovery that reframes how we understand machine intelligence and safety.

The Illusion of Direct Comparison

Danka begins by stripping away the romanticized view of how these systems process language. He writes, "Language models do not process text; they process numbers." This is not merely a technical detail but a crucial distinction for anyone trying to interpret AI behavior. He explains that when a user prompts a chatbot, the text is converted into high-dimensional vectors—essentially coordinates in a space with thousands of dimensions. However, Danka warns that our human intuition fails here. "The dimensions are not human-crafted, nor do they correspond to human-interpretable traits like size and friendliness," he notes. In fact, he is blunt about the opacity of these systems: "humans cannot understand what the axes mean, because the axes don't mean anything in the sense of corresponding to physical characteristics of nature."

Explore LLM word representations using similarity analysis

This framing is vital because it stops readers from trying to force a human narrative onto mathematical structures that don't support it. The author argues that because these vectors are initialized with random numbers and trained on vast, unstructured datasets, the specific values for a word like "banana" will be completely different in a BERT model versus a GPT-2 model. Danka demonstrates this by attempting a direct correlation between the two, only to find "basically zero-valued correlations." The evidence is clear: if you try to compare the raw coordinates, the models look nothing alike.

The Power of Relational Structure

So, if the raw numbers don't match, how do we know if the models are thinking similarly? Danka introduces Representational Similarity Analysis (RSA), a technique borrowed from neuroscience to solve exactly this problem. Instead of comparing the coordinates directly, RSA compares the relationships between the coordinates. Danka explains the logic simply: "Instead of correlating embeddings vectors directly between models, calculate similarities within models across embeddings, and then determine whether the patterns of across-embeddings similarities are similar."

He uses a striking analogy to make this abstract concept concrete. Imagine taking a picture of the number "7," then rotating that picture. If you compare the pixels of the original and the rotated image, the correlation is near zero. Yet, the information content is identical. Danka writes, "The token pairs that have higher similarity in GPT-2's embeddings also have higher similarity in BERT's embeddings." This indicates that while the "language" of the vectors is different, the underlying map of meaning is shared. This is a profound insight for the field of AI safety; it suggests that different architectures may converge on similar semantic structures despite their divergent training paths.

The embeddings vector of "banana" might be very different between BERT and GPT-2, but the way that "banana" and "apple" relate to each other within each model might be similar.

Danka's choice to use cosine similarity for the internal comparisons and correlation for the cross-model analysis is technically precise. He notes that while the mean offsets of vectors matter within a single model, they can bias comparisons across models. By focusing on the pattern of similarities rather than the absolute values, RSA cuts through the noise of random initialization. This approach mirrors historical developments in the field, such as the evolution of Word2vec, where the focus shifted from isolated word definitions to the geometric relationships between words. Danka's application of RSA to modern transformers shows that this relational approach remains the gold standard for understanding distributed representations.

Robustness Across Architectures

One of the most compelling aspects of Danka's argument is the robustness of RSA across different model sizes. He points out that direct comparison fails when models have different embedding dimensions, such as comparing BERT-large (1024 dimensions) to GPT-2-XL (1600 dimensions). "Directly comparing their embeddings will not be insightful," he admits, noting that standard correlation requires identical vector lengths. However, RSA bypasses this limitation entirely. Because it operates on the similarity matrices rather than the raw vectors, it can compare a model with 768 dimensions against one with 1600 without error.

Critics might note that while RSA proves structural similarity, it does not explain why different models converge on these patterns or whether those patterns are truly robust across all semantic domains. The analysis relies on the assumption that the internal statistical structures are the primary carriers of meaning, which may overlook emergent behaviors that only appear at the surface level of text generation. Furthermore, Danka acknowledges that the embeddings spaces are not simply rotated versions of each other; they are distinct. This raises the question of whether the shared structure is a universal truth of language or merely a byproduct of training on similar internet-scale datasets.

Bottom Line

Tivadar Danka's analysis provides a necessary corrective to the tendency to treat large language models as monolithic, comparable entities. By proving that we must look at the relationships between data points rather than the points themselves, he offers a more accurate lens for evaluating AI alignment and safety. The strongest part of this argument is its demonstration that semantic convergence is real even when numerical divergence is absolute. The biggest vulnerability remains the interpretability gap: while we can measure that the structures are similar, we still cannot easily read the "axes" to understand what those structures actually represent to the machine.

Explore LLM word representations using similarity analysis

by Tivadar Danka · The Palindrome · Read full article

The Illusion of Direct Comparison.

The Power of Relational Structure.

So, if the raw numbers don't match, how do we know if the models are thinking similarly? Danka introduces Representational Similarity Analysis (RSA), a technique borrowed from neuroscience to solve exactly this problem. Instead of comparing the coordinates directly, RSA compares the relationships between the coordinates. Danka explains the logic simply: "Instead of correlating embeddings vectors directly between models, calculate similarities within models across embeddings, and then determine whether the patterns of across-embeddings similarities are similar."

He uses a striking analogy to make this abstract concept concrete. Imagine taking a picture of the number "7," then rotating that picture. If you compare the pixels of ...

Explore LLM word representations using similarity analysis

The Illusion of Direct Comparison

The Power of Relational Structure

Robustness Across Architectures

Bottom Line

Deep Dives

Sources

Explore LLM word representations using similarity analysis