← Back to Library

A Dream of Spring for Open-Weight LLMs: 10 Architectures from Jan-Feb 2026

Deep Dives

Explore related topics with these Wikipedia articles, rewritten for enjoyable reading:

  • Mixture of experts 12 min read

    The Trinity Large 400B model uses a Mixture-of-Experts architecture with 13B active parameters, which is a core architectural feature discussed in the article.

  • Transformer (deep learning) 48 min read

    All ten LLM releases discussed use transformer architectures as their foundation; the article compares their architectural differences and innovations.

  • Attention (machine learning) 12 min read

    The article extensively discusses sliding window attention (local:global) patterns, comparing ratios used in different models like Gemma 3, Olmo 3, and Trinity Large.

If you have struggled a bit to keep up with open-weight model releases this month, this article should catch you up on the main themes.

In this article, I will walk you through the ten main releases in chronological order, with a focus on the architecture similarities and differences:

  1. Arcee AI’s Trinity Large (Jan 27, 2026)

  2. Moonshot AI’s Kimi K2.5 (Jan 27, 2026)

  3. StepFun Step 3.5 Flash (Feb 1, 2026)

  4. Qwen3-Coder-Next (Feb 3, 2026)

  5. z.AI’s GLM-5 (Feb 12, 2026)

  6. MiniMax M2.5 (Feb 12, 2026)

  7. Nanbeige 4.1 3B (Feb 13, 2026)

  8. Qwen 3.5 (Feb 15, 2026)

  9. Ant Group’s Ling 2.5 1T & Ring 2.5 1T (Feb 16, 2026)

  10. Cohere’s Tiny Aya (Feb 17, 2026)

(PS: DeepSeek V4 will be added once released.)

Since there’s a lot of ground to cover, I will be referencing my previous The Big LLM Architecture Comparison article for certain technical topics (like Mixture-of-Experts, QK-Norm, Multi-head Latent Attention, etc.) throughout this article for background information to avoid redundancy in this article.

1. Arcee AI’s Trinity Large: A New US-Based Start-Up Sharing Open-Weight Models

On January 27, Arcee AI (a company I hadn’t had on my radar up to then) began releasing versions of their open-weight 400B Trinity Large LLMs on the model hub, along with two smaller variants:

  • Their flagship large model is a 400B param Mixture-of-Experts (MoE) with 13B active parameters.

  • The two smaller variants are Trinity Mini (26B with 3B active parameters) and Trinity Nano (6B with 1B active parameters).

Along with the model weights, Arcee AI also released a nice technical report on GitHub (as of Feb 18 also on arxiv) with lots of details.

So, let’s take a closer look at the 400B flagship model. Figure 2 below compares it to z.AI’s GLM-4.5, which is perhaps the most similar model due to its size with 355B parameters.

As we can see in the Trinity and GLM-4.5 comparison, there are several interesting architectural components added to the Trinity model.

First, there are the alternating local:global (sliding window) attention layers (SWA) like in Gemma 3, Olmo 3, Xiaomi MiMo, etc. In short, SWA is a type of sparse (local) attention pattern where each token attends only to a fixed-size window of t recent tokens (for example, 4096) instead of attending to the entire input (which could be up to n=256,000 tokens). This reduces the per-layer regular attention cost from O(n²) ...

Continue reading on Ahead of AI →

The full article by Sebastian Raschka is available on Ahead of AI.