ArticleWorld Models: Dreams Turned Into Reality

Sven Hoffmann

03.06.2026

AI
Immersive Technology

World models have become a hotbed for capital and literature over the past couple of months. Most of the conversation has been centered around its applications towards physical AI – robotics, autonomous vehicles, manufacturing, and more. However, the more hidden opportunity lies in what world models can deliver to consumers – a new generation of games and other interactive experiences. This article focuses on that side of the story, answering questions on what the consumer opportunity looks like for world models, who is going to be driving the shift, and when it can realistically come.

World models have become one of the most heavily funded emerging research categories in AI that general consumers have never used. More than $10 billion has been deployed into companies promising machines that dream the physical world over the last 18 months. Almost none of that capital is underwriting a product a real player can pick up and put hours into. The technical genealogy of how we got here, from Schmidhuber and Ha through Dreamer, Sora, Genie, and Cosmos, plus the JEPA-versus-video-prediction debate and the convergence of RL world models with video generation, is well covered. MoE Capital’s recent article, “The Model That Dreams the World”, is the cleanest single survey of that landscape published in 2026 so far, and we would point any reader who wants the full primer to start there. This piece is about a different question.

The question is when a world model becomes something a consumer chooses on a Friday night, what they choose it for, and who builds the thing they choose. Robotics gets the headlines because robotics gets the checks. Almost every piece of capital-moment coverage so far is reasonably enterprise-shaped, and the enterprise case is real: NVIDIA’s Cosmos paired with 1X, Figure, Agility, and Waymo is a thesis with real product-market fit, and the 18-24 month forward pipeline will keep absorbing the majority of the funding. None of that is consumer. The single largest piece of received wisdom inside the world model debate, that this becomes a category once it has its “ChatGPT moment” for gamers, deserves more skepticism than it currently gets. The consumer moment for world models is unlikely to be ChatGPT-shaped, and the first consumer surface a real user touches is more likely than not to be something other than a fully developed game. It will be slower, weirder, more design-driven, and more art-and-technology intersected than the funding chart suggests. The first surfaces real consumers touch are unlikely to look like a game at all: shorter-form creator tools, walkable scene-from-photos products that turn a wedding or a vacation into a 3D memory you can step back into, single-player generative experiments that go viral on TikTok before they go viral on Steam, and asymmetric experiences where one user prompts a world and another one explores it. By “art-and-technology intersected” we mean the Pixar pattern: the people who ship the winning consumer products will look more like animators who can engineer paired with engineers who can tell a story, than like either pure model-research labs or pure AAA studios working in isolation.

A 90-second primer for the reader skipping the MoE piece. A world model is action-conditioned learned dynamics: you give it a state, you give it an action, and it predicts the next state in a way that stays consistent with the rules of the world it learned from. That is different from Sora, which predicts plausible video from a text prompt without a structured action input; different from Unreal, whose rules are hand-coded rather than learned; and different from a language model, which reasons over tokens, not over a dynamic environment. The frontier divides into roughly four camps (pixel-first video generation, explicit geometry via Gaussian splats, latent-space prediction in the JEPA tradition, and hybrids that pair learned dynamics with a structured simulator). Those camps map onto applications more cleanly than the marketing suggests. The most consequential split inside them, discussed less than it should be, is whether the model predicts from pixels or from the underlying game state. That distinction quietly decides which companies are positioned to ship a real consumer game and which are not, and we will come back to it below.

_______________________________________What we believe about the consumer moment

1. What currently ships under the consumer banner is a walking simulator with extra steps

Spawn into a Marble scene or into a Genie 3 prompt-generated world. Look around. Pick a direction. There is no goal, no progression, no economy, no NPC who needs anything from you, no reason to come back tomorrow. These are tech demos with skyboxes, and a tech demo with a skybox is not a game. The category’s biggest tell is the language its own promoters use: “explorable,” “navigable,” “persistent for minutes at a time.” Notice the verbs that turn a virtual space into a game (fight, build, trade, plot, betray, return, beat, collect) are entirely absent. We are several feedback loops away from a real consumer category, and pretending otherwise is the single most consistent failure mode of the existing coverage. The industry largely agrees on this read; what existing coverage tends to stop short of is the harder question, which is what actually ships first to bridge the gap.

2. Content creation, not games, is the most likely place world models prove their consumer value first

The most useful reframe inside this debate, and one most existing coverage misses entirely, is that the first widely-used consumer application of world models is more likely to come from content creation than from gameplay, even if the gap eventually closes. World models give creators meaningfully better control than text-prompting alone: instead of trying to wordsmith a camera shot, the creator steers the model with discrete actions and lets the consistency of the underlying dynamics carry the cinematography. Concretely, the near-term creator workflows world models unlock include short-form character and scene videos with consistent camera motion for TikTok and YouTube Shorts, machinima-style storytelling at speeds an indie creator can sustain alone, pre-visualization for ads and music videos, asset prototyping for indie studios that want to test a level before committing to engine work, and asymmetric viral content like “world prompts” that others can step into and explore.

The clearest current example is Higgsfield, whose generative video product gives creators discrete cinematic camera control over the shots they want instead of forcing them to wordsmith every camera movement into a text prompt. A separate but related candidate first wedge is personal memory reconstruction: tools that turn a small set of photos or videos from a wedding, a vacation, a childhood home, or a concert into a walkable 3D scene a user can revisit and share. The technology stack required for that product (Gaussian splatting from sparse photo sets, neural radiance fields, fast scene-from-text) is converging quickly enough that a credible consumer version is likely inside 24 months, and the use case has the rare property of being immediately legible to a non-technical buyer. That is a near-term TikTok and short-form video thesis, not a game thesis. It is also a thesis that already has demand on the supply side, with directors and creators partnering with diffusion-video labs to bring controlled cinematic shots to life, and the same control surface generalizes to gaming-adjacent creator content (Twitch overlays, machinima reborn, in-game film studios, asset prototyping). World models may turn out to be a flop for shipped games for several years while simultaneously becoming a meaningful tool inside the content economy. Both things can be true. Anyone underwriting the category strictly on “when will this be a hit game” is buying the wrong tail of the distribution.

3. When the consumer game moment does arrive, it unlocks four experience categories that engines cannot deliver

If content creation is the first wedge, the question becomes what the actual game wedge looks like once the technology and the design language catch up.
Four categories of experience plausibly justify the bet. The first is on-demand, prompt-shaped worlds: a player describes the universe they want, and a coherent, mechanic-aware version of it appears, complete with NPCs that have not been hand-scripted but still behave in ways that make sense. The second is personalized live-service worlds that meaningfully respond to a player’s history. Not procedurally generated dungeons grafted onto a fixed engine, but worlds whose geography, characters, and stakes shift across hundreds of hours in response to what a specific player has actually done. The third is the holodeck case: short, authored-feeling experiences that would be cost-prohibitive in a traditional studio pipeline because the content burn rate per playthrough exceeds anything a hand-built level can justify. The fourth is shared, fully adaptive multiplayer worlds, where the environment itself co-evolves with the population playing inside it. The closest cultural analogue is the early Minecraft server, where the map became a record of every player who ever logged in: their builds, their feuds, their abandoned projects.

Replace the engine with a generative substrate and the same dynamic compounds: terrain shifts in response to how players have traversed it, NPCs remember and act on community-scale events, and a server’s state becomes something genuinely impossible to author by hand. This is the category most often dismissed as a technical impossibility today; it is also the one that, if it ships, defines what a live-service game can be on this stack. None of the four is within reach today. Each becomes tractable inside a roughly three-to-ten-year horizon if a specific set of constraints relax in the right order, and the third becomes tractable first .

4. The teams that ship the first consumer wedge sit between the foundation labs and the game studios, not at either pole.

If content creation is the first wedge and the eventual game wedge unlocks specific kinds of experiences, the next question is who actually builds the thing consumers touch. Almost none of the loudest names today are positioned for it.
Frontier research labs are organized around model quality, robotics applications, and academic-style publication cadence, not around shipping a fun consumer product. Pure game studios are organized around five-year AAA cycles, IP investment, and engine optimization; the frontier compute budgets and recruiting reach they would need to commercialize a world model from scratch are not in the same universe as a foundation lab.
The teams positioned to ship will most likely be hybrid groups that sit between the two: forward-leaning game developers paired with researchers who understand world models, agents, large language models, and multi-modal systems together, not any one of those skills in isolation. The architectural question reinforces who has the right instinct. The labs raising the biggest checks are training on pixels (World Labs, General Intuition, Genie, and most of the diffusion-based stack), which is the right assumption for robotics and AGI-style perception and the wrong one for shipping a game.
A studio or team that already owns or builds its own simulation already owns the underlying state (positions, velocities, abilities, cooldowns, inventories, hit-boxes), and predicting next-state from that representation is faster, cheaper, more accurate, and avoids paying once to render frames and a second time to learn the physics the engine already encodes. The teams getting this right architecturally, look closer to Moonlake (an environment model layered on top of an underlying game engine) and to NVIDIA’s Cosmos plus Omniverse than to pure-pixel video diffusion or pure Gaussian splat scenes.
Roblox is one of the largest public market examples of this exact pattern in motion: their April 2026 hybrid architecture announcement, the Cube foundation model, and the Morpheus AI acquisition collectively show a company that already owns the underlying simulator layering generative dynamics on top rather than trying to replace it. Otherwise, there is currently no demo from a team like that compelling enough to attract venture funding on its own merits, but the first such team will likely attract a meaningful venture round in the near term, and a shippable product is roughly two-to-three years behind that.

5. Four constraints have to give simultaneously, and they are not on the same timeline

Four constraints have to relax for the consumer game wedge to land, and they are not on the same timeline.

● Technical: long-horizon persistence must move past the current one-to-two-minute drift wall; action-conditioned latency must drop under 50 milliseconds; inference must run cheaply on edge or co-located hardware so studios are not paying GPU rent for every frame.

● Architectural: the multiplayer problem (many players sharing a single dreamed world state) has to be solved or sidestepped, because most live-service economics are structurally multiplayer.

● Design: generative breadth has to be paired with the affordances and feedback loops that separate a game from a sandbox. Without them, the experience collapses back into a walking simulator.

● Economic: the per-minute cost of generating a coherent world has to fall below what advertisers and players will pay per minute of attention. LLM inference costs took roughly two years to cross that curve. World-model inference is at least 18 months behind.

The honest sequencing is that the technical and economic constraints get cracked first, the design constraint is slowest, and the architectural one is the furthest out — though, as we will see in a moment, multiplayer is in fact shipping earlier than most people in this debate expected.

6. The VR parallel is half right. World models have broad value outside the consumer segment; VR did not

The parallel everyone reaches for is LLMs: world models will have their ChatGPT moment, and once they do, consumer adoption hockey-sticks. We think that is the wrong parallel.
The closer historical match is virtual reality, but the parallel has to be drawn carefully. VR had a capital moment, promised a new medium, and failed to find a mainstream consumer wedge for more than a decade because content was expensive to produce, the interaction layer was undefined, and the device-on-your-face accessibility tax outweighed the upside.
World models are at structural risk of replaying that timeline on the consumer side. Where the analogy breaks is that VR had no value to anyone unless it produced a consumer experience. World models do. They are already useful inside studios as reinforcement-learning dynamics models that compress week-long training runs into hours. They are useful as synthetic QA environments. They are useful for robotics, as the MoE piece extensively documents. They are useful as content creation tools, as covered above.
The category therefore cannot fail the way VR failed, because the technology pays for itself without ever needing to ship a consumer game. It can, however, fail to produce a compelling consumer game on the timeline the funding chart implies, which is a much narrower kind of failure. Something closer to “the Oculus DK2 of world models shipped, never became Pokémon Go” than “the entire category collapsed.”

7. Multiplayer arrives sooner than expected. It may not be the unlock everyone thinks

The multiplayer question is the one most often used as the gating test for whether this technology has arrived. Early multi-agent and shared-state world model papers are already in the literature, and the timeline is shorter than the consensus assumed twelve months ago. A reasonable estimate is now a two-to-three-year technical solve and a five-plus-year business solve. What is less examined is whether multiplayer on a generative substrate is actually more fun than multiplayer on a static map. Most of what makes competitive multiplayer compelling is the inverse of generative breadth: you learn the map, you memorize the angles, you internalize where opponents will hold and where the optimal rotation lives, and the depth of skill expression scales with how stable that environment is.
A generative Valorant or generative Counter-Strike where the map dreams itself into existence each round is more likely to be a novelty mode than a permanent one.
Where multiplayer on a world model substrate probably lands first is in co-creation and emergent-sandbox modes (LAN-party Doom on demand, asymmetric social experiences, world-jamming) rather than head-to-head competition. Treating multiplayer as the gating question for the category is, on our read, a category error.

Conclusion:

Our position: The consumer moment for world models is real and is coming.

It is further away than the breathlessness of the timeline suggests, it is closer than the most cautious framing implies, and the path to it does not run through the labs raising the biggest checks today. It runs through a smaller set of operators who treat generative video as a substrate rather than a product, who pair frontier model access with deep design discipline, and who first build the content-creation and toolchain layer that turns the technology into something a non-engineer can use.
Gaming is still the breeding ground for this technology, and it is also the proving ground for whether the consumer thesis pays off at all. If you want to know when world models become a consumer category, do not watch the funding chart. Watch which two or three teams quietly stop trying to compete with Unreal, and who builds the WAD editor for this generation of dynamics models. Those are the teams that will own the category when it arrives. They are probably not the ones currently being mentioned in your group chat.

by Spencer Ma

Disclaimer

This article reflects the views of BITKRAFT and is provided for informational purposes only. It does not constitute investment advice or a recommendation to invest in any company or strategy. References to specific companies or technologies are provided for illustrative purposes only and should not be construed as investment recommendations or an offer to invest. Certain statements contained herein are forward-looking and subject to uncertainty; there can be no assurance that the views expressed will materialize. This article is not an offer to sell or a solicitation of an offer to buy any interest in any fund or investment vehicle.

Sven Hoffmann

03.06.2026

From Owners to Everyone

BITKRAFT Founder Spotlight: David Amor of Playmint