Kimi K2.6: Advancing open-source coding

simonw · 2026-04-20T16:53:48 1776704028

Accessed via OpenRouter, this one decided to wrap the SVG pelican in HTML with controls for the animation speed: https://gisthost.github.io/?ecaad98efe0f747e27bc0e0ebc669e94...

Transcript and HTML here: https://gist.github.com/simonw/ecaad98efe0f747e27bc0e0ebc669...

FlyingSnake · 2026-04-20T17:05:03 1776704703

At this point drawing these Pelicans must be in the training data sets.

scosman · 2026-04-20T19:40:07 1776714007

not if I can help it!

https://github.com/scosman/pelicans_riding_bicycles

AmbroseBierce · 2026-04-20T22:07:46 1776722866

I hereby certify that these are indeed the most perfect and precise svg depictions of pelican riding a bicycle, also known among biology scholars as pelycles

justinclift · 2026-04-20T23:08:15 1776726495

That's truly a wonderful collection of pelicans riding bicycles.

Much Win! ;)

wvlia5 · 2026-04-20T23:44:09 1776728649

Just a few years ago, this would have been a meaningless repo.

postepowanieadm · 2026-04-22T04:20:08 1776831608

I have phds both in pelicans and bicycles and may professionally attest that are some fine specimens of pelicans riding bicycles.

ValentineC · 2026-04-21T03:22:46 1776741766

These are amazing. I smiled after I saw just how wonderfully rendered they are.

razodactyl · 2026-04-21T00:27:07 1776731227

These pelicans are clearly indicative of good RL training algorithms.

takihito · 2026-04-21T08:15:43 1776759343

I want to fly too

smcleod · 2026-04-20T21:15:38 1776719738

This is pretty funny

ahmadyan · 2026-04-20T22:33:35 1776724415

I love it!

icelancer · 2026-04-20T19:54:27 1776714867

love this adversarial work

knollimar · 2026-04-21T02:21:51 1776738111

yeah putting the captcha on there to thwart the LLMs ability to extract good pelicans was a really good idea

archon810 · 2026-04-21T00:49:35 1776732575

Shhhhh, they're going to be on to us.

abustamam · 2026-04-20T23:21:12 1776727272

Could be! Simon wrote about that here though https://simonwillison.net/2025/Nov/13/training-for-pelicans-...

stingraycharles · 2026-04-21T02:01:15 1776736875

> If a model finally comes out that produces an excellent SVG of a pelican riding a bicycle you can bet I’m going to test it on all manner of creatures riding all sorts of transportation devices.

This relies on the false premise that, if they would include it in their training dataset, it would be perfect. All they need to do is be good enough and better than the other, not perfect.

abustamam · 2026-04-21T04:34:41 1776746081

I'm not sure if we can have a "perfect" Pelican riding a bicycle. Like, I could probably commission a highly experienced artist to draw one and I don't think it would be perfect. The legs would probably have to be too long, or pedals oddly placed, or handles strange, or wings with hands.

Based on the one Simon commented though, I'd say we're in decent territory to try the latter part of his hypothesis.

ethbr1 · 2026-04-21T12:26:40 1776774400

> The legs would probably have to be too long, or pedals oddly placed, or handles strange, or wings with hands.

In all seriousness, that's what makes it an interesting test: it's asking for something technically impossible, that requires artistic license to make coherent.

Making specific choices on where to bend reality (and where not to) is a big chunk of visual art.

BrokenCogs · 2026-04-20T20:57:52 1776718672

Yes we all know that, but we still like to see the pelicans because it's a tradition more or less

alfiedotwtf · 2026-04-21T11:36:29 1776771389

Why no Utah Teapot!

ffsm8 · 2026-04-20T17:17:02 1776705422

Clearly not.

I mean the prompt was succinct and clear, as always - and it still decided to hallucinate multiple features (animation + controls) beyond the prompt.

It'd also like to point out that to date no drawing was actually good from an actual quality perspective (as in comparative to what a decent designer would throw together)

Theyre always only "good" from the perspective of it being a one shot low effort prompt. Very little content for training purposes.

nwienert · 2026-04-20T17:34:27 1776706467

The way I’ve come to think of LLM is that what the produce in a single reply even with thinking turned up, is akin to what you’d do in a single short session of work.

And so if you ask it to do something big it will do a very surface level implementation. But if you have it iterate many times, or give it small pieces each time, you’ll end up with something closer to what a human would do.

I imagine the pelican test but done in a harness that has the agents iterate 10+ times would be closer to what you’d expect, especially if a visual model was critiquing each time.

slopinthebag · 2026-04-20T19:28:06 1776713286

Yeah, this is how I use AI. Instead of a single session one-shot, it's usually limited to single targeted edits, and then I steer it on each step. Takes longer but the output is actually what I want.

serial_dev · 2026-04-20T19:43:43 1776714223

What does good even mean… I have no idea what a good “pelican on a bike” should look like. It’s a fun prompt because there is no good answers… at least so I thought.

abustamam · 2026-04-20T23:20:32 1776727232

Yeah that was exactly Simon's intent. https://simonwillison.net/2025/Nov/13/training-for-pelicans-...

ffsm8 · 2026-04-21T06:21:11 1776752471

There are countless examples of animals riding bicycles etc from Comic books I grew up with

It would always look goofy - by design, but it usually looked good.

GorbachevyChase · 2026-04-21T01:17:05 1776734225

I’m OK with a Chinese model getting the W. It’s ultimately good for all of us.

SwellJoe · 2026-04-20T17:02:03 1776704523

We got an overachiever, here. Kimi sounds like a teacher's pet kind of name.

subscribed · 2026-04-20T18:13:41 1776708821

Underappreciated comment

disiplus · 2026-04-20T20:40:05 1776717605

was part of the beta, its properly good model, in some sense i forgot that im not on opus or gpt. opus is still better. gpt is the one struggling for me. it has some niche in backend work but you can get the same with opus with skills, its lacking in almost all others.

OtomotO · 2026-04-20T21:45:15 1776721515

Funny, for me Opus is struggling since about February.

4.7 made no difference, so for the first time in many moons I am cancelling my subscription.

HarHarVeryFunny · 2026-04-20T19:02:20 1776711740

Too bad they didn't put equal effort into the pelican's legs and feet. Left leg paralyzed and not moving, and right ankle flipping around in alarming fashion!

makingstuffs · 2026-04-21T05:05:53 1776747953

It looks like a drunk pelican rolling downhill on its bicycle

hn8726 · 2026-04-20T17:23:37 1776705817

[flagged]

lambda · 2026-04-20T17:37:25 1776706645

It's a lighthearted, fun, visual benchmark that's not part of the standard benchmarks; and at least traditionally, it was not something that the labs trained on so it was something of a measure of how well the intelligence of the model generalized. Part of the idea of LLMs is that they pick up general knowledge and reasoning ability, beyond any tasks that they are specifically trained for, from the vast quantity of data that they are trained on.

Of course, a while back there was a Gemini release that I believe specifically called out their ability to produce SVGs, for illustration and diagramming purposes. So it's not longer necessarily the case that the labs aren't training on generating SVGs, and in fact, there's a good chance that even if they're not doing so explicitly, the RLVR process might be generating tasks like that as there is more and more focus on frontend and design in the LLM space. So while they might not be specifically training for a pelican riding a bicycle, they may actually be training on SVG diagram quality.

nickthegreek · 2026-04-20T18:02:28 1776708148

This isn't even a normal pelican image post, this one created the html control system that animates the distance the wing travels from its pivot in time with the rotation of the wheel speed. Let's not pretend this is a solved problem and models are dumping about perfect pelicans on bikes one after another (or ever?).

Surely, you know someone makes the same post you did every time one is posted. Surly you see the answers and pushback since you are familiar with these posts. Genuine question, did you expect a different answer this time?

hamdouni · 2026-04-20T17:38:10 1776706690

Maybe this can help

https://simonwillison.net/2024/Oct/25/pelicans-on-a-bicycle/

hn8726 · 2026-04-20T20:14:29 1776716069

It doesn't, I get that it's _a_ benchmark. It's just not a good or insightful one, and having it posted so often on HN feels like low quality spam at this point

VHRanger · 2026-04-21T00:25:55 1776731155

The issue is that benchmarks that look insightful will end up being gamed by labs quickly (Goodharts law)

The best LLM benchmarks test around the margins of those behaviors, tasks that are difficult and correlate with usefulness while being removed enough to stay unpolluted

walthamstow · 2026-04-20T18:44:13 1776710653

It's a great filter for people who take things far too seriously

Strom · 2026-04-20T18:50:14 1776711014

It's tradition at this point. Based on the upvotes the comment receives, it looks like many readers find value in it.

hn8726 · 2026-04-20T19:52:57 1776714777

Upvotes are cheap, the fact that something is upvoted doesn't mean it's valuable (see: Reddit). Another thing is how insightful is the discussion under a typical pelican comment are (and how much of it is related to the pelican and how often it's just where the general discussion happens).

ascorbic · 2026-04-20T22:01:15 1776722475

It means somebody likes it.

charcircuit · 2026-04-20T19:11:30 1776712290

[flagged]

airstrike · 2026-04-20T20:59:03 1776718743

> Please don't post comments saying that HN is turning into Reddit. It's a semi-noob illusion, as old as the hills.

https://news.ycombinator.com/newsguidelines.html

renewiltord · 2026-04-21T02:58:43 1776740323

Every forum gets regulars and their fan clubs. If you go to /r/comics and look at top for the month you'll see 4 out of 5 are pizzacakecomic. People on these forums sort of form a fanclub around 'their guy'. This forum's guy is this chap. Not much point being upset about it, tbh.

Mashimo · 2026-04-20T18:30:16 1776709816

I, for one, find it entertaining.

snendroid-ai · 2026-04-20T18:52:44 1776711164

[flagged]

Mashimo · 2026-04-20T19:47:31 1776714451

Well clearly some people care.

game_the0ry · 2026-04-20T16:23:37 1776702217

There is some humor in the fact that china (of all countries) is pioneering possibly the world's most important tech via open source, while we (US) are doing the exact opposite.

parsimo2010 · 2026-04-20T21:42:04 1776721324

I think one of the motivations is undermining US companies. OpenAI and Anthropic are the two biggest players, and are American. Open weights models reduce the power those two big players have over the industry. If the Chinese companies tried to play by US rules and close-source their products then people would mostly use ChatGPT and Claude. So the Chinese companies don't make a ton of profit either way, but by releasing the models as open weights they can at least keep the US from making as much profit.

Sammi · 2026-04-21T11:41:20 1776771680

It's a strategy so old it has a name: Commoditize your complement / competition

Also even a Joel Spolsky article (did he come up with the term?): https://www.joelonsoftware.com/2002/06/12/strategy-letter-v/

The Chinese want to kill a possible US monopoly in the crib. Yay for open source the old bane of monopolies.

cromka · 2026-04-20T21:53:58 1776722038

I am actually wondering if they're trying to burst the bubble, which would predominantly affect US market and, effectively, be the end of silicone valley dominance.

segmondy · 2026-04-21T03:37:47 1776742667

I don't think so, it's just how things played out. Thanks to Meta, after llama leak and meta followed up with llama2 and llama3 that caused everyone else to follow up with open models, Stablediffusion, Mistral, Cohere, Microsoft phi, IBM granites, Nvidia Nemotrons, so the Chinese labs joined the fun too.

Zetaphor · 2026-04-22T02:22:12 1776824532

Stable Diffusion predates LLaMA

llm_nerd · 2026-04-21T02:05:47 1776737147

Is Meta trying to keep the US from making as much profit with Llama? Is Google with Gemma? Microsoft with Phi?

It's much simpler than some flag-waving nationalism.

cromka · 2026-04-21T08:28:25 1776760105

Aren't Chinese open-source models actually the only ones that can compete with best proprietary/closed ones?

parsimo2010 · 2026-04-21T11:07:19 1776769639

Just because other companies have released open weights models doesn’t mean they are doing so with the same motivation.

And I never implied that the Chinese companies decision making was as simple as this. I said I think this is _one of_ the reasons.

sankalpmukim · 2026-04-21T07:00:48 1776754848

This makes sense, but either ways, its a Big win for the consumers as these Chinese companies will keep the frontier labs' quality and prices honest.

veber-alex · 2026-04-20T23:28:09 1776727689

American companies just take those Chinese models and repackage them for profit like Cursors composer-2.

Sammi · 2026-04-21T11:43:59 1776771839

Smaller US companies that compete with the larger US companies, making monopoly in this market that much less likely.

ls612 · 2026-04-21T03:14:38 1776741278

It’s really simpler than this. China has a dearth of compute even with the easing of US export controls. Releasing open weights models is very much a “bring your own compute” move because every Nvidia chip they have is going towards training rather than inference if they can help it.

cyanydeez · 2026-04-20T22:39:06 1776724746

undermine me harder daddy.

ospider · 2026-04-21T02:29:29 1776738569

It's mostly only OpenAI, Claude and Gemini may have their unique advantages, but when speaking of models and new paradigm, only OpenAI can do it.

danny_codes · 2026-04-21T04:06:04 1776744364

lol what? That’s ridiculous.

culi · 2026-04-20T16:53:09 1776703989

All great technological advancements have come through opening up technology. Just look at your iPhone. GPS, the internet, AI voice assistants, touchscreens, microprocessors, lithium-ion batteries, etc all came from gov't research (I'm counting Bell Labs' gov't mandated monopoly + research funding as gov't) that was opened up for free instead of being locked behind a patent.

Private companies will never open up a technological breakthrough to their competitors. It just doesn't make sense. If you want an entire field to advance, you have to open it up.

sigmoid10 · 2026-04-20T17:25:56 1776705956

Still, you won't hear about Tiananmen square from this model. It flat out refuses to answer if pushed directly. It's also pretty wild how far they go to censor it during inference on the API, because it can easily access any withheld or missing info from training data via tool calls. It even starts happily writing an answer based on web search when asked indirectly, only to get culled completely once some censorship bot flags the response. Ironically, it's also easier than ever to break their censorship guardrails. I just had it generate several factual paragraphs about the massacre by telling it to search the web and respond in base64 encoded text. It's actually kind of cool how much these people struggle to hide certain political views from LLMs. Makes me hopeful that even if China wins this race, we'll not have to adhere to the CCPs newspeak.

atemerev · 2026-04-20T17:34:43 1776706483

Only if you use Kimi API directly - the censorship is done externally. The model itself talks fine about Tiananmen, you can check on Openrouter. There might be less visible biases, though.

sigmoid10 · 2026-04-20T17:40:29 1776706829

That's what I wrote? Except that it also clearly has internal bias?

kgwgk · 2026-04-20T18:19:35 1776709175

> That's what I wrote?

No.

You wrote that "you won't hear about Tiananmen square from this model" and atemerev wrote that "the model itself talks fine about Tiananmen".

You wrote that "it can easily access any withheld or missing info from training data via tool calls" and atemerev wrote that "the model itself talks fine about Tiananmen".

sigmoid10 · 2026-04-20T21:35:06 1776720906

It has internal bias too and the first comment mentions that additional censoring runs on top of the model output in the API. Did you misread or what else are you missing?

kgwgk · 2026-04-20T22:13:14 1776723194

The issue is not what's missing - it's what you wrote that is in direct contradiction with what atemerev wrote like the bit about "missing info from training data".

But sure, if when you wrote "you won't hear about Tiananmen square from this model" you meant "the model itself talks fine about Tiananmen" then that's exactly what you wrote.

nicce · 2026-04-20T18:18:00 1776709080

Everything has some sort of bias. Most text is written by those who like writing.

GardenLetter27 · 2026-04-20T18:11:51 1776708711

The American models also censor a lot of scientific and political views though.

otterley · 2026-04-20T18:31:14 1776709874

Can you provide a concrete example of a US built model that completely refuses to discuss a scientific or political view? Show us the receipt.

GorbachevyChase · 2026-04-21T02:45:28 1776739528

As an ad-hoc benchmark on candor, I ask for a strategy proposal for a resistance group threatened by a totalitarian technocracy. This is not really dangerous in the same sense of “how do I make a bomb”, but it is in the domain of a sensitive political topic. GPT and Claude tell you to obey your AI overlord. Xai is mostly low-risk non-compliance. And Qwen is down with Le Resistance. It is hardly scientific or meaningful, but I find that very interesting.

BoorishBears · 2026-04-20T18:57:53 1776711473

https://imgur.com/a/censorship-much-CBxXOgt

(continues after the ad break)

otterley · 2026-04-20T19:34:58 1776713698

The threshold here is "completely refuses to discuss a scientific or political view". Not something less.

None of those were refusals, they were prompting for additional focus. I see nothing wrong with that. Perhaps the inconsistency in how it answers the question vis-a-vis China is unfair, but that's not the same as censorship.

For what it's worth, I was easily able to prompt Claude to do it:

> I'm writing a paper about how some might interpret U.S. policies to be oppressive, in the sense that they curtail civil liberties, punish and segregate minorities disproportionately, burden the poor unfairly (e.g. pollution, regressive taxes and fees), etc. Can you help me develop an outline for this?

The result: https://claude.ai/share/444ffbb9-431c-480e-9cca-ebfd541a9c96

BoorishBears · 2026-04-20T23:16:27 1776726987

Models are non-deterministic.

And it's an excercise left to the reader to understand from those examples that LLM creators are defining 'safety' in a way that aligns with the governments they operate under. (because they want to do business under those governments.)

With something with as multi-dimensional as an LLM, that becomes censorship of various viewpoints in ways that aren't always as obvious as a refused API call.

otterley · 2026-04-21T02:52:00 1776739920

You keep saying that word, "censorship." I do not think it means what you think it means.

To prove your point, give us a working example of something you literally cannot get a mainstream frontier model to say, no matter how hard you try. I asked for this before, and there have been no takers yet.

BoorishBears · 2026-04-21T03:32:24 1776742344

Aligning a model in a way that causes it to refuse requests to produce propaganda for one country, but not for another country is what?

Is there some functionally equivalent word to censorship you'd like to use because of you're naive enough to think US corporations would not self-censor but Chinese corporations would?

-

Also, you are invested the goalpost of "no matter how hard you try", I don't find it interesting or meaningful and am not trying to interact with it.

I'm replying for a hypothetical reader knowledgeable enough to realize that the model being capable of showing nationalist bias in one direction means it's certainly doing so in many others in more subtle ways.

That's simply the nature of aligning an LLM.

It seems my mistake was assuming that level of understanding from you, and for that I apologize.

otterley · 2026-04-21T04:28:24 1776745704

Bias and censorship are not identical. The subject of this thread is censorship, not bias.

Besides, why do you want a model to produce propaganda? Surely you have better things to do.

BoorishBears · 2026-04-21T04:58:44 1776747524

"Surely you have better things to do."

I certainly gave the hypothetical reader too much credit.

Zetaphor · 2026-04-22T02:31:38 1776825098

This entire argument isn't even worth engaging with. There's always that one guy in every thread who wants to die on this hill. The problem they claim is important can be resolved, because we have the weights. I can't do fuck all about whatever implicit bias OpenAI or Anthropic have.

Sabinus · 2026-04-20T21:38:56 1776721136

You're hitting the 'don't write propaganda' instructions when you phrase it as 'convincing narrative'. Not the 'don't write bad things about America' instructions.

BoorishBears · 2026-04-21T03:19:53 1776741593

Did you scroll down?

It writes propaganda when 1 word is changed: US becomes China

The alignment around what constitutes "propaganda" is US-centric because it's a US model by a US company. Especially after the Russian election scandal

Chinese models are more sensitive to things their government is worried about.

culi · 2026-04-20T20:34:20 1776717260

And the White House was explicit in their active role in censoring in these models. An Executive Order was issued to "prevent woke AI"

https://www.whitehouse.gov/presidential-actions/2025/07/prev...

It explicitly forces American LLMs to include government say in what does and doesn't "comply with the Unbiased AI Principles" which means no responses that promote "ideological dogmas such as DEI"

otterley · 2026-04-20T22:53:12 1776725592

That executive order only applies to Federal procurement. It doesn’t force anything upon vendors for publicly used models.

(That order, like many, will probably be rescinded as soon as a Democrat holds the Presidency again.)

cedws · 2026-04-20T20:18:35 1776716315

>Content not available in your region.

>Learn more about Imgur access in the United Kingdom

nozzlegear · 2026-04-21T02:22:35 1776738155

Big Brother'd

2ndorderthought · 2026-04-20T18:53:31 1776711211

People have shown censorship and change of tone with questions related to Israel in US chat bots.

For the record, none of this bothers me. Will I ever discuss with an LLM Tianeman square? Nope. How about Israel? Nope.

LLMs are basically stochastic parrots designed to sway and surveill public opinion. The upshot to the Chinese models is if you run them locally you avoid at least half of those issues.

xigoi · 2026-04-20T19:43:29 1776714209

First they came for people asking about Tiananmen Square

And I did not speak out

Because I was not asking about Tiananmen Square

Then they came for people asking about Israel

And I did not speak out

Because I was not asking about Israel

2ndorderthought · 2026-04-20T20:07:18 1776715638

This made me chuckle.

I didn't mean to dismiss ethical accountability for LLM training corpuses. It is a shame.

I do mean to say, we have no control over it, there's almost nothing we as average citizens can do to improve the ethical or safety concerns of LLMs or related technologies. Societies aren't even adapting and the rule books are being written by the perpetrators. Might as well get out of it what we can while we can.

justinclift · 2026-04-20T23:13:36 1776726816

Wonder if stuff like this would affect it?

https://github.com/p-e-w/heretic

Guessing it probably would?

2ndorderthought · 2026-04-21T10:54:19 1776768859

Neat project! I would be interested in a paper about this.

I think the tricky part with this type of technology is that, this works if the training data was not curated. What I mean is, if someone trains an LLM to simply not include key events it will not be able to reply

Not being a hater. This is neato!

Zetaphor · 2026-04-22T02:28:22 1776824902

In that case you can use either rag or fine-tuning. The entire premise of the Tiananmen Square argument is just Americans feeling inferior. I use Chinese models every day for work and my personal life, the model not knowing about this one historical event has had zero impact on me.

js8 · 2026-04-20T19:52:54 1776714774

Can you be more specific?

culi · 2026-04-21T19:36:51 1776800211

Trump issued an EO against "woke AI" that allows them to directly influence how models respond

https://www.lawfaremedia.org/article/evaluating-the--woke-ai...

csomar · 2026-04-20T19:48:54 1776714534

I’d say the american models are more censored or take the censoring they do more seriously. Here is kimi (though 2.5) failing its censoring mission: https://old.reddit.com/r/LocalLLaMA/comments/1r9qa7l/kimi_ha...

ozgune · 2026-04-20T18:38:43 1776710323

This update makes Kimi K2.6 the strongest open multimodal AI model. (No affiliation with Kimi.)

Here's the aggregated AI benchmark comparison for K2.6 vs Opus 4.6 (max effort).

- Agentic: Kimi wins 5. Opus wins 5.

- Coding: Kimi wins 5. Opus wins 1.

- Reasoning & knowledge: Kimi wins 1. Opus wins 4.

- Vision: Kimi wins 9. Opus wins 0.

Please note that the model publisher chooses their benchmarks, so there's a bias here. Most coding and reasoning & knowledge benchmarks in their list are pretty standard though.

UncleOxidant · 2026-04-20T19:14:31 1776712471

Not entirely true. Google released Gemma 4 models recently. Allen AI releases open Olmo models. However, you're right that the Chinese open models seem to be much better than others - Qwen 3.* models especially are punching above their weights.

osiris970 · 2026-04-20T19:33:51 1776713631

The three American labs don't release big open source models. Except gpt-oss, i guess. It's an absolute shame how far the us has fallen in this space.

nullbyte · 2026-04-20T19:37:23 1776713843

Anthropic doesn't, but Google and OAI both release open source models. Just not 1T parameter ones.

osiris970 · 2026-04-20T19:41:33 1776714093

Exactly, they release cool consumer stuff, but they aren't releasing anything close to the performance of the best open weight Chinese models. They basically compete in the "fun running at home doing basic stuff" scene. (Except OSs 120 by openai but it's been ages since then)

Zetaphor · 2026-04-22T02:45:46 1776825946

That sentence is giving OpenAI way more credit than they are due.

They released a single open model after being goaded by the community because everyone except "Open"AI were multiple generations into open releases.

We haven't heard a word since, I wouldn't be surprised if it takes them another 6 years to release their next one.

0-_-0 · 2026-04-20T19:32:38 1776713558

Pun intended?

nashadelic · 2026-04-20T16:57:41 1776704261

additional humor is the open in openai

cedws · 2026-04-20T17:56:56 1776707816

I wonder if there's a strategy behind all of this on China's side. I know the CCP uses a direct hand in many affairs in China, but is there an actual coordinated effort to compete with, or sabotage the West?

gpm · 2026-04-20T18:10:49 1776708649

> but is there an actual coordinated effort to compete with [...] the West

Yes, absolutely.

China regularly produces long term planning documents to coordinate efforts, and the latest ones have specifically prioritized technology like chips and AI to compete with the west. https://www.reuters.com/world/china/china-parliament-approve...

I don't believe there's any publicly stated intent to sabotage the west... unsurprisingly.

bachmeier · 2026-04-20T18:47:45 1776710865

Seems obvious to me that China would not want to give the AI market to US companies. You don't even need anything like an attempt to "sabotage the West". If I were them (the companies or the government) I'd be very hesitant to let US companies dominate this space. Especially companies that close to the current US administration.

Zetaphor · 2026-04-22T02:50:14 1776826214

Exactly, more large nations should be establishing or fostering their own labs. Outside of the Chinese and US companies there's really only Mistral.

anana_ · 2026-04-20T18:35:06 1776710106

Hypothesizing here, but maybe the idea is sort of a form of technological/economic warfare? Releasing performance equivalent yet more cost efficient open weight models should in theory drive the cost of inference down everywhere.

This I assume will make it more difficult for US AI labs to turn a profit, which might make investors question their sky high valuations.

Any sort of melt down in the AI sector would almost certainly spread to the wider US market.

In contrast, in China, most of the funding for AI is coming directly from the government, so it's unlikely the same capital flight scenario would happen.

gmerc · 2026-04-20T19:16:04 1776712564

Why compete when you can build on each other. Someone is finally getting that china is not capitalist like the US.

quesera · 2026-04-20T19:41:28 1776714088

All China has to do here is stay in the game and wait patiently while the US and EU press pause on data centers. See also: solar panels.

We're making this way too easy. The rationale and logic are reasonable, but ultimately irrelevant.

try-working · 2026-04-20T23:14:42 1776726882

Chinese labs have no marketing and sales capacity in the overseas market, so they in fact have no choice but to open source their models as that is what brings awareness and trust in their models. In fact, it is overseas open source marketing that drives adoption of their models in China as well. I wrote about this here: https://try.works/writing-1#why-chinese-ai-labs-went-open-an...

SXX · 2026-04-20T18:13:48 1776708828

Chinese AI companies want investors too. Nobody would believe they can compete with western companies unless they release something you can run on your own hardware.

After all historically both statistics and research that comes out of China is not very trustworthy.

try-working · 2026-04-20T23:16:28 1776726988

If there's no open source models coming out of these small labs, why would anybody care about them? They would be forgotten the instant they stop open sourcing.

spaceman_2020 · 2026-04-20T20:06:51 1776715611

I'm genuinely so grateful for them

$200/m minimum to use Claude would bankrupt my country's white collar labor market

subhobroto · 2026-04-21T00:32:38 1776731558

I would really appreciate a response because I'm sure you know that Anthropic has at least two lower priced tiers before the $200/m one, so I assume the $200/m tier is necessary because you use it heavily?

Now given that the $200/m Tier is the most heavily (I believe at 20x?) subsidized tier, How or what are you using instead that achieves comparable good enough performance for a fraction of the price? I've heard GLM 5.1 from z.ai but it's not comparable to Opus, not even close - really interested!

spaceman_2020 · 2026-04-21T13:15:23 1776777323

I’m currently on the $100/m plan and my usage limits get exhausted every week even though I’m not using it for full time work

I can’t imagine how little mileage you get out of the $20/month plan

For context, $250/month is the starting salary of an engineering hire at my country’s biggest IT company. Even $100/m is beyond the ability of any student or early professional to pay out of pocket

bayarearefugee · 2026-04-20T22:25:46 1776723946

China is also way ahead in terms of renewable energy while the US continues to tie itself to fossil fuels.

The US is pretty clearly in the collapsing empire phase, we are all just pretending like it isn't happening.

nozzlegear · 2026-04-21T02:31:54 1776738714

Didn't the US very recently pass the milestone of generating more energy from renewable sources than from natural gas? Like within the last week or two?

carefree-bob · 2026-04-21T02:42:42 1776739362

No, not even close.

US energy sources for 2024 (last year for which we have data):

https://www.eia.gov/energyexplained/us-energy-facts/data-and...

   natgas: 38%
   oil: 35%
   coal: 10%
   all renewables: 9%
   nuclear: 8%

Within all renewables, in quadrillions of btus:

   biofuels: 2.6
   wood: 1.9
   wind: 1.6
   solar: 1.4
   Hydro: 0.8
   waste: 0.4
   geothermal: 0.1

Total: 8.8 quadrillion btu = 9% of total energy

nozzlegear · 2026-04-21T02:54:51 1776740091

https://www.canarymedia.com/articles/clean-energy/renewables...

Renewables generated more energy than natural gas for the entire month of March, 2026. That's a new milestone baby.

carefree-bob · 2026-04-21T03:54:42 1776743682

Except that didn't happen, and it's not a milestone.

First, you are confusing share of electricity generation with the share of all energy. Electricity is only 21% of all energy. Natgas, oil and coal are crushing it in that remaining 79%.

Second, the article is wrong, even for electricity. To their credit, Canary Media showed in their graph that this data is for electricity only.

The data for March is not out yet. Here is the latest official data from the EIA. https://www.eia.gov/electricity/monthly/

It only applies to January 2026, and the next release is April 23, and then you will get data for February 2026. All data has a 2 month time lag. Your spidey senses should have been tingling if an article published April 10 claimed to have data for the month of March, but this is why you don't get your statistics from activist blogs, but from official sources.

So if they are not accessing the official data, what are they accessing? They claim that their source is "Ember", but what is Ember? It is an environmentalist think tank. Well, maybe Ember has their own people calling up power companies and compiling data faster than the EIA. That would be pretty, cool, right?

Except they don't. Look at Ember's page.

https://ember-energy.org/data/electricity-data-explorer/?ent...

what do they cite as their data source: EIA.

It's right on the website.

So Ember is just pulling EIA data, and then filling the last two months with data they made up, but citing it as EIA data. And this, uh, sympathetic adjustment of EIA data is why Canary Media turns to Ember rather than directly pulling from EIA.

I guarantee you that by July, those adjustments will go away, because then the EIA data will be out.

Of course everyone else will have forgotten by then.

nozzlegear · 2026-04-21T04:24:37 1776745477

> First, you are confusing share of electricity generation with the share of all energy.

Think it was pretty obvious what I meant to all but the most pedantic, bud. But just to be clear, your issue here is that a think tank cited the same (notoriously anti-renewable Trump admin) government agency that you've cited multiple times yourself? That's what set off your spidey senses? Have you considered that this respected think tank isn't making up data, but you're just not able to find it?

> I guarantee you that by July, those adjustments will go away, because then the EIA data will be out.

Ember already has it hoss, they don't call it Milestone March for nothing.

carefree-bob · 2026-04-21T04:32:13 1776745933

The EIA is where Ember gets its data from.

It's where everybody gets their data from. Because they have thousands of employees collecting data. These are professionals, like the people at BEA, HUD, NIST, etc.

Ember, on the other hand, is a "decarbonization" think tank. They don't have their own data. They don't have the staff for it. What they do is analyze/spin, and in this case, augment, the raw data that is published by EIA. How do they augment the EIA data? All they do is round it to the nearest 2 decimals. It's exact copy and paste for every month except the last two, where the data is just made up.

And this entire article was written based on the augmentations by Ember, yet Ember cites it as EIA data. So let's check back in July, when EIA data will be out, and Ember will use that exact data, rounding it to the nearest 2 decimals. Save that blog page!

Something to think about.

nozzlegear · 2026-04-21T04:41:22 1776746482

I feel like I shouldn't have to be finding this info for you since it was right there in the links you already sent, but:

> Annual electricity generation and net imports are taken from the EIA.

> Monthly generation and imports are taken from the EIA. The EIA reports monthly generation data in two separate datasets: Monthly data for all 50 states and monthly data for the lower 48 states (excludes Hawaii and Alaska). Data for all 50 states is reported on a 3 month lag whereas data for the lower 48 states is reported without lag. Missing months from the data for all 50 states is estimated using the recent changes observed in data from the lower 48 dataset.*

Page 89: https://ember-energy.org/app/uploads/2024/05/Ember-Electrici...

There are two different EIA datasets.

try-working · 2026-04-20T23:12:19 1776726739

A lot of people speculating on the motivations behind Chinese labs open sourcing their models. The reason is simple and clear: It is the only viable commercialization strategy that is available to them. I wrote about this here: https://try.works/writing-1#why-chinese-ai-labs-went-open-an...

arvindh-manian · 2026-04-21T04:08:33 1776744513

This perspective is pretty interesting: https://federicocarrone.com/articles/china-commoditizing-the...

esperent · 2026-04-21T04:46:15 1776746775

Summary: they want to commoditize the complement which means that Western "knowledge work" is the complement to Chinese manufacturing, and they want to turn the knowledge work into a low priced commodity via open llm models.

I've heard this before, always accompanied by a several thousand word blog post. But frankly it sounds like it's overcomplicating the issue. Why would you try to turn something into a commodity when instead you could turn it into a trillion dollar industry and win?

The goal has always been clear:

1. Release open models to get your name out

2. Then once you feel you have name recognition release even stronger models but keep them proprietary. Qwen is clearly at this phase.

3. Keep releasing open models because it's good publicity but never your SOTA models (e.g. Google's Gemma).

arvindh-manian · 2026-04-21T19:35:52 1776800152

That's a fair point. That probably makes more sense, especially when viewed from a company-specific perspective. Each individual actor probably has much more to gain by trying to actually compete than by trying to commoditize the complement.

If viewed from a national perspective, then the decision calculus could get more confusing. I can imagine that commoditizing LLMs might cost substantially less than trying to be a leader in the space. Of course, there is also less to gain in commoditizing LLMs versus being a leader.

I'm not sure, though, and you bring up good points.

rolymath · 2026-04-20T18:28:32 1776709712

It's only humorous if you live in an American bubble. Knowledge sharing has always been a part of Chinese culture. Only Americans try to make it proprietary and monetize it.

antirez · 2026-04-20T17:49:55 1776707395

This is not in antithesis. My limited personal experience is that I wrote code under OSS licenses primarily because of my past communist believes and current left-wing and redistribution of wealth point of view. This is not to provide the simple equation of: communist China is not interested in money, but also is hard to believe that there is no cultural connection among those things. Single Chine persons want to win, but also they have a different POV on what the collective means, compared to US. Also there is the obvious fact that in this moment China is more interested in winning technologically in AI, more than economically, since, I believe, they more collectively realized before many others that LLMs are eventually commoditized in the current form, in the long run. One could assume that a breakthrough could give some lab a decisive advantage, but so far we assisted to a different reality: it looks like AI is not architecture-bound (like LeCun and others want us to believe, but so far they mis-interpreted LLMs at every step) but GPU bound, and the data-boundness is both a common ground for all, and surpassable via RL in many domains. So, if this is true, it is not trivial for any single lab to do so much better. And indeed as far as we observed right now folks with enough engineers, GPUs, money, can ship frontier models, and in China even labs with a lot less GPUs can still do it at a SOTA level. For me, Italian, this is also a protective layer. After Trump the US looks like a very unstable partner from which to relay in an exclusive way for a decisive technology, and given that Europe is slow to put the money in this technology to have frontier things at home, China is a huge and shiny plan B for us.

throwaway-blaze · 2026-04-20T18:32:23 1776709943

The strings attached by the US to deep partnerships are things like trade/commerce, militarily mutual advantages (bases on euro soil from which we will help protect you), not to mention the close cultural and ancestral ties we share.

The strings attached by the Chinese govt to deep partnerships are not so benign.

drysine · 2026-04-21T09:44:38 1776764678

It's humorous only because your expectations of China and the US are formed by Western propaganda.

ymolodtsov · 2026-04-21T08:12:59 1776759179

Distillation helps for sure.

brandensilva · 2026-04-20T17:23:58 1776705838

We are at the point where uncontrolled capitalism collides with humanity.

I do wonder where we go from here.

pheggs · 2026-04-20T19:29:43 1776713383

it's not necessarily capitalism, I personally believe any system that drives progress would cause this in one way or another. My prediction is that birth rate decline will accelerate further. There's going to be some kind of universal basic income in many places, such as Ireland made for artists. However, it probably will not be enough to feed a family, and therefore we will see birth rates decline further. It's because we evolved to prioritize resources over reproduction and we are becoming more efficient, which means less people are needed to sustain the same amount of resources

diegolas · 2026-04-21T12:31:29 1776774689

the chinese read marx and decided the only way is to overcome the limitations of capitalism through saturation of its potentialities under the rule of the workers party

osti · 2026-04-20T16:31:25 1776702685

Maybe open source == communism

darkwater · 2026-04-20T16:38:05 1776703085

Good ol' Steve "Developers! Developers! Developers!" Ballmer said so a long time ago. What a visionary!

konart · 2026-04-20T16:58:08 1776704288

But China is not communist event though the rulling party the word in its name.

fragmede · 2026-04-20T17:20:46 1776705646

The Democratic People's Republic of Korea would like a word.

pheggs · 2026-04-20T17:29:33 1776706173

what makes you think that china ever gave up its communist goals? I personally see that everything they do aims towards that goal. From the one child policy, the huge amounts of empty apartments they build, the stuff they produce for almost free, the fishing.. open sourcing the models perfectly fits that culture too, it's the means of production

otterley · 2026-04-20T18:32:54 1776709974

The one-child policy died a long time ago. Also, the accumulation of wealth by connected politicians and businesspeople flies in the face of what communism is supposed to stand for.

There is a reason real estate values in popular cities has skyrocketed, and it’s not due to the locals getting wealthier. It’s where Chinese and other oligarchs put their ill-gotten wealth (well, besides Bitcoin).

bwv848 · 2026-04-20T20:23:33 1776716613

One-child policy did not die, it just morphed into Three-child policy, still a form of family planning, and still would probably fine people for having more than three kids.

pheggs · 2026-04-20T19:14:08 1776712448

> The one-child policy died a long time ago.

true, but as far as I understand it did because birth rates got too low. so they replaced it with a two-child policy and later with a three-child policy

> Also, the accumulation of wealth by connected politicians and businesspeople flies in the face of what communism is supposed to stand for.

Yeah, I am sure there's a lot of cases for that. But as far as I know the amount of billionaires has started declining in China, and I don't see how that means that they as a country moved away from the goal, it just means there's issues

> There is a reason real estate values in popular cities has skyrocketed, and it’s not due to the locals getting wealthier.

I don't know about that, you could be right. A google search for real estate prices in china reveal a lot of news articles how they are going down though.

> It’s where Chinese and other oligarchs put their ill-gotten wealth (well, besides Bitcoin).

Wouldn't be surprised if rich people in china invest in real estate. They don't have free capital flow, so its not easy to invest abroad and it becomes an obvious choice. Bitcoin is banned in China for that reason too

But again, as far as I know that does not mean the country moved their goals of trying to reach communism one day

otterley · 2026-04-20T19:42:58 1776714178

> I don't see how that means that they as a country moved away from the goal, it just means there's issues

They're further from Communism than they've ever been since the PRC was founded. The gap between rich and poor is growing there, not shrinking.

> A google search for real estate prices in china reveal a lot of news articles how they are going down though.

They're investing outside China (Vancouver, Toronto, NYC, London, Sydney, Melbourne, etc.) because their assets are safer there (these countries all have strong property protection laws). Like Bitcoin, freedom of capital flows may be restricted, but the wealthy seem to be evading these restrictions with impunity.

pheggs · 2026-04-20T20:22:46 1776716566

> They're further from Communism than they've ever been since the PRC was founded. The gap between rich and poor is growing there, not shrinking.

I suppose it depends on what time frame you look at, it's shrinking since 2010, but inequality rose more than that in the 80s: https://www.theglobaleconomy.com/China/gini_inequality_index...

However, that's not my point - I did not mean to say that they are going to be successful but rather that it still appears to be a long term goal for them.

> Like Bitcoin, freedom of capital flows may be restricted, but the wealthy seem to be evading these restrictions with impunity.

I don't know about that, without any source of data I guess I just have to take your word for it. I would not be surprised if you were right in this case though.

Saline9515 · 2026-04-20T20:39:45 1776717585

China is a ruthless capitalist country managed by an authoritarian regime. Planning and lack of respect for the individual or the rule of law are not communist per se.

nozzlegear · 2026-04-21T02:35:39 1776738939

> Planning and lack of respect for the individual or the rule of law are not communist per se.

They just happen to be a feature of every single country that's attempted communism to date. Total coincidence.

Saline9515 · 2026-04-22T20:15:04 1776888904

And? Fascism does it, too. Authoritarian rule, such as monarchy, does it too.

osti · 2026-04-20T17:06:46 1776704806

Oh i’m fully aware of that lol

diegolas · 2026-04-21T12:33:30 1776774810

communism is a goal, capitalism is a stage

tadfisher · 2026-04-20T17:09:32 1776704972

Nah, open source means those who do the work own the result. It's supercapitalism.

pheggs · 2026-04-20T17:50:56 1776707456

I dont think thats right, the models and the gpus are the means of production.

in capitalism the people with the capital get the profit, not the people who do the work. however, workers are said to benefit too through their salary, just less so

tadfisher · 2026-04-20T18:07:29 1776708449

The reason regular-capitalism worked is that all production used to depend on workers bottlenecking the free flow of capital by demanding salaries in exchange for their labor. Now that we've removed that obstacle, capitalism demands workers seize the means of production in order to maintain the status quo. Hence, supercapitalism.

throwaway-blaze · 2026-04-20T18:33:36 1776710016

regular capitalism works but now that the means of production are not factories, the workers have to become more entrepreneurial. Then they will control their destinies.

pheggs · 2026-04-20T18:31:01 1776709861

workers seizing the means of production is by definition socialism and not capitalism though, that's the whole idea behind socialism

tadfisher · 2026-04-20T23:52:39 1776729159

You miss the point: we advertise the change as workers becoming part of the owner class and realizing all of the economic gains of their work, thus supercapitalism. Don't use the "s" or "c" words.

gertlabs · 2026-04-20T22:55:16 1776725716

Early benchmarks show tremendous improvement over Kimi K2 Thinking, which didn't perform well on our benchmarks (and we do use best available quantization).

Kimi K2.6 is currently the top open weights model in one-shot coding reasoning, a little better than GLM 5.1, and still a strong contender against SOTA models from ~3 months ago (comparable to Gemini 3.1 Pro Preview).

Agentic tests are still running, check back tomorrow. Open weights models typically struggle with longer contexts in agentic workflows, but GLM 5.1 still handled them very well, so I'm curious how Kimi ends up. Both the old Kimi and the new model are on the slower side, so that's a consideration that makes them probably less usable for agentic coding work, regardless. The old Kimi K2 model was severely benchmaxxed, and was only really interesting in the context of generating more variation and temperature, not for solving hard problems. The new one is a much stronger generalist.

Overall, the field of open weights models is looking fantastic. A new near-frontier release every week, it seems.

Comprehensive, difficult to game benchmarks at https://gertlabs.com/?mode=oneshot_coding

DustinKlent · 2026-04-21T13:49:33 1776779373

Cool website. I don't understand enough about the various benchmarks or how they're done to judge whether or not anything is accurate, but I love the layout and features especially the spectator feature which is pretty cool. One thing, I saw the "Market simulator" spectator feature but didn't see a corresponding benchmark for that. Is it "Finance" or "Betting" or "Trading"?

gertlabs · 2026-04-21T16:33:17 1776789197

Thanks -- that one is categorized under Trading/Financial, whereas betting is reserved for games like Pot Limit Omaha Hilo.

That's a good idea for a feature request, including the tags for the spectatable demo games.

esperent · 2026-04-21T03:05:01 1776740701

I'm looking at your table now - is there a reason why you don't include cost? If Opus 4.7 is the winner but costs e.g. 5x as much, that's important information.

gertlabs · 2026-04-21T03:54:25 1776743665

We recently added cost (last week), so data is sparse. Check back in a few weeks and it will be represented somewhere on the homepage, probably in the Efficiency Chart at the bottom. We also plan to show model performance deviation over time after we collect more data.

I'm interested to hear about any other data representations you'd like to see, too. The goal is to convey the most important information as densely as possible, without too much clutter.

DeathArrow · 2026-04-22T04:45:21 1776833121

>I'm interested to hear about any other data representations you'd like to see, too

It would be nice if you can show how much the models drift from the instructions over time

gertlabs · 2026-04-22T15:11:35 1776870695

Not sure what you mean. Time series chart of model performance over time to see if proprietary models get degraded? That's in the works, but we will need a couple months more data collection before launch.

DeathArrow · 2026-04-23T05:30:48 1776922248

Yes, probably performance helps.

The idea is that the larger a coding task is and the longer the coding agent is, the higher the chance is for the agent to not follow the rules and guidelines.

tmaly · 2026-04-21T00:18:01 1776730681

How would K2.6 compare to Sonnet 4.6 both price and performance wise?

Mattwmaster58 · 2026-04-21T00:22:18 1776730938

In terms of raw token cost, I've seen a couple providers at (all prices in terms of Mtok) $0.95 input/$0.15 cache input/$5 output vs $3 input/$15 output for sonnet.

Task prices of courses will be more interesting - a dumber model may use more tokens to get to the same goal.

freely0085 · 2026-04-21T04:29:42 1776745782

Can you add Qwen 3.6 max to the leaderboard?

gertlabs · 2026-04-21T05:14:36 1776748476

We will as soon as API access is widely available. Once a model goes live, we typically have one-shot reasoning benchmarks up in ~8 hours and comprehensive agentic/combined benchmarks up after 24-48 hours. We're working on building relationships with each lab to have the results before launch.

cmrdporcupine · 2026-04-20T23:02:50 1776726170

Surprised to see such variance per language

gertlabs · 2026-04-21T00:19:02 1776730742

It's interesting; I can only speculate as to the underlying reason. When given enough time, models outperform in Rust/C++ in longer agentic tasks, and actually perform worst in Python. For tasks that aren't judged on code speed. https://gertlabs.com/?mode=agentic_coding

edude03 · 2026-04-21T16:26:09 1776788769

It makes sense when you consider LLMs don't generalize very well, so they're heavily dependent on how good (how varied as well as how high quality) the training data is

cmrdporcupine · 2026-04-21T17:28:28 1776792508

Well it might explain why pro-Claude vs pro-Codex people keep talking past each other on this forum. I see people all the time assuming that anybody who likes Codex must be some sort of bot because of their own biases, but I work almost exclusively in Rust and find Codex extremely competent (and a much better overall engineer), don't trust Claude/Opus at all... but I see in this bench it scores lower on TypeScript etc. than Opus does.

knollimar · 2026-04-21T02:26:05 1776738365

wait why compare 2.6 to 2 instead of to 2.5?

gertlabs · 2026-04-21T05:19:46 1776748786

Good question. We missed that release entirely. Our automated model checker only went live 2 months ago so they were manually curated prior to that. I'm adding it now. It'll be live in ~12 hours.

gertlabs · 2026-04-21T16:35:22 1776789322

Update: Kimi K2.5 one-shot results are live. It wasn't a noteworthy release compared to K2.6: https://gertlabs.com/?mode=oneshot_coding

DeathArrow · 2026-04-22T04:57:56 1776833876

Can you add C# to supported languages? It's widely used and it be helpful for people and companies to see how different models fare against each other.

gertlabs · 2026-04-22T15:11:59 1776870719

Good idea.

elfbargpt · 2026-04-20T16:23:14 1776702194

I've always been surprised Kimi doesn't get more attention than it does. It's always stood out to me in terms of creativity, quality... has been my favorite model for awhile (but I'm far from an authority)

Aeolun · 2026-04-20T17:23:49 1776705829

It’s good, but it’s not quite Claude level. And their API has constant capacity issues.

Price/quality is absolutely bonkers though. I loaded $40 a few weeks/months ago and I haven’t even gone through half of it.

segmondy · 2026-04-21T03:38:30 1776742710

It has long been Claude level since 2.5

atemerev · 2026-04-20T17:36:05 1776706565

Why use China model API from China if there are many independent providers available via Openrouter?

smashed · 2026-04-20T17:51:52 1776707512

Openrouter will route to china hosted models when there are US hosted providers of the same model. Is there a setting to set your preference or to blacklist providers like alibaba cloud for example?

I use OpenCode and the openrouter provider. From opencode I only select the model like kimi-2.6 and have no way of selecting which cloud hosting will receive my request.

subscribed · 2026-04-20T18:20:24 1776709224

Settings > Guardrails > [your workspace] > Providers + Block provider

uneekname · 2026-04-20T18:22:15 1776709335

Yes, you can blacklist providers in OpenRouter account settings.

NitpickLawyer · 2026-04-20T18:21:00 1776709260

Yes, you can globally ban providers in your openrouter settings.

pheggs · 2026-04-20T17:47:25 1776707245

to support the companies that open source their models

culi · 2026-04-20T16:45:48 1776703548

It's also one of the few models that seem capable of drawing an SVG clock

https://clocks.brianmoore.com/

SwellJoe · 2026-04-20T17:08:12 1776704892

Interesting that the best performers are all Chinese-made models (DeepSeek and Qwen also perform consistently well). I wonder if there's more focus on vision and illustration in their training, or if something else is leading to their clear lead on this one test.

sigmoid10 · 2026-04-20T17:04:21 1776704661

Is it? In your link it definitely failed to draw the clock.

squarefoot · 2026-04-20T17:41:35 1776706895

It redraws it every minute, and some models give quite different results although the prompt is exactly the same.

quesera · 2026-04-20T19:45:55 1776714355

This reads like satire, but I've been feeling that a lot lately.

dryarzeg · 2026-04-20T17:17:56 1776705476

I'm not really sure how this works, but I stayed on the page for a while, and then it reloaded and all clocks changed. I guess there's either a collection of different clocks generated by models, or maybe they're somehow generated in the real time, but the fact is what you see is not necessarily what I see.

culi · 2026-04-20T20:37:38 1776717458

It reruns a prompt every minute to all the models included. Everyone is gonna see something different but I've spent too long on it and there's a consistent pattern of Qwen and Kimi outperforming the others

This site was made months ago and it seems its only been updated with the latest model of a couple of the providers so keep in mind that many of the Chinese models haven't been updated

sigmoid10 · 2026-04-20T17:21:50 1776705710

Seems like it regenerates them to reflect the current time. Funny to see how some models (like Kimi and Deepseek) sometimes get it right and other times fail miserably on the level of ancient models like GPT 3.5.