After EV Cars, Drones - Get Ready For "Made in China" AI Models
Fresh & Hot curated AI happenings in one snack. Never miss a byte π
This snack byte will take approx 4 minutes to consume.
In the ever-evolving world of AI, China's startups are not just playing catch-upβthey're sprinting toward the finish line, occasionally stopping to tie their shoelaces with impressive dexterity.
Despite facing U.S. restrictions on advanced chip acquisitions, these companies are proving that where there's a willβand a few thousand GPUsβthere's a way.
Take DeepSeek, for instance. Backed by High-Flyer, a quantitative hedge fund with a penchant for AI, DeepSeek unveiled a large language model in November that claims to rival OpenAI's o1 reasoning model.
Not to be outdone, Moonshot AI, with heavyweights like Alibaba and Tencent in its corner, boasts a math-specialized model that purportedly gives o1 a run for its money. Alibaba itself chimed in, asserting that its experimental model outperformed OpenAI's in mathematical prowess. While these claims are harder to verify than a cat's age, U.S. experts are nodding in begrudging respect.
Andrew Carr, a former OpenAI fellow turned AI entrepreneur, remarked, "China is catching up faster." He noted that DeepSeek's researchers replicated OpenAI's reasoning model within a few months, leaving many of his colleagues pleasantly surprisedβand perhaps a tad concerned.
One of the battlegrounds for these AI models is the American Invitational Mathematics Examination (AIME), a test designed to challenge the brightest high school mathletes. DeepSeek claims its model outperformed OpenAI's on the AIME.
However, when The Wall Street Journal put 15 problems from this year's AIME to the test, OpenAI's o1 model solved them faster than DeepSeek, Moonshot, and Alibaba's experimental model. In one-word puzzle involving a hypothetical two-player game, OpenAI's program delivered the answer in 10 seconds, while DeepSeek took over two minutes. Speed isn't everything, but in the world of AI, it's a significant bragging right.
Despite U.S. restrictions on advanced AI chips, Chinese developers are finding creative workarounds. Moonshot's founder, Yang Zhilin, emphasized reinforcement learning, which mimics human trial and error, potentially reducing the need for high-end computing power.
Additionally, techniques like "mixture of experts" (MoE) are being employed, where an initial routing mechanism directs problems to specialized expert modelsβmuch like a head chef assigning the spaghetti order to the Italian cook. Tencent's MoE model, released in November, claims performance comparable to Meta Platforms' Llama 3.1 model, despite being trained with about one-tenth of the computing power.
DeepSeek's journey is particularly noteworthy. Originating as the AI research unit of High-Flyer, a hedge fund managing $8 billion and known for leveraging AI in trading, DeepSeek connected around 10,000 of Nvidia's A100 chips in 2021 to form a cluster named Fire-Flyer 2.
In an August paper, DeepSeek claimed Fire-Flyer 2 achieved performance close to a similar Nvidia system but at a lower cost and with reduced energy consumption. Their May paper on an MoE model, incorporating efficient data processing techniques, garnered significant attention in the AI community.
Jack Clark, co-founder of AI startup Anthropic, observed that China's approach to circumventing export controls involves building highly efficient software and hardware training stacks with accessible hardware.
However, challenges remain. The U.S. continues to tighten export controls, and as Nvidia rolls out its latest AI data-center chip, Blackwell, the technology gap could widen. Companies like Elon Musk's xAI are constructing data centers with tens of thousands of Nvidia chips, raising the stakes.
Chinese AI startups, currently valued at a fraction of U.S. counterparts like OpenAI, face skepticism from financiers about their ability to monetize advancements.
Yet, the ingenuity and determination of Chinese developers are undeniable. By focusing on efficient algorithms, specialized models, and innovative training techniques, they're making significant strides in the AI arena. As the saying goes, necessity is the mother of inventionβand in this case, it's birthing some remarkably clever AI solutions.
While the AI race between China and the U.S. continues, Chinese startups are proving that with resourcefulness and a dash of audacity, they can keep pace with, and occasionally outpace, their Western counterparts. And who knows? Maybe one day, we'll all be using AI models stamped with "Made in China," pondering how they managed to leap over trade restrictions with the grace of a cat avoiding a bath.