Google recently unveiled a technology that could fundamentally change how artificial intelligence (AI) models use memory. This not only had a ripple effect on global stock markets of memory makers but also provided a hope for a quick resolution to increasing RAM prices. Cloudflare CEO Matthew Prince was among the first to flag the significance of the announcement, calling the tech as “Google’s DeepSeek” moment. The technology in question is called TurboQuant – and if it delivers on Google’s promises, it could reshape the economics of running AI at scale.
“This is Google's DeepSeek. So much more room to optimise AI inference for speed, memory usage, power consumption, and multi-tenant utilisation,” Prince said in a post on X (formerly Twitter).
What is TurboQuant?
At its core, TurboQuant is a compression algorithm designed to solve one of AI's most pressing practical problems: memory. Every time a user has a long conversation with an AI chatbot, the model needs to remember everything said previously to perform its core function of providing natural flowing conversation.
It stores that context in what is known as a key-value (KV) cache, which grows larger with every exchange.
The longer the conversation, the more the cache, and the faster it consumes memory. The impact: AI tools slow down or run out of memory entirely before conversations get very far.
TurboQuant attacks this problem directly. According to Google, the algorithm can shrink the memory needed to run large language models by at least six times and deliver up to eight times faster processing speeds – all with zero loss in accuracy.
“Introducing TurboQuant: Our new compression algorithm that reduces LLM key-value cache memory by at least 6x and delivers up to 8x speedup, all with zero accuracy loss, redefining AI efficiency,” Google Research announced. TurboQuant achieves its efficiency gains through two complementary techniques working in tandem.
Why TurboQuant is Google’s DeepSeek moment
When Chinese AI startup DeepSeek launched its first AI model R1, it wiped billions from US stock market, especially from Nvidia’s market value, in January 2025. This was due to the claims made by the company that it used much lower firepower than OpenAI or Google to train its models, suggesting that companies may not need costly chips made by the likes of Nvidia to train its LLMs and get accurate results.
Price likened Google’s breakthrough with DeepSeek: using less power to get desired results.
The market reacted fast
Just like in the case of DeepSeek, the Google announcement hit global memory chips as their stocks dropped almost immediately. The logic behind the sell-off was straightforward: if AI models suddenly need far less memory to operate, the companies selling that memory face a potentially smaller market than investors had assumed – the way it happened with Nvidia by DeepSeek claims.
But analysts were quick to draw an important distinction. TurboQuant's efficiency gains are specific to inference and the KV cache, meaning the real threat falls on NAND flash memory, and not the high-bandwidth memory (HBM) that sits inside Nvidia’s AI accelerators and powers training infrastructure at companies like Microsoft and Meta.