Chinese AI startup DeepSeek has introduced a new large language model that reportedly surpasses counterparts from Meta and OpenAI in testing.
🚀 Introducing DeepSeek-V3!
— DeepSeek (@deepseek_ai) December 26, 2024
Biggest leap forward yet:
⚡ 60 tokens/second (3x faster than V2!)
💪 Enhanced capabilities
🛠 API compatibility intact
🌍 Fully open-source models & papers
🐋 1/n pic.twitter.com/p1dV9gJ2Sd
The model, DeepSeek V3, boasts 671 billion parameters, compared to 405 billion in Llama 3.1. This indicates enhanced adaptability to complex applications and higher accuracy in responses.
The Hangzhou-based company trained the model in just two months with a budget of $5.58 million, using only 2,048 GPUs. This is significantly fewer resources than typically required by major tech firms. DeepSeek promises the best price-to-performance ratio in the market.
🎉 What’s new in V3?
— DeepSeek (@deepseek_ai) December 26, 2024
🧠 671B MoE parameters
🚀 37B activated parameters
📚 Trained on 14.8T high-quality tokens
🔗 Dive deeper here:
Model 👉 https://t.co/9iwEF6aLuk
Paper 👉 https://t.co/ruzwMFYAAH
🐋 2/n
Future plans include introducing multimodality and “other advanced features.”
OpenAI team member Andrej Karpathy praised DeepSeek’s development, calling it impressive given the limited resources.
DeepSeek (Chinese AI co) making it look easy today with an open weights release of a frontier-grade LLM trained on a joke of a budget (2048 GPUs for 2 months, $6M).
— Andrej Karpathy (@karpathy) December 26, 2024
For reference, this level of capability is supposed to require clusters of closer to 16K GPUs, the ones being… https://t.co/EW7q2pQ94B
“This doesn’t mean large GPU clusters are unnecessary for cutting-edge LLMs, but it shows the importance of maximizing available resources. This project demonstrates there’s still much to optimize in both data and algorithms,” Karpathy added.
Previously, DeepSeek released a “competitor to OpenAI’s o1” — the advanced, “thinking” model DeepSeek-R1-Lite-Preview.
In July, Chinese company Kuaishou launched its video-generation AI model Kling, making it publicly available.