advertisement
Seeking The Deep In GenAI

There is a whale and not an elephant in the room. The Year of the Snake speaks to renewal, and we start off by renewing the phrase ‘an elephant in the room’ to ‘a whale in the room.’ Chinese hedge fund High-Flyer recently released a series of open-source models of various flavours called DeepSeek. Just as OpenAI once had everyone downloading their app as they raced to No. 1 on app stores. The same has happened with DeepSeek and their R1 model. This was no drop in the ocean or a stirring of the pot. It was bolder. More of a displacement of over a trillion dollars worth of stock as investors panicked over the model’s capabilities and the purported cost of training.
The trillion dollars will be mentioned here for the last time. Why? The stocks can be “recovered” after the shock, deeper understanding of the methodologies used, and planned releases from the wounded companies.
Our focus is here below this sentence where we separate the hype of clickbait experts and the truth.
advertisement
R1 and the other flavours of DeepSeek models pave the way for more accessible and efficient AI models. With full capabilities to measure up against OpenAI, Claude and Gemini for free and for API at fraction of the cost, there is no way this app was not going to go top of the charts. But how was it made?
Fortunately, as it was released as an open-source model, researchers and machine learning engineers have a chance to peek under the hood and really see what makes it tick. Five key things made this possible.
- Using a mixture of experts, to reduce computational overheads so that inferencing does not occur across the whole network, but a specific area identified to contribute significantly to answering the prompt faster and better. This allows for a smaller network to be activated for a given problem, resulting in computational and energy savings.
- Distillation, using a large, well-trained “teacher” model to train a smaller “student” model to perform the same as the teacher is a common practice and some AI companies have it in their terms discouraging or warning against using their models for distillation, but it’s hard to prove without breaking privacy rights of users to determine distillation.
- DeepSeek’s R1 model uses direct reinforcement learning, where the model learns to solve problems by receiving rewards based on the correctness of its answers, without being explicitly shown step-by-step solutions, this contrasts with supervised learning methods, which rely on large, labelled datasets of both questions and answers.
- DeepSeek has implemented mathematical optimisations to reduce the number of calculations necessary to progress through a network, making it more efficient than other models, this means it can run on slower harder, and cost less but still give competitive performance.
- High-Flyer is a hedge fund. Quants are professionals who use mathematical and statistical models to analyse financial markets and securities, and they had some really good ones, and this was not their first attempt at making a model. Eventually, these smart mathematicians found a way to take all the previous for factors and turn them it a huge breakthrough for AI accessibility and development.
What does this mean for Africa though and the Global South that lacks the capital, power or infrastructure to build some of these large models that came before DeepSeek? It’s all good news.
advertisement
The methods used by DeepSeek and others point to a future where AI development is not limited to large corporations, with an assumed $5.5 million to develop the model. However, this is highly unlikely. However, the cost is still significantly less than that of previous models. There is an opportunity to build on, and improve, the model, thanks to its next advantage – it is open source. This might lead to more push towards AI regulation from competition even though it works for the greater good. The accessibility of open-source models helps in so many ways.
As a continent, our barrier to entry, and that of the rest of the Global South, has been reduced significantly. I hope to see more AI start-ups leveraging these models to build more localised AI solutions. Which means our focus should not be breaking benchmarks. Our focus needs to be on using AI to solve our problems and leverage our strengths.
But this does not mean the end of the large models that previously dominated and required exclusive power to train and infer. They will still have a place. They are the teacher models and maybe these models might be what is required to get us to AGI. The bullseye they are looking at is artificial general intelligence (AGI) and SI for large tech companies. The work of these companies will continue, and we will continue to react and appreciate all the advancements that come from either side.
advertisement
There is a lot more to discuss over these models and to get ahead of all the misinformation and disinformation, comparing the training, inferencing, data used, privacy and data protection, geopolitical implications, effects on research and use, censorship and so much more. Not enough to fit in one article, but consider this Part 1 of a series. And we already have a new model – Qwen from Alibaba – that is allegedly better than even R1. It, too, is open-source, and may just be the first model I consider truly multimodal.
What would you like me to explore for Part 2?