It would be an understatement to say that Generative AI (GenAI) is having its day in the sun. Most of today's GenAI powered by Large Language Models (LLMs) is run in the centralized cloud, built with power-hungry processors. However, it will soon have to be distributed across different parts of the network and value chain, including devices such as smartphones, laptops and edge-cloud. The main drivers of this shift will be privacy, security, hyper-personalization, accuracy, and better power and cost efficiency.
AI model "training," which occurs less often and requires extreme processing, will remain in the cloud. However, the other part, "inference," where the trained model makes predictions based on the live data, will be distributed. Some model "fine-tuning" will also happen at the edge.
Challenges of today's cloud-based GenAI
No question that AI will touch every part of human and even machine life. GenAI, which is a subset application, will also be very pervasive. That means the privacy and security of the data GenAI processes will be critically important, and unfortunately, there is no easy or guaranteed way to ensure that in the cloud.
Equally important is GenAI's accuracy. For example, ChatGPT's answers are often riddled with factual and demonstrable errors (Google "ChatGPT hallucinations" for details). There are many reasons for this behavior. One of them is that GenAI is derived intelligence. For example, it knows 2+2=4 because more people than not have said so. The GenAI models are trained on enormous generic datasets. So, when that training is applied to specific use cases, there is a high chance that some results will be wrong.
Why GenAI needs to be distributed
There are many reasons for distributing GenAI, including privacy, security, personalization, accuracy, power efficiency, cost, etc. Let's look at each of them from both consumer and enterprise perspectives.
Privacy: As GenAI plays a more meaningful role in our lives, we will share even more confidential information with it. That might include personal, financial, health data, emotions and many details even you or your family and closest friends may not know. You do not want all that information to be sent and stored perpetually on a server you have no control over. But that's precisely what happens when the GenAI is run entirely in the cloud.
One might ask, we already store so much personal data in the cloud now, why is GenAI any different? That's true, but most of that data is segregated, and in many cases, access to it is regulated by law. For example, health records are protected by HIPPA regulations. But giving all the data to GenAI running in the cloud and letting it aggregate is a disaster waiting to happen. So, it is apparent that most privacy-sensitive GenAI use cases should run on devices.
Security: GenAI will have an even more meaningful impact on the enterprise market. Data security is a critical consideration when utilizing GenAI for enterprises. Even today, the concern for data security is making many companies opt for on-prem processing and storage. In such cases, GenAI has to run on the edge, specifically on devices and the enterprise edge cloud, so that data and intelligence stay within the secure walls of the enterprise.
Again, one might ask, since enterprises already use the cloud for their IT needs, why would GenAI be any different? Like the consumer case, the level of understanding of GenAI will be so deep that even a small leak anywhere will be detrimental to companies' existence. In times when industrial espionage and ransomware attacks are prevalent, sending all the data and intelligence to a remote server for GenAI will be extremely risky. An eye-opening early example was the recent case of Samsung engineers leaking trade secrets when using ChatGPT for processing company confidential data.
Personalization: GenAI has the potential to automate and simplify many things in life for you. To achieve that, it has to learn your preferences and apply appropriate context to personalize the whole experience. Instead of hauling, processing, storing all that data and optimizing a large power-hungry generic model in the cloud, a local model running on the device would be super-efficient. That will also keep all those preferences private and secure. Additionally, the local model can utilize sensors and other information in the device to better understand the context and hyper-personalize the experience.
Accuracy and domain specificity: As mentioned, using generic models trained with generic data for specific tasks will result in errors. For example, a model trained on financial industry data can hardly be effective for medical or healthcare use cases. GenAI models must be trained for specific domains and further fine-tuned locally for enterprise applications to achieve the highest accuracy and effectiveness. These domain-specific models can also be much smaller with fewer parameters, making them ideal for running at the edge. So, it is evident that running models on devices or edge cloud is a basic need.
Since GenAI is derived intelligence, the models are vulnerable to hackers and adversaries trying to derail or bias their behavior. A model within the protected environments of enterprise is less susceptible to such acts. Although hacking large models with billions of parameters is extremely hard, with the high stakes involved, the chances are non-zero.
Cost and power efficiency: It is estimated that a simple exchange with GenAI costs 10x more than a keyword search. With the enormous interest in GenAI and the forecasted exponential growth, running all that workload on the cloud seems expensive and inefficient. It's even more so when we know that many use cases will need local processing for the reasons discussed earlier. Additionally, AI processing in devices is much more power efficient.
Then the question becomes, "Is it possible to run these large GenAI models on edge devices like smartphones, laptops, and desktops?" The short answer is YES. There are already examples like Google Gecko and Stable Diffusion optimized for smartphones.
Prakash Sangam is the founder and principal at Tantra Analyst, a leading boutique research and advisory firm. He is a recognized expert in 5G, Wi-Fi, AI, Cloud, and IoT.
Industry Voices are opinion columns written by outside contributors—often industry experts or analysts—who are invited to the conversation by Silverlinings staff. They do not represent the opinions of Silverlinings.