Cloud

Nokia Event-Driven Automation: A Game-Changer for Infrastructure Management

In today’s fast-paced digital world, the need for reliable and agile infrastructure automation has never been greater. Nokia's Event-Driven Automation (EDA) is designed to streamline and enhance network operations using modern DevOps and NetOps methodologies.

At its core, EDA is built for the future, leveraging Kubernetes to manage not just workloads but entire infrastructures. It introduces a new level of reliability through atomic change sets, ensuring that updates roll out seamlessly and, if necessary, can be instantly reverted to a stable state. This approach significantly reduces risk and boosts confidence in deploying changes at scale.

EDA’s open-source philosophy also sets it apart. While the core technology remains proprietary, Nokia has made critical components—such as deployment intents—fully customizable, allowing enterprises to tailor automation processes to their specific needs. This flexibility ensures businesses can innovate at their own pace without sacrificing reliability.

With its emphasis on automation, reliability and adaptability, Nokia EDA is more than just an infrastructure tool—it is a vision for the future of data center automation. Watch the full video to see how Nokia is redefining the way networks operate in an era of rapid digital transformation.

Want more on EDA? Check out all the resources available at Nokia.ly/EDA


My name's Bruce Wallis and I lead product management for a bunch of Nokia's data center oriented products. So the main two that I'm the lead PLM for is SR Linux, which is our operating system that we run on our switches and Event-Driven Automation, which is our infrastructure automation platform that we launched last year and why I'm here, obviously.

It's hard to hit all the value prompts in a single question, but Nokia Event-Driven Automation at a super high level is an infrastructure automation platform that's designed for the modern NetOps DevOps era that we currently live in. So it's been designed from the ground up for things like streaming telemetry, for intent-based automation, the typical closed loop stuff that others in the industry have been talking about for a while.

Some of the nuance, some of the stuff that I think is our secret sauce, we lean heavily into Kubernetes and using that as a platform for more than just managing workloads. In our case, actually, you can manage your entire infrastructure using Kubernetes if you so wish.

We also focused a lot on reliability, which I'm sure we'll talk about more, but the whole theme that device and human interactions have never been more important. They've never happened more frequently than they are today just based on the churn we're seeing in the IT space. So it's never been more important for those interactions to be reliable. And that's where we're invested. I would say the vast majority of our time is trying to ensure those interactions can be reliable, which of course leads to a higher, you can roll out more changes faster with higher confidence that they're going to work, all of that kind of good stuff.

So we're initially targeting this platform at data center enterprises, that kind of market. But the platform itself is super generic. We don't call it necessarily even a data center automation platform. It truly is a generic infrastructure automation platform. We just happen to be biting off data center as our first use case.

I think I would break it into a few different areas. The first is that whole device to machine interactions that we're experiencing a lot more of. I think when you look at an automation platform that needs to look more holistically, not just at a device level, what does it mean if you roll out a change and it goes to 9 devices but not all 10? So I think there was a lot of time we spent on just this idea of atomic change sets that get rolled out at an infrastructure level, not at a per device level.

And if any of those changes were to fail, we do these, again, atomic rollbacks, as well. So your entire infrastructure is stepping forwards all at the same time and that either works and now that's your new golden state, or it doesn't, and you immediately roll back to the previous golden state. So it's this whole theme of having these golden states and being able to move from one to the other as you roll out additional changes into your infrastructure. I think that's a big theme that we've hammered on around reliability.

There's also a few others. So the platform itself is trying to basically give you these nice little abstract resources, something like an interface, or you can even imagine something more abstract like an entire data center fabric. You can think of that as being modeled using one of these abstractions. And those abstractions are, obviously, opinionated, so they're going to take away a lot of the complexity of deploying a data center fabric for example. And that in turn results in reliability, but just because the number of code paths to get to the result at the end is less.

And so we've seen just based on us having more opinions around what gets rolled out, and potentially giving less ability to customers to have those opinions, actually helps a lot. And I know I think we have some questions later around open source, so I'll cover the antithesis of what I just said, which is you can't be closed off. You need to be nice and open, as well.

So yeah, I think those are the big themes I would think of around reliability. Probably the other is that whole NetOps DevOps thing. If we look at the application world that exists around us, you're getting multiple updates delivered to your laptop for Microsoft products, or Google products, almost every day. And the reason they can do that is they do a lot of CI/CD pipeline stuff where they can roll out a change and have it run through a bunch of automated testing before any customers ever see it. And assuming that all of your tests eventually pass, you have an automated deployment into production.

So we haven't typically looked at infrastructure through that lens before, and I think it is the first time we're starting to do that. We promote very much that whole, we'll give you a pipeline that you can run a bunch of pre-checks in, eventually roll out your change and then run a bunch of post checks after. And if any of those things fail, again, that whole reliable atomic stepping forwards and back happens.

So I think applying some of the concepts we've seen in the surrounding application world, which has had to adopt, I would say, a bunch of fairly emerging technologies just out of bare necessity, they have to move fast. So we're starting to see that roll into infrastructure. And as much as those things help them move fast, they help them move fast reliably. So we're starting to bring that reliability down to the infrastructure layer.

So I just finished saying that we close everything off and we give you no choice. You have to do what we say. So the way we do that actually is using, I'll call them intents loosely, but you can imagine the set of code that would execute and how you deploy a fabric. Let's call that an intent. We wanted to develop those intents themselves on top of a platform and the intents themselves could be open source.

We feel that it's really, really important when you're building a product like EDA where extensibility and customers having the ability to change what the system does very much down to its core, you have to do that in yourself in the open source community. So what we have with EDA is we have what we call the EDA core, which this is actually closed source, so this is our secret sauce.

And then on top of that, we've built a bunch of these intents, these abstract resources along with their corresponding logic and all of that we distribute in open source. So as much as you can look at what we did, this is how we deploy a fabric. These are our opinions. You can now take what we did, the literal source code, and change it, tweak it, make it work however you like, and then you can roll that into the system as your version of our fabric. So that build-a-platform, give tools to yourself to help you build these intents is what we did. And then make sure that that same functionality that we leveraged can be leveraged by our customers, as well.

So I think that gives us this best of breed system where it's hyper, hyper extensible, but it can also be very, very opinionated. It can be very turnkey. If you're an enterprise that just wants to consume the set of intents that we bake in with the platform, you can do that. If you're the type of enterprise that actually really cares about things, like specific BFD timers, or which routing protocols are used in the underlay, you actually have that capability too. You can build that on top of the same framework that we're leveraging ourselves.

I think we've seen just a ton of traction in some of our other products just by engaging with that open source community, giving them tools to help them just play around. We're probably the only vendor I think that still 

The editorial staff had no role in this post's creation.