Article 6EGWM Intel's Open Source Strategy

Intel's Open Source Strategy

by
Stephen Cass
from IEEE Spectrum on (#6EGWM)
image.webp?id=36206535&width=980

Stephen Cass: Hello and welcome to Fixing the Future, a podcast from IEEE spectrum. I'm your host Stephen Cass, a senior editor at Spectrum, and before Before we start, I just want to tell you that you can get the latest coverage from some of Spectrums most important beats, including AI, climate change, and robotics by signing up for one of our free newsletters. Just go to spectrum.ieee.org/newsletters to subscribe. With all that said, today's guest is Arun Gupta, vice president and general manager of Open Ecosystem Initiatives at Intel and chair of the Cloud Native Computing Foundation. Hi, Arun, thanks for joining me.

Arun Gupta: Hi, I'm very happy to be here.

Cass: So, Intel is very famously a hardware company. What does it get out of supporting open-source ecosystems?

Gupta: Well, I mean, Pat always says, Software defined, hardware enabled." So, you can build the finest piece of hardware, but if the software is not going to run on it it's not going to be very helpful, right? And that's honestly the reasons that we contribute to open source all along, and we have been contributing for over two decades. Because our customers they consume our product, which is a silicon using these open-source projects. So, you pick a project OpenJDK, PyTorch, TensorFlow, scikit-learn, Kafka, Cassandra, Kubernetes, Linux kernel, GCC. And our customers who want to consume our silicon they want to make sure that these open-source projects are consumed well on the Intel silicon, they behave well, and they are able to leverage all the features that are in the instruction set of the latest edition of the chip.

So, that's where over the last two decades Intel has been contributing to open source very actively because it truly aligns with our customer obsession. So, I mean, if you think about it, Intel has been the top contributor to Linux kernel for over 15 years. We are among the top 10 contributors to Kubernetes, and I just learned, I think a couple of days ago, our number is up to number seven now. We are among the top contributors to OpenJDK, number three Contributor to PyTorch. So, if you think in terms of the scale that we are operating, there are hundreds of people, thousands of developers at Intel that are contributing to these open-source projects.

Cass: I know Intel probably doesn't have a formal opinion, but you yourself, what do you find the most exciting project?

Gupta: Oh, several. I mean, and I've been in the open-source community for over two decades as well. And I find excitement all over the place really. So, some of the names that I shared earlier, think in terms of OpenJDK, right? OpenJDK is the reference implementation of Java. We are talking about 12 million developers they need to use OpenJDK. And a large number of them continue to use Java on Intel architecture. And as they are continuing to use on Intel architecture, with Sapphire Rapids we have accelerators that have been attached to the silicon as well. Now, we want to make sure customers are able to leverage those accelerators whether you are using crypto or hashing or security, that's where we are making contributions in OpenJDK that can leverage that acceleration in the Intel silicon, and not just upstream. The fact the way we do the upstream contribution it goes to the main branch. And because it goes to the main branch, that means it's available in all the downstream distros.

So, it doesn't matter whether you're using Oracle JDK or Amazon Corretto or Eclipse Adoptium, it's available in the downstream distro. So, that pervasive nature of our upstream optimizations available all over the board I think is a key factor why we are excited about it. And that's sort of the philosophy we take for other projects as well. PyTorch for example, has their default oneDNN network on how you do optimization. And that's again done by the oneAPI team at Intel. And we do this in a very upstream manner because people will take the PyTorch distribution. PyTorch 2.0 was done a few weeks ago, and that's where a lot of our optimizations are available. So, you pick a project. Linux kernel, again, we do this in the upstream main branch so that it doesn't matter whether you're using Debian or Canonical or Ubuntu or what you're using, those optimizations are available for you over there. I mean, overall, if you think about it, Intel has been committed to driving collaboration, standardization, and interoperability in open-source software from the very beginning.

Cass: So, that actually leads me to my next question, which is about that issue of interoperability and standardization and so on. I have a feeling of dread whenever the word is, oh, just compile it from source comes up or just use it from source comes up. Because unless the project has reached a level of maturity that there are nice binaries that have been being packaged up from my specific version of my operating system, using open-source software in that way is just a nightmare. How do I replicate the environment? Have I got this going on? Have I understood that and so on? It's really difficult to use unless I'm really deeply embedded in the community where that software comes from. So, can you talk a little bit about what are some of the solutions to that problem? Because standardization seems to be a very imaginary phantom when I'm doing this because I end up having to almost duplicate the exact reference setup that that particular community has used.

Gupta: Well, you can go down the rabbit hole very fast actually. So, as you said very rightly, I think that's where it's important that the contributions are done in such a manner where they have the biggest impact. So, as a developer, let's say you're building on a Linux machine, you want to be able to say apt-get or Yum install, and that's sort of all that you should have to do. And that's where the impetus lies on Intel and their partners that once this gets into upstream, if there is a CVE, if there is a vulnerability, if there is a problem, if there is a patch that needs to be applied, it should just go straight up in the upstream contribution. And from there upstream it gets delivered in the right patches and then it goes into the right packages essentially.

So that end of the day you can just say Yum update and voila, you have the right configuration in for you. And compile from the source only works for people who are brave at heart, right? Because you don't know what the dependencies are, etc. So, I think within Intel we really think in terms of what contributions are we making upstream, how is it available in downstream distributions, and then how are the customers using it? And then the customer is really giving us feedback, Hey, this is sort of the next set of the investment that you need to do in the open-source project." And that kind of makes a full circle, essentially. So, that's how we look at it. So, really Intel really contribute every layer of the stack and all the way from silicon to the app where we are creating an environment where open-source developers can deploy their solutions to any corner of the globe. And that's sort of the main element here.

Cass: Turning to open source and security, you recently tweeted, Automation is the only path to open-source security." Can you explain what you meant by that?

Yeah, absolutely. This was actually by one of the keynotes that I attended at Open Source Summit North America and Vancouver. And Eric Brewer was giving that talk. So, that was not my quote so it will be attributed to Eric Brewer from Google. And really, I fundamentally believe in that. So, every tweet that I do, I believe in that element. And really, if you think about why automation is the key, it is the only way to improve security. Because humans are meant to err, machines less likely because that's where machines are really good at. They are very good at repetitive, boring task. If you say, here is a tool that is integrated as part of the CI/CD bill, here is a CVE vulnerability scanning part, here is the static code analysis part. So, once you start putting those processes in place, once you start putting those tools in place, nobody is saying that the process is going to be perfect, but at least you have the process in place and then you start catching those bugs early as opposed to leaking it out.

And then once you find out where the process is failing, then you improve the process, then you inject a mold tool over there or you figure out what needs to be done. So, the whole point is make it to the point of it's super boring where everything is automated. As they say, automation in this boring infrastructure is the exciting times. So, that is really the key on how you can improve the security. And then of course, open source as Linus's law says, Given the number of eyeballs, all bugs are shallow." So, more people are looking at the source code. They all bring that unique diverse perspective that really allows you to kind of counter that what's going on here and that, oh, this doesn't serve my use case and maybe I tweak it this way but yet make sure it goes through the regression test. And for the regression test, again, the performance test, all of that automation is the key. So, think in terms of push to prod, right? Every time I'm making a new commit to the GitHub repo, what all is happening after that? Is there a static code analysis? Is there a pull review request? Is there a regression test? Is there a performance test? Is there a scalability test? What all tests are happening automatically because that improves your confidence in pushing into putting it into production.

Cass: You talked recently about developing a software bill of materials as part of the way to attack this problem. Could you tell a bit more about that?

Gupta: Yeah, absolutely. Now, the software bill of materials is sort of where it's coming from the executive order that was issued by the Biden government. This really happened after the Log4j incident that happened a couple of years ago. So essentially, when Log4Shell happened, people were like, Where are Log4js used? We don't even know that." And it took companies a long time to figure out. We understand that this is a vulnerability, but how do we track where it is? And as part of that, that's where the executive order came about to be. And so the idea here is that the executive order says if you want to operate with federal government, which everybody wants to, if you want to sell to federal government, then we need to have a software bill of materials. Now, Intel is primarily a silicon company. It is a silicon company. So, in that sense, we have done the hardware bill of materials for a number of years, and that's always been the case. We're just extending that knowledge and domain to software bill of materials.

So, essentially what you could do is you can take a look at software bill of materials, then you understand how the software is made of. You understand the dependencies, you understand the libraries, you understand the version number, you understand their licenses. So, there are tools by which you can look at an SBOM or software bill of materials and understand. So, tomorrow if Log4Shell happens, then inside you can say, Hey, where is my SBOM database?" And if Log4j is happening, tell me all the softwares across Intel, for example, that are using Log4j this particular version and then hopefully I can nip it right in the bud itself. So, that's sort of the whole premise of SBOM. And of course, Intel works with the federal government all the time. The executive order requires any new orders, any new business with the government starting, I believe, June 15th, to have an SBOM. And I think there is a retrofit window for the next few months. So, we are ready for that as we launch out.

Cass: I want to talk a little bit more about humans and open source as virtually all major open-source projects have accompanying large human communities. What are some of the other human problems you see recurring in those communities and what are some of the best ways you've seen to address or avoid those problems?

Gupta: Yeah, no, absolutely. First of all, never use humans for the job of a machine. This is a quote that was made by Agent Smith in the movie Matrix, and I really believe in that. And that's where automation is the key. The humans are honestly what makes the projects that much more interesting. Particularly if you are in an open-source project, you really need to think about- I won't name the company. One of my previous companies. We submitted a pull request. We were trying to get into a brand-new community. We submitted a pull request for a very fundamental change in a very popular open-source project. The pull request was denied within 30 minutes because the team did not do a good job of understanding the social dynamics, understanding the people, understanding the needs of the project. They just rolled in that nope, we want this [to be?] happen. Everybody just flipped on the table completely. Nope, not going to work.

And then eventually you start building trust because trust doesn't happen day one. Particularly in this open-source world, if you are co-opting where you are all working in sort of the OpenJDK implementation but you have your own product distribution as well. Similarly, if you're all working on Kubernetes, but you have your own managed service or your own distribution around Kubernetes. So, that's where the people problems happen, actually, because humans are squishy, right? As they say, they have feelings and those feelings get hurt. And they have their corporates who are paying their bills, and those corporates have sometimes competing priorities. So, that's where I've seen constantly all along. But I would say I'm part of the Cloud Native Computing Foundation and I definitely would highly give very high points to CNCF in terms of how they have been very diverse, very inclusive, and all sorts of efforts that are happening within CNCF to minimize the people problem. But humans are humans, that happens all the time.

Cass: I want to turn now to green software and sort of open source's place in it. And you've done a little bit of work in this area and commentary on this area. Can you tell people what green software is and why is open source important there?

Gupta: Yeah, absolutely. Well, green software is- think in terms of sustainability of the software, right? And that's what the Green Software Foundation is an open-source foundation under Linux Foundation. So, they have defined what are the Green Software Foundation principles. And when you think in terms of green software, what you're thinking in terms of when I'm writing the software, is it the most optimal software in terms of CPU, in terms of memory consumption, in terms of execution time? So, those are the tenets that are coming to your mind, essentially. When I am running my containers, for example, where I'm running my containers, are they run in a data center that is purely powered by electricity or are they powered by renewable electricity? Can I move my workloads around across the globe? Do I have that flexibility where I'm only running my workloads where the data centers are powered by the natural electricity? So, New Zealand to India to Europe to America back to New Zealand. So, if you can go around the world moving your workloads and if that is what your customer demands are, those are some of the elements that people talk about in terms of Green Software Foundation.

More recently, I think I tweeted about this as well. More recently, there was a report that came out from Green Software Foundation and there they were really talking about what is the state of green software essentially? And some of the highlights if you think about it were there, that the green software really requires a holistic approach. You can't just say, Because I'm using such and such programming language, I'm green. Or because I'm deploying in such and such data center, I'm green." That's an important element. Then there is software legislation that is super important as well because the government's requiring it on how it needs to be done. And if you think about the emissions from software, how much tech-centric we have become over the years, the software emissions are equivalent to air, rail, and shipping combined. I think those are the key elements that we need to think about that how do we make sure that this is an important element? So, how do we cut it down?

And you talked about open source. Open-source solutions are really essential to greening the software essentially. And also there are lots of different tools available. There is an open-source Carbon Aware SDK that helps you build the carbon aware software solutions with the intelligence to use the greenest energy sources. That's the part that I was talking about. Then there is cloud carbon footprint is one example of open-source tooling that is impacting the speed and quality of decarbonization approaches. So, there's a lot of work that is happening. There is LF Energy, a foundation. She wrote in a December article that, one company cannot build the technologies needed to mitigate climate change and traditional black box approaches to proprietary software will only inhibit progress." So, that only emphasizes the importance of open software. So, I would highly recommend people to go to Green Software Foundation website, which is basically greensoftware.foundation, look at their principles essentially, and see what needs to be done.

Cass: So, that leads me to my next question and this is sort of in your role as part of that Cloud Native Computing Foundation where one of the criticisms with sort of cloud computing and this model, I mean, you talk about, okay, it's great, you can shift your computing basically to follow the sun or the wind. But on a personal coding level, the low marginal cost of spinning up another virtual server, does that remove the incentives for efficiency? Because it's like, why do I have to be efficient? I'll just spin up another server. It would lose that efficiency. How do you really get it in the way that I need to be efficient because this is going to mean something to me personally, very directly, not in the abstract global sense?

Gupta: No, absolutely. And I think you are absolutely right. To some extent what we have done is the ease of spinning up a VM without giving enough information about it that, Hey, by the way, when you spin up this VM, the carbon footprint of that VM is going to be such and such." Not necessarily metric ton, but 0.006 metric ton. So, I think that transparency needs to come out. What I would love to see is when I walk into Costco or Safeway, right, I pick up a product and I see here is the label of that product. I know how much proteins, sugars, carbohydrates it has. I would love to see that I want to buy an application that has its green footprint on that application where it says, Hey, by the way, when you are consuming this website or when you're consuming this API, here is the label on it." And I think that level of transparency is going to be fundamental. I would love to walk into Costco and say, by the time this milk got here, it has made the way all the way through such and such farm, and really route it back to that was the farm really done in a green manner? The truck that traveled, what does it cost? So, what is the cumulative footprint? Because once we start raising awareness, and that's where the legislation angle would really help, and that's what is rapidly increasing. So, I think it really requires that holistic approach at policy level, at software level, at data center level, at visibility level. That once you are aware, hopefully you are becoming more and more conscious, essentially.

Cass: Turning back to the technical for the moment. You talked at the start about, hey, one of the reasons we're involved with these ecosystems is that we want to make sure people are using the full feature set, they're using all the tools available in our silicon. Have there been examples though where you've looked at the open-source community's needs and that has led to specific features being put into future revs of the silicon?

Gupta: Well, it's always a two-way cycle, right? Because silicon is typically a longer development cycle. So, in that sense, when we start working on a silicon it could take two to five years essentially. And so right about that time when we are creating that silicon feature is when the discussion needs to happen as well. Contributing a feature to Linux kernel could take about the same time. By the time you conceive the idea, by the time you propose the idea, by the time you write the code, it's reviewed, and by the time it's merged into the main branch and available in the downstream distro. Because our goal really here is by the time silicon is released and is made available in the CSPs and the data center and your client devices, we want to have all that work to be available in the downstream distros. So, that work happens hand in hand in terms of what is the feature that community is telling us that is important and what is the feedback that we're giving back to the community.

Cass: So, what kind of things does Intel have planned ahead for its roadmap in the next year or two with regard to open source?

Gupta: Yeah, no, I mean, my team is the open ecosystem team essentially, and we are constantly working on- my team is responsible for open ecosystem strategy across all of Intel. So, we work with all the BUs, business units, within Intel and helping them define their open ecosystem strategy. So, my team also runs the open.intel.com website. So, I would highly encourage people go and find out what are the latest and the greatest things that we are doing over there. We recently launched OpenFL or Open Federated Learning as a project that was just contributed to LF AI & Data Foundation. So, that's an exciting project where we talk about how Intel and UPenn or Penn Medical actually worked with their partners to create this federated learning platform. So, that's an exciting element. We continue to sponsor a lot of open-source conferences, and whether it's KubeCon or Open Source Summit or any other high profile developer events.

So, telling developers that whether you are operating at a silicon level or at an app level, Intel is relevant all around the stack. So, think about us, tell us we have that- and again, think of us from, we're not really creating a new language here, per se, but what we are really doing is giving you that leg up on your competition, giving you that performance, that optimization that you really need. Because oftentimes when customers run their application in the stack, they would think, Oh, Intel is so far down below the stack, it doesn't matter." No, it does matter. And that's exactly what the point we're trying to tell you. That because the fact that your Java application is running in a serverless environment, because the memory footprint is small, because it's operating a lot more efficiently, that brings down the cost of your serverless function that much lower. So, I think that's where customers, the developers need to think about the relevance of Intel, and those are the areas we're going to keep pushing and telling the story. I really call myself as a chief storytelling officer around the efforts that Intel is doing and we would love to hear what else the developers would like to hear.

Cass: So, well that was fantastic, Arun. I really enjoyed talking with you today. And so on today in Fixing the Future, we were talking with Arun Gupta of Intel. And for IEEE Spectrum, I'm Stephen Cass.

Gupta: Stephen, thank you for having me.

External Content
Source RSS or Atom Feed
Feed Location http://feeds.feedburner.com/IeeeSpectrum
Feed Title IEEE Spectrum
Feed Link https://spectrum.ieee.org/
Reply 0 comments