GitHub Copilot: AI-based Code Generation and the Future of Software Engineering
Can AI make manual coding obsolete?
As recently as several years ago, the idea that programmers would be able to auto-complete code using natural language prompts seemed like a far-fetched dream, at least in the near term. But this dream became a reality in 2021 when Microsoft (which acquired GitHub in 2018) and OpenAI announced GitHub Copilot: an AI-based system that can make code recommendations in many programming language based on language prompts.
GitHub Copilot: Real-time Code Suggestions using OpenAI’s Codex
GitHub Copilot works by leveraging OpenAI’s Codex AI model. Codex is a derivative of GPT-3, and its training data includes natural language and billions of lines of code available in GitHub repositories and other publicly available sources. With these inputs, Codex became able to translate inputs written in natural language into code programming code in over a dozen languages.
This powerful ability to auto-suggest a programmer’s desired code was incorporated into a user-friendly interface in the form of GitHub Copilot. GitHub began to offer Copilot as an add-in for various Integrated Developer Environments (IDEs), or coding environment where programmers inputted and compiled code. By using Copilot in these IDEs, programmers were able to have Copilot suggest code for completing full functions based on language prompts.
Value Creation and Value Capture: How Copilot Enhances GitHub’s “Freemium” and Enterprise SaaS Business Models
GitHub is a company that provides a Git repository hosting service, or a service that helps programmers manage and store versions of their code by utilizing the Git open-source version control system. GitHub has a classic freemium/enterprise SaaS model, in which the freemium version allows individual users to store and manage code repositories, while the enterprise plans offer additional features including more robust collaboration and security.
For GitHub, Copilot is a breakthrough feature that both facilitates tremendous value creation and value capture. From a value creation standpoint, the ability to auto-complete code has been transformative for developers. As Mike Krieger, Co-Founder of Instagram, notes in regards to the value provided by Copilot, “This is the single most mind-blowing application of machine learning I’ve ever seen.”
Copilot also provides GitHub with tremendous leverage for value capture. First and foremost, GitHub is not offering Copilot for free and is rather charging for it, with users paying $10/month or $100/year. But GitHub is also positioning Copilot as a lead generation mechanism. For example, while Copilot is available as an add-on integration in various IDEs, Copilot requires the rare programmer who might not have a GitHub account to create one in order to be able to install the add-on. Copilot furthermore has the potential to convert more users to higher-priced enterprise plans. GitHub recently announced that it is rolling out Copilot for Business, an initiative that it believes will significantly boost enterprise adoption.
A Powerful Flywheel: More Code, Better Suggestions
One need not extrapolate too much to imagine what a powerful flywheel that Copilot affords GitHub in acquiring and retaining customers.
To start, Copilot already seems like a powerful lead generation and value creation feature in its current form. But as more developers start using Copilot, GitHub should be able to utilize this data to further refine its AI models to make Copilot’s code-completion suggestions more accurate. In this manner, GitHub’s Copilot can facilitate a flywheel that could create a formidable moat where: Copilot attracts users to its platform, users create more code using Copilot, GitHub’s developers can use user usage data to further refine and train the Copilot AI model, which in turn attracts even more developer adoption and usage, and so on.
Challenges: Potentially Buggy Software, IP Lawsuits, and Emerging Competition
Some of the immediate challenges Copilot faces is the security of its software. A study in NYU found that Copilot generates vulnerable code 40% of the time. The source of these vulnerability seem to stem from the fact that Copilot’s AI models are trained on public source code that is often filled with bugs and references to outdated APIs.
There is also the recent legal challenge that has arisen regarding whether GitHub’s Copilot violates developers’ intellectual property rights. Matthew Butterick and several other litigators have filed a class action lawsuit in a US federal court against Microsoft, GitHub, and OpenAI over the legality of the GitHub Copilot, arguing that by training their AI systems on public GitHub repositories, Copilot violates the legal rights of creators who posted code or other work under certain open-source licences on GitHub.
Last, even as GitHub confronts these challenges on multiple fronts, GitHub also faces emerging competition from many different angles. Many companies have started to offer competitive AI code solutions similarly built on top of OpenAI’s models. Tabnine, Captain Stack, GPT-Code-Clippy, and many others now offer competitive AI-coding solutions at either competitive or even free prices. In addition, IDEs providers such as Replit have also begun to offer their own ML-powered “pair programming” solutions. Replit recently announced Ghostwriter that similar completes code in real-time and can transform code based on natural language inputs.
GitHub has several advantages in fending off competitors. Perhaps GitHub’s most powerful advantage is its parent company Microsoft’s partnership with OpenAI, through which Microsoft became OpenAI’s preferred partner for commercializing new AI technologies. This partnership grants Microsoft and GitHub early and supposedly exclusive access to OpenAI’s new models (e.g. Microsoft initially had an exclusive license on GPT-3 in 2020). Thus, as OpenAI’s models become more powerful, GitHub will be in a position to be able to first capitalize on these advancements.
Who Will Own the Future of Programming?
Whether GitHub will be able to stay ahead of its competition, or whether it will be able to address both the vulnerability and legal challenges facing Copilot, remains unclear. However, what is clear is that the future of software engineering is here, and ML models will be generating many lines of code from here on. The question now becomes: when will AI take over coding responsibilities altogether?
Thanks for the post, Louis. This is a very impactful application of AI/ML and one that I did not expect – I always thought AI would come for the real economy first, but this is a clear disruption of the digital economy. My biggest concern regarding the growth and application of Copilot in the medium to long term is one that you raised at the end of you post: data privacy and IP. It reminds me of how Professor Greenstein wanted to use Darth Vader’s AI-generated image for a case but had to remove it due to copyright constraints. Data privacy and IP issues will limit the availability of training data tremendously, and tech companies like Microsoft/GitHub will need to get creative to fill in the gap. Perhaps synthetic data is the solution?
Really enjoyed reading your post! I used to refer to GitHub when I had to code for engineering classes and I find it fascinating that there is a potential to auto-complete code using natural language prompts. The flywheel that Copilot would generate for GitHub would be fantastic and it would also offer users what many of them are looking for – a quick, easy solution to their coding challenges. I am interested to see where these capabilities land in the next couple years!
Thanks for the exciting post! I really enjoyed reading it! I see enormous potential here, especially for the further democratization of AI! Do you see parallels here to the challenges and concepts of Hugging Faces? And to what extent is being part of a tech giant an advantage here, or perhaps a disadvantage?