Visit hbs.edu

Key Lessons from Census III on Open Source Software

With an estimated 96% of codebases incorporating Free and Open Source Software (FOSS), it forms the backbone of modern businesses, driving innovation and reducing costs across industries. However, its decentralized and distributed nature makes assessing its health, economic value, and security a significant challenge. The recently released report, Census III of Free and Open Source Software, by Frank Nagle, Assistant Professor at Harvard Business School and faculty affiliate at D^3 in the Laboratory for Innovation Sciences at Harvard, and collaborators at Harvard and LINUX (see full list of authors below), provides an in-depth analysis of the OSS landscape, revealing key trends and risks. 

The Census III report utilizes a similar methodology to Census II (2022), but with a more comprehensive dataset. It provides eight rank-ordered Top 500 lists of FOSS usage, based on over 12 million 2023 data points from four Software Composition Analysis (SCA) partners. The authors note that: “Operating under data constraints, the findings of this report cannot – and do not purport to – be a definitive claim of which FOSS packages are the most critical.” The report’s findings should be viewed, rather, as the authors’ best estimate of which FOSS application library packages are most widely used. [1]

Key Insight: The Rise of Cloud-Specific FOSS Packages

“The use of cloud service-specific packages is increasing, with high-ranking components that did not rank in Census II.” [2]

As businesses increasingly migrate operations to the cloud, the adoption of FOSS packages tailored to cloud services has surged. For instance, packages like boto3, used for AWS services, and google-cloud-go, used for Google Cloud, ranked among the top FOSS packages. These cloud-specific tools empower firms to streamline operations and innovate quickly in competitive markets. Census III shows that businesses increasingly depend on these packages to address scalability, system integration, and service-specific automation challenges.

Key Insight: Persistent Challenges in Software Version Transitions

“There is an ongoing transition from Python 2 to Python 3, demonstrating the challenges of transitioning to new versions of software with incompatibilities.” [3]

Transitioning to updated software versions remains a significant hurdle for many organizations. The report highlights that even after 15 years, in 2022, 7% of Python developers still used Python 2, which is no longer supported for security updates. They noted, furthermore, that usage remains significantly higher in specific fields, including 23% in DevOps, 24% in computer graphics, and 29% in data analysis. This resistance to upgrade poses critical risks, as legacy software may contain unpatched vulnerabilities. The report also notes that organizations using outdated versions face compounding risks of inefficiency, escalating support costs, and regulatory noncompliance.

Key Insight: The Need for Standardized Naming in FOSS

“There are promising efforts to implement a standardized naming schema for software components which would improve supply chain security and future census efforts.” [4]

A lack of standardized naming for FOSS components creates inefficiencies and security gaps. The Census III report notes how inconsistent naming conventions complicate dependency management and hinder software supply chain transparency. Proposed solutions, such as Package URL (PURL) and cryptographic hashes, offer promising paths forward. These approaches simplify project tracking, enhance collaboration across ecosystems, and mitigate risks in managing cross-platform dependencies critical for operational security.

Key Insight: The Role of Individual Contributors in FOSS Development

“Among top non-npm1 projects, 17% had only one developer and 40% had one or two developers accounting for more than 80% of commits authored.” [5]

Despite its collaborative nature, much of FOSS development relies heavily on a small number of contributors. The report reveals that, during their review of 47 of the top 50 non-npm projects in 2023, they found that the majority of projects (64%) had four or fewer developers authoring 80% of commits. This concentration of responsibility introduces risks, particularly if maintainers face burnout or leave the project. The report emphasizes that these dependencies raise concerns about sustainability, scalability, and continuity in critical software ecosystems.

Key Insight: The Risks of Legacy Software in FOSS

“Legacy software persists in the open source space, making their security as important as their replacement packages.” [6]

Legacy FOSS software continues to play a significant role, even when better alternatives exist. For example, packages like minimist and request remain widely used despite being deprecated. This reliance on outdated software can introduce security vulnerabilities and reduce operational efficiency. The report highlights how these dependencies often become entrenched due to familiarity, lack of transition resources, and the complexity of updating integrated systems.

Why This Matters

For business leaders, understanding the findings of Census III is crucial for leveraging FOSS effectively while mitigating risks. The report emphasizes that businesses must invest in FOSS to sustain its role as a foundation of modern innovation. To ensure security and future advancements, the authors suggest sharing data on FOSS usage to improve transparency, coordinating efforts to adopt standardized naming and best practices, and investing in critical projects through funding, talent, and time. By embracing these strategies, organizations can support the FOSS ecosystem while strengthening their own resilience and competitive edge in a rapidly evolving digital economy.

Footnotes

(1) Non-npm refers to software packages, libraries, or components that are not managed or distributed through npm (Node Package Manager). npm is a widely used package manager for JavaScript, primarily associated with Node.js applications. Non-npm packages are hosted on alternative package management systems such as Maven (for Java), PyPI (for Python), NuGet (for .NET), Cargo (for Rust), or others tailored to specific programming languages or ecosystems.

References

[1] Frank Nagle, Kate Powell, Richie Zitomer, and David A. Wheeler, Census III of Free and Open Source Software (Harvard Business School, Laboratory for Innovation Science at Harvard, and Open Source Security Foundation, December 2024): 1-187, 6.

[2] Nagle et al., Census III, 2.

[3] Nagle et al., Census III, 2.

[4] Nagle et al., Census III, 2.

[5] Nagle et al., Census III, 2.

[6] Nagle et al., Census III, 2.

Meet the Authors

Frank Nagle Headshot

Frank Nagle is an Assistant Professor in the Strategy Unit at Harvard Business School and a faculty affiliate of the Digital Data Design Institute and Laboratory for Innovative Science at Harvard. Professor Nagle studies how competitors can collaborate on the creation of core technologies, while still competing on the products and services built on top of them – especially in the context of artificial intelligence. His research falls into the broader categories of the future of work, the economics of IT, and digital transformation and considers how technology is weakening firm boundaries. His work frequently explores the domains of crowdsourcing, free digital goods, cybersecurity, and generating strategic predictions from unstructured big data. 

Kate Powell is the Program Manager at the Laboratory for Innovation Science at Harvard. At LISH, she works closely with staff, faculty, and postdoctoral fellows to manage various projects and administrative processes. Before joining LISH, she worked as a Research Coordinator at Tufts University’s Center for Applied Brain and Cognitive Science where she worked with scientists from the U.S. Army’s Combat Capabilities Development Command to test the effects of stress on active duty soldiers . She graduated from Harvard Graduate School of Education with an Ed.M. in Human Development and Psychology.

Richie Zitomer is a Predoctoral Fellow at Harvard Business School working with the Strategy Unit. Before joining HBS he was a data scientist, most recently at Reddit and Coursera. He received a Master of Data Science from the University of British Columbia and a Bachelor of Arts in Philosophy, Politics & Economics from the University of Pennsylvania.

David A. Wheeler is an expert on open source software (OSS) and on developing secure software. His works on developing secure software include the Open Source Security Foundation (OpenSSF) Secure Software Development (LFD121) course. He is the Director of Open Source Supply Chain Security at the Linux Foundation and teaches a graduate course in developing secure software at George Mason University (GMU). Dr. Wheeler has a PhD in Information Technology, a Master’s in Computer Science, a certificate in Information Security, a certificate in Software Engineering, and a B.S. in Electronics Engineering, all from George Mason University (GMU).


Engage With Us

Join Our Community

Ready to dive deeper with the Digital Data Design Institute at Harvard? Subscribe to our newsletter, contribute to the conversation and begin to invent the future for yourself, your business and society as a whole.