Contributing to Open Source AI Projects: The Why, Why Not, & How?
I firmly believe that open-source projects are the lifeblood of innovation, pushing the status quo for the better. There are several reasons why contributing to open source projects is appealing, from new college graduates aiming to build a visible professional profile to experienced engineers seeking to gain experience working with peers and even senior engineers experiencing the fear of missing out and looking for the right project to contribute to.
I wanted to distill the difference between just getting your hands dirty vs. truly contributing to an open-source project that will shape the community.
Why Contribute?
Understanding Your Motivation: Before diving into a project, it's crucial to be clear about why you want to contribute to open source. Is it to gain coding experience, to break into a new area and build your profile, or is it genuinely about making a significant impact on a project? Understanding your motivation is key before taking the plunge.
Finding a Good Project: A good open-source project often distinguishes itself through a clear vision that resonates with a broad audience, strong leadership driving the project with purpose, active community engagement, and high-quality, well-documented code. If you find a project that is active and inviting to newcomers, take the time to read about its purpose and the people involved and see if it aligns with your goals and purpose.
Determining Your Value Add: If your experience lies in working on the application layer, then attempting to contribute to a project focused on building open-source models might not be the best start. While it's beneficial to learn how models are built, a successful contribution often comes from leveraging your strengths. For areas you wish to learn, start by observing project execution, contributing to non-core tasks, and gradually moving to core areas as you build your understanding.
It's Not Just About Coding: Open source contribution isn't solely about writing code. Like any software project, contributions can be made through design, documentation, testing, release management, and running community events, among others. These aspects are all vital to a project's success.
Time Allocation: With limited time, especially for those already working full-time jobs, keeping up with the fast pace at which open source evolves can be challenging. Setting realistic expectations and dedicating specific times to contributions is important. Otherwise, sporadic contributions may hinder your learning process.
Avoiding Conflicts: Ensure you know your company or school policies and potential conflicts of interest regarding open source contributions. While many companies encourage their employees to contribute to open source, it's important to understand any relevant clauses before you begin.
Why Not?
Given the considerations mentioned above, if you're just interested in diving in and getting your hands dirty, or if you want to build a prototype to explore model capabilities, then you have numerous options at your disposal. Within just a few days, you can assemble the necessary building blocks to create an AI application that harnesses the power of foundational models. This can be achieved without the need to contribute to open source directly. Here are some specific examples:
Frameworks: LangChain, LlamaIndex and CrewAI allow you to build applications and lets contribute to their framework.
Pre-built Models: Checkout HuggingFace’s model repository and Replicate open source models to get started with API integration
Example Notebooks: See practical examples of how to build features: Gemini, LLAMA, and individual contributors.
All I want is Open Source.
If you're convinced that contributing to open source is the right path for you, here are some AI projects and ideas where your contributions can make a significant impact:
Foundational Models: Consider Mistral AI and LLAMA2 for reference implementations of large language models and Stable Diffusion for image and video generation.
If you're new to model development, explore their client libraries and open libraries such as transformers, diffusers, and accelerate to see what interests you. Your contributions to these libraries will support applications integrating these models.
Build your own guardrails and safety models to integrate with these base models.
Datasets: Seek out projects focused on creating cleaner datasets to address copyright issues, enhance data diversity, and develop tools for effective dataset evolution. If you're inclined, you can create your own dataset from scratch and share it with the community.
Beyond Foundational Models: The model ecosystem, including libraries, frameworks, and tools like TensorFlow, PyTorch, Keras, has thrived even before the GenAI craze began. Contributing to these core libraries and frameworks is highly valuable but requires subject matter expertise.
Security, Ethics, and Responsibility: With growing concerns over the use, ethics, and responsible deployment of AI, consortiums are seeking open-source contributors to help build a safer ecosystem. One notable project is the Content Authenticity Initiative, which focuses on developing open-source tools for this purpose.
If you are aware of other open-source projects or want more details to get started, comment below or ping me at natarajan.sriram@gmail.com.

