How we manage IP

We believe the soul of BigCode to be clear and transparent communication striving towards open collaboration. The project, therefore, runs under the following set of open and permissive licenses.

Datasets. We value openness and transparency about the training data of LLMs and intend to release datasets whenever we have the rights to do so. We will also provide data cards for all datasets we release. Please see the Dataset Card for The Stack. We are aware of ongoing discussions around the data governance of LLMs and would like to research better processes for it. See, for example, how we experiment with giving developers the possibility to have their data removed from The Stack. You can track which models have been trained using The Stack on the Hugging Face platform.

Code. All inbound code contributions (e.g. for model training or dataset analysis) must be made under an Apache 2.0 license. All outbound code will be made available under the Apache 2.0 license. The Apache 2.0 license is the most commonly used open source license due to its permissive character and clarity regarding copyright and patents.

Documentation. Documentation will be received and made available by the Project under the Creative Commons Attribution 4.0 International License. For the sake of clarity, “Documentation” means any material related to the project which is not: code, a machine learning model or related features, nor a dataset. For instance, “Documentation” can be -but not limited to- specifications; guidelines; blog posts; academic papers; etc.

Machine Learning Models. Any machine learning model and related features (e.g. checkpoints) resulting from the Project will be licensed under an Open & Responsible AI License. Please take a look at the FAQ for our BigCode OpenRAIL-M. OpenRAILs are licenses designed to permit free and open access, re-use, and downstream distribution of the Model and its derivatives while establishing a set of behavioral-use restrictions for which the model cannot be used, due to ethics-informed concerns and/or the technical limitations of the model as informed by its model card.

Contributions under a different license

We are flexible and understand that each individual contributor or contributing party might have its own interests besides the collective BigCode effort. In case you do not feel comfortable licensing some of your contributions to the project under the Apache 2.0, please get in touch with us. We will see how to work around and make everyone comfortable. Note that for contributions with a non-permissive license, our general policy is to put them in a separate Github repository living outside the BigCode organisation.

If you have any further questions regarding IP, please reach out at contact@bigcode-project.org.