Information Technology
Tabnine Introduces Ability to Flag Unlicensed Code in AI-generated Software
Tabnine , the originators of the AI code assistant category, today introduced Code Provenance and Attribution, a feature that enables enterprises to benefit from the use of large-scale, popular LLMs for software development tasks while minimizing the likelihood of restrictively licensed code being injected into their codebase.
Large language models from Anthropic, OpenAI, and others have been trained on vast catalogs of content and code captured from publicly visible sources, many of which are not freely licensed. When combined with the likelihood of LLMs to generate content which matches what they have seen in the past, use of the vendors' models may introduce IP or copyright liabilities. With Provenance and Attribution, Tabnine checks code generated using AI chat or AI agents against code that is publicly visible on GitHub, flags any matches, and references the source repository as well as its license type. This information makes it easier for engineering teams to review the code being generated with the assistance of AI and decide if the license of that code meets their specific standards and requirements.
With Tabnine's new Provenance and Attribution capability, Tabnine will more easily support development teams—and their legal and compliance teams—who want to leverage a wide variety of powerful models.
“Models trained on larger pools of data outside of permissively licensed open source code can offer superior performance, but enterprises who use them run the risk of running afoul of IP and copyright violations,” said Peter Guagenti, President at Tabnine. “Our Code Provenance and Attribution capability addresses this tradeoff, increasing productivity without sacrificing compliance. Experienced engineering teams expect to know the source and license of generative AI output and this feature ensures they do.”
Given that the copyright law for use of AI generated content is still unsettled, Tabnine's proactive stance aims to drastically reduce the risk of IP infringement when enterprise use models like Anthropic's Claude, OpenAI's GPT-4o, and Cohere's Command R+ for software development.
Tabnine's license compliant model, Tabnine Protected 2, which is trained exclusively on code that is permissively licensed, remains a critical offering. Many companies believe that the very use of an LLM trained on unlicensed software may introduce risk, so Tabnine will continue to support and develop this unique model. The new Provenance and Attribution capability adds support for legal and compliance teams who are comfortable using a wider variety of models as long as they specifically do not inject unlicensed code.
The Code Provenance and Attribution capability supports the full breadth of software development activities inside Tabnine, including code generation, code fixing, generating test cases, implementing Jira issues, and more. Since Tabnine reads code like a human, it not only flags output that exactly matches open source code on GitHub but also if there are functional or implementation matches.
Tabnine soon expects to add capability to allow users to identify specific repos, such as those maintained by competitors, and then have Tabnine check generated code against them as well. Additionally, Tabnine plans to add censorship capability, allowing Tabnine administrators to remove matching code before it is displayed to the developer.
Code Provenance and Attribution is in Private Preview, open to any Tabnine enterprise customer, and works on all available models, including Anthropic, OpenAI, Cohere, Llama, Mistral, and Tabnine. Learn more about Code Provenance and Attribution here .
About Tabnine
Tabnine helps development teams of every size use AI to accelerate and improve the software development life cycle. As the original AI coding assistant, Tabnine has been used by millions of developers around the world to boost code quality and developer happiness using generative AI. Unlike other coding assistants, Tabnine is the AI that
you control; it is extensively personalized to your engineering team, private and secure (easily running in your controlled environments), never stores or trains on your company's code or user data, and offers models trained exclusively on open-source code with permissive licenses to eliminate IP risks. Learn more at
tabnine.com or follow us on
LinkedIn .
Contact
press@tabnine.com
2321 Rosecrans Avenue. Suite 2200
90245 El Segundo Stati Uniti