0% found this document useful (0 votes)

48 views4 pages

AI Code Generators Article - Part 1 0423

The document discusses potential legal issues with training AI models on open source code and using the output of AI code generators. It addresses whether these activities could constitute copyright infringement or violate open source licensing terms. It also presents some practical solutions to mitigate these risks.

Uploaded by

Dariusz Ciupiński

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

48 views4 pages

AI Code Generators Article - Part 1 0423

Uploaded by

Dariusz Ciupiński

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

Solving Open Source Problems With AI Code Generators -

Legal issues and Solutions

PART 1 - LEGAL ISSUES
By: James Gatto

AI-based code generators are a powerful application of generative AI. These tools leverage AI to assist code developers
by using AI models to auto-complete or suggest code based on developer inputs or tests. These tools raise at least
three types of potential legal issues:

• Does training AI models using open source code constitute infringement or, even if the use is licensed, does
doing so require compliance with conditions or restrictions of the open source licenses?

• Does using the output of an AI code generator subject the developer to infringement claims?

• Does use of AI-generated code by developers creating a new software application require the application to be
licensed under an open source license and its source code to be made available?

This article will address these legal issues and discuss some practical solutions to abate these problems. Part 1 of the
article covers the legal issues. Part 2 will cover solutions.

AI Code Generators
These tools can greatly simplify and expedite the code development process. The AI models used are typically trained
on billions of lines of code, mostly publicly available open source code. Based on a developer request and existing
code, the tool can generate suggested code ranging from snippets of code to fully coded functions. This is done in
real time in a matter of seconds. These tools are easy to use and work with many programming languages. A simple
example is shown below.

PAGE 1 www.lawoftheledger.com
Training AI Code Generator Models
The training data for AI code generator models is typically based on huge repositories of open source code. Many
people think that because the code used is open source it can be freely used with no legal problem. After all, the
point of open source is to freely permit its use. And true open source licenses do not discriminate against the use to
which open source software is put.1

Open Source License Basics

Open source software is typically free to use but that freedom is based on a license that accompanies the software.
Most open source licenses permit the user to copy, modify and redistribute the open source code. However, these
freedoms come with conditions. These conditions vary by license and can range from simple compliance obligations
to more onerous, substantive requirements.

Examples of sample compliance obligations include maintaining copyright notices, providing attribution and including
the license terms with any redistribution. The more substantive provisions can include the requirement that any
software that includes or is derived from the open source software must be licensed under the terms of the open
source license and the source code for that software must be made freely available. These conditions are often
referred to as “tainting” of the software. The licenses with these permissions are often called “restrictive” open source
licenses.

For commercial developers, who desire to develop proprietary software that can be licensed for a fee under a
proprietary license, tainting is a huge problem. The value of software is severely diminished if the developer must
license it under an open source license and make the source code available. The reason is that the open source license
gives recipients the right to copy, modify and redistribute that software at no charge.

Whether simple compliance or more substantive obligations, failure to comply with those terms can result in legal
problems. Failure to comply can be deemed a breach of contract. Or it can result in termination of the license and
loss of right to use the open source software. Continued use after termination can give rise to claims for copyright
infringement.

Open Source Legal Issues With AI Code generators

Does training AI models using open source code constitute infringement or, even if the use is licensed, does doing so require
compliance with conditions or restrictions of the open source licenses?

Training AI models using open source code alone does not likely constitute infringement.2 As explained above, typically
open source licenses do not impose restrictions on the use of the open source code. However, legal problems can
arise if the open source license compliance obligations are not satisfied.

A recent lawsuit against CoPilot, an AI code generator alleges that in training the model using open source code, the
tool stripped copyright notices and license terms from the the code in violation of the licenses. It alleges that the
output of CoPilot copies the code (or portions of it) yet does not include the copyright information or attribution
notices or satisfy other compliance obligations. The legal claims include breach of contract for violation of license
terms, violation of the DMCA Section 1202 for removing copyright management information (CMI) and various other

1 For example, one of the fundamental tenets of the criteria for open source is the” license must not restrict anyone from making use of
the program in a specific field of endeavor.” The Open Source Definition, https://opensource.org/osd/

2 Depending on how the open source code is obtained, other issues may arise. For example, some open source code repository platforms
have terms of use that cover use of their platform. Violation of those terms could present certain problems, but those issues are beyond
the scope of this paper.

PAGE 2 www.lawoftheledger.com
claims. Section 1202 prohibits intentionally removing or altering any CMI or distributing works knowing that CMI has
been removed or altered.

Training AI models and stripping out CMI might be a violation of the DMCA Section 1202. And it may constitute
breach of contract for failure to comply with the relevant open source license terms. However, each of these issues
will be fact specific. One of the facts depends on the specific license terms. For example, some open source licenses
require the compliance obligations be met if the open source code is redistributed.3 Arguably, if Company A downloads
open source code and uses it to train its own models, on its own servers, at that point it is not yet a redistribution by
the company. If Company A’s AI code generator outputs that code in response to a user request, that likely becomes
a redistribution.

Another factual issue relates to how Company A trains its model. If the model includes the open source code, likely
it may need to maintain the CMI in that code. However, if the model is generated by learning information about the
code and later creates new code based on this information, the issues may be different. In this case, the model itself
may not include a copy of the code. Some of the AI code generators claim they do not look up copies of code to
generate their output.4

The bottom line is that it is necessary to consider the license terms and the method of training and using the model
to assess whether any legal violation has occurred by training an AI model with open source code.

Does using the output of an AI code generator subject the developer to infringement claims?

Because open source licenses permit copying, modifying and redistributing the open source code, outputting the
code from the AI tool alone may not be an infringement. However, if the code is output and license compliance
obligations are not satisfied, that may be breach of contract. Under some open source licenses, such breach may
result in termination of the license. Continued use after termination may constitute infringement.

Does use of AI-generated code by developers creating a new software application require the application to be licensed
under an open source license and its source code to be made available?

If the code output from an AI code generator is covered by a restrictive open source license, use of that code in
another program taints that program. As explained above, this requires the program as a whole to be licensed under
the same terms as the restrictive open source license and requires the source code for the entire program to be made
available. This means recipients can copy, modify and redistribute the program for free. This is not an ideal solution if
the desire is to build proprietary software that can be licensed for a fee.

Solving the Open Source Problems with AI-Based Code Generators

The trio of problems addressed above seem insurmountable to some people. Many companies are banning the use of
AI-code generators by their developers to avoid tainting issues and to minimize the likelihood of getting dragged into
a lawsuit for infringement. This is a legally safe option but prevents developers from obtaining the benefits of AI code
generators. Fortunately, there are a number of practical solutions that can mitigate these risks and enable developers
to safely use AI code generators.

3 For example, some versions of the BSD license state: “Redistributions in binary form must reproduce the above copyright notice, this list
of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.”

4 For example, the CoPilot website states that it “generates new code in a probabilistic way, and the probability that they produce the
same code as a snippet that occurred in training is low. The models do not contain a database of code, and they do not ‘look up’ snippets.
Our latest internal research shows that about 1% of the time, a suggestion may contain some code snippets longer than ~150 characters
that matches the training set.”

PAGE 3 www.lawoftheledger.com
In part 2 of this article, I will discuss some of these solutions. As a preview, these solutions can include:

• filters to prevent the output of problematic code

• code referencing tools to flag problematic output

• code scanning tools to assist developers with open source compliance.

Conclusion
Many companies have banned the use of AI code generators by its developers due to the legal risks and uncertainty
resulting from the use of open source code to train the models. In my view, these issues are manageable by using
various known solutions. Use of these solutions can significantly mitigate the risk and uncertainty of using AI-code
generators. Developers, companies and their in-house counsel struggling with how to manage legal risks with AI code
generators will definitely want to learn about these solutions.

For Part 2 of this article and more details on generative AI, please contact:

James Gatto
Blockchain and Fintech Team Co-leader
bio
202.747.1945
[email protected]

Sheppard Mullin’s Blockchain Technology and Fintech team helps clients develop innovative and comprehensive legal
strategies to take advantage of what may be the most disruptive and transformative technology since the Internet.
We focus on advising clients on how to meet their business objectives, without incurring unnecessary legal risk. Our
team includes attorneys with diverse legal backgrounds who collectively understand the vast array of legal issues
with and ramifications of blockchain technology and digital currencies. More Information

This alert is provided for information purposes only and does not constitute legal advice and is not intended to form an attorney client relationship.
Please contact your Sheppard Mullin attorney contact for additional information.

PAGE 4 www.lawoftheledger.com

Python Postgresql Tutorial
No ratings yet
Python Postgresql Tutorial
47 pages
GP2018 R2 New
No ratings yet
GP2018 R2 New
2,875 pages
Ubio Alpeta User Guide: Union Community
No ratings yet
Ubio Alpeta User Guide: Union Community
113 pages
Kofax Capture Installation Guide - KofaxCaptureInstallationGuide - EN
No ratings yet
Kofax Capture Installation Guide - KofaxCaptureInstallationGuide - EN
146 pages
Exchange 2013 Managed Availability
No ratings yet
Exchange 2013 Managed Availability
106 pages
Xpression 3 Design Track - Xpresso For Adobe InDesign CS3 PDF
No ratings yet
Xpression 3 Design Track - Xpresso For Adobe InDesign CS3 PDF
241 pages
Serverless Application Model
No ratings yet
Serverless Application Model
302 pages
深入理解Android内核设计思想.pdf (深入理解Android内核设计思想.pdf) (Z-Library)
No ratings yet
深入理解Android内核设计思想.pdf (深入理解Android内核设计思想.pdf) (Z-Library)
700 pages
Pan Os Cli Quick Start 7
No ratings yet
Pan Os Cli Quick Start 7
744 pages
M00 EBS On OCI Course Overview Ed7
No ratings yet
M00 EBS On OCI Course Overview Ed7
29 pages
Mir200 User Guide - Robot Interface 20 v10
No ratings yet
Mir200 User Guide - Robot Interface 20 v10
45 pages
Static Web Site Cheat Sheets
100% (1)
Static Web Site Cheat Sheets
31 pages
The Heroku Hackers Guide
No ratings yet
The Heroku Hackers Guide
61 pages
Developer Guide PDF
100% (1)
Developer Guide PDF
1,263 pages
Unicharm - Prospectus 2016-2018
No ratings yet
Unicharm - Prospectus 2016-2018
350 pages
Amazon Translate
No ratings yet
Amazon Translate
26 pages
Hortonworks Data Platform: Apache Hive Performance Tuning
No ratings yet
Hortonworks Data Platform: Apache Hive Performance Tuning
48 pages
Network+ Study Guide - Uneditable
No ratings yet
Network+ Study Guide - Uneditable
59 pages
MLNX - EN Documentation Rev 4.9-5.1.0.0 LTS - 10!23!2022
No ratings yet
MLNX - EN Documentation Rev 4.9-5.1.0.0 LTS - 10!23!2022
206 pages
Red Hat Enterprise Linux-8-8.8 Release Notes-En-Us
No ratings yet
Red Hat Enterprise Linux-8-8.8 Release Notes-En-Us
209 pages
Udemy Courses 100% Off FREE Coupons
No ratings yet
Udemy Courses 100% Off FREE Coupons
3 pages
Opentext Exstream Interactive Overview
No ratings yet
Opentext Exstream Interactive Overview
3 pages
C-H Activation
No ratings yet
C-H Activation
24 pages
Banner 9 Quick Guide
No ratings yet
Banner 9 Quick Guide
9 pages
Siemens FireFinder XLS Zeus v3.0 Programming Tool Quick Start Guide PDF
No ratings yet
Siemens FireFinder XLS Zeus v3.0 Programming Tool Quick Start Guide PDF
60 pages
Database Design Document Template PDF Free
No ratings yet
Database Design Document Template PDF Free
22 pages
Google Creative Certification
No ratings yet
Google Creative Certification
35 pages
General 80300 TRM
No ratings yet
General 80300 TRM
312 pages
CC (Neha) PDF
No ratings yet
CC (Neha) PDF
50 pages
LXF - 254 - September 2019
No ratings yet
LXF - 254 - September 2019
100 pages
OCI Exam Preperation Handbook v2.0
No ratings yet
OCI Exam Preperation Handbook v2.0
14 pages
IBM Data Science Capstone
No ratings yet
IBM Data Science Capstone
51 pages
Jagan Teki U Boot From Scratch v2019 01 Edition v2
No ratings yet
Jagan Teki U Boot From Scratch v2019 01 Edition v2
63 pages
Camm 4e Ch01 PPT
No ratings yet
Camm 4e Ch01 PPT
48 pages
5 Best Portfolio-Ready Data Analytics Projects For Beginners by Learnbay Blogs May, 2023 Medium
No ratings yet
5 Best Portfolio-Ready Data Analytics Projects For Beginners by Learnbay Blogs May, 2023 Medium
17 pages
Opentext™ Brava!™ Enterprise Champions Guide For Tuning Performance
No ratings yet
Opentext™ Brava!™ Enterprise Champions Guide For Tuning Performance
25 pages
Manual SIRIUS ACT With PROFINET IO en-US
No ratings yet
Manual SIRIUS ACT With PROFINET IO en-US
122 pages
Banner General Middle Tier Implementation Guide
No ratings yet
Banner General Middle Tier Implementation Guide
221 pages
Digital Forensics
100% (1)
Digital Forensics
103 pages
SQLMX Vs Oracle
No ratings yet
SQLMX Vs Oracle
49 pages
Alibaba Cloud Product
No ratings yet
Alibaba Cloud Product
5 pages
Classifications 4 (1) .2.0 Troubleshooting& FAQs Guide
No ratings yet
Classifications 4 (1) .2.0 Troubleshooting& FAQs Guide
27 pages
2024 08 15 Traffic Analysis Exercise Answers
No ratings yet
2024 08 15 Traffic Analysis Exercise Answers
10 pages
OpenFrame Base 7 Fix#3 TSAM Guide v2.1.4 en
No ratings yet
OpenFrame Base 7 Fix#3 TSAM Guide v2.1.4 en
80 pages
Sms Gateway License
No ratings yet
Sms Gateway License
1 page
10 Best Presentation Apps To Create Amazing Presentations: Blog Categories
No ratings yet
10 Best Presentation Apps To Create Amazing Presentations: Blog Categories
30 pages
Installation PowerCenter Express
No ratings yet
Installation PowerCenter Express
60 pages
Arwind Thakare - Client Services Technician - CY9
No ratings yet
Arwind Thakare - Client Services Technician - CY9
3 pages
CONTRACTORFOCAL MODULE v2.1 PDF
No ratings yet
CONTRACTORFOCAL MODULE v2.1 PDF
66 pages
Project Report
No ratings yet
Project Report
55 pages
AWS Re/start Agenda: Week 1 - Introduction, Cloud Foundations
No ratings yet
AWS Re/start Agenda: Week 1 - Introduction, Cloud Foundations
12 pages
R730 and R730xd Technical Guide
No ratings yet
R730 and R730xd Technical Guide
64 pages
Banner Student Overall Installation Guide 9.3
No ratings yet
Banner Student Overall Installation Guide 9.3
57 pages
Log
No ratings yet
Log
26 pages
WooCommerce Mobile App - Ionic 3
No ratings yet
WooCommerce Mobile App - Ionic 3
22 pages
Spring 2305.15486
No ratings yet
Spring 2305.15486
305 pages
Security Best Practice - Hardening Guide
No ratings yet
Security Best Practice - Hardening Guide
12 pages
Mastering Windows Azure Application Development
No ratings yet
Mastering Windows Azure Application Development
3 pages
Opentext™ Brava! ™ For Content Suite: Integration Guide
No ratings yet
Opentext™ Brava! ™ For Content Suite: Integration Guide
7 pages
Chapter 8 - Exam Questions
No ratings yet
Chapter 8 - Exam Questions
16 pages
DevOps ZeroToHero English
No ratings yet
DevOps ZeroToHero English
3 pages
Catia V5 Fundamentals
100% (2)
Catia V5 Fundamentals
53 pages
Scraper Automatic Content Crawl and Post Plugin For Wordpress License
No ratings yet
Scraper Automatic Content Crawl and Post Plugin For Wordpress License
1 page
SIGFOX Whitepaper
No ratings yet
SIGFOX Whitepaper
14 pages
ER Modeling (I)
No ratings yet
ER Modeling (I)
55 pages
BOLT Pagesource
No ratings yet
BOLT Pagesource
14 pages
Programming in C Unit III
No ratings yet
Programming in C Unit III
19 pages
Deploy Web Apps With Docker
No ratings yet
Deploy Web Apps With Docker
61 pages
Linux Privilege Escalation 1714714339
No ratings yet
Linux Privilege Escalation 1714714339
18 pages
AI Code Generators Article - Part 2 0623
No ratings yet
AI Code Generators Article - Part 2 0623
4 pages
How To Install A Custom Rom On A Cube Talk 9X U65GT
0% (1)
How To Install A Custom Rom On A Cube Talk 9X U65GT
5 pages
Coupon - WCFM Documentation
No ratings yet
Coupon - WCFM Documentation
14 pages
Manual: Dlan® Wifi Outdoor
No ratings yet
Manual: Dlan® Wifi Outdoor
43 pages
Data Structure Exam Sample - 2020
No ratings yet
Data Structure Exam Sample - 2020
12 pages
SAP Manual Testing Interview Questions and Answers
No ratings yet
SAP Manual Testing Interview Questions and Answers
5 pages
Yuanliang Lyu - Resume
No ratings yet
Yuanliang Lyu - Resume
1 page
Review 99832
No ratings yet
Review 99832
9 pages
Sample Business Plan Laundry
No ratings yet
Sample Business Plan Laundry
4 pages
Cisco Unified CCX Database Schema Guide, Release 8.5
No ratings yet
Cisco Unified CCX Database Schema Guide, Release 8.5
78 pages
Banner Form Desc
No ratings yet
Banner Form Desc
62 pages
400,000 WORDS Over A Weekend: Leonid Glazychev, CEO, Logrus IT Pavel Doronin, Product Manager, Smartcat
No ratings yet
400,000 WORDS Over A Weekend: Leonid Glazychev, CEO, Logrus IT Pavel Doronin, Product Manager, Smartcat
31 pages
How To Sync On-Premises Active Directory To Azure Active Directory With Azure AD Connect
No ratings yet
How To Sync On-Premises Active Directory To Azure Active Directory With Azure AD Connect
15 pages
Attack Report
No ratings yet
Attack Report
5 pages
1Z0 1112 2 Demo
No ratings yet
1Z0 1112 2 Demo
4 pages
Week 5 Module 5 Graded Quiz
No ratings yet
Week 5 Module 5 Graded Quiz
4 pages
Termsof Service Policy
No ratings yet
Termsof Service Policy
8 pages
Chapters 5-6 Summary and Reflection
No ratings yet
Chapters 5-6 Summary and Reflection
5 pages
Django Reference Sheet
No ratings yet
Django Reference Sheet
3 pages
Android & TalkBack Mobile Accessibility Cheat Sheet
No ratings yet
Android & TalkBack Mobile Accessibility Cheat Sheet
2 pages
Az 900
No ratings yet
Az 900
11 pages