As AI-generated content becomes a larger share of the digital world that already occupies much of our lives, the laws regulating this content become more focal for everyday freedoms and aggregate economic welfare. Perhaps none of these laws are more important than copyright.
The jurisprudence on how copyright law will apply to generative AI is being written as we speak and, insofar as our legal system makes it out of the next decade, these decisions will shape the future.
For example, a case decided in March of this year, Thaler v. Perlmutter, set a precedent that human authorship is required for copyright protection over any work. So the artworks produced by Stephen Thaler’s “Creativity Machine,” a neural network image model like DALLE, don’t get copyright protection. This decision makes it much easier for anyone to use the outputs of AI tools in their work. It also probably prevented an arms race where AI companies try to produce and copyright as much artistic material as possible before any others get there first. More speculatively, this decision could reserve a role for humans working with AIs even in a fully superhuman world as rubber-stampers bring AI’s work under the legal protections of copyright.
Other examples include:
Denmark passing a law that gives people stronger copyright over their own voice and physical features in response to AI deepfakes.
Several lawsuits currently moving through US courts right now from authors, artists, and musicians against AI companies for using their work in training. For example, Concord Music Group is suing Anthropic for copyright infringement, alleging that by training on lyrics and reproducing those lyrics in the chat interface when asked, the company is “reproducing, distributing, and displaying someone else’s copyrighted works to build its own business.”
There is another case against GitHub which alleges that their Copilot coding tool violates the intellectual property contract terms in open source code license agreements by failing to attribute open source code authorship when reproducing similar blocks of code for users.
Some older precedent is relevant in these cases, e.g this 2015 case over Google using scanned book text to train their search engine algorithm ruled that it was transformative fair use, in particular ruling that “the ‘statistical information’ pertaining to ‘word frequencies, syntactic patterns, and thematic markers’ in that book are beyond the scope of copyright protection.”
The outcome of these cases could undermine the utility of current LLM tools. How much more expensive does training get if even open source code is outside the scope of fair use? Or if every time the AIs produce a similar code block they have to produce a bunch of useless CMI token alongside it.
The decisions in these cases could also levy significant settlements or fines from AI firms, and end up advantaging China in the AI arms race. Certain decisions might completely change which content is produced in the future and who gets rich on it. The current sequel obsession in Hollywood, for example, is probably due to the continual extension of copyright for media properties. Perhaps more importantly, the enforcement of IP law can require substantial surveillance of AI outputs to make sure no illegal content is produced.
So what are the right decisions in these cases?
There are specifics that are obviously important to the right decision in each case, so there isn’t a single answer across all of them. But there is a frame of analysis that is missing from most discussions of AI and IP:
Most people analyze intellectual property as a natural property rights issue rather than a question of properly subsidizing a positive externality.
Copyright isn't about rights
The confusion of intellectual property and property rights is fair enough given the name, but intellectual property is not a property right at all. Property rights are required because property is rivalrous and exclusive: When one person is using a pair of shoes or an acre of land, other people’s access is restricted. This central feature is not present for IP: an idea can spread to an infinite number of people and the original author’s access to it remains untouched.
There is no inherent right to stop an idea from spreading in the same way that there is an inherent right to stop someone from stealing your wallet. But there are good reasons why we want original creators to be rewarded when others use their work: Ideas are positive externalities.
When someone comes up with a valuable idea or piece of content, the welfare maximizing thing to do is to spread it as fast as possible, since ideas are essentially costless to copy and the benefits are large.
But coming up with valuable ideas often takes valuable inputs: research time, equipment, production fixed costs etc. So if every new idea is immediately spread without much reward to the creator, people won’t invest these resources upfront, and we’ll get fewer new ideas than we want. A classic positive externalities problem.
Thus, we have an interest in subsidizing the creation of new ideas and content.
There are many ways governments subsidize new ideas: grants, prizes, loans, scholarships, etc. A major challenge with all of these methods is selection: which ideas should be rewarded and which should be ignored?
Intellectual property law is best understood as a cleverly designed subsidy that mimics property rights in order to avoid this central challenge of selection. Here’s Adam Smith explaining this advantage in the context of patents:
Thus the inventor of a new machine or any other invention has the exclusive privilege of making and vending that invention for the space of 14 years . . .
For if the legislature should appoint pecuniary rewards for the inventors of new machines, etc., they would hardly ever be so precisely proportioned to the merit of the invention as this is. For here, if the invention be good and such as is profitable to mankind, he will probably make a fortune by it; but if it be of no value he also will reap no benefit.
Essentially, copyright on a movie that no one likes isn’t worth anything and a patent on an invention that no one uses is worth the same. The value of the subsidy from intellectual property protection scales automatically in proportion to the market value of whatever is protected. So by mimicking property rights, the government can assign huge rewards to blockbuster films and not spend a dime on crank inventions without having to pick any winners.
To fund this property-mimicking subsidy, the government excludes anyone else from using a copyrighted idea unless the owner sells access. This rewards the original creator but it does so by taxing the value that the idea would provide to everyone else. So the intellectual property subsidy gets us more new ideas, but at the cost of less value from each one because they don’t get to spread as fast as they could.
Things can get more complicated because copyrighted or patented ideas can be inputs into the creation of other new ideas. For example, genes sequenced by private contractors during the Human Genome Project and were under IP protection saw 20-30% less subsequent scientific research and product development compared to the public domain genes and patents that are randomly assigned judges who are more likely to invalidate them see 50% more follow-on citations from later inventors than those assigned to more lenient judges. So raising IP subsidies by increasing copyright protection can have second order effects that decrease the creation of new ideas and possibly outweigh the effects of the increased subsidy entirely.
Striking the welfare-maximizing balance between increasing incentives for new ideas and avoiding taxes on the spread of existing ones is the purpose of intellectual property.
IP and AI
This frame of intellectual property as a subsidy to new ideas clarifies the IP fights around AI. It’s not a question of natural rights or just desserts for artists and authors, it’s a question of whether we should raise subsidies for new ideas and increase taxes on their spread, or do the opposite and decrease direct subsidies for new ideas awhile making existing ones easier to spread, remix, and build upon.
This is not always an easier question, but I argue that generative AI decreases the optimal strength copyright, and thus courts should allow them to use the code and works of others in the training process without legal risk or license fees.
Intellectual property exists to repay creators for the upfront costs of their work that are difficult to recoup when selling access to their work for marginal cost (often close to zero). But generative AI lowers the upfront costs of creating the types of content they train on. Writing code, producing economic research, and generating images are all easier, so less subsidy is required to induce people to do them.
Beyond the direct effects of generative AI, the fixed costs for all of these activities have been decreasing for decades while copyright protections have only risen. Producing and distributing movies, music, research, or invention is now so cheap that millions of terabytes of such content are uploaded to the internet for free every day.
Simultaneously, AI raises the value of knowledge and content in the public domain. Just like search engine indexing made knowledge on the internet more valuable by making it easier to access, so do AI tools increase the value of the knowledge they are trained on. AI tools not only make it easy to find particular information on request, but they know when to reference a certain article or section of open source code even when the user has no idea what to ask for.
When individual artists can use AI tools to create entire films or video games in their apartment, the costs we pay to support Warner Brother’s exclusive rights to The Lord of The Rings increase. Not many independent creators could afford to recreate or extend movies based on Tolkein’s work using traditional Hollywood tech, but with advanced AI tools, the cost is much lower and thus we miss out on much more by banning anyone else from trying to use the ideas.
Generative AI is decreasing the upfront costs that IP exists to subsidize while simultaneously increasing the value of the spread, remixed, and followed-on ideas that IP bans. Thus, courts should not make decisions that increase intellectual property subsidies.
Since the pure outputs of AI models are already non-copyrightable under Thaler v Perlmutter, protecting the inputs under fair use would leave the space between your keyboard and Stargate as an open space for the exploration and remix of all human knowledge. Inviting copyright law into this interface would amputate it.
The courts writing AI and IP jurisprudence today are setting the economic incentive structure around the 21st century’s most important technology. It is therefore crucial that these decisions be grounded in IP's true purpose as a welfare-maximizing subsidy rather than confused appeals to natural property rights.
Even "natural" property rights can be understood in similar terms. Eg, Hume argues that we give people rights to the fruits of their labor/land because we want people to make long term investments like agriculture. If you knew you probably wouldn't get to reap what you sow (because bandits might steal it), you would do a lot less sowing. But world where nobody tills, sows, etc, is a much poorer world than one where people do. So I totally agree IP law regarding AI should be approached from a broadly consequentialist perspective, by thinking about what incentives we want to create. I'd just say that's true of regular property too.
I love this last paragraph - any think tanks commissioning grants in this area? Seems really important at the moment.