Sarah Silverman Sues Maker Of ChatGPT For Copyright Infringement

DL :)@lemmy.ml · 1 year ago

Sarah Silverman Sues Maker Of ChatGPT For Copyright Infringement

Max_Power@feddit.de · edit-2 1 year ago

I like her and I get why creatives are panicking because of all the AI hype.

However:

In evidence for the suit against OpenAI, the plaintiffs claim ChatGPT violates copyright law by producing a “derivative” version of copyrighted work when prompted to summarize the source.

A summary is not a copyright infringement. If there is a case for fair-use it’s a summary.

The comic’s suit questions if AI models can function without training themselves on protected works.

A language model does not need to be trained on the text it is supposed to summarize. She clearly does not know what she is talking about.

IANAL though.

jmcs@discuss.tchncs.de · 1 year ago

I guess they will get to analyze OpenAI’s dataset during discovery. I bet OpenAI didn’t have authorization to use even 1% of the content they used.

erogenouswarzone@lemmy.ml · 1 year ago

SS is such a tool. Does anybody remember the big anti-gay speech that launched her career in The Way of the Gun? She’ll do anything to get ahead.

Here’s the speech: https://www.youtube.com/watch?v=PAl5xGi7urQ

PipedLinkBot@feddit.rocks · 1 year ago

Here is an alternative Piped link(s): https://piped.video/watch?v=PAl5xGi7urQ

Piped is a privacy-respecting open-source alternative frontend to YouTube.

I’m open-source, check me out at GitHub.

another_kbin_addict@lemmy.world · 1 year ago

Good piped bot

Tosti@feddit.nl · edit-2 9 months ago

deleted by creator

Margot Robbie@lemmy.world · 1 year ago

She’s going to lose the lawsuit. It’s an open and shut case.

“Authors Guild, Inc. v. Google, Inc.” is the precedent case, in which the US Supreme Court established that transformative digitalization of copyrighted material inside a search engine constitutes as fair use, and text used for training LLMs are even more transformative than book digitalization since it is near impossible to reconstitute the original work barring extreme overtraining.

You will have to understand why styles can’t and should not be able to be copyrighted, because that would honestly be a horrifying prospect for art.

patatahooligan@lemmy.world · 1 year ago

“Transformative” in this context does not mean simply not identical to the source material. It has to serve a different purpose and to provide additional value that cannot be derived from the original.

The summary that they talk about in the article is a bad example for a lawsuit because it is indeed transformative. A summary provides a different sort of value than the original work. However if the same LLM writes a book based on the books used as training data, then it is definitely not an open and shut case whether this is transformative.

Margot Robbie@lemmy.world · 1 year ago

But what an LLM does meets your listed definition of transformative as well, it indeed provides additional value that can’t be derive from the original, because everything it outputs is completely original but similar in style to the original that you can’t use to reconstitute the original work, in other words, similar to fan work, which is also why the current ML models, text2text or text2image, are called “transformers”. Again, works similar in style to the original cannot and should not be considered copyright infringement, because that’s a can of worm nobody actually wants to open, and the courts has been very consistent on that.

So, I would find it hard to believe that if there is a Supreme Court ruling which finds digitalizing copyrighted material in a database is fair use and not derivative work, that they wouldn’t consider digitalizing copyrighted material in a database with very lossy compression (that’s a more accurate description of what LLMs are, please give this a read if you have time) fair use as well. Of course, with the current Roberts court, there is always the chance that weird things can happen, but I would be VERY surprised.

There is also the previous ruling that raw transformer output cannot be copyrighted, but that’s beyond the scope of this post for now.

My problem with LLM outputs is mostly that they are just bad writing, and I’ve been pretty critical against “”“Open”""AI elsewhere on Lemmy, but I don’t see Siverman’s case going anywhere.

patatahooligan@lemmy.world · 1 year ago

But what an LLM does meets your listed definition of transformative as well

No it doesn’t. Sometimes the output is used in completely different ways but sometimes it is a direct substitute. The most obvious example is when it is writing code that the user intends to incorporate into their work. The output is not transformative by this definition as it serves the same purpose as the original works and adds no new value, except stripping away the copyright of course.

everything it outputs is completely original

[citation needed]

that you can’t use to reconstitute the original work

Who cares? That has never been the basis for copyright infringement. For example, as far as I know I can’t make and sell a doll that looks like Mickey Mouse from Steamboat Willie. It should be considered transformative work. A doll has nothing to do with the cartoon. It provides a completely different sort of value. It is not even close to being a direct copy or able to reconstitute the original. And yet, as far as I know I am not allowed to do it, and even if I am, I won’t risk going to court against Disney to find out. The fear alone has made sure that we mere mortals cannot copy and transform even the smallest parts of copyrighted works owned by big companies.

I would find it hard to believe that if there is a Supreme Court ruling which finds digitalizing copyrighted material in a database is fair use and not derivative work

Which case are you citing? Context matters. LLMs aren’t just a database. They are also a frontend to extract the data from these databases, that is being heavily marketed and sold to people who might otherwise have bought the original works instead.

The lossy compression is also irrelevant, otherwise literally every pirated movie/series release would be legal. How lossy is it even? How would you measure it? I’ve seen github copilot spit out verbatim copies of code. I’m pretty sure that if I ask ChatGPT to recite me a very well known poem it will also be a verbatim copy. So there are at least some works that are included completely losslessly. Which ones? No one knows and that’s a big problem.

Ziro@lemmy.world · edit-2 1 year ago

Let’s remove the context of AI altogether.

Say, for instance, you were to check out and read a book from a free public library. You then go on to use some of the book’s content as the basis of your opinions. More, you also absorb some of the common language structures used in that book and unwittingly use them on your own when you speak or write.

Are you infringing on copyright by adopting the book’s views and using some of the sentence structures its author employed? At what point can we say that an author owns the language in their work? Who owns language, in general?

Assuming that a GPT model cannot regurgitate verbatim the contents of its training dataset, how is copyright applicable to it?

Edit: I also would imagine that if we were discussing an open source LLM instead of GPT-4 or GPT-3.5, sentiment here would be different. And more, I imagine that some of the ire here stems from a misunderstanding of how transformer models are trained and how they function.

Margot Robbie@lemmy.world · 1 year ago

I’m tired of internet arguments. If you are not going to make a good faith attempt to understand anything I said, then I see no point in continuing this discussion further. Good day.

Sagrotan@lemmy.world · 1 year ago

Like the record labels sued every music sharing platform in the early days. Adapt. They’re all afraid of new things but in the end nobody can stop it. Think, learn, work with it, not against it.

diskmaster23@lemmy.one · 1 year ago

I think it’s valid. This isn’t about the tech, but the sources of your work.

Sagrotan@lemmy.world · 1 year ago

Of course it’s valid. And the misuse of AI has to be fight. Nevertheless we have to think differently in the face of something we cannot stop in the long run. You cannot create a powerful tool and only misuse it. I miscommunicated here, should’ve explained myself, I got no excuses, maybe one: I sat on the shitter and wanted to make things short.

Riptide502@lemm.ee · 1 year ago

AI is a duel sided blade. On one hand, you have an incredible piece of technology that can greatly improve the world. On the other, you have technology that can be easily misused to a disastrous degree.

I think most people can agree that an ideal world with AI is one where it is a tool to supplement innovation/research/creative output. Unfortunately, that is not the mindset of venture capitalists and technology enthusiasts. The tools are already extremely powerful, so these parties see them as replacements to actual humans/workers.

The saddest example has to be graphic designers/digital artists. It’s not some job that “anyone can do.” It’s an entire profession that takes years to master and perfect. AI replacement doesn’t just mean taking away their job, it’s rendering years of experience worthless. The frustrating thing is it’s doing all of this with their works, their art. Even with more regulations on the table, companies like adobe and deviant art are still using shady practices to unknowingly con users into building their AI algorithms (quietly instating automatic OPT-IN and making OPT-OUT options difficult). It’s sort of like forcing a man to dig their own grave.

You can’t blame artists for being mad about the whole situation. If you were in their same position, you would be just as angry and upset. The hard truth is that a large portion of the job market could likely be replaced by AI at some point, so it could happen to you.

These tools need to be TOOLS, not replacements. AI has it’s downfalls and expert knowledge should be used as a supplement to both improve these tools and the final product. There was a great video that covered some of those fundamental issues (such as not actually “knowing” or understanding what a certain object/concept is), but I can’t find it right now. I think the best comes when everyone is cooperating.

Steeve@lemmy.ca · 1 year ago

Even as tools, every time we increase worker productivity without a similar adjustment to wages we transfer more wealth to the top. It’s definitely time to seriously discuss a universal basic income.

givesomefucks@lemmy.world · edit-2 1 year ago

In evidence for the suit against OpenAI, the plaintiffs claim ChatGPT violates copyright law by producing a “derivative” version of copyrighted work when prompted to summarize the source.

Both filings make a broader case against AI, claiming that by definition, the models are a risk to the Copyright Act because they are trained on huge datasets that contain potentially copyrighted information

They’ve got a point.

If you ask AI to summarize something, it needs to know what it’s summarizing. Reading other summaries might be legal, but then why not just read those summaries first?

If the AI “reads” the work first, then it would have needed to pay for it. And how do you deal with that? Is a chatbot treated like one user? Or does it need to pay for a copy for each human that asks for a summary?

I think if they’d have paid for a single ebbok Library subscription they’d be fine. However the article says they used pirate libraries so it could read anything on the fly.

Pointing an AI at pirated media is going to be hard to defend in court. And a class action full of authors and celebrities isn’t going to be a cakewalk. They’ve got a lot of money to fight, and have lots of contacts for copyright laws. I’m sure all the publishers are pissed too.

Everyone is going after AI money these days, this seems like the rare case where it’s justified

limeaide@lemmy.ml · 1 year ago

Can the sources where ChatGPT got it’s information from be traced? What if it got the information from other summaries?

I think the hardest thing for these companies will be validating the information their AI is using. I can see an encyclopedia-like industry popping up over the next couple years.

Btw I know very little about this topic but I find it fascinating

rainroar@lemmy.ml · 1 year ago

Yes! They publish the data sources and where they got everything from. Diffusers (stable diffusion/midjoirny etc) and GPT both use tons of data that was taken in ways that likely violate that data’s usage agreement.

Imo they deserve whatever lawsuits they have coming.

radarsat1@lemmy.ml · 1 year ago

likely violate that data’s usage agreement.

It doesn’t seem to be too common for books to include specific clauses or EULAs that prohibit their use as data in machine learning systems. I’m curious if there are really any aspects that cover this without it being explicitly mentioned. I guess we’ll find out.

rainroar@lemmy.ml · 1 year ago

I think with a book your standard digital license / copyright would forbid it, would it not?

radarsat1@lemmy.ml · 1 year ago

Maybe. I’m interested in the specifics.

Beej Jorgensen@lemmy.sdf.org · 1 year ago

It depends on if the summary is an infringing derivative work, doesn’t it? Wikipedia is full of summaries, for example, and it’s not violating copyright.

If they illegally downloaded the works, that feels like a standalone issue to me, not having anything to do with AI.

TWeaK@lemm.ee · 1 year ago

Wikipedia is a non profit whose primary purpose is education. ChatGPT is a business venture.

Rivalarrival@lemmy.today · 1 year ago

A book review published in a newspaper is a commercial venture for the purpose of selling ads. The commercial aspect doesn’t make the review an infringement.

A summary is a “Transformative Derivation”. It is a related work, created for a fundamentally different purpose. It is a discussion about the work, not a copy of the work. Transformative derivations are not infringements, even where they are specifically intended to be used for commercial purposes.

TWeaK@lemm.ee · 1 year ago

A book review is most likely critical, and thus falls under fair use.

A summary is not critical, so would not have a fair use exemption. I would also disagree that it is transformative. That argument is about work that is so different to the original that it must be considered a separate piece (eg new music that uses a sample from old music). A summary is inherently not transformative, because it is merely a shortened version of the original - the ideas expressed are the same.

Rivalarrival@lemmy.today · 1 year ago

Transformative doesn’t mean that the idea is different. It means the purpose for expressing the idea is different. Informing an individual or the general public of the general idea presented in a book is not an infringement. If it were, every book report every student is ever asked to write would be an infringement.

TWeaK@lemm.ee · 1 year ago

https://en.m.wikipedia.org/wiki/Transformative_use

Transformativeness is a characteristic of such derivative works that makes them transcend, or place in a new light, the underlying works on which they are based.

A summary would not place the original work in a new light. A summary is the same work but shorter. A summary would be infringement.

Student book reports are for educational purposes, which has its own specific exemption under fair use. As does work which is critical of the original, along with news. A critical piece, for example, is transformative because it introduces new ideas, talking about the work and framing it in new ways.

AI meets none of these exemptions with a summary. It’s debatable whether it even could meet these exemptions in the way that it functions.

Rivalarrival@lemmy.today · edit-2 1 year ago

Student book reports are for educational purposes, which has its own specific exemption under fair use. As does work which is critical of the original, along with news. A critical piece, for example, is transformative because it introduces new ideas, talking about the work and framing it in new ways.

You’re forgetting two other important categories of fair use. Paste that student’s book report in a newspaper, and it is no longer “educational”, but it is still “news reporting”. “Author publishes work” is a newsworthy event.

Paste it in response to an individual asking about the work, and again, it is no longer educational, but it is still “commentary”, which is much the same as news reporting but with a typically smaller audience.

Even if these two categories of fair use were not specifically included in copyright law, they would naturally arises from the right to free speech. Making a summary subject to the original copyright would make it unlawful for anyone to even discuss the work at all.

Peanut@sopuli.xyz · 1 year ago

Personally I find this stupid. If we have robots walking around, are they going to be sued every time they see something that’s copywrited?

It’s this what will stop progress that could save us from environmental collapse? That a robot could summarize your shitty comedy?

Copywrite is already a disgusting mess, and still nobody cares about models being created specifically to manipulate people en mass. “What if it learned from MY creations” asks every self obsessed egoist in the world.

Doesn’t matter how many people this tech could save after another decade of development. Somebody think of the [lucky few artists that had the connections and luck to make a lot of money despite living in our soul crushing machine of a world]

All of the children growing up abused and in pain with no escape don’t matter at all. People who are sick or starving or homeless do no matter. Making progress to save the world from immanent environmental disaster doesn’t matter. Let Canada burn more and more every year. As long as copywrite is protected, all is well.

Dr. Jenkem@lemmy.blugatch.tube · edit-2 1 year ago

How do you figure that AI is the answer to environmental collapse? Don’t get me wrong, copyright law is stupid, but I guess I just don’t buy into all of the AI hype to the extent that others are.

Peanut@sopuli.xyz · 1 year ago

I believe it will require a level and pace of informational processing that is far beyond what humans will accomplish alone. just having a system that can efficiently sift through the excess existing papers, and find correlations or contradictions would be amazing for development of new technology. if you are paying attention to any environmental sciences right now, it’s terrifying in an extremely real and tangible way. we will not outpace the collapse without an intense increase in technological development.

if we bridge the gap of analogical comprehension in these systems, they could also start introducing or suggesting technologies that could help slow down or reverse the collapse. i think this is much more important than making sure sarah silverman doesn’t have her work paraphrased.

PowerCrazy@lemmy.ml · 1 year ago

We already know how to stop climate change, but we, as in capitalist society, does not want to.