Linux lays down the law on AI-generated code, says yes to Copilot, no to AI slop, and humans take the fall for mistakes — after months of fierce debate, Torvalds and maintainers come to an agreement

Lee Duna@lemmy.nz · 1 month ago

Linux lays down the law on AI-generated code, says yes to Copilot, no to AI slop, and humans take the fall for mistakes — after months of fierce debate, Torvalds and maintainers come to an agreement

Katherine 🪴@piefed.social · 1 month ago

Linux kernel being written by Microsoft’s AI.

Blue_Morpho@lemmy.world · 1 month ago

The title of the article is extraordinary wrong that makes it click bait.

There is no “yes to copilot”

It is only a formalization of what Linux said before: All AI is fine but a human is ultimately responsible.

" AI agents cannot use the legally binding “Signed-off-by” tag, requiring instead a new “Assisted-by” tag for transparency"

The only mention of copilot was this:

“developers using Copilot or ChatGPT can’t genuinely guarantee the provenance of what they are submitting”

This remains a problem that the new guidelines don’t resolve. Because even using AI as a tool and having a human review it still means the code the LLM output could have come from non GPL sources.

marlowe221@lemmy.world · edit-2 1 month ago

Yeah, that’s also my question. Partially because I am a former-lawyer-turned-software-developer… but, yeah. How are the kernel maintainers supposed to evaluate whether a particular PR contains non-GPL code?

Granted, this was potentially an issue before LLMs too, but nowhere near the scale it will be now.

(In the interests of full disclosure, my legal career had nothing to do with IP law or software licensing - I did public interest law).

wonderingwanderer@sopuli.xyz · 1 month ago

If it’s flagged as “assisted by <LLM>” then it’s easy to identify where that code came from. If a commercial LLM is trained on proprietary code, that’s on the AI company, not on the developer who used the LLM to write code. Unless they can somehow prove that the developer had access to said proprietary code and was able to personally exploit it.

If AI companies are claiming “fair use,” and it holds up in court, then there’s no way in hell open-source developers should be held accountable when closed-source snippets magically appear in AI-assisted code.

Granted, I am not a lawyer, and this is not legal advice. I think it’s better to avoid using AI-written code in general. At most use it to generate boilerplate, and maybe add a layer to security audits (not as a replacement for what’s already being done).

But if an LLM regurgitates closed-source code from its training data, I just can’t see any way how that would be the developer’s fault…

sem@piefed.blahaj.zone · 1 month ago

Pretty convenient.

This is how copyleft code gets laundered into closed source programs.

All part of the plan.

wonderingwanderer@sopuli.xyz · 1 month ago

How would they launder it? Just declare it their own property because a few lines of code look similar? When there’s no established connection between the developers and anyone who has access to the closed-source code?

That makes no sense. Please tell me that wouldn’t hold up in court.

ricecake@sh.itjust.works · 1 month ago

I believe what they’re referring to is the training of models on open source code, which is then used to generate closed source code.
The break in connection you mention makes it not legally infringement, but now code derived from open source is closed source.

Because of the untested nature of the situation, it’s unclear how it would unfold, likely hinging on how the request was formed.

We have similar precedent with reverse engineering, but the non sentient tool doing it makes it complicated.

wonderingwanderer@sopuli.xyz · 1 month ago

That makes sense. I see the problem with that, and I don’t have a good solution for it. It is a divergence of topic though, as we were discussing open-source programmers using LLMs which are potentially trained on closed-source code.

LLMs trained on open-source code is worth its own discussion, but I don’t see how it fits in this thread. The post isn’t about closed-source programmers using LLMs.

Besides, closed-source code developers could’ve been stealing open-source code all along. They don’t really need AI to do that.

Still, training LLMs on open-source code is a questionable practice for that reason, particularly when it comes to training commercial models on GPL code. But it’s probably hard to prove what code was used in their datasets, since it’s closed-source.

ricecake@sh.itjust.works · 1 month ago

I don’t really see it as a divergence from the topic, since it’s the other side of a developer not being responsible for the code the LLM produces, like you were saying.
In any case, it’s not like conversations can’t drift to adjacent topics.

Besides, closed-source code developers could’ve been stealing open-source code all along. They don’t really need AI to do that.

Yes, but that’s the point of laundering something. Before if you put foss code in your commercial product a human could be deposed in the lawsuit and make it public and then there’s consequences. Now you can openly do so and point at the LLM.

People don’t launder money so they can spend it, they launder money so they can spend it openly.

Regardless, it wasn’t even my comment, I just understood what they were saying and I’ve already replied way out of proportion to how invested I am in the topic.

stylusmobilus@aussie.zone · 1 month ago

any resulting bugs or security flaws firmly onto the shoulders of the human submitting it.

Watch Americans and their companies pull some mad gymnastics on proportioning blame for this

Electricd@lemmybefree.net · 1 month ago

Well yea, it’s the human submitting the code, and using a tool known to be imperfect

Your comment is pretty dumb

stylusmobilus@aussie.zone · 1 month ago

At this point it’s 23 on -5 with opinions on that dumb comment sunshine