The "AI is theft" debate - An argument

Christoffer

This statement has been used ever since the question of training data was first asked. It's starting to become a very polarized debate in which nuances get lost and the actual relevant questions get dismissed in all the shouting noise.

The main question is however, is training data theft?

Numerous research studies have found links between how the human mind generate new ideas to that of how AI models do it. Not in a self-aware way of intelligently guiding thought, but the basic fundamental synthesis of data into new forms.

This poses a question: if the physical process is mimicked, then how can training data be considered theft?

Here are two scenarios.

A) A person has a perfect photographic memory. They go to the library every day for 20 years and read every book in that library. They then write a short story drawing upon all that has been read and seen in that library during these 20 years.

B) A tech company let an algorithm read through all books at the same library, which produces a Large Language Model based on that library as its training data. It's then prompted to write a short story and draws upon all it has read and seen through the algorithm.

Why is B considered theft? The system has within it the same type of storage as how a human brain stores memories; the texts aren't direct copies of any book in the library any more than a person with photographic memory has copies of the same books in their photographic memory. Both have outputs that synthesize from the same books, both have physical processes that functions the same through their neural network structures. The only difference is that the person and "prompter" is one and the same within scenario A and separated as the system and human prompter in scenario B.

The counter argument to this is that tech companies haven't paid for the books and texts they've put into the training data. But the person with a photographic memory in scenario A did not pay either, yet is able to recall these texts anyway. Not having paid for a book does not block people from reading a book they find or borrow.

Another counter argument is that the LLMs are able to write out direct copies of the texts they have "in memory", but this is also true for the person with photographic memory. This is also an alignment problem that some tech companies have moved to mitigate and asking a model to write direct quotes from something copyrighted is not possible. In the end, a person need to directly manipulate the system; intentionally force it to make direct copies. But how is this different from someone with a photographic memory intentionally writing down a chapter from a book in that library?

The intention of making a direct copy of something isn't made by the system, it's made by the person making that decision. It's like blaming the gun for murdering someone and not the killer. You can argue that taking away the access to guns makes it harder for the killer to commit murder, but someone set on killing someone will just find other means of doing it. It does not change the equation.

So if the act of gathering memories and training your own conscious identity (your mind) for the purpose of creating something, is physically and systematically identical to how these AI models function outside of the question of intent, then how can an argument for theft be formed against these systems?

How can an argument for these models being "plagiarism machines" be made when the system itself doesn't have any intention of plagiarism? It can accidentally plagiarize something but in the same way a human can accidentally plagiarize a text. This happens all the time and it's always the responsibility of the writer to make sure it doesn't happen. The best AI systems are also tuned to mitigate that this happens, something that humans aren't.

...the real problem starts when students are asked to deliver well-written and unique assignments within a short span. This situation isn’t limited to academics only. Many writers often face this issue when they have to write content on similar topics repeatedly. The chances of accidental plagiarism become extremely high in such situations....

...it is essential for every writer to check for plagiarism before posting it or sharing it with their clients to avoid consequences.

We can also argue that these systems mitigate plagiarism overall. If a system is aligned to counteract and make sure it doesn't plagiarize, then the risk that people doing research on something and accidentally plagiarize, is mitigated further if they use these systems for research.

This debate has its roots in the old Napster lawsuits. But a problem arises in the context of the actions taken by those using the system. Napster (and also Pirate Bay), enabled people to download the products themselves and for the passive consumption that has its monetization directly built into the product. Napster essentially provided the copying machine built for the specific purpose of spreading products for free. But these AI systems aren't spreading anything. The claims against these AI models is rather that they enable people to plagiarize. The claims are that these companies used copyrighted material, but they didn't spread it and they are actively working against their systems producing anything copyrighted. So the intent of plagiarizing fundamentally falls on the user of the system, if that user is even able to produce actual plagiarized material.

-------

Among many of the critics of AI are concept artists. I've been a fan of concept art for a long time and I know a lot of how concept art is produced for film- and game studios. There is a particular problem that arises for concept artist's argument against AI being that very often in their own process they utilize copyrighted material and often directly use these images in their own work.

The argument against image generative models is that they copy elements into new images. But this isn't the case. Diffusion models generate images in the same manner as how the human mind generate images. We form images through our pattern recognition and generate almost hallucinatory when we picture something. There's never any copyrighted images being directly copied into what diffusion models produce and even when these models are pushed to make a direct copy they cannot duplicate anything more than a human who's really bad at duplicating a photograph.

But many concept artists have for years been manually scraping copyrighted photos through Google image searches and used it in a process called "photobashing".

Here's an example:

While many focus on using stock photos for this process, this is not true in all cases. Since the time when concept artists started to heavily criticize diffusion models, they've been more inclined to communicate that photos in their photobashing are from stock libraries.

But just a few years ago, manual scraping for images used in concept art was a fairly common practice. So these concept artists have been making money on photobashing copyrighted photos into their concept art for years, but now they criticize diffusion models (that doesn't even do this) to be infringing on copyright, effectively calling it "theft". Is this not clearly a double standard perspective?

--------

In conclusion, I have a hard time being convinced that these AI models can be criticized in terms of "theft" in the way they are today. The systems themself do not function in the way people seem to think they do and the way they function is so similar to the physical and systemic processes of human creativity that any ill-intent to plagiarize can only be blamed on a user having that intention. All while many artists have been directly using other people's work in their production for decades in a way that is worse than how these models synthesize new text and images from their training data.

We can blame these companies for things like using private emails and private information in their training data. But those are on grounds of privacy infringement, of arguments about security and unlawful use of private information. The actual debate, however, seems entirely focused on "theft" of copyright material, for which I cannot find any functioning reasoning behind.

Data that is out in public; that's officially released and used as training data is essentially out in the open and the critics of AI systems are essentially trying to argue that an unofficial copy that is made behind closed doors, without further spread or passive consumption by those doing it, and isn't released to the public, is "theft". That would then mean that if you borrowed a book and took a picture of a page and sent it to a friend, you have conducted a form of theft that is considered more serious than what they blame the AI systems of doing. Even though these systems "remember" a page, rather than making a direct copy.

And if there's no rational argument for it being theft, then artists and people criticizing AI like this risk being shut down by court rulings in lawsuits that set the groundwork for a society's jurisdictions around AI systems and their use. Proving theft could be impossible and because of this I think the debate has to become more mature and get away from the polarized masses who end up in echo chambers and groupthink mentalities.

Pushing ill-conceived arguments based solely on disinformation out of the fear of losing one's job to an AI can end up giving these companies more power in the end.

Rather we should focus on how to co-exist with these systems and argue for them being in assistance of artists and writers rather than them being a threat to their existence. That there are people out there believing that these systems will make them writers and artists out of nothing isn't any different than when digital still cameras with video functions were supposed to enable everyone to be a professional cinematographer. That never happened; what happened was that real cinematographers started utilizing these cameras as part of their toolkit.

So it seems that this whole debate is purely driven by fear and not nearly enough by rational thought.

jkop

How can an argument for these models being "plagiarism machines" be made when the system itself doesn't have any intention of plagiarism? — Christoffer

The user of the system is accountable, and possibly its programmers as they intentionally instruct the system to process copyright protected content in order to produce a remix. It seems fairly clear, I think, that it's plagiarism and corruption of other people's work.

jkop

the way they function is so similar to the physical and systemic processes of human creativity that any ill-intent to plagiarize can only be blamed on a user having that intention. All while many artists have been directly using other people's work in their production for decades in a way that is worse than how these models synthesize new text and images from their training data. — Christoffer

What's similar is the way they appear to be creative, but the way they appear is not the way they function.

A machine's iterative computations and growing set of syntactic rules (passed for "learning") are observer-dependent and, as such, very different from a biological observer's ability to form intent and create or discover meanings.

Neither man nor machine becomes creative by simulating some observer-dependent appearance of being creative.

Christoffer

The user of the system is accountable — jkop

If the user asks for an intentional plagiarized copy of something, or a derivative output, then yes, the user is the only one accountable as the system does not have intention on its own.

possibly its programmers as they intentionally instruct the system to process copyright protected content in order to produce a remix. It seems fairly clear, I think, that it's plagiarism and corruption of other people's work. — jkop

But this is still a misunderstanding of the system and how it works. As I've stated in the library example, you are yourself feeding copyrighted material into your own mind that's synthesized into your creative output. Training a system on copyrighted material does not equal copying that material, THAT is a misunderstanding of what a neural system does. It memorize the data in the same way a human memorize data as neural information. You are confusing the "intention" that drives creation, with the underlying physical process.

There is no fundamental difference between you learning knowledge from a book and these models learning from the same book. And the "remix" is fundamentally the same between how the neural network forms the synthesis and how you form a synthesis. The only difference is the intention, which in both systems, the mind and the AI model, is the human input component.

So it's not clear at all that it's plagiarism because the description of the system that you did isn't correct about how it functions. And it's this misunderstanding that a neural network and machine learning functions and this mystification about how the human mind works that produces these faulty conclusions.

If we are to produce laws and regulations for AI, they will need to be based on the most objective truths about how these systems operate. When people make arguments that make arbitrary lines between how artificial neural networks work and neural networks in our brains, then we get into problematic and arbitrary differences that fundamentally spells out an emotional conclusion: "we shouldn't replicate human functions" and the followup question becomes "on what grounds?" Religious? Spiritual? Emotional? Neither which is grounds for laws and regulations.

What's similar is the way they appear to be creative, but the way they appear is not the way they function. — jkop

That's not what I'm talking about or what this argument is about. The appearance of creativity is not the issue or fundamental part of this. The function; the PHYSICAL function of the system is identical to how our brain functions within the same context. The only thing that's missing is the intention, the driving force in form of the "prompt" or "input request". Within us, our creativity is interlinked with our physical synthesis system and thus it also includes the "prompt"; the "intention" of creation. AI systems however, only has the synthesis system, but that in itself is not breaking any copyrights anymore than our own mind when we experience something and dream, hallucinates and produce ideas. The intention to plagiarize is a decision and that decision is made by a human, that responsibility is made by a human at the point of the "intention" and "prompt", not before it.

And so, the argument can be made that a person who reads three books and writes a derivative work based on those three books is doing the same as someone who prompts an AI to write derivative work. Where do you put the blame? On reading the three books (training the models), or the intention of writing the derivative work (prompting the AI model to write derivative)?

If you put the blame on the act of training the models, then you also put blame on reading the books. Then you are essentially saying that the criminal act of theft is conducted by all people all the time they read, see, hear and experience someone else's work. Because that is the same as training these models, it is the same fundamental process.

But putting blame on that is, of course, absurd. We instead blame the person's intent of writing a derivative piece. We blame the act of wanting to produce that work. And since the AI models doesn't have that intent, you cannot logically put blame on the act of training these models on copyrighted material, because there's nothing in that act that breaks copyright. It's identical to how we humans consume copyrighted material, storing it in neural memory form. And a person with photographic memory excels at this type of neural storage, exactly like these models.

A machine's iterative computations and growing set of syntactic rules (passed for "learning") are observer-dependent and, as such, very different from a biological observer's ability to form intent and create or discover meanings. — jkop

I'm not sure if you really read my argument closely enough because you keep mixing up intention with the fundamental process and function of information synthesis.

There are two parts of creation: 1) The accumulation of information within neural memory and its synthesis into a new form. 2) The intention for creation that guides what is formed. One is the source and one is the driving principle for creation out of that source. The source itself is just a massive pool of raw floating data, both in our minds and in these systems. You cannot blame any of that for the creation because that is like blaming our memory for infringing on copyrighted material. It's always the intention that defines if something is plagiarized or derivative. And yes, the intention is a separate act. We constantly produce ideas and visions even without intent. Hallucinations and dreams form without controlled intent. And it's through controlled intent we actually create something outside of us. It is a separate act and part of the whole process.

And this is the core of my argument. Blaming the process of training AI models as being copyright infringement is looking to be objectively false. It's a fundamental misunderstanding of how the system works, and they seem to more or less come from people's emotional hate for big tech. I'm also critical of big tech, but people will lose much more control over these systems if our laws and regulations get defined by court rulings in which we lose because side of the people made naive emotional arguments about things they fundamentally don't understand. Misunderstandings of the system, of the science of neural memory and functions and how our own minds work.

I ask, how are these two different?

A) A person has a perfect photographic memory. They go to the library every day for 20 years and read every book in that library. They then write a short story drawing upon all that has been read and seen in that library during these 20 years.

B) A tech company let an algorithm read through all books at the same library, which produces a Large Language Model based on that library as its training data. It's then prompted to write a short story and draws upon all it has read and seen through the algorithm. — Christoffer

This is the core of this argument. The difference between intention and the formation of the memory core that creation is drawn from and how synthesis occurs. That formation and synthesis in itself does not break copyright laws, but people still call it theft without thinking it through properly.

Neither man nor machine becomes creative by simulating some observer-dependent appearance of being creative. — jkop

This isn't about creativity. This isn't about wether or not these systems "can be creative". The "creative process" does not imply these systems being able to produce "art". This type of confusion in the debate is what makes people not able to discuss the topic properly. The process itself, the function that simulates the creative process, is not a question of "art" and AI, that's another debate.

The problem is that people who fight against AI, especially artists fighting against these big tech companies, put blame on the use of copyrighted material in training these models, without understanding that this process is identical to how the human mind is "trained" as well.

All while many artists, in their own work and processes, directly use other people's work in the formation of their own, but still put some arbitrary line between themselves and the machines when these machines seem to do the same, even though things like diffusion models are fundamentally unable to do produce direct copies in the same way as some of these artists do (as per the concept art example).

It's a search for a black sheep in the wrong place. Antagonizing the people in control of these systems rather than trying to promote themselves, as artists, to be part of the development and help form a better condition for future artists.

RussellA

Numerous research studies have found links between how the human mind generate new ideas to that of how AI models do it. — Christoffer

As Pablo Picasso said: "good artists copy; great artists steal”, which also references the difference between art and craft.

As Claude Lorrain painted figures and trees in the foreground with boats on water in the background, Andre Derain 300 years later painted figures and trees in the foreground with boats on water in the background. But was Derain a good artist copying Lorrain or was he a great artist stealing from Lorrain?

Both the good artist and the great artists generate new ideas, but there is a difference. The good artist copies what is immediately visible in a great painting whilst the great artist discovers what is hidden beneath the surface of a great painting.

It seems that at the moment AI is copying what is immediately apparent in existing texts, making it a good source, rather than being able to discover the hidden structure behind existing texts, potentially making it a great source, and then possibly rivalling the best of humans.

jorndoe

Apropos:

The Simpsons in the 1950s
^{— demonflyingfox · Apr 28, 2024 · 1m:7s}

I wouldn't call it theft/plagiarism, more like a demonstration.

Christoffer

↪RussellA

The definitions of how artists work upon inspirations and other's work is part of the equation, but still dependent on the intention of the artist. The intention isn't built into the AI models, it's the user that forms the intended use and guiding principle of creation.

So, artist's stealing from other artists is the same as a user prompting the AI model to "steal" in some form. But the production of the text or image by the AI model is not theft in itself, it's merely a function mimicking how the human brain acts upon information (memory) and synthesize that into something new. What is lacking within the system itself is the intention and thus it can't be blamed for theft.

Which means we can't blame the technicians and engineers for copyright infringement as these AI models don't have "copies" of copyrighted work inside them. An AI model is trained on data (the copyrighted material) and forms a neural network that functions as its memory and foundation of operation, i.e what weights and biases it has for operating.

So, it's essentially exactly the same as how our brain structure works when it uses our memory that is essentially a neural network formed by raw input data; and through emotional biases and functions synthesize those memories into new forms of ideas and hallucinations. Only through intention do we direct this into forming an intentional creative output, essentially forming something outside of us that we call art.

It's this difference that gets lost in the debate around AI. People reason that the training data is theft, but how can that be defined as theft? If I have a photographic memory and I read all books in a library, I've essentially formed a neural network within me on the same ground as training a neural network. And thus, there are no copies, there's only a web of connections that remember data.

If we criminalize remembering data, the formation of a neural network based on data, then a person with photographic memory is just as guilty by merely existing in the world. The difference between training an AI model and a human becomes arbitrary and emotional rather than logical.

We lose track of which moral actions are actually immoral and blame the wrong people. It's essentially back to the luddites destroying machines during the industrial revolution, but this time, the immoral actions of the users of these AI systems gets a pass and instead the attacks gets directed towards the engineers. Not because they're actually guilty of anything, but rather because of the popularity of hating big tech. It's a polarizing situation in which the rational reasoning gets lost in favor of people forming an identity around whatever groupthink they're in

And we might lose an enormous benefit to humanity through future AI systems because people don't seem to care to research how these things actually operate and instead just scream out their hate due to their fear of the unknown.

Society needs to be smarter than this. Artists need to be smarter than this.

jkop

[

If the user asks for an intentional plagiarized copy of something, or a derivative output, then yes, the user is the only one accountable as the system does not have intention on its own. — Christoffer

According to you, or copyright law?

But this is still a misunderstanding of the system and how it works. As I've stated in the library example, you are yourself feeding copyrighted material into your own mind that's synthesized into your creative output. Training a system on copyrighted material does not equal copying that material, THAT is a misunderstanding of what a neural system does. It memorize the data in the same way a human memorize data as neural information. You are confusing the "intention" that drives creation, with the underlying physical process. — Christoffer

If 'feeding', 'training', or 'memorizing' does not equal copying, then what is an example of copying? It is certainly possible to copy an original painting by training a plagiarizer (human or artificial) in how to identify the relevant features and from these construct a map or model for reproductions or remixes with other copies for arbitrary purposes. Dodgy and probably criminal.

You use the words 'feeding', 'training', and 'memorizing' for describing what computers and minds do, and talk of neural information as if that would mean that computers and minds process information in the same or similar way. Yet the similarity between biological and artificial neural networks has decreased since the 1940s. I've 'never seen a biologist or neuroscientist talk of brains as computers in this regard. Look up Susan Greenfield, for instance.

Your repeated claims that I (or any critic) misunderstand the technology are unwarranted. You take it for granted that a mind works like a computer (it doesn't) and ramble on as if the perceived similarity would be an argument for updating copyright law. It's not.

RussellA

The intention isn't built into the AI models, it's the user that forms the intended use and guiding principle of creation. — Christoffer

HG Wells in his book The Time Machine wrote “It sounds plausible enough tonight, but wait until tomorrow. Wait for the common sense of the morning.”

This is data. Two writers may use this as a foundation and inspiration for their own works

If the first writer did no more than repeat the same material and wrote “It sounds plausible enough tonight, but wait until tomorrow. Wait for the common sense of the morning”, then this would be plagiarism and considered theft.

However, if the second writer in using the same material was able to discover a deeper structure, and was able to base their writing on such a deeper structure, then their writing would not be considered as either plagiarism or theft.

For example, Wells i) draws attention to the concept of time by following the idea of tonight by the idea of tomorrow, ii) repeats the same word "wait" but gives it two different meanings when he writes "wait until tomorrow" and "wait for the common sense", and iii) contrasts opposites, when he infers that an idea first thought successful may in fact not be so.

If the second writer did no more than copy this deeper structure discovered in Well's writing and wrote "One started by questioning the importance of training data in AI but ended by becoming more confused than ever, confused about the role of the algorithm and confused about the role of the engineer creating the algorithm, striving to find the nuances but only discovering the noise", then I am sure no there would be no question of either plagiarism or theft.

There is data, and at a deeper level there is what the data means. Can an AI algorithm ever discover what data means at this deeper level?

Mr Bee

Numerous research studies have found links between how the human mind generate new ideas to that of how AI models do it. Not in a self-aware way of intelligently guiding thought, but the basic fundamental synthesis of data into new forms. — Christoffer

That's one of the main issues right? How comparable human creativity is to that of AI. When an AI "draws upon" all the data it is trained on is it the same as when a human does the same like in the two scenarios you've brought up?

At the very least it can be said that the consensus is that AIs don't think like we do, which is why don't see tech companies proclaiming that they've achieved AGI. There are certainly some clear shortcomings to how current AI models work compared to human brain activity, though given how little we know about neuroscience (in particular the process of human creativity) and how much less we seem to know about AI I'd say that the matter of whether we should differentiate human inspiration and AI's' "inspiration" currently is at best unclear.

But just a few years ago, manual scraping for images used in concept art was a fairly common practice. So these concept artists have been making money on photobashing copyrighted photos into their concept art for years, but now they criticize diffusion models (that doesn't even do this) to be infringing on copyright, effectively calling it "theft". Is this not clearly a double standard perspective? — Christoffer

It's not like photobashing isn't controversial too mind you. So if you're saying that AI diffusions models are equivalent to that practice then that probably doesn't help your argument.

Christoffer

According to you, or copyright law? — jkop

According to the logic of the argument. Copyright law does not cover these things and the argument I'm making is that there are problems with people's reasoning around copyright and how these systems operate. A user that intentionally push a system to do plagiarism and who carefully manipulate the prompts for that intention, disregarding any warnings by the system and the alignment programming of it to not do so... ends up solely being the guilty one. It's basically like if you asked a painter to make a direct copy of a famous painting and the painter says "no", pointing out that's plagiarism, yet you take out a gun and hold it to the painter's head and demand it. Will any court of law say that the painter, who is capable of painting any kind of painting in the world, is as guilty as you, just because he has that painting skill, knowledge of painters and different paintings, and the technical capability?

If 'feeding', 'training', or 'memorizing' does not equal copying, then what is an example of copying? It is certainly possible to copy an original painting by training a plagiarizer (human or artificial) in how to identify the relevant features and from these construct a map or model for reproductions or remixes with other copies for arbitrary purposes. Dodgy and probably criminal. — jkop

No one is doing this. No one is intentionally programming the systems to plagiarize. It's just a continuation of the misunderstandings. Training neural network systems is a computer science field in the pursuit of mimicking the human mind. To generalize operation to function beyond direct programming. If you actually study the history of computer science in artificial intelligence, the concept of neural network and machine learning has to do with forming neural networks in order to form operations that act upon pattern recognition, essentially forming new ideas or generalized operation out of the patterns that emerge from the quantity of analyzed information and how they exist in relation to each other. This is then aligned into a system of prediction that emulate the predictive thinking of our brains.

A diffusion model therefor "hallucinate" forward an image out of this attempt to predict shapes, colors and perspective based on what it has learned, not copies of what it used to learn. And the key component that is missing is the guiding intent; "what" it should predict. It's not intelligent, it's not a thinking machine, it merely mimics the specific process of neural memory, pattern recognition and predictive operation that we have in our brains. So it cannot just predict on its own, it can't "create on its own". It needs someone to guide the prediction.

Therefore, if you actually look at how these companies develop these models, you will also see a lot of effort put into alignment programming. They do not intentionally align the models to perform plagiarism, they actively work against it, and making sure there are guardrails for accidental plagiarism and block users trying to intentionally produce it. But even so, these systems are black boxes and people that want to manipulate and find backdoors into plagiarism could be able to do so, especially on older models. But that only leads back to who's to blame for plagiarism and it becomes even clearer that it's the user of the system who intentionally want to plagiarize something and solely becomes the one guilty of it. Not the engineers, or the process of training these models.

You use the words 'feeding', 'training', and 'memorizing' for describing what computers and minds do, and talk of neural information as if that would mean that computers and minds process information in the same or similar way. Yet the similarity between biological and artificial neural networks has decreased since the 1940s. I've 'never seen a biologist or neuroscientist talk of brains as computers in this regard. Look up Susan Greenfield, for instance. — jkop

The idea behind machine learning and neural networks were inspired by findings in neuroscience, but the purpose wasn't to conclude similarities, it was to see if operations could be generalized and improve predictability in complex situations, such as robotics based on experimenting with similarities to what neuroscientists had discovered about the biological brain. It's only just recently that specific research in neuroscience (IBS, MIT etc.) has been focusing on these similarities between these AI models and how the brain functions, concluding that there are striking similarities between the two. But the underlying principles of operation has always been imitating how memory forms. But you confuse the totality of how a brain operates with the specific function of memory and predictability. Susan Greenfield is even aligned with how memory forms in our brain and hasn't published anything to the contrary of what other researchers have concluded in that context. No one is saying that these AI systems acts as the totality of the brain, but the memory in our head exists as a neural network that acts from the connections rather than raw data. This is how neuroscientists describes how our memory functions and operates as the basis for prediction and actions. The most recent understandings is that memories are not stored in parts of the brain, but instead exists as spread across the brain with different regions featuring a more or less concentration of connections based on the nature of the information. Essentially acting like weights and biases in an AI system that focus how memories are used.

The fact is that our brain doesn't store information like a file. And likewise, a machine-learned neural network doesn't either. If I read and memorize a page from a book (easier if I had photographic memory), I didn't store this page in my head as a file like a computer does. The same goes for a neural network that was trained with this page. It didn't copy the page, it has put it in relation to other pages, other texts, other things in the world that has been part of its training. And just like our brain, if we were to remove the "other stuff", then the memory and understanding of that specific page would deteriorate, because the memory of the page relies on how it relates to other knowledge, other information, about language, about visual pattern recognition, about contextual understanding of the texts meaning and so on. All are part of our ability to remember the page and our ability to do something with that memory.

Again, I ask... what is the difference in scenario A and scenario B? Explain to me the difference please.

Your repeated claims that I (or any critic) misunderstand the technology are unwarranted. You take it for granted that a mind works like a computer (it doesn't) and ramble on as if the perceived similarity would be an argument for updating copyright law. It's not. — jkop

It's not unwarranted since the research that's being done right now continues to find similarities to the point that neuroscientists are starting to utilize these AI models in their research in order to further understand the brain. I don't see anyone else in these debates actually arguing out of the research that's actually being done. So how is it unwarranted to criticize others for not fully understanding the technology when they essentially don't? Especially when people talk about these models storing copyrighted data when they truly don't, and that these engineers also fundamentally programmed the models to focus on doing plagiarism, when that's a blatant lie.

It seems rather that it's you who take for granted that your outdated understanding of the brain is enough as a counter argument. And forget the fact that there are currently no final scientific conclusions as to how our brain works. The difference however, is that I'm writing these arguments out of the latest research in these fields and that's the only foundation to form any kind of argument, especially when we're talking about the context of this topic. What you are personally convinced about when it comes to how the brain works is irrelevant, whoever you've decided to trust in this field is irrelevant. It's the consensus of up to date neuroscience and computer science that should act as the foundation for arguments.

So, what are you basing your counter arguments on? What exactly is your counter argument?

Christoffer

There is data, and at a deeper level there is what the data means. Can an AI algorithm ever discover what data means at this deeper level? — RussellA

The system doesn't think, the system doesn't have intention. Neither writer exists within the system, it is the user that informs the intention that guides the system. It's the level of complexity that the system operates on that defines how well that output becomes. But the fact remains that if the engineer program the system not to plagiarize and the user doesn't ask for plagiarism, there's no plagiarism going on anymore than an artist who draws upon their memory of works of art that inspire them. These systems have build in guardrails that attempt to prevent accidental plagiarism, something that occurs all the time by humans and has been spottet within the systems as well. But in contrast to human accidental plagiarism, these systems are getting better and better at discovering such accidents, because such accidents are in no ones interest. It's not good for the artist who's work was part of the training data, it's not good for the user and it's not good for the AI company. No one has any incentive to let these AI models be plagiarist machines.

But the problem I'm bringing up in my argument primarily has to do with claims that the act of training the AI model using copyrighted material is plagiarism and copyright infringement. That's not the same as the alignment problem of its uses that you are bringing up.

If alignment keeps getting better, will artists stop criticizing these companies for plagiarism? No. Even if AI models end up at a point where it's basically impossible to get accidental plagiarism, artists will not stop criticizing, because they aren't arguing based on rational reasoning, they want to take down these AI models because many feel they're a threat to their income and they invent lies about how the system operates and about what the intentions are of the engineers and the companies behind these models. They argue that these companies "intentionally targets" them when they don't. These companies invented a technology that mimic how the human brain learn and memorize and how this memory functions as part of predicting reality, demonstrating it with predicting images and text into existence and how this prediction start to emerge other attributes of cognition. It's been happening for years, but is now at a point in which there can be practical applications in society for their use.

We will see the same with robotics in a couple of years. The partly Nvidia-lead research using LLMs for training robots that was just published showed how GPT-4 can be used in combination with robotics training and simulation training. Meaning, we will see a surge in how well robots perform soon. It's basically just a matter of time before we start seeing commercial robots for generalized purposes or business applications outside of pure industrial production. And this will lead to other sectors in society starting to criticize these companies for "targeting their jobs".

But it's always been like this. It's the luddites all over the again, smashing the industrial machines instead of getting to know what this new technology could mean and how they could use them as well.

That's one of the main issues right? How comparable human creativity is to that of AI. When an AI "draws upon" all the data it is trained on is it the same as when a human does the same like in the two scenarios you've brought up?

At the very least it can be said that the consensus is that AIs don't think like we do, which is why don't see tech companies proclaiming that they've achieved AGI. There are certainly some clear shortcomings to how current AI models work compared to human brain activity, though given how little we know about neuroscience (in particular the process of human creativity) and how much less we seem to know about AI I'd say that the matter of whether we should differentiate human inspiration and AI's' "inspiration" currently is at best unclear. — Mr Bee

AGI doesn't mean it thinks like us either. AGI just means that it generalizes between many different functions and does so automatically based on what's needed in any certain situation.

But I still maintain that people misunderstand these things. It's not a binary question, it's not that we are looking at these systems as A) Not thinking like humans therefore they are plagiarism machines or B) They think like us therefor they don't plagiarize.

Rather, it is about looking at what constitutes plagiarism and copyright theft in these systems. Copyright laws are clear when it comes to plagiarism and stealing copyrighted material. But they run into problems when they're applied as a blanket statement against these AI models. These AI models doesn't think like us, but they mimic parts of our brain. And mimicking part of our brain is not copyright infringement or theft because if it does so with remarkable similarity, then we can't criticize these operations without criticizing how these functions exists within ourselves. The difference between our specific brain function and these AI systems become arbitrary and start to take the form of spirituality or religon in which the critics falls back on "because we are humans".

Let's say we build a robot that uses visual data to memorize a street it walks along. It uses machine learning and as a constantly updating neural network that mimics a floating memory system like our own neural network constantly changing with more input data. While walking down the street it scans its surroundings and memorize everything into neural connections, like we do. At some point it ends up at a museum of modern art and goes inside, and inside it memorizes its surroundings, but that also means all the paintings and photographs. Later in the lab we ask it to talk about its day, it may describe its route and we ask it to form an image of the street. It produces an image that somewhat looks like the street, skewed, but with similar colors, similar weather and so on. This is similar to how we remember. We then ask it to draw a painting inspired by what it saw int the museum. What will it do?

Critics of AI would say it will plagiarize, copy and that it has stored the copyrighted photos and paintings through the camera. But that's not what has happened. It has a neural network that formed out of the input data, it doesn't have a flash card storing it as a video or photo of something. It might draw something that is accidental plagiarism out of that memory, but since the diffusion system generates from a noise through prediction into form, it will always be different than pure reality, different from a pure copy. Accidental plagiarism happens all the time with people and as artists we learn to check our work so it doesn't fall under it. If the engineers push the system to do such checks, to make sure it doesn't get too close to copyrighted material, then how can it plagiarize? Then we end up with a system that does not directly store anything, it remembers what it has seen just like humans remembers through our own neural network, and it will prevent itself from drawing anything too close to an original.

One might say that the AI system's neural memory is too perfect and would constitute being the same as having it on a normal flash card, but how is that different from a person with photographic memory? It then becomes a question of accuracy, effectively saying that people with photographic memory shouldn't enter a museum as they are basically storing all those works of art in their neural memory.

Because what the argument in here is fundamentally about is the claim that the act of training AI models on copyrighted material is breaking copyright. The use and alignment problem is another issue and an issue that can be solved without banning these AI models. But the promotion for banning these systems stem from claims that they were trained on copyrighted material. And it is that specific point that I argue doesn't hold due to how these systems operate and how the training process is nearly identical to how humans are "training" their own neural-based memory.

Let's say humans actually had a flash card in our brain. And everything we saw and heard, read and experienced, were stored as files in folders on that flash card. And when we wrote or painted something we all just took parts of those files and produced some collage out of them. How would we talk about copyright in that case?

But when a system does the opposite and instead mimic how our brain operate and how we act from memory, we run into a problem as much of our copyright laws are defined based on interpreting "how much" a human "copied" something. How many notes were taken, accidentally or not, how much of another painting can be spotted in this new work etc. But for AI generated material, it seems that it doesn't matter how far off from other's work it is, it could be provably as original as any other human creation deemed "original", but it still gets blamed as plagiarism because the training data was copyrighted material, not realizing that artists function on the same principles and sometimes even go further than these AI systems, as my example of concept artists showed.

The conclusion, or message I'm trying to convey here is that the attempt to ban these AI models and call their training process theft is just luddite behavior out of existential fear. And that the real problem is alignment to prevent accidental plagiarism, which is something these companies work hard to prevent as it's in no ones interest for that to happen in outputs. That this antagonizing pitch fork behavior that artists and other people have in this context is counter-productive and that they should instead demand to work WITH these companies to help mitigate accidental plagiarism and ill-willed use of these models.

It's not like photobashing isn't controversial too mind you. So if you're saying that AI diffusions models are equivalent to that practice then that probably doesn't help your argument. — Mr Bee

No, I'm saying diffusion models doesn't do that and that there's a big irony to the fact that many concept artists who are now actively trying to fight AI with arguments of theft effectively have made money in the past through a practice that is the very same process they falsely accuse these diffusion models of doing based on a misunderstanding of how they actually operate. The operation of these diffusion models, compared to that practice, actually makes the model more moral than the concept artists within this context as diffusion models never directly copies anything into the images, since they don't have any direct copies in memory.

This highlights a perfect example of why artist's battle to ban these models and their reasoning behind it becomes rather messy and could bite them back in ways that destroys far more for them than if they actually tried to help these companies to instead align their models for the benefit of artists.

jkop

Again, I ask... what is the difference in scenario A and scenario B? Explain to me the difference please. — Christoffer

A and B are set up to acquire writing skills in similar ways. But this similarity is irrelevant for determining whether a literary output violates copyright law.

You blame critics for not understanding the technology, but do you understand copyright law? Imagine if the law was changed and gave Ai-generated content carte blanche just because the machines have been designed to think or acquire skills in a similar way as humans. That's a slippery slope to hell, and instead of a general law you'd have to patch the systems to counter each and every possible misuse. Private tech corporations acting as legislators and judges of what's right and wrong. What horror.

So, what are you basing your counter arguments on? What exactly is your counter argument? — Christoffer

If your claim is that similarity between human and artificial acquisition of skills is a reason for changing copyright law, then my counter-argument is that such similarity is irrelevant. What is relevant is whether the output contains recognizable parts of other people's work.

One might unintentionally plagiarize recognizable parts of someone else's picture, novel, scientific paper etc. and the lack of intent (hard to prove) might reduce the penalty but hardly controversial as a violation.

Christoffer

A and B are set up to acquire writing skills in similar ways. But this similarity is irrelevant for determining whether a literary output violates copyright law. — jkop

Why is it irrelevant? The system itself lacks the central human component that is the intention of its use. While the human has that intention built in. You cannot say the system violates copyright law as the system itself isn't able to either have copyright on its output or by its own will break copyright. This has been established by the "https://en.wikipedia.org/wiki/Monkey_selfie_copyright_dispute" and would surely apply to an AI model as well. That leaves the user as the sole agent responsible for the output, or rather, for the use of the output.

Because I can paint a perfect copy of someone else's painting. It's rather in what way I use it that defines how copyright is applied. In some cases I can show my work crediting the original painter, in some cases I can sell it like that, or I will not be able show it at all but still have it privately or unofficially. The use of the output defines how copyright applies and because of that we are far past any stage in which the AI model and its function is involved, if it has at all made anything even remotely close to infringing on copyright.

It's basically just a tool, like the canvas, paint and paintbrush. If I want to sell that art, it's my responsibility to make sure it isn't breaking any copyright laws. The problem arise when people make blanket statements that all AI outputs break copyright, which is a false statement. And even if there is a ruling that forbid the use of AI systems, they would only be able to criminalize monetization of outputs, not if they're used privately or unofficially within some other creative process as a supporting tool.

All artists use copyrighted material during their work, painters usually cut out photos and print out stuff to use as references and inspiration while working. So all of this becomes messy for those proposing to shut these AI models down, and in some cases lead to double standards.

In any case it leads back to the original claim that my argument challenge. The claim that the training process is breaking copyright because it is trained on copyrighted material. Which is what the A and B scenario is about.

You blame critics for not understanding the technology, but do you understand copyright law? Imagine if the law was changed and gave Ai-generated content carte blanche just because the machines have been designed to think or acquire skills in a similar way as humans. That's a slippery slope to hell, and instead of a general law you'd have to patch the systems to counter each and every possible misuse. Private tech corporations acting as legislators and judges of what's right and wrong. What horror. — jkop

Explain to me what it is that I don't understand about copyright law.

And explain to me why you make such a slippery slope argument as some kind of "appeal to extremes" fallacy thinking that such a scenario is what I'm proposing. You don't seem to read what I write when I say that artist need to work with these companies for the purpose of alignment. Do you understand what I mean by that? Because your slippery slope scenario tells me that you don't.

You keep making these strawmans out of a binary interpretation of this debate. That me criticizing how artists argue against AI means I want to rid all copyright law from AI use. That is clearly false.

I want people to stop making uninformed, uneducated and polarized arguments and instead educate themselves to understand the systems so the correct arguments can be made that make sure artists and society can align with the development of AI. Because the alternative is the nightmare you fear. And when artists and people just shout their misinformed groupthink opinions as hashtags until a court rules against them because they didn't care to understand how these systems work, that nightmare begins.

If your claim is that similarity between human and artificial acquisition of skills is a reason for changing copyright law, then my counter-argument is that such similarity is irrelevant. — jkop

How do you interpret this being about changing copyright law? Why are you making stuff up about my argument? Nowhere in my writing did I propose we change copyright laws in favor of these tech companies. I'm saying that copyright law does not apply to the training process of an AI model as the training process is not an action of copyright infringement anymore than a person with photographic memory who reads all books in a library. You seem unable to understand the difference between the training process and the output generation? And it's this training process specifically that is claimed to infringe on copyright and the basis of many of the current lawsuits. Not the generative part. Or rather, they've baked these lawsuits into a confused mess of uninformed criticism that with good lawyers on the tech company's side could argue in the same manner I do. And the court requires proof of copyright infringement. If the court rules that there's no proof of infringement in the training process, it could spiral into dismissal of the case, and that sets the stage for a total dismissal of all artists concerns. No one seems to see how dangerous that is. This is why my actual argument that you seem to misunderstand constantly, is to focus on the problems with image generation and create laws that actually dictate mandatory practices for these tech companies to work with artists for the purpose of alignment. That's the only way forward. These artists are now on a crusade to try and rid the world of these AI models and it's a fools errand. They don't understand the models and the technology, and try bite off more than they can chew instead of focusing their criticism properly.

What is relevant is whether the output contains recognizable parts of other people's work. — jkop

Alignment is work already being conducted by these companies, as I've said now numerous times. It's about making sure plagiarism doesn't occur. It's in everyone's interest that it doesn't happen.

And the challenge is that you need to define "how similar" something is in order to define infringement. This is the case in every copyright case in court. Artists are already using references that copy entire elements into their own work without it being copyright infringement. Artists can see something in certain colors, then see something else with a nice composition, then see a picture in a newspaper that becomes the central figure in the artwork and they combine all three into a new image that everyone would consider "original". If an AI model does exactly the same, and at the same time only use its neural memory, it's using even less direct references and influences as a diffusion model never copy anything directly into an image. Even older examples of outdated misaligned models that show almost identical images still can't reproduce them exactly, because they aren't using a file as the source, they're using neural memory in the same way we humans do. Compare that to artists who directly use other people's work in their art, it happens more than people realize. Just check how many films in which directors blatantly copy a painting into a shot composition and style, or use an entire scene from another movie almost verbatim. How do you draw the line? Why would the diffusion models and LLMs be worse than how artists are already working? As I said, it ends up being an arbitrary line in which we just conclude... because it's a machine. But as I've said, the machine, like the gun that forced the painter to plagiarize, cannot be blamed for copyright infringement. Only the user can.

One might unintentionally plagiarize recognizable parts of someone else's picture, novel, scientific paper etc. and the lack of intent (hard to prove) might reduce the penalty but hardly controversial as a violation. — jkop

Yes, which means the alignment problem is the most important one to solve. Yet, as mentioned, if we actually study how artists work, if we check their process, if I check my own process, it quickly becomes very muddy how works of art forms. People saying that art magically appears out of our divine creativity are just religious and spiritual and that's not a foundation for laws. The creative process is part a technological/biological function and part subjective intention. If the function can be externalized as a tool, then how do copyright get defined? Copyright can only be applied to intention, it cannot be applied to the process, otherwise all artists would infringe on copyright in their process of creation.

In the end, if alignment gets solved for these AI models, to the point they are unable to copy anything over a certain point of plagiaristic level for an output, and that this aligns with copyright laws for definitions of "originality", then these systems will actually be better at avoiding copyright infringement than human artists, because they won't try to fool the copyright system for the purpose of gaining something out of riding on other's success, which is the most common reason why people infringe on copyright outside the accidental. An aligned system does not care, it only sets the guardrails so that the human component cannot step over the line.

BC

The system itself lacks the central human component that is the intention of its use. — Christoffer

The processors in AI facilities lack intention, but AI facilities are owned and operated by human individuals and corporations who have extensive intentions.

Mr Bee

AGI doesn't mean it thinks like us either. AGI just means that it generalizes between many different functions and does so automatically based on what's needed in any certain situation. — Christoffer

AGI doesn't necessarily have to think exactly like us, but human intelligence is the only known example of a GI that we have and with regards to copyright laws it's important that the distinction between an AGI and a human intelligence not be that all that wide because our laws were made with humans in mind.

It might draw something that is accidental plagiarism out of that memory, but since the diffusion system generates from a noise through prediction into form, it will always be different than pure reality, different from a pure copy. — Christoffer

The question is whether or not that process is acceptable or if it should be considered "theft" under the law. We've decided as a society that someone looking at a bunch of art and using it as inspiration for creating their own works is an acceptable form of creation. The arguments that I've heard from the pro-AI side usually tries to equate the former with the latter as if they're essentially the same. That much isn't clear though. My impression is that at the very least they're quite different and should be treated differently. That doesn't mean that the former is necessarily illegal though, just that it should be treated to a different standard whatever that may be.

Let's say humans actually had a flash card in our brain. And everything we saw and heard, read and experienced, were stored as files in folders on that flash card. And when we wrote or painted something we all just took parts of those files and produced some collage out of them. How would we talk about copyright in that case? — Christoffer

Depends on what we're talking about when we say that this hypothetical person "takes parts of those files and makes a collage out of them". The issue isn't really the fact that we have memories that can store data about our experiences, but rather how we take that data and use it to create something new.

jkop

Why is it irrelevant? — Christoffer

Because a court looks at the work, that's where the content is manifest, not in the mechanics of an Ai-system nor in its similarities with a human mind.

What's relevant is whether a work satisfies a set threshold of originality, or whether it contains, in part or as a whole, other copyrighted works.

There are also alternatives or additions to copyright, such as copyleft, Creative Commons, Public Domain etc. Machines could be "trained" on such content instead of stolen content, but the Ai industry is greedy, and to snag people's copyrighted works, obfuscate their identity but exploit their quality will increase the market value of the systems. Plain theft!

Christoffer

The processors in AI facilities lack intention, but AI facilities are owned and operated by human individuals and corporations who have extensive intentions. — BC

And those extensive intentions are what, in your perspective? And in what context of copyright do those intentions exist?

AGI doesn't necessarily have to think exactly like us, but human intelligence is the only known example of a GI that we have and with regards to copyright laws it's important that the distinction between an AGI and a human intelligence not be that all that wide because our laws were made with humans in mind. — Mr Bee

Not exactly sure what point you're making here? The only time in which copyright laws apply to the system itself and independent of humans either on the back or front end is when an AGI shows real intelligence and provable qualia, but that's a whole other topic on AI that won't apply until we're actually at that point in history. That could be few years from now, 50 years or maybe never, depending on things we've yet to know about AGI and super intelligence. For now, the AGI system's that are on the table mostly just combine many different tasks so that if you input a prompt it will plan, train itself and focus efforts towards a the goal you asked for without constant monitoring and iterative inputs from a human.

Some believe this would lead to actual subjective intelligence for the AI, but it's still so mechanical and lacking the emotional component that's key to how humans structure their experience that the possibility for qualia is pretty low or non-existent. So the human input, the "prompter" still carries the responsibility of its use. I think, however, that the alignment problem becomes a bigger issue with AGI as we can't predict in what ways an AGI plan and execute for a specific goal.

This is also why AGI can be dangerous, like the paperclip scenario. With enough resources at its disposal it can spiral out of control. I think that the first example of this will be a collapse of some website infrastructure like Facebook as the AGI ends up flooding the servers with operations due to a task that spirals out of control. So before we see nuclear war or any actual dangers we will probably see some sort of spammed nonsense because an AGI executed a hallucinated plan for some simple goal it was prompted to do.

But all of that is another topic really.

The question is whether or not that process is acceptable or if it should be considered "theft" under the law. We've decided as a society that someone looking at a bunch of art and using it as inspiration for creating their own works is an acceptable form of creation. The arguments that I've heard from the pro-AI side usually tries to equate the former with the latter as if they're essentially the same. That much isn't clear though. My impression is that at the very least they're quite different and should be treated differently. That doesn't mean that the former is necessarily illegal though, just that it should be treated to a different standard whatever that may be. — Mr Bee

The difference between the systems and the human brain has more to do with the systems not being the totality of how a brain works. It's simulating a very specific mechanical aspect of our mind, but as I've mentioned it lacks intention and internal will, which is why inputted prompts need to guide these processes towards a desired goal. If you were able to add different "brain" functions up to the point that the system is operating on identical terms as the totality of our brain, how do laws for humans start to apply on the system? When do we decide it having agency enough to be the one responsible for actions?

But the fundamental core to all of this is whether or not copyright laws apply to a machine that merely operate on simulating a human brain function. It may be that neural networks that are floating and constantly reshape and retrain itself on input data is all there is to human consciousness, we don't know until we reach that point for these models. But in the end it becomes rather a question of how copyright laws function within a simulation of how we humans "record" everything around us in memory and how we operate on it.

Because when we compare these systems to that of artists and how they create something, there are a number of actions by artists that seem far more infringing on copyright than what these systems do. If a diffusion model is trained on millions of real and imaginary images of bridges, it will generate a bridge that is merely a synthesis of them all. And since there's only a limited number of image perspectives of bridges that are three-dimensionally possible, where it ends up will weight more towards one set of images than others, but never a single photo. An artist, however, might take a single copyrighted image and trace-draw on top of it, essentially copying the exact composition and choice of perspective from the one who took the photograph.

So if we're just goin by the definition of a "copy" or that the system "copies" from the training data, it rather looks like there are more artists actually copying than there are actual copying going on within these diffusion models.

Copyright court cases have always been about judging "how much" was copied. It's generally about defining how many notes something was similar to, if lyrics or texts appeared in too many exact words or sentences after another. And they all depend on the ability of the lawyers and attorneys to prove that the actions taken were more or less based on a line drawn in the sand from previous cases that proved or disproved infringement.

Copyright law has always been shifting because it's trying to apply a definition of originality to determine if a piece of art is infringement or not. But the more we learn about the brain and creative process of the mind, the more we understand of how little free will we actually have and how influential our chemical and environmental processes are in creativity, and how less logical it is to propose "true originality". It simply doesn't exist. But copyright laws demand that we have a certain line drawn in the sand that defines where we conclude something "original", otherwise art and creativity cannot exist within a free market society.

Anyone who studied human creativity in a scientific manner, looking at biological processes, neuroscience etc. will start to see how these definitions soon become artificial and non-scientific. They are essentially arbitrary inventions that over the centuries and decades since 1709 have gone through patch-works trying to make sure that line in the sand is in the correct place.

But they're also taken advantage of. With artists that had a lot of power using it against lesser known artists. And institutions who've used it as a weapon to acquire pieces of work from artists who lose their compensation because they didn't have a dozen legal teams behind them fighting for their rights.

So, what exactly has "society" decided about copyright laws? In my view it seems to be a rather messy power battle rather than truly finding where the line is drawn in the sand. The reason why well-known artists try to prove copyright infringement within the process of training these models is that if they win, they will kill the models as they can't use the data that is necessary to train them. The idea of the existential threat to artists have skewed people's minds into making every attempt to kill these models, regardless of how illogical the reasoning is behind it. But it's all based on some magical thinking about creativity and ignoring the social and intellectual relationship between the artist and the audience.

So, first, creativity isn't a magic box that produce originality, there's no spiritual and divine source for it and that produces a problem for the people drawing the line in the sand. Where do you draw it? When do you decide something is original? Second, artists will never disappear because of these AI models. Because art is about the communication between the artist and their audience. The audience want THAT artist's perspective and subjective involvement in creation. If someone, artists or hacks who believe they're artists, think that generating a duplicate of a certain painting style through an AI system is going to kill the original artist, they're delusional. The audience doesn't care to experience derivative work, they care only about what the actual artist will do next, because the social and intellectual interplay between the artist and the audience is just as important, if not the most important aspect rather than some derivative content that looks similar. That artists believe they're gonna lose money on some hacks forcing an AI to make "copies" and derivative work out of their style is delusional on both sides of the debate.

In the end, it might be that we actually need the AI models for the purpose of deciding copyright infringement:

Imagine if we actually train these AIs on absolutely everything that's ever been created and possible to use as training data. And then we align that system to be used as a filter where we decide the weights of the system to approximately draw that line in the sand, based on what we "feel" is right for copyright laws. Then, every time we have a copyright dispute in the world, be it an AI generation or someone's actual work of art, this artwork is put through that filter and it can spot if that piece of work falls under copyright infringement or not.

That would solve both the problem with AI generated outputs and normal copyright cases that try to figure out if something was plagiarized.

This is why I argue for artists to work with these companies for the purpose of alignment rather than battling against them. Because if we had a system that could help spot plagiarized content and define what's derivative, it will not only solve the problems with AI generative content, it will also help artists that do not have enough legal power to win against powerful actors within the entertainment industry.

But because the debate is so simplified down to two polarized sides and that people's view on copyright laws is this belief that there is a permanent and rigid line in the sand, we end up in a battle about power struggles about other things rather than about artists actual rights, the creativity and the prospects of these AI models.

Judging the training process to be copyright infringement becomes a stretch and a very wiggly drawn line in that sand. Such a definition start to creep into aspects that doesn't really have to do with copying and spreading files, or plagiarism and derivative work. And it becomes problematic to define that line properly based on how artists themselves work.

Depends on what we're talking about when we say that this hypothetical person "takes parts of those files and makes a collage out of them". The issue isn't really the fact that we have memories that can store data about our experiences, but rather how we take that data and use it to create something new. — Mr Bee

Then you agree that the training process of AI models does not infringe on copyright and that it's rather the problem of alignment, i.e how these AI models generate something and how we can improve them not to end up producing accidental plagiarism that the focus should be on. And as I mentioned above, such a filter in the system or such an additional function to spot plagiarism would maybe even be helpful to determine if plagiarism has occurred even outside AI generations; making copyright cases more automatic and fair to all artists and not just the ones powerful enough to have a legal teams acting as copyright special forces.

Because a court looks at the work, that's where the content is manifest, not in the mechanics of an Ai-system nor in its similarities with a human mind. — jkop

If the court look at the actual outputted work, then the training process does not infringe on copyright and the problem is about alignment, not training data or the training process.

Defining how the system works is absolutely important to all of this. If lots of artists use direct copies of other's work in their own work and such work can pass copyright after a certain level of manipulation, then something that never use direct copies should also pass copyright. How a tool or technology function is absolutely part of how we define copyright. Such rulings have been going on for a long time and not just in this case:

https://en.wikipedia.org/wiki/White-Smith_Music_Publishing_Co._v._Apollo_Co.
https://en.wikipedia.org/wiki/Williams_%26_Wilkins_Co._v._United_States
https://en.wikipedia.org/wiki/Sony_Corp._of_America_v._Universal_City_Studios,_Inc.
https://en.wikipedia.org/wiki/Bridgeman_Art_Library_v._Corel_Corp.

What's relevant is whether a work satisfies a set threshold of originality, or whether it contains, in part or as a whole, other copyrighted works. — jkop

If we then look at only the output, there's cases like Mannion v. Coors Brewing Co., in which the derivative work can be argued is even more closely resembled to the original than what a diffusion model produce even when asked to do a direct copy, and yet, the court ruled that it was not copyright infringement.

So where do you draw the line? As soon as we start to define "originality" and we start to use scientific research on human creativity, we run into the problem of what constitutes "inspiration" or "sources" for the synthesis that is the creative output.

There is no clear line about what constitutes "originality", so it's not a binary question. AI generation can be ruled both infringement and not, so it all ends up being about alignment; how to make sure the system acts within copyright laws and not that it, in itself, breaks copyright law, which is what the anti-AI movement is trying to prove, on these shaky grounds. And the question of what constitutes "originality" is within the history of copyright cases a very muddy defined concept, to the point that anyone saying the concept is "clear", don't know enough about this topic and has merely made up their own mind about what they themselves believe, which is no ground for any law or regulation.

There are also alternatives or additions to copyright, such as copyleft, Creative Commons, Public Domain etc. Machines could be "trained" on such content instead of stolen content, but the Ai industry is greedy, and to snag people's copyrighted works, obfuscate their identity but exploit their quality will increase the market value of the systems. Plain theft! — jkop

And now you're just falling back on screaming "theft!" You simply don't care about the argument I've made over and over now. Training data is not theft because it's not a copy and the process mimics how the human brain memorize and synthesize information. It's not theft for a person with photographic memory, so why is it theft for these companies when they're not distributing the raw data anywhere?

Once again you don't seem to understand how the systems work. It's not about greed; the systems require such a large amount of data to function in a way that makes the technology function properly. The amount of data is key. And HOW the technology works is absolutely part of how we define copyright laws, as described with the cases above. So ignoring how this tech works and just screaming that they are "greeeeedy!" just becomes the same shouting polarized hashtag mantra that everyone else is doing right now.

And this attitude and lack of knowledge about the technology show up in your contradictions:

Because a court looks at the work, that's where the content is manifest, not in the mechanics of an Ai-system nor in its similarities with a human mind. — jkop

Machines could be "trained" on such content instead of stolen content, but the Ai industry is greedy... Plain theft! — jkop

...If the court should just look at the output, then the training data and process is not the problem, but still you scream that this process is theft, even though the court might only be able to look at what the output of these AI models are doing.

The training process using copyrighted material happens behind closed doors. Just like artists gathering copyrighted material in their process to produce their artwork. If the training process on copyrighted material is identical to an artist using copyrighted material when working, since both appears behind closed doors... the only thing that matters is the final artwork and output from the AI. If alignment is solved, there won't be a problem, but the use of copyrighted material in the training process is not theft, regardless of how you feel about it.

Based on previous copyright cases, if the tech companies win against those claiming the training process is "theft", it won't be because the companies are greedy and have "corrupted" legal teams, it will be because of the copyright law itself and how it's ruled in the past. It's delusional to think that all of this concludes in "clear" cases of "theft".

jkop

↪Christoffer

One difference between A and B is this:

You give them the same analysis regarding memorizing and synthesizing of content, but you give them different analyses regarding intent and accountability. Conversely, you ignore their differences in the former, but not in the latter.

They should be given the same analysis.

Christoffer

One difference between A and B is this:

You give them the same analysis regarding memorizing and synthesizing of content, but you give them different analyses regarding intent and accountability. Conversely, you ignore their differences in the former, but not in the latter. — jkop

No, the AI system and the brain function in their physical process has no accountability because you can only put guilt on something that has subjective intent. And I've described how intent is incorporated into each. The human has both the process function AND the intent built into the brain. The AI system only has the process system and the intent is the user of that system. But if we still put responsibility on the process itself, then it's a problem with alignment and we can fine tune the AI system to align better. Even better than we can align a human as human emotion comes in the way of aligning their intent. Which is why accidental plagiarism happens all the time. We simply aren't smart enough in comparison to an AI model that's been properly aligned with copyright law. Such a system will effectively be better than a human at producing non-copyrighted material set within a decided value of "originality".

jkop

↪Christoffer

You ask "Why is B theft?" but your scenario omits any legal criteria for defining theft, such as whether B satisfies a set threshold of originality.

How could we know whether B is theft when you don't show or describe its output, only its way of information processing. Then, by cherry picking similarities and differences between human and artificial ways of information processing, you push us to conclude that B is not theft. :roll:

Christoffer

You ask "Why is B theft?" but your scenario omits any legal criteria for defining theft, such as whether B satisfies a set threshold of originality.

How could we know whether B is theft when you don't show or describe its output, only its way of information processing. Then, by cherry picking similarities and differences between human and artificial ways of information processing, you push us to conclude that B is not theft. :roll: — jkop

Because the issue that the whole argument I made is about... is that there are claims of copyright infringement put on the process of training these models, not the output. When people scream "theft!" to the tech companies, they are screaming at the process of training the models using copyrighted material. What I demonstrated with the scenarios is that the process does not fall under copyright infringement, because it's an internal process that is behind closed doors, either inside our head or in the AI lab. And so, that process cannot be blamed for copyright infringement and the companies cannot be blamed to violate any copyright other than the output.

Because of this, the output is a question of alignment and the companies are actively working towards mitigating accidental plagiarism. Which means they're already working to adress the problems that artists don't like about AI generations. And the user is then solely the one responsible for how they use the generated images and are solely the ones who need to make sure they don't end up with plagiarized content.

But the main issue, why I'm making this argument, is that none of this matters for the artists doing lawsuits. They are attacking the first part, the process, the one that the scenario A and B is about. And therefore shown to not be interested in alignment or making sure these models are safe from plagiarism. Instead, they either have no knowledge of the technology and make shit up about how it is theft, things that aren't true about how the technology works because they think the companies just take their stuff and put it out there. Or they know how the technology works, but they intentionally target the part of the technology that would kill the models as an attempt to destroy the machines as the luddites did. Both of these stances are problematic and could lead to court rulings at a loss for artists, effectively giving less voice to artists in this matter, rather than enforcing them.

The argument is about focusing on alignment and how to improve the outputs past plagiarism. It's about making sure LLMs always cite something if they use direct quotes, and have guardrails that self-analyze the outputs to make sure it falls within our rather arbitrary definitions of originality.

Because people stare blindly into the darkness about these AI models. The positive sides of them, all the areas in which we actually benefit, like medical, sciences and even artists in terms of certain tools, are actually partly using the same models that are trained on this copyrighted material, because the amount of data is key to the accuracy and abilities of the models. So when artist want to block the usage of copyrighted material in training data, they're killing far more than they might realize. If a cancer drug development is utilizing GPT-4 and they suddenly must shut it down and retrain on less data, it will stop the development of that drug as well as maybe not be able to continue if a reworked model doesn't function the same due to the removal of a large portion of training data.

People, simply don't understand this technology and run around screaming "theft!" just because others scream "theft!". There's no further understanding and no further nuance to this topic and this simplified and shallow version of the debate needs to stop, for everyone's sake. These models are neither bad or good, they're tools and as such it's the usage of the tools that needs to be adressed, not let luddites destroy them.

BC

And those extensive intentions are what, in your perspective? And in what context of copyright do those intentions exist? — Christoffer

#1. Make money.

I do not know what percent of the vast bulk of material sucked up for AI training is copyrighted, but thousands of individual and corporate entities own the rights to a lot of the AI training material. I don't know whether the most valuable part was copyrighted currently, or had been copyrighted in the past, nor how much was just indifferent printed matter. Given the bulk of material required, it seems likely that no distinction was made.

Perhaps using the English speaking world's copyrighted material to train AI is covered by "Fair Use", but perhaps not. IF an AI company sells information containing content from the New York Times or National Enquirer without paying royalties, that would not be fair use. If an AI produces a novel which has a remarkable similarity to a novel by a known published author, that may not be fair use, either.

Social media makes money out of the communication between people. It isn't copyrighted and people voluntarily provide it, whether it be slop of pearls. The many people who produce copyrighted material haven't volunteered to give up their ideas.

AI is breaking new ground here, and legislation and courts have not had time to sort out the various ownership issues.

There is a matter of trust here. There is no reason we should trust AI technology and its corporate owners.

wonderer1

There is a matter of trust here. There is no reason we should trust AI technology and its corporate owners — BC

A just machine to make big decisions
Programmed by fellas with compassion and vision
We'll be clean when their work is done
We'll be eternally free, yes, and eternally young
- Donald Fagan, IGY

:chin:

BC

↪wonderer1

Nice! I like that. Here's another in like vein:

All Watched Over By Machines Of Loving Grace

I like to think (and
the sooner the better!)
of a cybernetic meadow
where mammals and computers
live together in mutually
programming harmony
like pure water
touching clear sky.

I like to think
(right now, please!)
of a cybernetic forest
filled with pines and electronics
where deer stroll peacefully
past computers
as if they were flowers
with spinning blossoms.

I like to think
(it has to be!)
of a cybernetic ecology
where we are free of our labors
and joined back to nature,
returned to our mammal
brothers and sisters,
and all watched over
by machines of loving grace.

..................Richard Brautigan, Poet in Residence, California Institute of Technology in Pasadena, California 1967

Mr Bee

The difference between the systems and the human brain has more to do with the systems not being the totality of how a brain works. It's simulating a very specific mechanical aspect of our mind, but as I've mentioned it lacks intention and internal will, which is why inputted prompts need to guide these processes towards a desired goal. If you were able to add different "brain" functions up to the point that the system is operating on identical terms as the totality of our brain, how do laws for humans start to apply on the system? When do we decide it having agency enough to be the one responsible for actions? — Christoffer

So your claim is that adding intentionality to current diffusion models is enough to bridge the gap between human and machine creativity? Like I said before I don't have the ability to evaluate these claims with the proper technical knowledge but that sounds difficult to believe.

Because when we compare these systems to that of artists and how they create something, there are a number of actions by artists that seem far more infringing on copyright than what these systems do. If a diffusion model is trained on millions of real and imaginary images of bridges, it will generate a bridge that is merely a synthesis of them all. And since there's only a limited number of image perspectives of bridges that are three-dimensionally possible, where it ends up will weight more towards one set of images than others, but never a single photo. An artist, however, might take a single copyrighted image and trace-draw on top of it, essentially copying the exact composition and choice of perspective from the one who took the photograph.

So if we're just goin by the definition of a "copy" or that the system "copies" from the training data, it rather looks like there are more artists actually copying than there are actual copying going on within these diffusion models. — Christoffer

Okay, but in most instances artists don't trace.

Copyright law has always been shifting because it's trying to apply a definition of originality to determine if a piece of art is infringement or not. But the more we learn about the brain and creative process of the mind, the more we understand of how little free will we actually have and how influential our chemical and environmental processes are in creativity, and how less logical it is to propose "true originality". It simply doesn't exist. But copyright laws demand that we have a certain line drawn in the sand that defines where we conclude something "original", otherwise art and creativity cannot exist within a free market society. — Christoffer

I don't see how originality is undermined by determinism. I'm perfectly happy to believe in determinism, but I also believe in creativity all the same. The deterministic process that occurs in a human brain to create a work of art is what we call "creativity". Whether we should apply the same to the process in a machine is another issue.

Anyone who studied human creativity in a scientific manner, looking at biological processes, neuroscience etc. will start to see how these definitions soon become artificial and non-scientific. They are essentially arbitrary inventions that over the centuries and decades since 1709 have gone through patch-works trying to make sure that line in the sand is in the correct place.

...So, first, creativity isn't a magic box that produce originality, there's no spiritual and divine source for it and that produces a problem for the people drawing the line in the sand. Where do you draw it? When do you decide something is original? — Christoffer

Indeed the definitions are very arbitrary and unclear. That was my point. It was fine in the past since we all agree that most art created by humans is a creative exercise but in the case of AI it gets more complicated since now we have to be more clear about what it is and if AI generated art meets the standard to be called "creative".

Second, artists will never disappear because of these AI models. Because art is about the communication between the artist and their audience. The audience want THAT artist's perspective and subjective involvement in creation. If someone, artists or hacks who believe they're artists, think that generating a duplicate of a certain painting style through an AI system is going to kill the original artist, they're delusional. The audience doesn't care to experience derivative work, they care only about what the actual artist will do next, because the social and intellectual interplay between the artist and the audience is just as important, if not the most important aspect rather than some derivative content that looks similar. That artists believe they're gonna lose money on some hacks forcing an AI to make "copies" and derivative work out of their style is delusional on both sides of the debate. — Christoffer

Artists will never entirely disappear, I agree. And indeed there will certainly continue be a market for human made art as consumers will generally prefer it. The idea that artists can be "replaced" or could be made "obsolete" simply misunderstand the very concept of art itself which is that it isn't a commodity and short of completely cloning an individual artist, you can never truly make someone who creates art like they do. There are plenty of people who will pay up good money to get a drawing by their favorite artist in spite of the number of human artists who can perfectly replicate their style. This is because they value their work in particular and I don't see the rise of AI changing that.

However the problem is that in today's art industry, we don't just have artists and consumers but middle men publishers who hire the former to create products for the latter. The fact is alot of artists depend on these middle men for their livelihoods and unfortunately these people 1) Don't care about the quality of the artists they hire and 2) Prioritize making money above all else. For corporations artists merely create products for them to sell and nothing more so when a technology like AI comes up which produces products for them for a fraction of the cost in a fraction of the time, then they will more than happily lay off their human artists for what they consider to be "good enough" replacements even if the consumers they sell these products to will ultimately consider them inferior.

There are people who take personal commissions but there are also those that do commissions for commercial clients who may want an illustration for their book or for an advertisement. Already we're seeing those types of jobs going away because the people who commissioned those artists don't care in particular about the end product so if they can get an illustration by a cheaper means they'll go for it.

Then you agree that the training process of AI models does not infringe on copyright and that it's rather the problem of alignment, i.e how these AI models generate something and how we can improve them not to end up producing accidental plagiarism that the focus should be on. And as I mentioned above, such a filter in the system or such an additional function to spot plagiarism would maybe even be helpful to determine if plagiarism has occurred even outside AI generations; making copyright cases more automatic and fair to all artists and not just the ones powerful enough to have a legal teams acting as copyright special forces. — Christoffer

Of course the data collection isn't the problem but what people do with it. It's perfectly fine for someone to download a bunch of images and store it on their computer but the reason why photobashing is considered controversial is that it takes that data and uses it in a manner that some consider to be insufficiently transformative. Whether AI's process is like that is another matter that we need to address.

---------------

Sorry if I missed some of your points but your responses have been quite long. If we're gonna continue this discussion I'd appreciate it if you made your points more concise.

jkop

↪Christoffer

Ok, if your opponent's arguments are also about the nature of the information processing, then they cannot say whether B is theft. No-one can from only looking at information processing.

The painting of Mona Lisa is a swarm of atoms. Also a forgery of the paining is a swarm of atoms. But interpreting the nature of these different swarms of atoms is neither sufficient nor necessary for interpreting them as paintings, or for knowing that the other is a forgery.

Whether something qualifies for copyright or theft is a legal matter. Therefore, we must consider the legal criteria, and, for example, analyse the output, the work process that led to it, the time, people involved, context, the threshold of originality set by the local jurisdiction and so on. You can't pre-define whether it is a forgery in any jurisdiction before the relevant components exist and from which the fact could emerge. This process is not only about information, nor swarms of atoms, but practical matters for courts to decide with the help of experts on the history of the work in question.

Addition:
When the producer of a work is artificial without a legal status, then it will be its user who is accountable. If the user remains unknown, the publisher is accountable (e.g. a gallery, a magazine, book publisher, ad-agency etc).

Regarding the training of Ai-systems by allowing them to scan and analyse existing works, then I think we must also look at the legal criteria for authorized or unauthorized use. That's why I referred to licenses such as Copyleft, Creative Commons, Public Domain etc. Doesn't matter whether we deconstruct the meanings of 'scan', 'copy', 'memorize' etc. or learn more about the mechanics of these systems. They use the works, and what matters is whether their use is authorized or not.

Barkon

I don't think so(I don't think it's illegal). It is in effect, browsing and sometimes buying. We are allowed to build ideas and conjectures about what we browse, and a computer article shouldn't be any different.

Shawn

Just a random question. Had someone sold the database of all posts of a forum (not this one, in my mind), would that be considered theft or public information?

Christoffer

#1. Make money. — BC

There are better ways to make money. It's easy to fall into the debate trap of always summerizing anything against companies as having pure capitalist interests, but trying to solve AI tools is kinda the worst attempt if all intentions are just profit. And the competition in this new field as an industrial component of society demands making sure it is the best product without risking courts taking them down and ruining your entire business.

I do not know what percent of the vast bulk of material sucked up for AI training is copyrighted, but thousands of individual and corporate entities own the rights to a lot of the AI training material. I don't know whether the most valuable part was copyrighted currently, or had been copyrighted in the past, nor how much was just indifferent printed matter. Given the bulk of material required, it seems likely that no distinction was made. — BC

Since the amount of data is key for making the systems accurate and better, it needs to be as much as possible. And since text and images are in larger quantities today than from people who died 70 years ago, it is required to use copyrighted material.

But since it is training data, why does it matter? People seem confused as to what "theft" really means when talking about training data. Memory is formed through "pattern networks" similar to the human brain, nothing is copied into the system. Weights and biases are programmed in to guide the learning and it learns how to recognize pattern structures on all the data. Since it's learning this, it's learning the commonalities in images, text or whatever data in order to form understanding on how to predict next steps in the generation. When it accidentally plagiarize something, it's similar to how we picture a memory in our head as clear as we can. I can remember a Van Gogh painting with high clarity, but is still not identical to the original. I can remember text I've read, but I often misremember the exact lines. This is because my mind fills in the gaps through predictive methods based on other memory and other information I've learned.

As I've mentioned numerous times in this thread, it's important to distinguish between training processes and generated outputs. The alignment problem is about getting the system to function in a way not destructive to society, in this case destructive to artists copyright. But the data it was trained on is part of information available in the world and since it's used behind closed doors in their labs, it's no different from how artists use copyrighted images, music or text in their workflow while creating something. A painter who cut out images from magazines and use it as inspirational references while painting, copying compositions, objects, colors and similar from those images are basically the same as an AI model trained on those images, maybe even less direct in copying when compared to some artists.

Have you ever scrolled through royalty free library music? Their process is basically taking what's popular in commercial music or movie soundtracks, replicating the sound of all instruments, take enough notes but changing one or two so as to not copy some track directly. How is this different from anything that Suno or Udio is doing?

And with scale, with millions of images, text, music etc. it means that the risk of accidental plagiarism is very low compared to an artist using just a few sources.

In the end it's still the responsibility of the one using the system to generate something who need to make sure they're not plagiarizing anything. It's their choice of how to use the image.

The output and the training process is not one and the same thing, but people use the nature of outputs and accidental plagiarism in outputs in relation to the training process and training data as proof for theft when it's not actually in support of such a conclusion. There's no database of copyrighted material on some cloud server somewhere in which these systems "get the originals". They don't store any copyrighted material anywhere but in their own lab. So how does that differ from an artist who's been scraping the web for references in their work storing it on their own hard drives?

The many people who produce copyrighted material haven't volunteered to give up their ideas. — BC

In what way did they "give up their ideas"? If I create an image, upload it to something like Pinterest and someone else downloads that image to use as a reference in their own artistic work, then they didn't commit theft. Why is it theft if a company uses it for training data of an AI model? As long as the output and generations are aligned not to fall into plagiarism, why does that matter anymore than if another artist used my images in their own workflow? Because the companies are part of a larger capitalist structure? That's not a foundation for defining "theft".

Here's an example of artwork that is "inspired" by the artist Giorgio de Chirico for the game cover of "ICO".

?u=https%3A%2F%2Fwww.mobygames.com%2Fimages%2Fcovers%2Fl%2F127591-ico-playstation-2-front-cover.jpg&f=1&nofb=1&ipt=cb996a3fd2c882188cea636d3b03e7167a122264383426b0be258fdccd8fd051&ipo=images

?u=https%3A%2F%2Fwww.mobygames.com%2Fimages%2Fcovers%2Fl%2F127591-ico-playstation-2-front-cover.jpg&f=1&nofb=1&ipt=cb996a3fd2c882188cea636d3b03e7167a122264383426b0be258fdccd8fd051&ipo=images

?u=https%3A%2F%2Fwww.soho-art.com%2Fshopinfo%2Fuploads%2F1281311321_large-image_giorgiodechiricothenostalgiaoftheinfinite1912013oilpaintinglarge.jpg&f=1&nofb=1&ipt=01354c6508c2adee6a25cec712e0bc8e33807e503895fba98d34cc82ba42713c&ipo=images

?u=https%3A%2F%2Fi0.wp.com%2Fftn-blog.com%2Fwp-content%2Fuploads%2F2017%2F08%2Fschermafbeelding-2017-08-14-om-08-59-07.png&f=1&nofb=1&ipt=e7bb7f3d5203d63f8e2ca4fcfe44d0454de9b47ebb6186d48c0251e6666e8730&ipo=images

No one cares about that, no one screams "theft", in general they loved how the artist for the cover got "inspired" by Giorgio de Chirico.

But if I were to make an image with an AI diffusion model doing exactly the same, meaning, some elements and general composition is unique, but the style, colors and specific details were similar but not copied, and then use it commercially, then everyone would want to crucify me for theft.

If it was even possible, because if I ask DALL-E for it, it simply replies:

I was unable to generate the image due to content policy restrictions related to the specific artistic style you mentioned. If you'd like, I can create an image inspired by a surreal landscape featuring a windmill and stone structures at sunset, using a more general artistic approach. Let me know how you'd like to proceed!

And if I let it, this pops out:

It's vaguely resembling some aspects of Giorgio de Chirico's art style, but compared to the ICO game cover, it's nowhere near it.

This is an example of alignment for the usage of these systems, in which the system tries to recognize attempts to plagiarize. And this process is improving all the time. But people still use old examples of outdated models to "prove" what these systems are doing at the moment. Or they use the examples of companies or people who blatantly doesn't care about alignment to prove that a totally other company also does it because... "AI is evil" which is just the length of their entire argument.

And with court rulings from the past ruling in favor of the accused like in Mannion v. Coors Brewing Co., then artists seem to be protected far more for blatant rip-offs than an AI diffusion model producing something far less of a direct copy.

So your claim is that adding intentionality to current diffusion models is enough to bridge the gap between human and machine creativity? Like I said before I don't have the ability to evaluate these claims with the proper technical knowledge but that sounds difficult to believe. — Mr Bee

Why is it difficult to believe? It's far more rooted in current understandings in neuroscience than any spiritual or mystifying narrative of the uniqueness of the "human soul" or whatever nonsense people attribute human creativity to stem from. Yes, I'm simplifying it somewhat for the sake of the argument; the intention and predictive/pattern recognition system within us are rather constantly working as a loop influencing each other and constantly generating a large portion of how our consciousness operates. Other parts of our consciousness functions the same; like how our visual cortex isn't getting fed by some 200 fps camera that is our eyes, but rather that our eyes register photons that our visual cortex interprets by generating predictions in-between the raw visual data we feed through our eyes. It's the reason why we have optical illusions and if we stare at some object in high contrast for a long period of time and then look at a white canvas we see an inverted image as our brain try to over-compensate by generating a flow of data to fill in gaps that aren't seen anymore in raw data.

At its core the actual structure of a neural engine or machine learning is mimicking the exact nature of how our brain operates with pathways. We don't have a raw data copy of what we see or hear, we have paths that forms in relation to other memory paths and the relations between them forms the memories that we have. It's why we can store such vast amounts of information in our heads because it's not bound to physical "bits", but connections which become exponentially complex the more you have.

Inspired by these findings in neuroscience, machine learning using neural maps started to show remarkable increases of computing capabilities far beyond normal computers, but what they gained in compute power, they lost in accuracy. Which is key to understanding all of this.

They don't copy anything because that would mean an AI model would be absolutely huge in size. The reason I can download an AI model that is rather trivial in size is because it's just a neural map, there's no training data within them. It's just a neural structure "memory", similar to the neural maps in our own brains.

And they're using the same "diffusion model" operations that tries to mimic how we "remember" from this neural map by analyzing the input (intention) and find pathway memories that links to the meaning of the input and interpret it into predictions that generating something new.

Recent science (that I've linked in above posts) have started to find remarkable similarities between our brain and these systems. And that's because they didn't make these AI models based on some final conclusions about our brain, they were instead inspired by what was found in neural science and just tried methods to mimic our brain, without knowing if it would work or what would happen.

This is the reason why no one still knows how an LLM could generate fluid text in other languages without direct programming of such functions and why many of these other functions just emerged from these large quantities of text data forming these neural maps.

It's rather that because these companies did all of this, neuroscientists are starting to use their research papers back into their own field as it shows hints at how functions in our brain emerge abilities just by how prediction occurs within these neural pathways. It's basically someone trying something before people know exactly how something works and discovering actual results.

The point being, it mimics what we know about how our brain generate something like an image or text and thus, what's missing is everything that constitutes an "intention" in this process. "Intention" isn't just a computational issue, but something that reflects the totality of our mind, with emotions, behaviors and what might constitute everything about they physical of being a human. Human "intention" might therefore not be able to be copied without requiring everything that constitutes being a "human".

A good example of another technology that mimics a human function is the camera, or the recorder and speakers. These are more simplistic in comparison, but we've replicated the way our eyes register photons, especially in digital cameras, with lenses, rods and cones. And we've replicated how we record sounds using membranes and how our vocal cords produce sounds like membranes in a speaker and its hollow structure which forms sounds like our throat.

But when we mimic brain structures and we witness how they form behaviors similar to how our brain functions during creativity, we are all of a sudden thrown into moral questions about copyright in which people who don't understand the tech generally argues like those sitting in the audience at the first film projection, believing the train actually is about to hit them, or how record players and cameras took the souls of people when they got recording in those mediums.

As far as I see this, it's a religious and spiritual realm that makes people fear these AI models core functions, not scientific conclusions. It's about people breaking cameras because they think they will capture their souls.

Okay, but in most instances artists don't trace. — Mr Bee

Neither do diffusion models, ever. But artists who trace will still come out unscathed compared to how people react to AI generated images. Where is the line drawn? What's the foundation on which definitions of such differences are made?

I don't see how originality is undermined by determinism. I'm perfectly happy to believe in determinism, but I also believe in creativity all the same. The deterministic process that occurs in a human brain to create a work of art is what we call "creativity". Whether we should apply the same to the process in a machine is another issue. — Mr Bee

That's not enough of a foundation to conclude that machines do not replicate the physical process that goes on in our brain. You're just attributing some kind of "spiritual creative soul" to the mind, that it's just this "mysterious thing within us" and therefore can't be replicated.

Attributing some uniqueness to ourselves only because we have problems comprehending how this thing within us works or function isn't enough when we're trying to define actual laws and regulations around a system. What is actually known about our brain through neuroscience is the closest thing we have to an answer, and that should be the foundations for laws and regulations, not spiritual and emotional inventions. The fact is that human actions are always traceable to previous causes, to previous states, and a creative choice is no different from another type of choice.

The only reason why people attribute some uniqueness to our internal processing in creativity is because people can't separate their emotional attachment to the idea of divine creativity; It's basically just an existential question that when we break down the physical processes of the brain and find the deterministic behaviors and demystify creativity, people feel like their worldview and sense of self gets demolished. And the emotional reactions from that are no grounds for actual conclusions about how we function and what can be replicated in a machine.

Indeed the definitions are very arbitrary and unclear. That was my point. It was fine in the past since we all agree that most art created by humans is a creative exercise but in the case of AI it gets more complicated since now we have to be more clear about what it is and if AI generated art meets the standard to be called "creative". — Mr Bee

And when we dig into it, we see how hard it is to distinguish what actually constitutes human creativity form machine creativity.

However, I don't think this is a problem due to the problem of "intention". These AI models aren't really creative in exactly the way we are. They mimic the physical processes of our creativity, which isn't the same as the totality of what constitutes being creative. We might be able to replicate this in the future, but for now, the intention is what drives the creativity, we are still asking the AI to make something, we are still guiding it. It cannot do it on its own, even though it has the physical neural pathway processing replicated. Even if we just hit a button for it to randomly create something, it then gets guided by the fundamental weights and biases that were there to inform it's fundamental basic handling of the neural map.

To generate we must combine intention with the process and therefor before we can judge copyright infringement, those to must have produced something. I.e the output. And so, the argument I've been making in here is that any attempt to blame the training process due to using copyrighted data in the training data is futile as nothing have been created until after intention and process generates an output.

Only then can plagiarism and other copyright problem be called into question.

However the problem is that in today's art industry, we don't just have artists and consumers but middle men publishers who hire the former to create products for the latter. The fact is alot of artists depend on these middle men for their livelihoods and unfortunately these people 1) Don't care about the quality of the artists they hire and 2) Prioritize making money above all else. For corporations artists merely create products for them to sell and nothing more so when a technology like AI comes up which produces products for them for a fraction of the cost in a fraction of the time, then they will more than happily lay off their human artists for what they consider to be "good enough" replacements even if the consumers they sell these products to will ultimately consider them inferior.

There are people who take personal commissions but there are also those that do commissions for commercial clients who may want an illustration for their book or for an advertisement. Already we're seeing those types of jobs going away because the people who commissioned those artists don't care in particular about the end product so if they can get an illustration by a cheaper means they'll go for it. — Mr Bee

Yes, some jobs will disappear or change into a new form. This have happened all the time throughout history when progress is rapidly changing society. Change is scary, and people are most of the time very comfortable in their bubble which when popped lead them to strike out like an animal defending themselves. This is where the luddite behavior comes into play.

But are we saying that we shouldn't progress technology and tools because of this?

When photoshop arrived with all its tools, all the concept artists who used pencils and paint behaved like luddites, trying to work against concept art being made with these new digital tools. When digital musical instruments started becoming really good, the luddites within the world of composing started saying that people who can't write notes shouldn't be hired or considered "real" composers.

What both these companies think and what luddites think of AI, they both forget that the working artist's biggest skill isn't that they can paint a straight line, or play a note, it's that they have the eye for composition and design, that they have an ear for melody, a mind for poetry.

People seem to have forgot what artists are actually hired for and it's not the craft. A concept artist isn't really hired for their personal style (if they're not among the biggest names in the industry), they're hired to guide the design based on the requirements of the project. They're hired for their ability to evaluate what's being formed and created.

All forms of art made within a larger project at such companies like game studios etc. is all in slavery to the overarching design of the entire project.

And because of this, the input that these artists have is very limited to the fundamental core of their expertise, i.e the knowledge of the artist to guide design towards the need of the project.

Therefore, a company who fires an artist in favor of someone who's not an artist to start working with AI generation, will soon discover that the art direction becomes sloppy and uninspiring, not because the AI model is bad, but because there's no "guiding principles" and expert eye guiding any of it towards a final state.

This is why artists need to learn to work with these models rather than reject them. Find ways of fusing their art style, maybe even train a personalized AI on their own art and work in a symbiosis with it.

Because this type of work for corporations is fundamentally soulless anyway. These artists aren't working with something they then own, the corporation owns it. They're there to serve a purpose.

In reality, an artist speeding up their process with AI would leave them more time to actually create for themselves. More time to figure out their own ideas and explore in more meaningful ways. Because they don't have to work overtime for some insecure producer who constantly changes their mind making them have to do patch works of their art because other people are lacking in creativity or ability to understand works of art.

Anyone who's been working within these kinds of corporate systems and these types of corporations aren't actually happy. Because there's no appreciation for anything they do and no understanding of their ideas as everything is filtered through whatever corporate strategy that drives the project from above.

Why not then be the artist who's an expert with AI? Because you can't put an intern onto writing prompts, that's not how generative AI works. You need to have an eye for art even if you work with AI. Utilizing AI in your work for these companies does not destroy your artistic soul for what you make for yourself or your own projects.

The "good enough" companies, before these AI models, have never been good for artists anyway. Why would artists ever even care for their work towards these companies if they themselves won't care for the artists? So if they start becoming artist with expertise in AI, then these companies will soon hire them back once they realize it's just not viable to have non-artists handling their AI generations.

I don't think artists have really been thinking enough about this shift in the industry. Instead they're behaving like luddites thinking that's a good way forward. And companies who don't know the value of artists, didn't value artists before either. And those companies who are forced into using AI due to how it speeds up projects when trying to compete with others, but who still focus on the quality of art in their product, will still hire actual artists for the job of handling the generative AIs. Because they understand that the quality of the artist isn't in the brush, or photoshop, or a random prompt, it's in the eye, ear and mind to evaluate what's being created, to make changes, to guide something, a design, towards a specific goal.

How all of this just seem to get lost in the debate about generative AI is mind boggling.

Maybe this is because there are more non-artists, who aren't even working with any of this, who drives the hate against AI. Just like all the people who hate CGI in movies and just suck up the marketing of "no CGI" in movies nowadays. When the truth is that CGI is used all of the time in movies and these people simply have no idea what they're talking about, they just want to start a brawl online to feed the attention economy with their ego.

Of course the data collection isn't the problem but what people do with it. It's perfectly fine for someone to download a bunch of images and store it on their computer but the reason why photobashing is considered controversial is that it takes that data and uses it in a manner that some consider to be insufficiently transformative. Whether AI's process is like that is another matter that we need to address. — Mr Bee

Then you agree that the lawsuits going on that targets the training process rather than the outputs, uses of outputs and the users misusing these models are in the wrong.

Sorry if I missed some of your points but your responses have been quite long. If we're gonna continue this discussion I'd appreciate it if you made your points more concise. — Mr Bee

Sorry, ready this too late :sweat: But still, the topic requires some complexity in my opinion as the biggest problem is how the current societal debate about AI is often too simplified and consolidated down into shallow interpretations and analysis.

Christoffer

The painting of Mona Lisa is a swarm of atoms. Also a forgery of the paining is a swarm of atoms. But interpreting the nature of these different swarms of atoms is neither sufficient nor necessary for interpreting them as paintings, or for knowing that the other is a forgery. — jkop

Neither is an argument that compares laws based on actions that aren't within the actions being targeted by existing laws. The training of these models is similar to any artist who works behind closed doors with their own "stolen" copyrighted material. It's still the output that's the main focus of determining plagiarism and what actions are taken to mitigate that. An artist who accidentally plagiarize does not get barred from ever making art again. A system that didn't handle accidental plagiarism in earlier versions before, but is then optimized to mitigate it, should that be banned?

You still seem to be unable to separate the formation of these neural models with the actions in generating outputs. The training process is not breaking copyright because the output and use of the system did so. That's purely a problem with alignment.

So this argument about Mona Lisa makes no sense in this context.

Whether something qualifies for copyright or theft is a legal matter. Therefore, we must consider the legal criteria, and, for example, analyse the output, the work process that led to it, the time, people involved, context, the threshold of originality set by the local jurisdiction and so on. You can't pre-define whether it is a forgery in any jurisdiction before the relevant components exist and from which the fact could emerge. This process is not only about information, nor swarms of atoms, but practical matters for courts to decide with the help of experts on the history of the work in question. — jkop

Training an AI model on copyright data is still not infringement anymore than an artist who use copyrighted material in their work. The output is all there is and if accidental plagiarism is happening all the time, courts can demand that these companies do what it takes to mitigate that from happening, which they are already doing... the rest is up to the intent of the user and should be the responsibility of the user. If the user is forcing a system to plagiarize, that's not the tech company's fault anymore than Adobe being responsible for if someone is using someone else's photo to enhance their own by manually changing it in Photoshop.

Blaming the training process for copyright infringement would fall into an arbitrary definition about how we can handle copyright data in our own private sphere. It would mean that I cannot take a favorite book and in my private home take a pencil and write down segments from that book onto a piece of paper, that would constitute copyright infringement, even if I don't spread that paper around officially.

How the tech works is out in the open, but people make emotionally loaded and wrong interpretations of it based on their own lack of understanding both about the technology.

Like, if I ask: where's the copyrighted data if I download an AI model? Point to where that data is please. It's not encrypted, that's not why you can't point to it. It's in there in the model, so if you are to claim copyright infringement, you have to point at the copied material inside the AI model. Otherwise how do you define that the companies "spread copyrighted material"? This is the foundation of any claim of theft, because what they did in the lab is behind closed doors, just as with an artist taking copyrighted material into their workflow and process.

Regarding the training of Ai-systems by allowing them to scan and analyse existing works, then I think we must also look at the legal criteria for authorized or unauthorized use. — jkop

I can take a photo of a page in a book and have it for myself. That's not illegal. I can scan and analyze I can do whatever I want as long as I'm not spreading those photos and scans around.

If you claim that training an AI model is unauthorized, then that would mean it is unauthorized for you to take screenshots, take photos of anything in your home (there are copyrighted material everywhere from books to posters and paintings to design of furniture etc.) because that is what the consequence for such definitions will be.

Then you can say, ok we only make it unauthorized to train AI models with such analysis. That would mean that all algorithms in the world that works based on analysing data would be considered unauthorized, fundamentally breaking internet as we know it. Ok, so you say we only do it for generative AI, then why? On what grounds do you separate that from any other scenario? If the defense say that such ruling is arbitrarily in favor of artists based on very unspecified reasons other than that they are humans, and question why engineers aren't allowed to utilize similar processes as artists, then what's morally correct here?

It all starts to boil down to how to define why artists have more rights than these engineers and why the machine isn't allowed while artists utilize even worse handling of copyrighted material in their private workflows.

It you actually start to break all of this down it's not as clear cut as you seem to believe.

Doesn't matter whether we deconstruct the meanings of 'scan', 'copy', 'memorize' etc. or learn more about the mechanics of these systems. They use the works, and what matters is whether their use is authorized or not. — jkop

Of course it matters. Artists scan, copy and memorize lots of copyrighted material in their workflows. Why are they allowed and not the engineers training these models? Why are artists allowed to do whatever they want in their private workflows, but not these companies? Because that's what you are targeting here. What's the difference? I mean, what's the actual difference that produce such a legal difference between the two that you can conclude one illegal over the other?

Just a random question. Had someone sold the database of all posts of a forum (not this one, in my mind), would that be considered theft or public information? — Shawn

If the forum rules constitutes that they in some way "own" the posts, then all people who write on the forum can't do anything about it. However, I don't think forum owners own the posts written so they can't really sell the posts. However, there's nothing illegal with scraping all posts from a forum if they're available as public information. Google's spider is already doing this for search results and as I've pointed out in this thread, training an AI model on those forum posts shouldn't be considered illegal as they're not copying anything more than I'm reading other's posts and can draw upon them when I write something myself.

It all, always, at every point, boils down to what the output of an AI system is. Did the user ask for a plagiarized post from one of the posters of a forum? Then the user is breaking copyright. If the system accidentally plagiarize, then the tech company must be forced to implement mitigation systems to prevent that from happening.

But information, works of art and everything that is available in some form officially, can't be protected from being viewed, analyzed and decoded. That's not what copyright is for, it's for protecting unlawful spreading, reselling or plagiarism of other's work. With enough original transform, an output or spread of such output isn't anymore illegal than if I get inspired by Giorgio de Chirico and create a cover for a game.

The "AI is theft" debate - An argument

Welcome to The Philosophy Forum!

Categories

More Discussions