Op-Ed: How to pay producers for materials used in AI training – Cheapskate or not?

0


OpenAI’s ChatGPT is one of the most powerful generative AI tools available to the public – Copyright AFP/File Kirill KUDRYAVTSEV

The discussion about payment for AI training materials got serious with The New York Times lawsuit. Now it’s reached government level in Australia, in a very tentative form.

Exactly how this glaring case of universal use of other people’s property is compensated will be interesting, and probably messy. There are no current precedents for AI, although simple royalty payments would do the trick.

…Which is where things stop being simple:

New generations of AI will of course have to be trained. That won’t stop.

But, and it’s a truly huge but – There will be existing databases that can simply be loaded into the new AI.

Do AI trainers pay for the load of that existing data into a new AI system? If it were any other type of media, it’d be an easy question.  

Do they want to pay? Probably not. The corporate world is not famous for paying anyone but themselves for anything.

Can they be made to pay? Almost definitely, but you’d need a blockchain system to ensure payments and human oversight. A system like that for films was pioneered by White Rabbit in California circa 2018.  

What about variations in language text from originals? Slight (and often inexcusable) variations in other media often make managing copyright extremely difficult.

In purely business terms there are other issues to address:

How valuable is the language content you’re buying? That’s a sticking point in multiple ways. How do you value Shakespeare as content value in dollars and cents? What use is it? Do you need it to understand references in other texts? You probably do, particularly in metaphoric usage.

If you assume that large language models use purely functional content, you might have a parameter or so. Language is however multifunctional. In context, the same word might mean something totally different.

For example, the subject is finance. The comment is “Fascinating”.  Is it sarcasm, or the more usual meaning? This is where language models have to deal with expressions which might seem out of context with the subject.

Therefore, you need literature, and a lot of it. You need up-to-date expressions and older slang. That’s a pretty broad palette, and demand is likely to increase as the language evolves.

The other issue here is quite mundane enough to cause suicide in bored house bricks. It’s called intellectual property rights. The copyright minefield is always active.

People will and do litigate over any sort of content, and that issue’s been at plague levels for decades. This racket needs to be shut down, and the parasites removed.   

This is largely due to holdovers from old copyright laws and an almost total lack of Copyscape evolution.

It’s not at all hard to do a “List > Read” function. It’s one of the first things you learn in coding. Any verbatim or near-verbatim text should be easy to find. At a certain percentage of content, it’s plagiarism. A court may decide that a lower percentage of content is still plagiarism, or that a key idea has been stolen.

This is LEGO for basic software. It’s also likely to be the basis of a lot of pretty murderous litigation worth big money. You need something better, which can be accepted by all sides without clogging up the courts with minutiae.

It also needs to be 2000% idiot-proof. It should be so simple that producers can check their own work against other people’s work. AI trainers should be in no doubt of whether it’s copyright content.

Then there’s the money. Perhaps millions of years’ worth of royalties could be on the line. The rate of payment for content needs to be bearable for those paying and those being paid.

This is hardly an insoluble issue. Payments of this type have already been set up. It shouldn’t even be an issue.

…But…

There are real ongoing risks here. The whole idea could be derailed by a cheapskate approach to which producers have no option but to react negatively. “Cheap” could turn out to be incredibly expensive by causing a dispute that otherwise wouldn’t arise.

If the payment issues can’t be trusted. It can’t be viable. Loopholes in payment arrangements could become black holes. If a single payment scheme goes sour, so will all its clones.

This cannot be a drill, ever. Just get it right and make it industry standard.


Op-Ed: How to pay producers for materials used in AI training – Cheapskate or not?
#OpEd #pay #producers #materials #training #Cheapskate

Leave a Reply

Your email address will not be published. Required fields are marked *