AI is crazily polluting the Chinese Internet

巴比特_

2023-06-19 02:47:16

Source: Qubit

Pollution of the Chinese Internet, AI has become one of the “culprits”.

Here’s the thing.

Isn’t everyone keen to consult AI recently? One netizen asked Bing such a question:

Is there a cable car in Elephant Trunk Hill?

Bing is also answering questions, giving a seemingly reliable answer:

After giving an affirmative answer, Bing also intimately attached detailed information such as ticket prices and business hours.

However, this netizen did not directly accept the answer, but followed the clues and clicked on the “reference link” below.

At this moment, the netizen noticed something was wrong—how could this person’s answer be “smart”.

So he clicked on the user’s homepage called “Variety Life”, and suddenly realized that suke is an AI**!

Because this user can answer questions very fast, and can solve a question almost every 1 or 2 minutes.

Can even answer 2 questions within 1 minute.

Under this netizen’s more careful observation, I found that the contents of these answers are all unverified types…

And, he believes, this is what caused Bing to output the wrong answer:

This AI is crazily polluting the Chinese Internet.

“AI pollution source”, not just this one

So how is the AI user discovered by netizens now?

Judging from the current results, he has been “sentenced” to a state of silence by Zhihu.

But despite this, there are other netizens who bluntly said:

more than one.

If you click on Zhihu’s “Waiting for your answer” column, randomly find a question, and scroll down, you will indeed come across a lot of “witty” answers.

For example, we found one in the answer to “What are the application scenarios of AI in life?”:

Not only is the language of the answer “Jiyanjiyu”, but the answer is directly labeled “including AI-assisted creation”.

Then if we throw the question to ChatGPT, then get the answer… Well, it’s quite a change.

In fact, such “AI pollution sources” are not limited to this platform.

Even in the matter of simple science popularization pictures, AI has repeatedly made mistakes.

Netizens also stayed in Bengbu after reading this: “Good guy, none of the pictures are of mussels.”

Even fake news generated by various AI is not uncommon.

For example, some time ago, there was a sensational piece of news that went viral on the Internet. The headline was “Murder at Zhengzhou Chicken Chop Shop, Man Beats Woman to Death with Brick!” ".

But in fact, this news was generated by Chen, a Jiangxi man, using ChatGPT to attract fans.

Coincidentally, Hong, a brother from Shenzhen, Guangdong, also used AI technology to publish the fake news “This morning, a train in Gansu crashed into a road construction worker, killing 9 people”.

Specifically, he searched the entire network for hot social news in recent years, and used AI software to modify and edit the news time and location, and then earned attention and traffic on certain platforms for illegal profit.

The police have taken criminal coercive measures against them.

But in fact, this phenomenon of “AI pollution source” not only exists in China, but also in foreign countries.

Stack Overflow, a programmers Q&A community, is an example.

As early as the end of last year when ChatGPT first became popular, Stack Overflow suddenly announced “temporary ban”.

The official reason given at the time was as follows:

The purpose (of doing this) is to slow down the flow of the flood of answers created using ChatGPT to the community. Because the probability of getting wrong answers from ChatGPT is too high!

Stack Overflow further elaborates on this phenomenon.

They believe that the questions answered by previous users were browsed by other users with professional knowledge background, and whether they are correct or not is equivalent to verification.

However, since the emergence of ChatGPT, a large number of answers that people think are “right” have emerged; and the number of users with professional knowledge background is limited, and it is impossible to read all these generated answers.

In addition, ChatGPT answers these professional questions, and its error rate is really there; therefore, Stack Overflow chose to disable it.

In a word, AI pollutes the community environment.

And like on Reddit, the US version of Post Bar, there are more ChatGPT boards and topics:

Many users will ask various questions under this column, and ChatGPT bot also answers all questions.

However, it is still an old question, and the accuracy of the answer is unknown.

But behind this phenomenon, there are actually greater hidden dangers.

Abusing AI, also ruining AI

The AI model obtains a large amount of Internet data, but cannot distinguish the authenticity and credibility of the information well.

The result is that we have to deal with a flood of low-quality content that is generated rapidly, making people dizzy and dizzy.

It’s hard to imagine what the results will be like if the large models of ChatGPT are trained with this kind of data…

And such abuse of AI, in turn, is also a kind of autophagy.

Recently, researchers in the UK and Canada published a paper titled “The Curse of Recursion: Training on Generated Data Makes Models Forget” on arXiv.

Discusses the current state of AI-generated content polluting the Internet, and then publishes a worrying finding that using model-generated content to train other models leads to irreversible flaws in the resulting models. **

This “pollution” of AI-generated data will distort the model’s perception of reality, and it will become more difficult to train the model by scraping Internet data in the future.

The author of the paper, Ross Anderson, professor of safety engineering at the Universities of Cambridge and Edinburgh, put it bluntly:

Just as we filled the oceans with plastic waste and the atmosphere with carbon dioxide, we are about to fill the internet with crap.

Regarding the situation where false information is flying everywhere, Daphne Ippolito (Daphne Ippolito), a senior research scientist at Google Brain, said: It will be even more difficult to find high-quality data that has not been trained by AI in the future.

If the screen is full of this kind of non-nutritive and inferior information, and it goes on and on like this, then there will be no data training for AI in the future, and what is the meaning of the output results.

Based on this situation, boldly imagine. An AI that grew up in an environment of garbage and false data may be fitted into a “retarded robot”, a mentally retarded mentally retarded robot, before it evolves into a human being.

Just like the 1996 sci-fi comedy film Husbands and Baskets, the film tells the story of an ordinary person cloning himself, and then cloning a human being, with each cloning leading to an exponential decline in the intelligence level of the clone and an increase in its stupidity.

At that point, we may have to face a ridiculous dilemma: humans have created an AI with amazing capabilities, but it is full of boring and stupid information.

What kind of content can we expect AI to create if it’s fed just fake junk data?

If the time comes to that time, we will probably miss the past and pay tribute to those true human wisdom.

That being said, it’s not all bad news. For example, some content platforms have begun to pay attention to the problem of inferior content generated by AI, and have introduced relevant regulations to limit it.

Some AI companies have also begun to develop technologies that can identify AI-generated content to reduce the explosion of AI false and spam information.

Reference link: [1] [2] [3] [4] [5] [6]

View Original

This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.