a16z: The combination of AI and blockchain creates four new business models

2023-08-16 11:16:49

Original video: Web3 with a16z, AI & Crypto

Author: Dan Boneh (Professor at Stanford University, Senior Research Advisor of a16z crypto), focusing on cryptography, computer security and machine learning; Ali Yahya (General Partner of a16z crypto), formerly worked at Google Brain, and is also the Google machine learning library TensorFlow One of the core contributors.

Organize & compile: Qianwen, ChainCatcher

Stephen King once wrote a science fiction novel called “The Diamond Age”, in which there is an artificial intelligence device that acts as a mentor for people throughout their lives. When you’re born, you’re paired with an AI that knows you so well—knows your likes and dislikes, follows you through life, helps you make decisions, and steers you in the right direction. That sounds great, but you never want technology like this falling into the hands of giant middlemen. Because this will bring a lot of control to the company, as well as a series of privacy and sovereignty issues.

**We wanted this technology to be truly mine-owned, and a vision emerged that you could do this with the blockchain. **You can embed artificial intelligence in smart contracts. Keep your data private with the power of zero-knowledge proofs. In the next few decades, this technology will only get smarter and smarter. You can choose to do whatever you want, or change it in any way you wish.

So what is the relationship between blockchain and artificial intelligence? What kind of world will artificial intelligence lead us to? What is the current status and challenges of artificial intelligence? What role will blockchain play in this process?

AI and Blockchain: Compete against each other

The development of artificial intelligence, including the scene described in “The Diamond Age”, has always existed, and it has only recently experienced a leap forward.

**First, AI is largely a top-down, centrally controlled technology. **Encryption technology is a bottom-up, decentralized cooperation technology. In many ways, cryptocurrency is a study of how to build a decentralized system that enables large-scale human cooperation without a central controller in the true sense. In that respect, it’s a natural way in which these two technologies can come together.

AI is a sustainable innovation that enhances the business models of incumbent technology companies and helps them make top-down decisions. The best example of this is Google, which can decide what content to present to users across billions of users and billions of page views. Cryptocurrency, on the other hand, is essentially a disruptive innovation whose business model is fundamentally at odds with that of large technology companies. **Thus, this is a movement led by fringe rebels, not those in power. **

Therefore, artificial intelligence may be closely related to all aspects of privacy protection, and the two promote and interact with each other. AI as a technology has created various incentives that lead to less and less privacy for users because companies want to get all of our data. And artificial intelligence models trained on more and more data will become more effective. On the other hand, AI is not perfect, models can be biased, and biases can lead to unfair results. Therefore, there are many papers on algorithmic fairness at this stage.

I think we’re headed down a path to AI where everyone’s data is aggregated into these massive model training to optimize the model. Cryptocurrencies, on the other hand, move in the opposite direction, increasing personal privacy and empowering users to take control of their data sovereignty. **Arguably, encryption is a technology that rivals an artificial intelligence because it helps us distinguish human or AI-created content from rich content, and in a world flooded with AI-created content, encryption Technology will be an important tool for maintaining and preserving human content. **

Cryptocurrency is the wild west because it’s completely permissionless because anyone can participate. You have to assume that some of these parties are malicious. **So now there is a greater need for tools to help you sort out honest players from dishonest players, and machine learning and artificial intelligence, being an intelligent tool, can actually be of great benefit in this regard. **

For example, there are projects that use machine learning to identify suspicious transactions submitted to wallets. In this way, these transactions of users will be marked and submitted to the blockchain. This works well to prevent users from accidentally submitting all their funds to an attacker, or doing something they will regret later. Machine learning can also be used as a tool to help you judge in advance which transactions may also have mev.

**Just as LLM models can be used to detect fake data or malicious activity, in turn, these models can also be used to generate fake data. **The most typical example is deepfakes. You can create a video of someone saying something they’ve never said before. But blockchain can actually help alleviate this problem.

For example, there is a timestamp on the blockchain, showing that you said such and such a thing on this date. If someone falsifies the video, then you can use the timestamp to deny**. All of this data, real real data, is recorded on the blockchain and can be used to prove that this deepfake video is really fake. **So I think, blockchain might help in fighting counterfeiting.

We can also rely on trusted hardware to do this. Devices like cameras and our phones sign the images and videos they capture as a standard. It’s called C2PA, and it specifies how cameras can sign data. In fact, one of Sony’s cameras can now take photos and videos, and then generate a C2PA signature on the video. This is a complex topic and we will not dwell on it here.

Usually, when newspapers publish pictures, they don’t publish pictures taken by cameras intact. They do cropping, do some licensing on the photo. Once you start editing pictures, it means that the recipients, final readers, and users on the browser do not see the original pictures, and C2PA signature verification cannot be performed.

The question is, how do you get users to confirm that the images they see are indeed properly signed by a C2PA camera? This is where the ZK technique comes in, you can prove that the edited image is actually the result of downsampling and greyscale scaling the correctly signed image. In this way, we can replace the C2PA signature with a simple zk proof, and correspond to these images one-to-one. For now, readers can still confirm that what they’re seeing is the real image. Therefore, zk technology can be used to counteract this information.

How does the blockchain break the game?

Artificial intelligence is essentially a centralized technology. It benefits in large part from economies of scale, as things are much more efficient running off of a single data center. In addition, data, machine learning models, machine learning talent, etc. are usually controlled by a small number of technology companies,

**So how to break the game? Cryptocurrency can help us achieve the decentralization of artificial intelligence by using technologies such as ZKML, which can be applied to data centers, databases, and machine learning models themselves. For example, in terms of computing, using zero-knowledge proofs, users can prove that the process of actually doing inference or training the model is correct.

That way, you can outsource the process to a large community. Under this distributed process, anyone with a GPU can contribute computing power to the network and train models in this way, without having to rely on a large data center where all the GPUs are concentrated.

**Whether this makes sense from an economic point of view is uncertain. But at least with the right incentives, the long tail can be achieved. **You can take advantage of all possible GPU capabilities. Having all of these people contribute computing power to model training or inference runs would replace the big tech companies controlling everything. To achieve this, various important technical issues must be resolved. In fact, a company called Nvidia is building a decentralized GPU computing market, mainly for training machine learning models. In this market, anyone can contribute their own GPU computing power. On the other hand, anyone can leverage any computation present in the network to train their large machine learning models. This will be an alternative to centralized big tech companies such as openai, google, metadata, etc.

One can imagine a situation where Alice has a model that she wants to protect. She wants to send the model to Bob in an encrypted form. Bob now receives the encrypted model and needs to run his own data on the encrypted model. How to do this? Then use the so-called fully homomorphic encryption to calculate the encrypted data. If the user has the encrypted model and the plaintext data, then the encrypted model can be run on the plaintext data, and the encrypted result can be received and obtained. You send the encrypted result back to Alice, and she can decrypt it and see the plaintext result.

**This is actually already existing technology. The question is, the current technology works well for medium-sized models, can we scale it up to larger models? **This is quite a challenge and requires efforts from more companies.

Status, Challenges and Incentives

I think it’s about decentralization in computing. The first one is the verification problem, you can use ZK to solve this problem, but currently these techniques can only handle smaller models. **The challenge we face is that the performance of these cryptographic primitives is far from sufficient for training or inference of very large models. **So there is a lot of work going on to improve the performance of the proof process so that larger and larger workloads can be proved efficiently.

At the same time, some companies are also using other technologies that go beyond encryption. Instead, using techniques of a game-theoretic nature, they let more independent people do the work. It’s a game-theoretic, optimistic approach that doesn’t rely on cryptography, but it’s still consistent with the larger goal of decentralizing AI or helping create an AI ecosystem. This is the goal proposed by companies such as openai.

**The second big problem is the distributed system problem. **Like, how do you coordinate a large community to contribute gp to a network that feels like an integrated, unified computing underlayment? There will be many challenges, such as how to break down the workload of machine learning in a reasonable way, and assign different workloads to different nodes of the network, and how to do all this work efficiently.

Current techniques can basically be applied to medium-sized models, but cannot be applied to models as large as gpt 3 or gpt 4. Of course, we have other methods. For example, we can have multiple people train and compare the results, so there is a game-theoretic incentive. Incentivize people not to cheat. If someone cheats, others may complain that they calculated incorrect training results. That way, people who cheat don’t get paid.

We can also decentralize data sources in the community to train large machine learning models. Similarly, we can also collect all the data and train the model ourselves instead of a centralized organization. This can be achieved by creating a kind of marketplace. This is similar to the computing market we just described.

We can also look at it in terms of incentives, encouraging people to contribute new data to a large data set, which is then used to train models. The difficulty here is similar to the verification challenge. **You have to somehow verify that the data that people contribute is really good data. The data is neither duplicates nor randomly generated junk nor is it somehow generated inauthentic. **

Also, make sure the data doesn’t subvert the model in some way, or the model performance will actually just get worse and worse. Maybe we have to rely on a mix of technical solutions and social solutions, in which case you can also build credibility with some kind of site metrics that community members have access to so that when they contribute data, it’s more It turned out to be more believable.

Otherwise, it will take a very long time to actually achieve coverage data distribution. One of the challenges of machine learning is that the model can only really cover the distribution that the training dataset can reach. If there are some inputs that are far outside the distribution of the training data, your model may actually behave completely unpredictable. In order for a model to perform well on edge cases, black swan data points, or data inputs that might be encountered in the real world, we need a dataset that is as comprehensive as possible.

**So if you have this open, decentralized marketplace that feeds data for datasets, you can have anyone in the world with unique data contribute that data to the network, that’s a much better way . Because if you try to do it as a central company, you have no way of knowing who owns the data. **So if you can create an incentive for these people to come forward and provide this data, then I think you can actually get significantly better coverage of the long tail data.

So we have to have some mechanism to make sure the data you provide is real. One way is to rely on trusted hardware, let the sensor itself embed some trusted hardware, and we only trust data that is correctly signed by the hardware. Otherwise, we must have other mechanisms to distinguish the authenticity of the data.

There are currently two important trends in machine learning. First, performance measurement methods for machine learning models are constantly improving, but are still in their early stages, and it is practically difficult to know how well another model performs. Another trend is that we are getting better at explaining how models work.

So based on these two points, at some point, I might be able to understand the impact of the dataset on the performance of the machine learning model. **If we can understand whether datasets contributed by third parties contribute to the performance of machine learning models, then we can reward this contribution and create momentum for the existence of this market. **

Just imagine if you could create an open market where people contribute trained models that solve specific types of problems, or if you create a smart contract that embeds some kind of testing in it, if someone can provide a model using zkml, And prove that the model solves the test, which is an outcome scenario. You now have the tools you need to create a marketplace that is incentivized when people contribute machine learning models that solve certain problems.

How does AI and encryption form a business model?

**I think the vision behind the intersection of cryptocurrency and artificial intelligence is that you can create a set of protocols that distribute the value captured by this new technology of artificial intelligence to more people, everyone can contribute, everyone can The benefits of this new technology can be shared. **

**Thus, the people who can benefit will be those who contribute computing power, those who contribute data, or those who contribute new machine learning models to the network, so that better machine learning models can be trained to solve more important problems The problem. **

The demand side of the network can also benefit. They use this network as the infrastructure for training their own machine learning models. Maybe their model can contribute to something interesting, like a next-generation chat tool. In these models, since these companies will have their own business models, they will themselves be able to drive value capture.

Whoever builds this network also benefits. For example, create a token for the network that will be distributed to the community. All of these people will have collective ownership of this decentralized network for computing data and models, and will also be able to capture some of the value of all economic activity that occurs through this network.

As you can imagine, every transaction that goes through this network, every payment method that pays for computation, data, or models, is likely to be charged a fee that goes into a vault controlled by the entire network. Token holders jointly own the network. This is essentially the business model of the network itself.

Artificial intelligence for code security

Many listeners have probably heard of co-pilot, a tool used to generate code. **You can try to use these co-generation tools to write solidity contracts or cryptography code. What I want to emphasize is that doing so is actually very dangerous. Because many times, when you try to run, these systems actually generate code that works but is not safe. **

In fact, we recently wrote a paper on this problem, which states that if you try to get a co-pilot to write a simple encryption function, it provides the correct encryption function. But it uses an incorrect mode of operation, so you end up with an insecure encryption mode.

You may ask, why is this happening? One of the reasons is that these models are basically trained from existing code, they are trained in the github repository. Many github repositories are actually vulnerable to various attacks. Therefore, the codes learned by these models are working, but not safe. It’s like poor quality garbage producing garbage. So I hope that people will be careful when generating code using these generative models, double check that the code is actually doing what it’s supposed to do, and doing it safely.

**You can use the artificial intelligence model, combined with other tools to generate code, to ensure that the whole process is error-free. **For example, one idea is to use the llm model to generate a specification for a formal verification system and ask llm to generate a specification for a formal verification tool. Then, ask the same instance of llm to generate a program that conforms to the specification, and then use a formal verification tool to see if the program actually conforms to the specification. If there is a vulnerability, the tool will catch it. These errors can be fed back to llm as feedback, and then ideally, llm can modify its work and then produce another correct version of the code.

In the end, if you iterate, you end up with a piece of code that ideally satisfies this return value exactly, and formally verifies that it also satisfies this return value. And, since humans can read the backtrace, you can go through the backtrace and see that this is the program I wanted to write. In fact, there are already many people trying to evaluate the ability of LLM in finding software bugs, such as Unity smart contracts, C and C plus.

**So, are we reaching a point where LLM-generated code is less likely to contain bugs than human-generated code? **For example, when we talk about autonomous driving, what do we care about, is it less likely to crash than a human driver? I think this trend is only going to get stronger and more integrated into existing toolchains.

You can integrate it into a formal verification toolchain, and you can integrate it into other tools, like the aforementioned tools that check for memory management issues. You can also integrate it into your unit testing and integration testing toolchain so that llm doesn’t just act in a vacuum. It gets real-time feedback from other tools that connect it to the ground truth.

**I think that by combining very large machine learning models trained on all the data in the world, along with these other tools, it may be possible to make computational programs better than human programmers. Even if they still make mistakes, they might just be superhuman. This will be a big moment in software engineering. **

Artificial Intelligence and Social Graph

Another possibility is that we might be able to build decentralized social networks that actually behave a lot like Weibo, but where the social graph is actually entirely on-chain. It’s almost like a public product that anyone can build on. As a user, you control who you are on the social graph. You control your data, who you follow and who can follow you. Additionally, there is a whole host of companies building portals in the social graph that provide users with experiences like twitter, instagram, tick tock, or whatever else they want to build.

But it’s all built on the same social graph, and no one owns it, and no one multibillion-dollar tech company in the middle fully controls it.

**It’s an exciting world because it means it can be more vibrant, it can have an ecosystem of people building together. **Each user has more control over what they see and do on the platform.

**But at the same time the user also needs to filter the signal from the noise. ** For example, a reasonable recommendation algorithm needs to be developed to filter all content and show you the news sources you really want to watch. This will open the door to the entire market, a playing field of players providing services. You can use algorithms, use AI-based algorithms to curate content for you. As a user, you can decide whether to use a particular algorithm, maybe the one established by twitter, or something else. But again, you also need tools like “machine learning” to help you sift through the noise, to help you parse all the crap in a world where generative models create all the crap in the world.

Why is human proof important?

A very pertinent question is how do you prove that you are indeed human in a world flooded with artificially fake content?

Biometrics is one possible direction, one of the projects is called World Coin (World Coin), which uses retinal scans as biometric information to verify that you are a real person, to ensure that you are indeed a living person, not just an eye. photo. This system has secure hardware that is very difficult to tamper with, so the proof that comes out on the other end, the zero-knowledge proof that masks your actual biometrics, is very difficult to forge in this way.

On the Internet, no one knows you’re a robot. So I guess that’s where the Proof of Humanity project becomes really important, because knowing whether you’re interacting with a robot or a human is going to be really important. If you don’t have human evidence, then you can’t tell if an address belongs to one person, or to a group of people, or if 10,000 addresses really belong to one person, or if they’re just pretending to be 10,000 different people.

**This is critical in governance. If every participant in the governance system can prove that they are actually human, and they can prove that they are human in a unique way, because they only have one set of eyeballs, then the governance system will be more fair, and it will not Then plutocratization (based on a preference for the largest amount locked in a certain smart contract). **

Artificial Intelligence and Art

AI models mean that we will live in a world of infinite media abundance, a world where the communities surrounding any one particular medium or the narratives surrounding a given medium will become increasingly important.

For example, Sound.xyz is building a decentralized music streaming platform that allows artists, musicians to upload music and then directly connect with our community by selling NFTs to them. For example, you can comment on a track on the sound dot xyz website so that other people who play the song can also see the comment. This is similar to the previous Sound Cloud feature. The act of purchasing NFT is also supporting artists, helping them achieve sustainable development and create more musical works. **But the beauty of it all is that it actually gives artists a platform to really engage with the community. Artists are everyone’s artists. **

Because of what cryptocurrency does here, you can create a community around a piece of music that wouldn’t exist if a piece of music was just created by a machine learning model without any human element.

A lot of the music that we’re going to be exposed to is going to be entirely AI-generated, and the tools to build community and tell stories around the art, around the music, around other types of media are going to be really important, bringing together what we really care about and really want to invest in and Media that takes the time to engage is distinguished from other media in general.

**There may be some synergy between the two, like a lot of music will be enhanced or generated by AI. But if there’s also a human element involved, say, a creator uses an AI tool to create a new piece of music, they have their own sonic signature, they have their own artist page, they have their own community, they have their own followers. **

Now, there is a synergy between the two worlds, and you have the best music because AI empowers you with superpowers. But at the same time, you also have human elements and stories that are coordinated and enabled through encryption technology that allows you to bring all these people together on one platform.

**It’s definitely a whole new world when it comes to content generation. So how do we differentiate between human-generated art and machine-generated art that needs support? **

This actually opens the door for collective art, art that emerges through the creative process of an entire community rather than a single artist. There are already projects doing this where the community influences the chain through some voting procedure, generating artwork based on cues from machine learning models. **Maybe you generate not one piece of art, but ten thousand pieces. Then you use another machine learning model, also trained on feedback from the community, to pick the best one out of those 10,000. **

View Original

This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.