AI into Crypto

HASHKEY CAPITAL Reading Time: 24.45Min


The launch of ChatGPT in November 2022 has opened the eyes of various industry players to AI large language model. This frenzied dynamic permeated the Crypto space, and this article aims to introduce the development of AI, its current status, and the industry that has emerged from the combination of AI+Crypto.


The development of AI and its current status


Types and Architectures


Machine learning (ML) is a technology with empirical learning capabilities, which learns to discriminate animals, language translation, and other specific tasks by learning from large data sets. Machine learning belongs to the most practical way of realizing artificial intelligence at present, according to whether the learned data is labeled and features it can be divided into supervised learning and unsupervised learning. There are many types of models that can accomplish supervised learning, including tree-based models, graph models, and the recently emerged neural networks. With the rapid development of computing power and data, deep learning has been further developed based on the architecture of neural networks. Current deep learning architectures commonly include, but are not limited to, CNNs, RNNs, and attention mechanisms.


Screenshot 2024-08-21 at 4.13.14 PM.png


Classification of machine learning, source: HashKey Capital


Different deep learning networks have the basic architecture of input layer, hidden layer, and output layer, the input layer is usually text, video, audio, and other data after being processed "tokenize/embedding". The hidden layer has a different design (model shape) depending on the dataset and the purpose of the task, as shown in the table.


Neural Networks

Data

Architectures

Single layer perceptron

-

Single layer, multivariate mapping

Multilayer perceptron

Table data

Input layer, hidden layer, output layer, fully connected

CNN

Graph structured data, geospatial information


Dimensionality reduction, Convolutional layer, pooling layer, and fully-connected layer 

RNNs

Sequential data (article, video frames), time series


Save hidden variables from previous time steps, add gated recurrent unit

Attention/Transformer

Language translation, general data/tasks


multi-headed attention, self-attention


Types of Neural Networks, Source: Organized by HashKey Capital


Thirty Years of Neural Network Development

Screenshot 2024-08-21 at 4.14.16 PM.png

30 years of neural network development, source: organized by HashKey Capital


Training neural networks first originated in the mid-1980s when Jordan trained a neural network to learn sequential patterns in his 1986 paper Serial Order: A Parallel Distributed Processing Approach. The tiny network had only a few neurons.


In the 1990s Jeffrey Ehrman expanded the neural network to a 50-neuron network with the discovery that the network spatially clusters words based on meaning. For example, it separated inanimate and animate nouns, and within these two categories animate objects were subdivided into human and nonhuman categories, and inanimate was categorized as breakable and edible. This indicates that the network has the ability to learn hierarchical explanations.


He further observed that words can be represented as points in a high-dimensional space, and then a sequence of words or sentences can be viewed as a path. This major breakthrough allows textual datasets to be digitized, vectorized, and processed by computers.


Screenshot 2024-08-21 at 4.15.13 PM.png

Source: http://3b1b.co/neural-networks


In 2011 Confluence researchers trained larger networks involving thousands of neurons and millions of connections, and a bottleneck was found in the study in the network's ability to maintain coherent context over long sequences.


In 2017, OpenAI built on Kathy's work by training on 82 million Amazon reviews in which emotional neurons were discovered. Such neurons perfectly categorized the emotions of the text.


Screenshot 2024-08-21 at 4.28.38 PM.png

Source: Learning to Generate Reviews and Discovering Sentiment


Regarding the limitations of context size, this paper 2017 Attention Is All You Need presents a solution. The paper creates a dynamic layer network that adapts connection weights based on the context of the network. It works by allowing words in the input to view, compare other words, and find the most relevant ones. The closer these words are in concept, the closer they are in space and can have higher connection weights. However, the paper only focused on the translation problem.


Thus OpenAI researchers tried a more powerful transformer architecture and launched GPT-3 in 2020, which attracted widespread attention from industries around the world, this time with the network reaching 175B parameters, 96 layers, and a 1,000-word context window.


What is a neural network?


Take the following 28x28 pixel digital image as an example, the neurons correspond to each pixel of the 28x28 input image, totaling 784 neurons, the numbers in the neurons are the activation values, which range from 0-1.


Screenshot 2024-08-21 at 4.32.18 PM.png

28x28 pixel digital image, Source: http://3b1b.co/neural-networks


These 784 neurons form the input layer of the network. The final layer is the output layer, which contains ten neurons representing the numbers 0-9, again with activation values ranging from 0-1. The middle layer is the hidden layer, where the activation value of the previous layer determines the activation value of the next layer as the neural network operates.


The depth of deep learning lies in the fact that the model learns many "layers" of transformations, each with a different representation. As shown in the figure below, for example, in 9, different layers can recognize different features. The closer the input layer is to the lower level of detail of the data, the closer the output layer is to the more specific concepts that can be used to differentiate.


Screenshot 2024-08-21 at 4.33.13 PM.png

Source: http://3b1b.co/neural-networks


As the model gets bigger, the hidden layers in the middle involve hundreds of billions of weights per layer, and it's these weights and biases that really determine what the network is actually doing. The process of machine learning is the process of finding the right parameters, which are weights and biases.


The transformer architecture used in GPT, a large language model, has an intermediate hidden layer consisting of 96 layers of decoder modules, of which GPT1, GPT2, and GPT3 have 12, 48, and 96 layers, respectively. The decoder in turn contains attention and forward feedback neural network components.


Training approach


The computing or learning process involves defining a cost function (or loss function) that sums the squares of the differences between the network's computed output predictions and the actual values, and when the sum is small, the model performs within acceptable limits.


Training starts by randomly parameterizing the network and finalizing the model parameters of the network by finding the parameter that minimizes the cost function. The way to converge the cost function is by gradient descent, by which the degree of impact of each parameter change on the cost/loss is examined, and then the parameters are adjusted according to that degree of impact. 


The process of calculating the parameter gradient introduces backward propagation or backpropagation, which traverses the network from the output layer to the input layer in reverse order according to the chain rule. The algorithm also requires the storage of any intermediate variables (partial derivatives) needed to compute the gradient.


Development Factors


There are three main factors that affect the performance of AI large language models during their training, namely the number of model parameters, dataset size, and the amount of computing.


Screenshot 2024-08-21 at 4.33.59 PM.png

Source: OpenAI report, Scaling Laws for Neural Language Models


This is consistent with the development of datasets and computers (computing power) in reality, but it can also be seen in the table below that computing power is growing faster than available data, while memory is the slowest to develop.


Screenshot 2024-08-21 at 4.34.32 PM.png

The development of dataset, memory and computing power, Source: https://github.com/d2l-ai


Data


Data Requirements


Faced with a large model, overfitting tends to occur when the training data is too small, and in general, the accuracy of the more complex model improves as the amount of data increases. Regarding the data requirement needed for a large model, it can be decided based on the rule of 10, which suggests that the amount of data should be 10 times the parameter, but some deep learning algorithms apply 1:1.


Labeled data


Supervised learning requires the use of labeled + featured datasets to arrive at valid results.


Screenshot 2024-08-21 at 4.38.23 PM.png

Source: Fashion-MNIST Clothing Categorization Dataset


Synthetic data


Despite the rapid increase in data over the past decade or two and the currently available open-source datasets including Kaggle, Azure, AWS, Google database, etc., limited, scarce, and expensive amounts of data are gradually becoming a bottleneck for AI development due to the issues of privacy, increasing model parameters, and data reproducibility. Different data solutions are proposed aiming to alleviate this problem.


Data Augmentation techniques may be an effective solution by providing insufficient data to the model without acquiring new samples, such as scaling, rotation, reflection, cropping, translating, adding Gaussian noise, mixup, etc.


Synthetic data is another option. Synthetic data are data that can be artificially generated by computer simulation or algorithms with or without a previous reference dataset. Regarding the development of tools for generating synthetic data, Ian J. Goodfellow invented the Generative Adversarial Network (GAN), which is a deep learning architecture. 


It trains two neural networks to compete with each other which can generate new, more realistic data from a given training dataset. The architecture supports generating images, filling in missing information, generating training data for other models, generating 3D models based on 2D data, and more.


It is still early in the development of the field, with most of the existing companies doing synthetic data being founded in 2021 or 2022, and a few in 2023.


Screenshot 2024-08-21 at 4.39.03 PM.png

The state of financing for synthetic data companies. Source : https://frontline.vc/blog/synthetic-data/


Vector Database


AI training process involves a large number of matrix operations, from word embedding, transformer QKV matrix, to softmax operations, and so on through the matrix operations, the entire model parameters are also carried in the matrix.


Screenshot 2024-08-21 at 4.39.35 PM.png

example of vector database, Source : https://x.com/ProfTomYeh/status/1795076707386360227


Computer Hardware Resources


Large models bring massive computer hardware demand, which is mainly categorized into training and inference.


Pre-training, fine-tuning, and inference


Pre-training and fine-tuning can be further divided under training. As mentioned before, building a network model first requires randomly initializing the parameters, then training the network and continuously adjusting the parameters until the network's loss reaches an acceptable range. The difference between pre-training and fine-tuning is that 


pre-training starts with each layer of parameters from random initialization, while some layers of fine-tuning can directly use the parameters of the previously trained model as the initialization parameters for this task (freezing the parameters of the previous layers) and acting on a specific dataset.



Screenshot 2024-08-21 at 4.40.15 PM.png

Source: https://d2l.ai/chapter_computer-vision/fine-tuning.html 


Pre-training and fine-tuning both involve model parameter changes, which ultimately result in a model or parameter optimization, while inference is the calculation of inference by loading a model after user inputs and ultimately obtaining feedback and output results.


Pre-training, fine-tuning, and inference are ranked from largest to smallest in terms of their computer requirements. The following table compares the computer hardware requirements of training and inference. The computer hardware requirements of the two are significantly different in terms of computing power, memory, and communication/bandwidth due to the differences in the computation process and accuracy requirements, and at the same time there is an Impossible Trilemma in computing power, memory, and communication/bandwidth.



Training

Inferencing

Algorithms

Forward Propagation
+Backpropagation

Backpropagation

Precision

32 bit, 16 bit

16 bit, 8 bit, 4 bit

Parallel strategy

DP, TP, PP

TP, PP

Computing power  demand

6-8FLOPs*

2FLOPs

Data transmitted

Intermediate result of forward propagation (activation), gradient of all parameters

Intermediate result of forward propagation (activation)

Memory

Model parameters, intermediate activation values, gradient, optimizer state

Model parameters, KV cash

Memory usage (how many times the model parameters, without KV cash)

20 times

1-2 times

Hardware

H100/A100($10k-40k)

4090($1k-2k)


*The statistical measurements in this table are based on a single model processing a single token, a single parameter.
*FLOPs: floating-point operations per second, the number of matrix computations.
*DP, TP, PP: data parallel, tensor parallel, pipeline parallel.


Computer hardware comparison between training and inferencing, Source: Organized by HashKey Capital         

                                              

The process of training a neural network requires alternating between forward and backward propagation, using the gradient given by the backward propagation to update the model parameters. Inference, on the other hand, requires only forward propagation. This difference becomes an influencing factor that primarily differentiates the computer hardware resources requirements for training and inference.


In terms of computing power, as shown in the table there is a simple multiplicative relationship between the number of model parameters and computing power consumption, with training requiring 6-8 floating-point operations and inference requiring 2. This is due to the backpropagation involved in training, which requires twice as much computing power as forward propagation, and thus the training’s computing power consumption is much higher than inference.


In terms of memory, the backpropagation used for training reuses the intermediate values stored in the forward propagation in order to avoid repeated computations. Therefore, the training process needs to keep the intermediate values until the backpropagation is completed. The resulting memory consumption during training mainly contains model parameters, intermediate activation values generated during forward computation, gradients generated by backward propagation computation, and optimizer states. The inference stage does not need backpropagation, and does not need optimizer state and gradient, etc., and its memory consumption usage is much smaller than that of training.


In terms of communication/bandwidth, in order to improve AI training performance, mainstream model training usually uses three parallel strategies: data parallel, tensor parallel, and pipeline parallel.


- Data parallel refers to replicating multiple model replicas that run on different devices, with each model replica acting on different datasets, and synchronizing gradient data during the training cycle.


- Pipeline parallelism, on the other hand, divides the intermediate hidden layers and each compute node is responsible for several of these transformer layers. This approach is also known as inter-layer parallelism.


- Tensor parallelism, on the other hand, splits each of these transformer modules and is also known as intra-layer parallelism.


Screenshot 2024-08-21 at 4.41.12 PM.png

Source: OpenAI, https://openai.com/index/techniques-for-training-large-neural-networks/


For these three strategies, it is projected that TP communication frequency is the largest, the communication volume is the highest, and is related to the number of tokens, model width, and number of layers. The communication volume and frequency of PP is smaller than that of TP, and is related to the number of tokens, and the width of the model. The communication volume and frequency of DP is the smallest and is independent of the input tokens.


Impossible Trilemma


The bottleneck of computer hardware resources in large models is mainly limited by computing power, bandwidth/communication and memory, and there are checks and balances among the three, resulting in the Impossible Trilemma problem. For example, due to communication bottlenecks, cluster performance cannot be improved by simply optimizing the power of a single computer. Therefore, although parallel architectures are used to accelerate cluster performance, most parallel architectures actually sacrifice communication or storage for computing power.


Sacrificing communication and storage for computing power:

In PP, if a GPU is assigned to each layer of the transformers, despite the increase in computational power in time units, the communication requirements between the layers also increase, resulting in increased data volume and latency. Also, the intermediate state storage requirement for forward propagation increases extremely fast.


Sacrificing communication for computing power:

In TP, each transformer is disassembled for parallel computation. Since the transformer consists of two components (Attention head and feed-forward network), the task can be split within the layer for either the Attention head or the feed-forward neural network. This TP approach can alleviate the problem of too much PP hierarchy due to GPUs not being able to fit the model. However, this approach still has serious communication overhead.


Crypto+AI


In this paper, we believe that currently there are the following major categories of AI in the crypto field:


Screenshot 2024-08-21 at 4.41.54 PM.png

Source: Organized by HashKey Capital


As mentioned earlier the three most critical components in AI are data, models, and computing power, which serve as the infrastructure to empower crypto AI. Their combination actually happens to form a computing network, with a large number of middleware appearing in the computation process in order to be efficient as well as more in line with the crypto spirit. Downstream are Agents based on these verifiable results, which can further serve different roles for different user audiences.


Another flowchart can be used to express the basic ecology of crypto AI as follows:


Screenshot 2024-08-21 at 4.42.24 PM.png

Ecological flowchart, source: organized by HashKey Capital


Of course, tokenomic mechanisms are needed in the crypto space to incentivize coordinating the participation of different players.


Data


For datasets, one can choose between public data sources or one’s own specific private data sources.


Data Source:

- Grass is the project that crawls data sources on top of Solana, the background is due to the fact that many companies block IP crawls from data centers but do not block residential users, Grass acts as a decentralized service provider that incentivizes residential users to contribute their bandwidth via token.


- Vana as DATA DAO also provides its own solution, where the creator creates different data dao for different data sources on the chain and sets up different incentive programs for users to upload their data. So far, there have been data dao created for reddit (rDAO) where more than 154,000 users provide their personal data to rDAO for AI training.


- Relevant data is collected in the form of DePINs, which allow users to connect their vehicles to the DIMO platform via a hardware device, for example. Basic information about that vehicle and more advanced driving pattern data, etc. will be securely transmitted to the DIMO network, stored on-chain, and linked to the corresponding vehicle ID (NFT). Another example is the Hivemapper which collects map data while the user is driving.


Synthetic Data Platform:

Dria is a synthetic data generation platform (OPStack L2) that incentivizes users to generate/trade synthetic data in a decentralized way. Its data storage is stored in Arweave through HollowDB. When users initiate a synthetic data generation request, Dria will accept the request and split the task to the computing nodes in the synthetic data network to execute, and after the network verification, the final synthetic data can be traded in the knowledge market.


Others:

Data labeling service platform, by assigning the labeling order task to different workers, these workers can get the corresponding token incentive after completing the task such as Cropo, Public AI and so on. However, the current problem is that there are more people doing data labeling than data, while AI companies have stable data labeling suppliers for their labeled data needs, due to the sticky existence of which makes their willingness to switch decentralized platforms weak. These platforms may only be able to obtain the allocation of the remaining part of the order from the data labeling suppliers.


Computing Networks


Generalized Computing Networks 

Generalized computing networks, which refer to networks that aggregate resources such as GPUs and CPUs to be able to provide generalized computing services which means no distinction between training and inference.


- Akash, a 2020 project, serves as a marketplace for matching computational supply and demand, allowing computing providers to bid orders, with the final matches uploaded on blockchain as transactions. A separate validator is responsible for packing blocks and performing validation. This process does not involve how AI tasks are assigned, nor does it validate the computation process, and results, while not distinguishing between training, inferencing.


- io.net, which until June 2022 developed institutional-grade quantitative trading systems primarily for the U.S. stock market and cryptocurrency markets, discovered Ray.io, an open-source Python library for building high-performance distributed systems, along the way. io.net leverages Ray and specialized libraries for data streaming, training, fine-tuning, and combines with Mesh VPNs (which simplify the process of developing and deploying large-scale AI models across vast networks of GPUs) to provide computing services.


- Bittensor, as an open platform, allows users to create subnets on its platform, each with its own unique incentives to motivate other users to participate as subnet miners, subnet validators, subnet miners to run specific tasks, and validators to verify these miners' tasks. 


- Aethir, is a cloud computing infrastructure providing high-quality services for AI and cloud gaming. Aethir focuses on aggregating high-quality GPU resources, such as NVIDIA's H100 chip, from data centers, tech companies, telcos, top game studios, and crypto mining companies. The network consists of 3 core actors: Container, Checker, and Indexer. Containers, including the Aethir Edge, are where compute resources are actually utilized. The Checker ensures the integrity and performance of the Container. If needed, the Indexer matches end-users with appropriate Containers based on end-user requirements.


Computing-specific networks


Pre-training


In the Crypto space, Gensyn, invested by a16z, proposes a decentralized training computing network.


The process is that after a user submits a training requirement task, the platform analyzes it, evaluates the required computing power as well as splits it into a minimum number of ML works, at which point the validator periodically grabs the analyzed task to generate thresholds for the comparison of downstream learning proofs. Once the task enters the training phase it is executed by the Solver, which periodically stores the model weights and response indexes from the training dataset as well as generates the learning proofs, and the verifier also performs the computational work rerunning some of the proofs to perform distance calculations to verify that they match the proofs. Whistleblowers perform arbitration based on a Graph-based pinpoint challenge program to check whether the validation work was performed correctly.


Fine-tuning


Fine-tuning is easier and less costly to implement than directly pre-training a large model, simply by fine-tuning the pre-trained model with a specific dataset, and adapting the model to a specific task while preserving the original model.


Hugging Face can be accessed as a pre-trained language model resource provider to the distributed platform, the user selects the model to be fine-tuned according to the task requirements and then uses the GPUs and other resources provided by the computing network for the fine-tuning of the task, which needs to be based on the complexity of the task to determine the size of the dataset, the complexity of the model, and to further determine the need for a higher level of resources such as the A100.


In addition to Gensyn, a platform that can support pre-training, most computing platforms can also support fine-tuning.


Inference


Compared to training (pre-training and fine-tuning), which requires tuning of model parameters, the computational process of inference involves only forward propagation and requires less computing power. Most decentralized computing networks currently focus on inference services.


- The Nosana Network is a platform for running AI inference workloads that provides computing services targeting the inference process for LLama 2 and Stable Diffusion models.


- Ritual.AI, the first phase of the platform is Infernet which is a lightweight framework. With it smart contract developers can request inference services from off-chain and deliver it to the on-chain smart contracts. The second phase is the execution layer, Ritual Chain, which supports AI-native operations.


Additional layers/middleware


When inferencing is carried out this stage is already the stage of model use, then middleware can be introduced at the right time:

- Model Matching: When doing inferencing, it is usually necessary to determine the appropriate model according to the task requirements.

- API: Abstract all open source models API unified interface, such as Redpill


On-chain smart contract to retrieve the results of off-chain AI computes:

- ORA protocol to provide verified inferencing results for smart contracts, e.g., opML node collects opML requests sent from the chain, will run AI inferencing, and then upload the results to the chain and wait for the challenge period.


Another layer of privacy can be added to the computing network, which mainly includes data privacy and model privacy, where data privacy is far more important than model privacy.

- Currently, Oasis protocol utilizes Intel TDX and NVIDIA TEEs to provide privacy and verifiability for AI model training.


Verification


Most computing networks build different validation systems to ensure that the system runs accurately, while the link is a part that has not yet been introduced in the traditional AI field.


ZKML


The main role of ZK proof is the following 2 points:

- Used to prove the accuracy of the model without revealing any parameters

- Prove that the computation was done correctly and that the model + inputs match the outputs: Modulus labs, Giza


Modulus Labs has shown that it is possible to create proofs for 18 million parameter models in 60-70 seconds using Polygon's Plonky proof system. For small models, it is possible to use ZKML at this stage, but the cost is still significant:

- The proof time of ZKML grows with increasing parameters.

- It is very expensive in terms of prover memory consumption. Worldcoin, for example, uses a model with 1.8M parameters and 50 layers to distinguish between 10 billion irises, for which proofs of inference can be generated in just a few minutes, but the memory consumption within the prover is too high for any mobile hardware.


Screenshot 2024-08-21 at 4.43.37 PM.png

Source: https://medium.com/@ModulusLabs/chapter-5-the-cost-of-intelligence-da26dbf93307


OPML


Given the limitations of ZKML described above, OPML is an alternative. Although weaker than ZKML in terms of security, its memory consumption and proof computation time are significantly better than that of ZKML. according to the ORA report, it is shown that for the same 7B-LLaMA model (with a model size of about 26GB) opML can be processed within 32GB of memory, whereas the memory consumption of the circuits in zkML can be on the order of terabytes or even petabytes.


TEEML


Trusted Execution Environment provides hardware-level security and can be an alternative to ZKML and OPML. TEE-proof is generated as a result of internal computation within TEE and its computational cost is much lower than that of zk-proof. Also, the proof size of TEE is usually a fixed constant (signature length) and thus has the advantage of a smaller footprint and lower cost of on-chain validation.


In addition to verification, TEE has the advantage of keeping sensitive data isolated, ensuring that external processes or computations cannot access or alter the data within it.


Projects that use TEE include:

- Aizel network (providing inference)

- Phala network (focused on AI Agent creation)

- Oasia protocol (AI model training)

- Marlin protocol (Oyster can deploy and validate ML models)



ZKML

OPML

TEEML

model size

small/limited

any sizes

any sizes

speed

slow

fast

fast

proof

validity proof

fraud proof


finality

no delays

delay for challenge period(7 days)

no delays

requirement

large memory for proof generation

any PC with CPU/GPU

Intel TDX and NVIDIA TEE...

service cost

extremely high

low

low

security

cryptographic security

crypto-economic security

hardware-based

Source: https://arxiv.org/pdf/2401.17555, Marlin Protocol


In addition, ORA protocol has developed opp/ai (Optimistic Privacy-Preserving AI on Blockchain) in addition to its own ZKML and OPML validation, and is not included in the above comparison table.


Agent Layer


Agent has the ability to analyze the incoming information, evaluate the current environmental conditions and make decisions. Agent composition is shown in the following figure, in which the LLM is the core component, in addition, it is necessary to feed the appropriate prompt to the LLM, and through the Memory to store short-term data and long-term historical data (external data). Since complex tasks cannot be completed at once, they need to be split into smaller tasks by Plan, in addition to this Agent can also call external APIs to get additional information, including current information, code execution capabilities, access to proprietary information sources, and so on.


Screenshot 2024-08-21 at 4.44.22 PM.png

Source: A Survey on Large Language Model based Autonomous Agents


The decision-making ability of Agents did not have a certain breakthrough until the emergence of the Large Language Model LLM in recent years. A report has collated the number of papers published on Agents from 2021 to 2023, as shown in the figure below, in reality there are only about a dozen research papers in 2021, but there are hundreds of papers published on them in 2023. The paper its categorized Agents into 7 categories.


Screenshot 2024-08-21 at 4.44.49 PM.png

Source: A Survey on Large Language Model based Autonomous Agents


In web3, the scenarios in which Agents exist are still limited compared to the web2 world, and currently include automated clearing, constructing code components (writing smart contracts, writing zk circuits), real-time risk control, and executing strategies such as arbitrage and yield farming.


Agent creation and trading platforms

- Theoriq (ChainML) has introduced the concept of Agent Base Layer, which allows developers to annotate Agents in the form of NFT and create their own Agents, as well as build an Agent Collective by combining Agents to fulfill complex requirements. This process evaluates the performance and linkage of different Agents through proof of attribution and proof of collaboration.


- Spectral Labs has two main products, Spectral Syntax, a platform that enables users to create Agents on the chain, and Spectral Nova, an inferencing service that supports requesting inferencing services. Creating an Agent in Spectral Syntax uses Spectral Nova's inferencing service, and that inferencing is ensured by ZK proof to make sure it works. At the same time they will launch Inferchain to enable Agent to Agent communication.


- Autonolas supports the construction of services consisting of multiple Agents, which allows the Service Owner to create a service and register the corresponding service in the service registry in order to start the workflow, request the developer to provide the Agent components and so on. Developers can develop the Agent, components and other code stored offchain, mint the corresponding NFT onchain, and refer to the metadata IPFS hash, and then refer to the underlying code by further referencing the IPFS hash. Services are typically run by a set of Operators, each running at least one Agent instance. In addition, Autonolas achieves consensus within the service for its agents by means of a Consensus Gadget that establishes an agreement between the agents within the service.


Agent Monitoring Platform

AgentOpsAI is a partner of sentient, providing Agent monitoring services (log events, calls, Agent errors, etc.), currently a centralized platform, no token involved.


Workflow


Based on different Agents can be combined/abstracted/created a specific application, at the same time, there are some coordination platforms available for users to choose what kind of Agents to use to build a specific type of application. But most of them are limited to the development of Agents.


Application


Project Developers


Some developers will use some AI to help their platforms to be smarter, for example, in security projects, machine learning is used to distinguish attack vulnerabilities; DeFi protocols use AI to build real-time monitoring tools; and data analytics platforms also use AI to help with data cleaning and analysis.


User


Q&A/Analysis Window

- Kaito.ai, users can use Q&A to get information about a project's community sentiment, price, and core team movements.

- 0xScope, the underlying use of knowledge graphs to integrate the data on the chain, namely user behavioral characteristics, to provide data analysis services for users, launched the Scopechat Q&A window in time for this wave of AI.


AI APP Store

Myshell proposes a consumer layer and creates AI APP Store, which provides different AI components and three creation modes to facilitate users to create different AI applications. The widgets are divided into basic and composite components. Basic components allow users to create Prompt, Voice, Avatar and other assets into AI apps, while composite components allow customized components to be built using a combination of multiple basic models/components. Creation modes include, classic, dev, and no-code three modes for developers and users with different abilities and needs.


Summary


In this article, we would like to highlight the following 3 points:


GPUAI

In crypto, a number of computing networks emerge inevitably make users feel that GPU is AI, but as analyzed in the previous section, there is an impossible trilemma of computing  networks, i.e., computing  power, bandwidth/communication, and memory, as well as three kinds of parallel strategies used in model training, such as data parallel, tensor parallel, and pipeline parallel,  all point to the checks and balances that are imposed on setting up the framework of computing network.


Same Model &  same DataSame Result

The reason behind the fact that the same model and data do not necessarily yield the same result is the use of floating point computation. This difference in computation also has an impact on the construction of the computing  network.


More AI Agents

AI Agents have only begun to show more utility in recent years, and we expect more Agents to appear in the market. But how Agents work in crypto or how to find the right token incentives remains a challenge.