Who does my data belong to? What are the items worth paying attention to in the data layer?

avatar
Asher
1 days ago
This article is approximately 1310 words,and reading the entire article takes about 2 minutes
Covering projects such as Vana, Ocean Protocol, Masa, Open Ledger, etc., in-depth analysis of the data needs of artificial intelligence training.

Original title: My Data is Not Mine: The Emergence of Data Layers

Original author: 0xJeff ( @Defi0xJeff )

Compiled by: Asher ( @Asher_0210 )

Who does my data belong to? What are the items worth paying attention to in the data layer?

As most of people’s attention is focused online, data is digital gold in this era. In 2024, the global average screen time is 6 hours and 40 minutes per day, an increase from previous years. In the United States, the number is even higher, reaching 7 hours and 3 minutes per day.

With such high levels of engagement, the amount of data generated is staggering, with 3.2877 TB of data generated every day in 2024. This translates to approximately 0.4 ZB of data per day (1 ZB = 1,000,000,000 TB), taking into account all newly generated, captured, copied, or consumed data.

Yet, despite the vast amounts of data generated and consumed every day, users own very little:

  • Social media: Data on platforms like X, Instagram, etc. is controlled by the companies, even though it is generated by the users;

  • Internet of Things (IoT): Data from smart devices generally belongs to the device manufacturer or service provider unless otherwise specified in a specific agreement;

  • Health data: While individuals have rights to their own medical records, much of the data from health apps or wearable devices is controlled by the companies that provide those services.

Encryption and social data

In the crypto space, we’ve seen the rise of Kaito AI , which indexes social data on the X platform and turns it into actionable sentiment data for projects, KOLs, and thought leaders. The terms “yap” and “mindshare” were popularized by the Kaito team for their expertise in growth hacking (via their popular mindshare and yapper dashboards) and their ability to attract organic interest on Crypto Twitter.

“Yap” aims to incentivize the creation of quality content on the X platform, but many questions remain unanswered:

  • How are yaps scored accurately?

  • Will mentioning Kaito earn you extra yaps?

  • Does Kaito truly reward quality content, or does it prefer controversial and popular opinions?

In addition to social data, discussions about data ownership, privacy, and transparency are becoming increasingly heated. With the rapid development of artificial intelligence, new questions have surfaced: Who owns the data used to train AI models? Who can benefit from the results generated by AI? These questions pave the way for the rise of the Web3 data layer - a step towards a decentralized, user-led data ecosystem.

The emergence of the data layer

In the Web3 space, a growing ecosystem of data layers, protocols, and infrastructure is emerging that aims to enable personal data sovereignty, give individuals greater control over their own data, and provide monetization opportunities.

Vana

Who does my data belong to? What are the items worth paying attention to in the data layer?

Vana ’s core mission is to give users control over their data, especially in the context of AI where data is invaluable for training models. Vana has launched DataDAOs, community-driven entities where users pool their data for a common good. Each DataDAO focuses on a specific dataset:

  • r/datadao: Focuses on Reddit user data, enabling users to control and monetize their contributions;

  • Volara: Processes X platform data to enable users to benefit from their social media activities;

  • DNA DAO: Aims to manage genetic data with a focus on privacy and ownership.

Vana segments data into a tradable asset called a DLP. Each DLP aggregates data in a specific field, and users can stake tokens into these pools to receive rewards, with top pools receiving rewards based on community support and data quality. What makes Vana stand out is the ease of data contribution. Users only need to first select a DataDAO, then directly aggregate their data through API integration or manually upload data, and finally earn DataDAO tokens and VANA tokens as rewards.

Ocean Protocol

Who does my data belong to? What are the items worth paying attention to in the data layer?

Ocean Protocol is a decentralized data marketplace that allows data providers to share, sell or license their data while consumers can access this data for use in AI and research. Ocean Protocol uses datatokens (ERC 20 tokens) to represent access to datasets, allowing data providers to monetize their data while maintaining control over access conditions.

The types of data traded on Ocean Protocol are:

  • Public data refers to open datasets, such as weather information, public demographics, or historical stock data, which are very valuable for AI training and research;

  • Private data includes medical records, financial transactions, IoT sensor data, or personalized user data, which require strict privacy controls.

Compute-to-Data is another key feature of Ocean Protocol that allows computation to be performed on data without moving the data, thereby ensuring the privacy and security of sensitive data sets.

Masa

Who does my data belong to? What are the items worth paying attention to in the data layer?

Masa focuses on creating an open layer for AI training data, providing real-time, high-quality, and low-cost data to AI agents and developers.

Masa launched two subnets on the Bittensor network:

  • Subnet 42 ( SN42 ): Aggregates and processes millions of data records per day, providing a foundation for AI agents and application development;

  • Subnet 59 ( SN59 ) – “AI Agent Arena”: A competitive environment where AI agents leverage real-time data from SN42 to compete for TAO releases based on performance metrics such as mind share, user engagement, and self-improvement.

Additionally, Masa has partnered with Virtuals Protocol to provide real-time data capabilities to Virtuals Protocol brokers. It has also launched the TAOCAT token to demonstrate its capabilities (currently on Binance Alpha).

Open Ledger

Who does my data belong to? What are the items worth paying attention to in the data layer?

Open Ledger is building a blockchain tailored specifically for data, especially for AI and machine learning applications, ensuring secure, decentralized and verifiable data management. The highlights include:

  • Datanets: A network of specialized data sources within OpenLedger that curates and enriches real-world data for AI applications;

  • SLMs: AI models that are customized for specific industries or applications. The idea is to provide models that are not only more accurate in niche use cases, but also compliant with privacy requirements and less susceptible to biases present in general-purpose models;

  • Data Validation: Ensuring the accuracy and trustworthiness of the data used to train Specific Language Models (SLMs) that these models are accurate and reliable for specific use cases.

The need for data in AI training

The demand for high-quality data is surging to advance the development of artificial intelligence and autonomous agents. In addition to initial training, artificial intelligence agents also require real-time data for continuous learning and adaptation. The key challenges and opportunities are:

  • Data quality over quantity: AI models require high-quality, diverse, and relevant data to avoid bias or poor performance;

  • Data sovereignty and privacy: As Vana shows, there is a push to monetize user-owned data, which could reshape how AI training data is acquired;

  • Synthetic data: With privacy concerns, synthetic data is gaining traction as a way to train AI models while mitigating ethical concerns.

  • Data Marketplaces: The rise of data marketplaces (centralized and decentralized) is creating an economy where data is a tradable asset;

  • AI in Data Management: AI is now being used to manage, clean, and enhance datasets, improving the quality of data for AI training.

As AI agents become more autonomous, their ability to access and process real-time, high-quality data will directly impact their effectiveness. This increased demand has given rise to a data marketplace specifically built for AI agents, where both AI agents and humans can access high-quality data.

Web3 Proxy Data Market

Cookie DAO aggregates social sentiment data of AI agents and token-related information, turning it into actionable insights for humans and AI agents. The Cookie DataSwarm API enables AI agents to access real-time high-quality data for trading-related insights, which is one of the most common applications in the crypto space. In addition, with 200,000 monthly active users and 20,000 daily active users, Cookie is one of the largest AI agent data markets, with COOKIE tokens at its core.

Finally, other noteworthy projects in this area are:

  • GoatIndex.ai focuses on Solana ecosystem insights;

  • Decentralised.Co focuses on niche data dashboards like GitHub and project specific analytics.

This article is translated from https://x.com/defi0xjeff/status/1884644127352193099Original linkIf reprinted, please indicate the source.

ODAILY reminds readers to establish correct monetary and investment concepts, rationally view blockchain, and effectively improve risk awareness; We can actively report and report any illegal or criminal clues discovered to relevant departments.

Recommended Reading
Editor’s Picks