Tether-backed QVAC unveils Genesis II, boosting world’s largest synthetic AI education dataset

1 day ago 12

Genesis II adds 10 new education domains and introduces structured reasoning methods to enhance AI training quality.

Tether-backed QVAC unveils Genesis II, boosting world’s largest synthetic AI education dataset

Key Takeaways

  • QVAC, Tether Data’s AI research division, released QVAC Genesis II, adding 107 billion tokens to what is now the largest public educational synthetic dataset for AI pre‑training.
  • Independent evaluations show models trained on Genesis II data deliver stronger reasoning accuracy and clearer answers than prior synthetic sets.
<?xml encoding="UTF-8">

Tether Data’s AI division QVAC has released Genesis II, adding 107 billion tokens to its open-source synthetic dataset for AI pre-training. The full dataset now spans 148 billion tokens across 19 education-focused domains, making it the largest of its kind.

Genesis II expands into new fields like computer science, statistics, and machine learning, while introducing a new “Option-Level Reasoning” approach that teaches models to reason through multiple-choice answers. This builds on QVAC’s prior failure-analysis method from Genesis I.

Tether CEO Paolo Ardoino said the initiative moves AI beyond fluency toward structured understanding. The dataset is available under a Creative Commons license on QVAC’s blog and Hugging Face, supporting open research and local model development outside centralized AI platforms.

Disclaimer
Read Entire Article