High quality fine-tuning and pre-training data
Quality training data is essential for the performant models. Get access to labelled and unlabelled datasets across all modalities with quality and performance benchmarks.
![](https://cdn.prod.website-files.com/661e598ecb89e63ddae89bfa/66675ab2868cec2391d07270_Screenshot%202024-06-10%20at%2020.57.20.png)
Quality training data is essential for the performant models. Get access to labelled and unlabelled datasets across all modalities with quality and performance benchmarks.
We have licensed data across multiple domains suitable for your training tasks. We can also source bespoke datasets tailored to your specific requirements. Reach out and we can help you commission your dataset.
Text-to-Video, Image-to-Video, Video Classification, Image-to-Text, Image-to-3D, Text-to-3D
Music, TTS, ASR, Interactive Voice Over, Scripted Dialogues, 80+ Languages & Dialects
Pretraining Corpora, Translation, Zero-Shot Classification, Text Generation, Text Retrieval, QnA and other Instruct-Tuning Datasets
Access our extensive library of over 3000 datasets by different criteria and learn about the provenance and quality of each of the datasets.
Frequently asked questions about Valyu's datasets and the provenance tool. Please let us know if you have any questions or comments.
Valyu is a distribution network of a large collection of high-quality datasets encompassing text, video, audio, and images across multiple domains including healthcare, finance, retail, and technology. Whether the data is structured, unstructured, or semi-structured, we can also curate and source datasets according to your specific needs.
Our data cards and quality benchmarks provide detailed descriptions, metadata and assessments for each dataset, helping you assess its provenance, relevance and suitability. Additionally, you can explore sample data and preview features to better understand the dataset's contents and characteristics.
Yes, you can access sample data or preview features of datasets on our platform to evaluate their quality and suitability for your projects before making a purchase decision.
We have a scoring system in place which addresses the characteristic and licenses to ensure that datasets listed on Valyu meet high standards of accuracy, completeness, and relevance. Each dataset has a core, which derived from two critical aspects:
Yes, Valyu offers custom dataset creation services to cater to your unique requirements. Our team can work with you to curate custom datasets or incorporate specific data features based on your project needs.