From a13b0f1116ccb422f1b853aa07235dabbff24ceb Mon Sep 17 00:00:00 2001 From: Shanks0465 Date: Tue, 22 Oct 2024 09:59:24 +0530 Subject: [PATCH] Info Update --- frontend/components/Dynamic/Area.tsx | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/frontend/components/Dynamic/Area.tsx b/frontend/components/Dynamic/Area.tsx index 0605100..e02bbae 100644 --- a/frontend/components/Dynamic/Area.tsx +++ b/frontend/components/Dynamic/Area.tsx @@ -33,7 +33,7 @@ Additionally, we introduced the Bharat Parallel Corpus Collection (BPCC), which description: `At AI4Bharat, our dedication to building language models and datasets for all 22 constitutionally recognized Indian languages is central to our mission. We employ a multifaceted approach, leveraging large-scale data crawling, synthetic data creation, and human annotation/crowd collections to create - comprehensive datasets. Our efforts have resulted in an extensive pretraining corpus of 251 million + comprehensive datasets. Our efforts have resulted in an extensive pretraining corpus of 251 billion tokens across 22 languages, complemented by 74.7 million prompt-response pairs in 20 Indian languages. Tools like Setu play a crucial role in large-scale crawling and data cleaning, enabling us to build state-of-the-art models such as Airavata, IndicBART, and IndicBERT. We also emphasize