האיגוד הישראלי לטכנולוגיות שפת אנוש
الرابطة الإسرائيلية لتكنولوجيا اللغة البشرية
The Israeli Association of Human Language Technologies

Hebrew & Arabic Corpus Linguistics Infrastructure
Corpora List
1. Text Hebrew Corpora
1.1 HTB (Hebrew Treebank) 6,500 sentences with +90% automatic annotation quality.
This data is the only UD 2.8-compliant for Hebrew, and it undergoes manual QA to have it as accurate as possible, aspiring for 100% accuracy.
1.2 Israel Hayom News Paper in Hebrew and English
1.3 +150 Blogs of women writing
1.4 More than 40,000 High-Tech news items 2009-2021 from https://www.geektime.co.il
1.5 DICTA Hebrew Wikipedia with +3,000 tagged sentences
1.6 The Israel Securities Authority (ISA): Thousands of Annual financial reports of public companies
1.7 Thousands of articles from Kol Zchut ("כל זכות"), the leading site about rights and entitlements in Israel: https://www.kolzchut.org.il/he/עמוד_ראשי
2. Audio with transcriptions
600 Hours of parliament Audio transcription