Corpora List

1. Text Hebrew Corpora

1.1 HTB (Hebrew Treebank) 6,500 sentences with +90% automatic annotation quality. 
This data is the only UD 2.8-compliant for Hebrew, and it undergoes manual QA to have it as accurate as possible, aspiring for 100% accuracy.

1.2 Israel Hayom News Paper in Hebrew and English

1.3 +150 Blogs of women writing

1.4 More than 40,000 High-Tech news items 2009-2021 from https://www.geektime.co.il

1.5 DICTA Hebrew Wikipedia with +3,000 tagged sentences

1.6 The Israel Securities Authority (ISA): Thousands of Annual financial reports of public companies

1.7 Thousands of articles from Kol Zchut ("כל זכות"), the leading site about rights and entitlements in Israel: https://www.kolzchut.org.il/he/עמוד_ראשי

 

2. Audio with transcriptions

600 Hours of parliament Audio transcription