האיגוד הישראלי לטכנולוגיות שפת אנוש
الرابطة الإسرائيلية لتكنولوجيا اللغة البشرية
The Israeli Association of Human Language Technologies

Hebrew & Arabic Corpus Linguistics Infrastructure
IAHLT HTB (Hebrew Tree Bank)
• Update existing HTB:
• Valid by UD standards
• Comparable to other languages (esp. Arabic)
• Findable named mentions, consistent part-of-speech tagging
• Removal of inserted tokens – all words are sum of their segments
Prof. Amir Zeldes presents the HTB Project
A Second Wave of Gold Standard Hebrew Treebanks:
Challenges and Applications
Version controlled and available from:
https://github.com/IAHLT/UD_Hebrew
•Build new treebanks to supplement HTB:
• Current topics
• A wide range of genres
• Written and spoken data
• Larger scale data to support ‘high-resource’ lanaguag quality