Workshop on Classical Arabic Digital Humanities

I recently had the opportunity to build on the work I did in classical Arabic digital humanities during my PhD by helping to design and teach a four-day workshop from July 1st-4th titled “From Classical Machine Learning to AI in Arabic and Islamic Studies” for the Evolution of Islamic Societies (EIS1600) project led by Maxim Romanov at the University of Hamburg. The project is focused on analyzing a large corpus (~100 million words) of biographical and chronical works from Spain to North Africa to better understand the development of Islamic societies. As Maxim puts it, the project focuses on three main areas. The first area focuses on major ethnic, religious, and professional groups—and how they shaped the development of local communities and fused them into what we call the Islamic world. The second one focuses on dynastic cycles through the patterns of the rise and fall of regional powers, their conflicts with rivals, and interactions with local communities. The third one traces patterns of environmental factors—plagues, famines, droughts, pest infestations, earthquakes, and climate change—and their effect on the life of local communities. To accomplish this at scale, the creation of new models and datasets tuned specifically for the domain of classical Arabic is required. I co-taught the workshop with computer scientist Tariq Yousef of the University of Southern Denmark.

The workshop was split across four days, each of which interleaved lectures by Tariq and myself and practical sections with example code allowing participants to apply what was discussed in the lecture to example corpora taken from the OpenITI corpus. The first day was an introduction to language models and word embeddings, as well as a session on document classification. Day two was devoted to data annotation and examining case studies on information extraction from the EIS1600 project. The third and final days contained sessions on citation analysis, much of which came from my own PhD research, generative language models, and topic modeling. The workshop materials and example code will be made available online for free use in the near future.