This method will identify Arabic text in a given UTF-8 multi-language document
and return an array of start and end positions for Arabic text segments. Understanding the language
and encoding of a given document is an essential step in working with unstructured multilingual text.
Without this basic knowledge, applications such as information retrieval and text mining cannot
accurately process data, and important information may be completely missed or misrouted.
Any application that works with Arabic in multiple languages documents can
benefit from this functionality. Applications can use it to take a fully automated approach to
process Arabic text by quickly and accurately determining Arabic text segments within multiple
languages document.
Example Output:
Peace سلام שלום Hasîtî
शान्ति Barış 和平 Мир
Say Peace in all languages!
The people of the world prefer peace to war and they deserve to have it.
Bombs are not needed to solve international problems when they can be solved
just as well with respect and communication. The Internet Internationalization
(I18N) community, which values diversity and human life everywhere, offers
"Peace" in many languages as a small step in this direction.
Arabic: نص عربي
أنطقوا سلام بكل
كل شعوب العالم تفضل السلام علي الحرب وكلها تستحق أن تنعم به.
إن القنابل لا تحل مشاكل العالم ويتم تحقيق ذلك فقط بالاحترام
مجموعة تدويل الإنترنت (I18N) ، والتي تأخذ بعين
التقدير الاختلافات الثقافية والعادات الحياتية
بين الشعوب، فإنها تقدم "السلام" بلغات كثيرة، كخطوة متواضعة في هذا
אמרו "שלום" בכל השפות! אנשי העולם מעדיפים את השלום על-פני המלחמה והם
ראויים לו. אין צורך בפצצות כדי לפתור בעיות בין-לאומיות, רק בכבוד
ובהידברות. קהילת בינאום האינטרנט (I18N), אשר מוקירה רב-גוניות וחיי אדם
בכל מקום, מושיטה יד ל"שלום" בשפות רבות כצעד קטן בכיוון זה.
Some Authors:
Frank da Cruz, New York City (USA)
Marco Cimarosti, Milano (Italy)
Michael Everson, Dublin (Ireland)
فريد عدلي / Farid Adly,
Editor in Chief, Italian-Arab News Agency ANBAMED
(Notizie dal Mediterraneo - أنباء البحر المتوسط),
Acquedolci (Italy)
Example Code:
<?php require '../src/arabic.php'; $Arabic = new \ArPHP\I18N\Arabic();