1. Romani language as Indo-Aryan language The Romani people, the Roma, and their language have been the subject of scientific interest since the 16th century. The scholarly research into the Romani language began in the 18th century, while the study of the Romani linguistics and the Roma was founded in the 19th century. To date, numerous works have been written on Roma history, society, culture, literature, and language, including dictionaries and grammars. Some of them describe a particular dialect, others aim at standardizing the Romani language. Today, a number of dictionaries, lexical databases, and grammars are also available digitally. In the 18th century, due to the similarities between Romani and Hindi, it was agreed that Romani is an Indian language and that the Roma originated from the Indian subcontinent. The Romani language belongs to the Indo-Aryan branch of the Indo-Iranian subfamily of Indo-European languages. It is a New Indo-Aryan (NIA) language that evolved from Old Indo-Aryan (OIA) through Middle Indo-Aryan or Prakrit languages (Pkr), similarly as all other New Indo-Aryan languages did. However, departure from its ancient homeland of India during the Middle Indo-Aryan linguistic phase, as well as the contact with numerous other languages resulted in its present unique structure and features. Therefore, Romani today conservatively preserves some original Indo-Aryan features, which are significantly closer to Prakrit and Old Indo-Aryan (preservation of some phonemes and consonant groups, synthetic verb forms, elements of grammatical structure), than in other New Indo-Aryan languages whose speakers have remained in the Indian subcontinent. On the other hand, Romani has distinct features that clearly distinguish it from the rest of the Indo-Aryan languages (its phonological system largely corresponds to the phonological system of contact languages, including the loss of retroflex consonants; new parts of speech such as prepositions and articles; syntactic order S-V-O; vocabulary containing Middle Persian loanwords, as well as Armenian, Greek, Romance, Slavic, etc. loanwords). On the basis of the the sound changes in Romani, R. L. Turner proved that the Roma people had originated from the area of the central part of northern India. Around the 4th century BC, they migrated to the northwest of the Indian subcontinent, and then, probably around the middle of the millennium, from India to Persia, where they may have settled until the middle of the 7th century. Thereafter, the Roma stayed in Armenia until about the 10th century and in Byzantium until about the 13th century before moving to other parts of Europe and, more recently, to other parts of the world. While the Romani language in the Byzantium seems to have been a unified common form of the Early Romani, the later migrations have led to a considerable diversification of the language, both through the influence of contact languages and through internal innovations, so that today there are many Romani dialects. Because of the intense migrations of the various Romani groups, their mutual contact, and interdialectal influences, it is very difficult to determine with certainty the exact genealogy of these dialects that would satisfy all scholars and lead to the agreement among them. 2. Phonology As mentioned above, Romani phonology remained in some respects more conservative than the phonology of other New Indo-Aryan languages, where the differences between sibilants were lost, as well as single intervocalic consonants, and great simplification (assimilation, etc.) of consonant groups took place. On the other hand, the Romani language lost some older Indo-Aryan phonological features due to the influence of the contact languages. For example, there was a loss of retroflex consonants (ṭ(h), ḍ(h), ṇ), because they do not occur in non-Indo-Aryan contact languages, like Persian, Armenian or European languages. In some dialects the loss of aspirates is also in progress (ph > p, th > t, kh > k, čh > č). Moreover, some new phonemes have entered Romani phonological system – f, x, c, z, ž, depending on the dialect and the phonology of the language the dialect was in contact with. 3. Morphology The declension system of Romani also differs in part from the declension system of other New Indo-Aryan languages. On the one hand, the declension of adjectives in Romani is almost identical to that of Hindi, with the endings -o, -i, -e, and the declension of nouns and (to some extent) pronouns in Romani coincides with New Indo-Aryan, in so far as it is characterized by the addition of postpositions. On the other hand, it is important to emphasize that a significant typological change has taken place in Romani – postpositions have turned into agglutinative case endings. This can be observed, for example, in the phrase “with mother, father, sister and brother,” which is read in Hindi as mātā jī, pitā jī, behen aur bhāī ke sāth. At the very end of this phrase is a single postposition meaning “with” which refers to all nouns in the phrase. In Romani, the same phrase reads e dajasa, e dadesa, e phenjasa thaj e phralesa. Each noun in the phrase ends in -sa, which means “with”. In this case, -sa is not a postposition, but a case ending for the instrumental. In addition, the Romani language has developed a category of prepositions, which is not typical of the New Indo-Aryan languages. Since these prepositions also convey case meanings and in Romani are usually associated with the nominative form (sometimes with the locative and genitive forms) of nouns, there is a tendency to replace the case forms by prepositional constructions, which means that Romani noun inflection is currently moving from synthetic to analytic inflection in a number of dialects. The Romani partly inherited and partly borrowed affixes to derive forms of different word classes and to compare adjectives. In addition, there were similar changes in morphology by analogy, both in the formation of the intransitive perfect participle and in the comparison of adjectives. Old Indo-Aryan has a comparative suffix -tara, as in ucca-tara, which is a comparative of the Old Indo-Aryan word ucca, meaning “tall”. In Romani, the comparative suffix is -eder, as in the comparative form učeder. It arose from the Old Indo-Aryan suffix and merged with the final vowel of the Old Indo-Aryan adjectival base ending in -a, giving rise to: Old Indo-Aryan ucca-tara > Rom. uče-der > uč-eder (OIA -a-tara > Rom. -eder). By analogy, comparatives of other adjectives were formed in this way, e.g., šukareder from šukar meaning “beautiful”, giving rise to the comparative suffix -eder (OIA śukra-tara without this analogy and with assimilations would give *šukeder). Romani is the only Indo-Aryan language with a definite article. In numerous dialects, the definite article can be o, i, and e (often le). The definite article was derived from the ProtoRomani and Early Romani demonstrative pronouns (derived from the OIA so, eta-), which came in use as definite articles while the Roma lived in Byzantium, due to intense contact with the Greek language, which has a definite article in its grammatical system (Romani definite articles were vocally assimilated with Greek articles). Since the articles precede the noun, usually in combination with prepositions, articles and prepositions often merge, e.g., an-o ker, meaning “in the house”. Demonstrative pronouns in Romani are a particularly complex set of word class with opaque derivation. They show traces (in their central element) of the Old Indian demonstrative pronoun (eta-), but Romani experts emphasize that the vowel a predominates in pronouns referring to what is near, while o predominates in those referring to what is far, e.g., adava, akava, kadava, etc. versus odova, okova, kodova, etc. Matras has attempted, as far in detail as possible, to reconstruct the emergence of the present demonstrative pronouns in three phases during the Proto-Romani period. In my opinion, the initial a- can be derived from the OIA pronoun aya(m) “this” (> *ā > Rom. a), and initial o- from the OIA pronoun asau “that” (as already derived by Sampson for personal pronouns and the article; asau > *aho > Rom. o). The second element of pronominal compound stems (*lo etc.) may be derived from the OIA base eta- “that”. As the third element – as formally (but without interpretation) envisaged by Matras, in his ‘third phase’ – it seems to me that the Romani adjectives kaj “where” and daj “somewhere” (after a- or o-) have been added to the beginning of the already compound base of the pronoun, to denote (as Matras says) “definite” and “indefinite” deixis. Finally, it is through analogous generalizations and polarizations of vowels a or o in all internal syllables that from *o-da-va, *o-ka-va the forms odova, okova or adava, akava, etc. must have arisen. Although this interpretation of mine is hypothetical, it may bring a great deal of morphological and semantic order and meaning into the confusing variety of demonstrative pronouns in Romani. The analytical derivation of complex numbers is yet another feature that distinguishes Romani from other New Indo-Aryan languages. In other New Indo-Aryan languages, numbers are simply inherited from Old Indo-Aryan and became – through a series of sound changes in Prakrit – quite intransparent, e.g., Hin. caudah “14” < Pkr. caudassa < OIA caturdaśa; Romani constructss them analytically, like dešuštar “14” (deš “10” + u + štar “4”). The reason for this could be the intensive contact of Romani with the Middle Persian language, in which complex numbers from 21 onwards are formed analytically using the conjunction u “and”. It must have therefore happened in Proto-Romani. Number borrowing is a very unusual and rare phenomenon. But Romani no longer has the original Indo-Aryan terms for the numbers 7, 8, and 9 (OIA sapta, aṣṭa, nava), but has borrowed them from Greek (Romani efta, oxto, enja), as well as the terms for 30, 40, and 50 (trianda, saranda, peninda), and the suffix for the derivation of ordinal numbers. On the border between nouns and verbs there are participles. The main suffixes for the intransitive perfect participle in Romani are -do and -lo, derived from the OIA suffix -ta. Old Indo-Aryan participles such as tāpita “warmed” and mukta “released, let go” have evolved into participles with different final syllables through different sound changes, namely -do and -lo, as in tavdo and muklo, from the verbs tavel “to warm” and mukel / mućel “to release, let go”. By analogy with such forms, many other participles were derived with -do and -lo final sylables, simply by adding them to the base of Romani verbs. That is how new regular participles arose in Romani. They developed through this analogy by adding the suffix -do or -lo to the verb base (the suffix -do is added to the verb bases ending in -n, -l, -d, -r, and otherwise -lo is added), while irregular participles were inherited from Old Indo-Aryan, having undergone sound changes. Therefore their bases look different from the bases of the verbs they belong to, e.g., gelo “gone” (< OIA gata) from džal “to go” or suto “fallen asleep” (< OIA supta) from sovel “to sleep”. Sometimes, suffix -no is used, as in dino from del “to give”, lino from lel “to take”. While dino can historically be traced to Pkr. dinna-, some verbs ending in -d, by analogy, follow the derivation of the participle with suffix -ino: čhudino from čhudel “to throw”. These intransitive perfect participles play an important role in morphology since in Romani they are the basis (ending in -d, -l, -n) for the formation of the perfect system tenses of verbs, (present) perfect and (past) pluperfect. On the other hand, there are also verb forms of the present system tenses (present, imperfect) derived from the present basis. Furthermore, the intransitive perfect participles provide the basis for the formation of passive forms. These forms of the perfect system were usually presented unsystematically in earlier grammars (e.g., l-preterite and d-preterite were distinguished without explaining the principle by which verbs take one form or the other). They were erroneously called preterite, aorist, or imperfect. They are forms of the (present) perfect tense. The past perfect tense (pluperfect) is derived analogously from them, just as the imperfect is derived from the present forms. In this doctoral thesis, I have tried to present these forms systematically, as well as the correct relationship between the formation of the passive and the formation of the perfect. The passive in Romani no longer requires agentive construction, as the corresponding forms in most New Indo-Aryan languages do. When constructing the passive from the participle base, the terminations for grammatical persons are derived from the forms of the auxiliary verb (copula < OIA bhavati) ovel. When constructing the perfect tense from the same participle base, the grammatical persons are expressed (at least in part) by endings derived from the other auxiliary verb (copula < OIA asti) si (some endings may also bear traces of the agentive pronoun). All this paints a much more systematic picture of verb morphology than the grammars have usually offered so far. In considering possible contact influences historically, I have concluded that such an (initially) analytical system of passives may have developed in the Proto-Romani period (during the stay of the Roma in Persia and Armenia), and that such an (initially) analytical system of perfect forms may have developed only in the Early Romani period, when the Roma lived in Byzantium. I hope that my systematization brings out a more precise periodization of the history of Romani morphology than we had so far. Romani has adopted another kind of participle from Old Indo-Aryan, i.e., the present participle. It is formed in Romani by adding the suffix -amno or -avno to the base of the verb in the present tense, which overlaps with the Old Indo-Aryan suffix for present middle praticiple māna (OIA -a-māna > Pkr. *-a-mano > Rom. -amno > Rom. -avno). In Romani, however, these participles have shifted meanings and often become adjectives and generate adjectival nouns, such as dikhavno “careful”, which as a noun means “supervisor” and is derived from the verb dikhel “to look”. Thus, its literal meaning is “looking”. Although these adjectives are not used in everyday speech with their literal participle meaning, and are classified as adjectives in the literature, it should be taken into consideration that they are derived exclusively from verbs. Therefore, and for the sake of systematicity of the description of the Romani language in historical perspective, they should be singled out as a separate category and defined as (originally) present (middle) participles. The grammatical structure of verb inflection in Romani is distinctly synthetic. It partly preserves the old synthetic forms (in the present tense), and partly has created new synthetic forms from the Middle Indo-Aryan analytic forms (in the perfect tense and passive). It differs from the structure of other New Indo-Aryan languages, which is almost completely analytic. Thus, in Hindi, a complete restructuring of the verb inflection system occurred in the New Indo-Aryan phase, so that the forms of verb tenses are constructed analytically, using present participles or intransitive perfect participles with the verb “to be” as copula. Also, there are distinctive complex verb compounds, which may consist of two or more verbs in different forms. These compound verbs consist of a basic verb that provides the basic semantic information, of a modifier verb that determines perfectiveness/imperfectiveness (whether the action is finished or not) and modes of action (Aktionsart), possibly followed by a modal verb in a participial form, and finally, an auxiliary verb that informs about tense or mode, person and number. The only synthetic verbal forms in Hindi are verb modes – conjunctive, imperative, and conditional to some extent. Unlike in Hindi, the present (and imperfect) in Romani is formed synthetically, using both the base and endings inherited from Old Indo-Aryan. The (present) perfect (and pluperfect) on the other hand, is formed from a perfect base derived from the intransitive perfect participle. Endings derived from the copula si are added to the base, but as stated, they may have preserved some traces of the agentive pronoun. The fusion of this perfect base and the endings gave rise (in the Early Romani period, i. e. during the stay in Byzantium) to new synthetic forms of perfect (and pluperfect) from the preceding analytic forms. The future tense can also be constructed either by the agglutinative suffix -a or analytically by means of the auxiliary verb kamel, or the particles kam or ka derived from it. Modal constructions consist of the main verb forms and modal verbs or particles (conjunctions). The new synthetic verb forms derived from the analytic ones, such as the forms of the perfect tenses, as well as the passive, arose, as mentioned, from the old periphrastic constructions of the intransitive perfect participles and auxiliary verbs, e.g., *gelo sem (gelo “gone”, sem “am”) > gelem “I have gone” (perfect); *phučlo ovel (phučlo “asked”, ovel “is”) > phučlol “(he / she) is asked” (passive). The participles and forms of the auxiliary verb si “to be”' have merged into distinctive perfect forms, which can now be analysed in the perfect verb bases and perfect endings. This is the reason why the perfect endings partially coincide with the present tense forms of the verb si “to be”. (However, they could also partially preserve the traces of agentive pronouns.) The same thing happened with the construction of the passive, except that the participle merged with the forms of the auxiliary verb ovel “to become, to be”, so that today they can also be considered (new) synthetic verb forms derived by suffixes. Denominatives are derived in Romani from adjectives and nouns following the same pattern, so that their bases merge with the auxiliary verb ovel, e.g., *baro ovel (baro “big”, ovel “becomes” > barjovel “grows”). Similarly to Hindi, Romani has no special verb meaning “to have” and therefore it uses a construction with the verb “to be” and a logical subject in the form of an oblique case/accusative, e.g., si man kher “I have a house”. The origin of this construction is not completely clear. One possibility is that this ‘accusative’ form is an Old Indo-Aryan genitive, from which most of the oblique forms emerged, so that the literal meaning of si man kher would be “(of (me) is the house)”. And the second possibility is that it is the influence of the Persian construction, which also consisted of a logical subject in the accusative and the verb “to be”: ma rā (...) ast “I have”. Unlike Hindi, the adverbial system in Romani is also distinctly synthetic. Some of the adverbs were inherited from Old Indo-Aryan, especially the adverbs ending in -e and -al, e.g., upre “up”, upral “from above”; some of them are being derived from adjectives with the suffix -e(s), e.g., lačho “good (masculine)” > lačhe(s) “well”; part consists of adverbial expressions merging prepositions with nouns, e.g., džibrše “next year”; and part consists of loan words, e.g., tehara “tomorrow” (from Greek). Some phrases were derived from adverbs of Middle Indo-Aryan origin, such as sar “how” > “as, like”; upre “up” > upro “over”; paše “near” > paša “at, next to”, etc. Romani also forms the adverbial present participle by the suffix -indo/-ando (etc.), which originates from the Old Indo-Aryan present active participle ending in -ant, cf. OIA hasant- > Rom. asando “laughing”. Most Romani conjunctions are loanwords, such as Slavic pa “so”, niti “neither”, a “and, while”, ali “but”, Romanian numa(j) “but” or Turkish ama “but”. The Indo-Aryan conjunctions are thaj “and” and vaj “or”, inherited from the Old Indo-Aryan (cf. OIA tathāpi “as well as” and vāpi “or”'), then the more recent (go)doleske “therefore”, or translated ones as maškaradava “however” and others. The particles va “yes”, na “no”, and ma “no” are also inherited from Old Indo-Aryan (cf. OIA evam “so”, na “no”, and mā “no”), while much of the rest of the particles are borrowed, just as the exclamations are. 4. Syntax As far as syntax is concerned, Romani, unlike Hindi, is not an ergative (or partly ergative) language (anymore). Its syntax differs greatly from the syntax of the Indo-Aryan languages. In other words, while the Old Indo-Aryan, Prakrts, and New Indo-Aryan languages have the word order S-O-V as their dominant syntactic order, in Romani the word order S-V-O is dominant. This is the result of intense language contacts, especially with the Greek language. 5. Lexicon As mentioned above, the ancestors of the Roma left the Indian subcontinent and lived in Persia (before the Arab conquest in the 7th century), then in Armenia, Byzantium (10th to 13th centuries), and finally in various parts of Europe and the world. At the same time, the Romani language was in intense contact with Persian, Armenian, Greek, and other European languages, which is best observable in its vocabulary. The Romani lexicon, therefore, consists of (at least) five layers: Indo-Aryan (original Romani lexicon), Iranian (borrowed from Iranian languages), Armenian, Greek, and a lexical layer borrowed from various other European languages (Romance, Slavic, Germanic, Albanian, etc.). Due to the constant migration of the Roma and their struggle for survival among the majority peoples throughout history, the Romani language has lost much of its original IndoAryan vocabulary and did not have the opportunity to profit by reintroducing the necessary Indo-Aryan vocabulary systematically, as the Indo-Aryan languages in India were able to do. The lack of this vocabulary, necessary for the systematic development of Romani as a language capable of communication in all spheres of modern life, is compensated for by speakers through lexical borrowings from the languages of the area in which they live. The educated speakers attempt nowadays to compensate for this by borrowing words from Hindi, the official language of the Republic of India. However, lacking the word formation capacity to develop its own vocabulary, the Hindi language expands its vocabulary through borrowing from Sanskrit (or, throughout history under Muslim rulers, from Persian). Therefore, expanding the Romani lexicon by borrowing words from Hindi is not the best solution for several reasons. There are some difficulties in trying to phonologically adjust the Hindi lexicon to Romani. In addition, there are problems in choosing the lexicon because there are numerous Arabian and Persian loanwords in Hindi. There is no linguistic argument that Hindi should be the source of the new Romani lexicon because Hindi is only one of the related languages for Romani, just like Punjabi, Gujarati, Bengali, and others, and not an ancestral or classical language that could provide the lexical material for its systematic development. The adequate linguistic material can neither be provided from the European languages from which Romani speakers nowadays borrow the necessary vocabulary, depending on the linguistic area where they live, thus creating a mutual “lexical distance”between the speakers of different dialects. Adopting the vocabulary from contact languages will not lead to a systematic development of a standard Romani – moreover, such a method of adoption would lead to an even greater division and differentiation of the Romani dialects, leaving Roma without a unified standard language. Therefore, I endorse the views of Marcel Courthiade, who points out that the most appropriate source of language material needed for the development of the Romani lexicon should be the dialects that still preserve their old lexical heritage. In these dialects, there are terms that have fallen out of use somewhere, but could be revived in everyday communication (interdialectal borrowing). The systematic development of vocabulary also requires the creation of neologisms. In this case, this should be done from the Romani material to the extent permitted by the word formation patterns of Romani, through their consistent application. However, in the case of semantic gaps and the impossibility to create new words using the existing patterns in Romani, lexical borrowing is inevitable. And if Romani aspires to a single common standard language, the most appropriate source of the lexical material, as in the case of other New Indian languages, would be the ancestral classical language, Sanskrit, which preserves forgotten and hidden lexical treasures for Romani. Moreover, Romani is still very close to Sanskrit both phonologically and morphologically.