|Abstract (english)|| |
This thesis examines the influence of thematic roles as semantic factors on the formation of synthetic compounds in English and Croatian (e.g. firefighter, vatrogasac ‘firefighter’). Synthetic compounds represent a particularly fruitful area of research, primarily because their formation has been previously described as the intersection of morphological, syntactic and semantic factors. The starting point of this dissertation is an overview of the most influential formal and functionalist models of synthetic compound formation (Lees 1963, Roeper and Siegel 1978, Botha 1980, Selkirk 1982, Lieber 1983, Grimshaw 1990, Oshita 1994, Ryder 1999, Ackema and Neeleman 2004, Booij 2010b, Gaeta 2010). These approaches are grouped into several main strands – approaches based on syntactic configuration, approaches based on argument structure, and approaches based on semantics factors, and the advantages and shortcomings of all approaches are discussed in turn. One of the most commonly assumed properties of synthetic compounds is that their formation is governed primarily by syntactic factors. This assumption is typically expressed in the form of a rule which states that synthetic compounds can only be formed through incorporation of an element which bears a thematic role which functions as an internal argument of the verb (in the sense of Williams 1981a and 1981b), such as Theme or Patient. According to Roeper and Siegel (1978), such a generalization explains why a compound like peace-maker is grammatically valid and a compound like *quick-maker is ungrammatical. Different formalizations of this rule were proposed in the literature, such as the First Sister Principle in Roeper and Siegel (1978), the Deep Structure Hypothesis in Botha (1980), the First Order Projection Condition in Selkirk (1982), and the Feature Percolation Conventions in Lieber (1983). Given the existence of synthetic compounds like winter warmer and axe murderer which clearly contradict this rule, the aim of this dissertation is to challenge the syntactocentric assumptions and analyse the formation of synthetic compounds through the scope of semantic factors. This dissertation approaches the formation of synthetic compounds from the perspective of Construction Morphology (Booij 2005 and 2010b, Gaeta 2006, Gaeta 2010, Gaeta and Zeldes 2017, Gaeta and Angster 2018, Tsujimura and Davis 2018). This theoretical framework constitutes a part of a wider network of Construction Grammar (CxG) approaches (Lakoff 1987, Fillmore et al. 1988, Goldberg 1995 and 2006, Kay and Fillmore 1999, Bergen and Chang 2005, Croft 2007, Boas and Sag 2012, Van Trijp et al. 2012) and usage-based approaches (Langacker 1988, Kemmer and Barlow 1999, Bybee 2006, Traugott 2008). Though various operationalizations of the term construction can be found in different CxG approaches, they are typically defined as form-meaning pairings which can contain both lexically filled (specified) and lexically open (schematic) parts. Constructions can be fully lexically specified, as is the case with morphologically simple words like cat, mouse, jump; partially lexically specified, which is the case in constructional idioms to jog X’s memory; and they can even be completely schematic, like the Ditransitive construction ‘Subj V OBJ1 OBJ2’ (cf. Goldberg 1995 and 2006). These constructions of varying size and complexity are mutually connected into a network called the constructicon, which represents our entire knowledge of language (Goldberg 2003). From the perspective of Construction Morphology (CxM), all words are constructions at the word level and synthetic compound in English are generally assumed to be licensed by constructions in which at least one part is schematic (Booij 2010b). By using corpus linguistic and psycholinguistic methods, we are able to test this and other assumptions about the creation of synthetic compounds in English and Croatian. The fact that Construction Morphology and Construction Grammar constitute a part of the usage-based paradigm represents an additional argument in favour of using these research methods. Corpus linguistics methods were used in this dissertation to analyse the productivity, schematicity and frequency of use of synthetic compounds in English and Croatian and the factors which might influence their formation. The corpus of English compounds was collected from the Daily Mail sub-corpus (comprising 23,192,074 tokens) of the English Broadsheet Newspapers 1993–2013 corpus available on Sketch Engine. The corpus of Croatian synthetic compounds was gathered from the Večernji list sub-corpus (49,237,340 tokens) of the HrWaC corpus (Ljubešić i Klubička 2014). Using language-specific CQL queries, a corpus of 18,720 English synthetic compounds and 16,520 Croatian synthetic compounds was collected. These compounds were further annotated for token and lexeme frequency, number of hapaxes (lexemes occurring only once in a corpus), individual nouns, verbs and affixes they consist of, general frequency of use of verbs, frequency of use of verbs in synthetic compounds, number of different left constituents in compounds, and the thematic roles which they contain. Statistical analysis of the corpus data showed that the formation of synthetic compounds in English is a highly productive and schematic process, as reflected in a relatively high hapaxto-token ratio (0.110), relatively high number of individual lexemes (3,825), a high number of different left constituents that a verb forms compounds with (4.98), and a relatively high number range of thematic roles assigned to left constituents of compounds. Analysis also revealed that the frequency of use of verbs in the formation of synthetic compounds and the number of different left constituents used to form synthetic compounds with a particular verb are in a statistically significant correlation with the formation of unprototypical thematic roles. These results indicate that verbs which are more frequently used to form synthetic compounds and which form synthetic compounds with a wider range of different left constituents (nouns) are more likely to form compounds in which the left constituent is not an internal argument of the verb, thus challenging the syntactocentric assumptions of formal models of formation. Statistical analysis of the Croatian corpus data indicates that synthetic compounds are only a semi-productive and a semi-schematic word-formation process, as reflected in a significantly lower hapax-to-token ratio (0.013), lower number of individual lexemes (494), a lower number of different left constituents with which a verb forms compounds (2.15), and a relatively short range of different thematic roles assigned to the left constituents of compounds. Contrary to the data for synthetic compounds in English, the results for Croatian compounds show that neither frequency of use of verbs in the formation of synthetic compounds nor the number of different left constituents used to form synthetic compounds with a particular verb are correlated with the formation of compounds with unprototypical thematic roles. This result is in line with the obtained values for productivity and schematicity as it indicates that synthetic compounds in Croatian are formed by partially lexically specified word-formation patterns which depend on individual verbs. As a generalization of the corpus data collected for both languages, two continua of nominal synthetic compounds were proposed in the thesis. The continuum for English synthetic compounds contains constructions at all levels of lexical specificity: lexically fully specified and idiomatic constructions (e.g. brainteaser), partially schematic constructions like [[X]N [[dodge]V er]N]N which denotes ‘a person who avoids something literally/metaphorically’ (e.g. draft dodger, soap dodger), and a completely schematic construction proposed as a generalization for creating all semantically compositional and non-idiomatic compounds [[X]N [[Y]V er]N]N ‘a person/thing performing an action Y which involves X’ (e.g. body warmer, winter warmer). Since the productivity and schematicity values showed that synthetic compounding in Croatian is only a partially schematic word-formation pattern, the continuum of constructions assumed for Croatian synthetic compounds contains only fully specified constructions (e.g. stihoklepac ‘versemonger’) and verb-specific partially specified constructions like [XNi- -o- gradiV- -telj]Nk denoting ‘a person/thingk which builds Xi’ (e.g. mostograditelj ‘bridge builder’ and cestograditelj ‘road builder’). The results of the corpus analysis were further tested using the psycholinguistic methodology, specifically, a lexical decision task experiment. The experiments for both languages (English and Croatian) were created and conducted using the IBEX Farm experimental platform (Drummond 2011). Both experiments were conducted with native speakers of English and Croatian and had the same 5 x 2 design, with thematic role types (5 levels – Patient, Theme, Goal, Instrument and Adjunct) and prime-target thematic role congruence (2 levels – congruent/incongruent) as factors. However, due to a low number of verbs which formed synthetic compounds with particular thematic roles, the Croatian version of the experiment also included corpus attestation of the verb as an additional third factor (2 levels – attested/unattested). Both experiments measured the reaction time and accuracy of participants when assessing novel synthetic compounds as possible words in English and Croatian. In the English version of the experiment, target compounds were primed by sentences containing an existing synthetic compound with different thematic roles between compound constituents. The experiment contained 40 synthetic compounds (8 per thematic role) and 60 filler tasks which contained nonsensical combinations of words (e.g. adolescent broom) and non-words (e.g. zilmer chan) as target words. The Croatian version of the experiment had the same methodology with minor modifications in terms of prime type and number of tasks. In this version of the experiment, the prime sentence did not contain a synthetic compound but the thematic relation was congruent or incongruent between the elements of the prime sentence (verbs and complements/modifiers) and the target compound. The Croatian version of the experiment also included 30 synthetic compounds as targets (6 per thematic role) due to a lower overall number of synthetic compounds found in the Croatian corpus. Analysis of data collected from the experiments revealed that thematic roles have a statistically significant and systematic effect on processing of synthetic compounds in English, which is reflected in slower reaction time and lower acceptability rating with less prototypical thematic roles (Instrument and Adjuncts). The same effect was not established for synthetic compounds in Croatian, which indicates that semantic factors do not affect processing of synthetic compounds in Croatian. These results were also confirmed by a mixed effects model (Bates et al. 2015) which used thematic roles, thematic congruence, frequency of use of verbs, and the number of different left constituents as fixed effects and individual participants and tasks as random effects. Although the psycholinguistic data yielded different results for synthetic compounds in the two languages, both groups of results largely reflect the conclusions of the corpus analysis which suggest two different word-formation patterns for creation of synthetic compounds in English and Croatian. These results represent a strong argument in favour of the usage-based models of language and confirm the assumption of these models that language capacity emerges as a result of linguistic experience and human exposure to language, whereas the structure and regularity within language stem from the frequency of use of a particular linguistic pattern.