Introduction: According to the evolutionary leadership theory, the leader-follower relationship is an adaptation that evolved as a mechanism for facilitating coordination during group activities (Van Vugt, Hogan, et al., 2008; Van Vugt & Ahuja, 2010). Important group activities in the era of human ancestors included foraging and group hunting, maintaining intra-group relations, and maintaining relations with other groups that ranged from peaceful cohabitation to warfare (Van Vugt, Hogan, et al., 2008), and it is hypothesized that success in those activities increased fitness for both leader and followers. According to the hypothesis of evolutionary mismatch (Li et al., 2018) it is possible that modern humans are still sensitive to cues that signaled good leadership to human ancestors but are not related to the quality of leadership in modern days. Since dominant and masculine ancestral leaders had greater chances of success in group activities such as hunting and fighting, it is thought that modern humans are particularly sensitive to cues of those traits (Petersen & Laustsen, 2020). Furthermore, the human face is shown to be a rich source of information used in non-verbal communication (Little et al., 2011), and people make implicit and rapid-fast impressions of personality traits based on facial appearance (Todorov, 2017). Therefore, many evolutionary studies investigated preferences toward leaders based on the facial cues that signaled good leadership to human ancestors, and past research was largely focused on facial signals of dominance that are often operationalized through experimental manipulations of facial masculinity. It has been shown that faces perceived as leader-like are also perceived as more competent, attractive, and masculine (Little, 2014), and it has been shown that stronger men acquired higher ranks in companies (Lukaszewski et al., 2016), therefore facial masculinity might be the universal cue in leader preferences. Furthermore, it has been shown that preferences for leaders with dominant and masculine faces increase when the followers' group is under threat of physical conflict with other groups, which indicates that social conflict moderates preferences for masculine leaders (Banai et al., 2022; Laustsen & Petersen, 2015; Little et al., 2007), and that the effect of facial masculinity was more important signal then the biological sex (Ferguson et al., 2019; Spisak, Dekker, et al., 2012). While the perception of the risk of social conflict may relate to current societal conditions, it might also be reflected through the personality trait of political ideology, where conservatives are more likely to perceive the world as more threatening and competitive (Duckitt & Sibley, 2010). Political ideology was tested as a moderator of preferences for leaders with masculine faces, and there is substantial experimental evidence that conservative followers prefer masculine and dominant leaders (Banai et al., 2022; Laustsen, 2017; Laustsen & Petersen, 2015). Furthermore, experimental research indicates that followers prefer leaders whose stances are congruent to their facial appearance (e.g., masculine-looking leaders sending a conservative message (Spisak, Homan, et al., 2012)), which might be an ally-finding mechanism (Pietraszewski et al., 2015). The aforementioned effects of preferences towards masculine-looking leaders were obtained using an experimental research design, which mostly included mock elections in which participants chose a leader between computer-generated or manipulated face images. While consistent replications of these effects speak of their validity, it is unknown if these effects are pronounced enough to impact real-world leader selection outcomes. To address this issue current research is focused on testing evolutionary hypotheses on preferences towards leaders’ facial masculinity using highly ecologically valid research design, and real-world outcomes of leader selection processes. To achieve this goal, this research was conducted on the sample of political election results, facial photographs of real politicians, macro-level estimates of conflict and political ideology, as well as politicians’ ideological positions, and facial masculinity was determined using a novel method of algorithmic image processing. Research aims and hypotheses: The aim of the current study is to investigate the relationship between politicians’ facial masculinity and real-world political elections outcomes. H1: Since more masculine faces are perceived as more dominant it is expected that masculinelooking politicians will have a greater success in political elections. H2: Since it has been shown that contextual information moderates the relationship between facial masculinity and followers’ preferences, it is expected that masculine-looking politicians will be preferred in countries with ongoing conflict and that feminine-looking politicians will be preferred in conflict-free countries. H3: Since it has been shown that conservative voters more often see the world as a dangerous and threatening place, it is expected that masculine-looking politicians will be preferred in conservative countries whereas feminine-looking politicians will be preferred in liberal countries. H4. Since it has been shown that followers show preferences towards congruence between leaders’ physical appearance and their stances, it is expected that masculine-looking politicians will be preferred if they are nominated by conservative parties, and that feminine-looking politicians will be preferred if they are nominated by liberal parties. Preliminary study: The preliminary study was conducted to construct and validate the algorithmic method for assessing facial masculinity. This approach combines computer vision for determining facial landmarks (standardized points that define anatomical features of a face) and predictive modeling to establish an algorithmic function that will be used for the estimation of facial masculinity of politicians’ faces in the main research. Method: Sample of faces. In the preliminary study, Chicago Face Database (CFD) (Ma et al., 2015) was used as a sample of human faces. It consists of 586 standardized photographs of males and females of different ethnicities. Each photograph contains two additional sets of information that were used in this study. First, it contains a set of anthropometric measurements: nose width, lip thickness, average eyes height, average eyes width, pupil-upper lip length (left and right), lower lip-chin length, eye shape (average eye height divided by average eye width) and facial width to height ratio. Second, the CFD contains subjective ratings for each face in the dataset. Each face was evaluated by multiple naïve respondents on the attributions of age, gender, race, masculinity, femininity, baby facedness, attractiveness, trustworthiness, how unusual it is, and then if it expresses emotions of happiness, sadness, disgust, surprise, or fear. Estimating facial landmarks. Facial landmarks were estimated using Face++, a commercial computer vision software aimed at facial recognition used in previous psychological research (Kosinski, 2017). The facial landmarks feature was used to map 83 facial landmarks to images of faces from CFD. These landmarks map faces for attributes of face shape (19 landmarks), eye shape (10 landmarks for each eye), eyebrows (8 landmarks for each eyebrow), mouth (18 landmarks), and nose (10 landmarks). The facial recognition algorithm analyses facial images and returns a data frame consisting of x- and y-coordinates for each image. These coordinates were estimated for each face in the CFD and used as inputs for further analysis. Results: Validation of algorithmic landmarks accuracy. To validate the accuracy of computer vision facial landmarks estimation correlations between computer-based and human-based anthropometric measures were calculated. First, Euclidean distances between computer-generated landmarks were calculated to reflect anthropometric measures of faces included in the CFD provided by human participants. Second, Pearson correlation coefficients were calculated for each corresponding measure between computerand human-generated estimates, for the total sample of faces and for each ethnic group separately. Results showed that correlations between all human- and computer-generated measures were statistically significant, positive, and high, with the lowest correlation being 0.74. Furthermore, no systematic differences in correlations calculated for subsamples of different ethnicities were found, indicating that the estimation of facial landmarks performs comparatively well for faces of different ethnicities. These results indicate that Face++ Facial landmarks feature estimates facial landmarks comparably well to human participants. Validation of algorithmic masculinity estimation method. In the first step of constructing a method for algorithmic facial masculinity estimation geometric-morphometric (GMM) analysis was applied. GMM is a multivariate method for analyzing the two-or three-dimensional representations of shapes (Adams & Otárola-Castillo, 2013). In the first step, Generalized Procrustes analysis was conducted to standardize images of faces on their position in space, size, and rotation in two-dimensional space. The result of this analysis is Procrustes coordinates that are suitable for multivariate statistical analyses. In the second step, a Principal components analysis was applied to Procrustes coordinates to reduce the dimensionality of the data. The parallel analysis indicated that 14 components should be retained, and this solution explained 90% of the variance in Procrustes coordinates. In the third step, a linear discrimination analysis was used to create a function for determining the masculinity index. The dataset on 586 faces was split into a training dataset used to train the predictive model (2/3 of data), and into a validation dataset used to validate the model’s accuracy (1/3 of data). Results on 14 components calculated in previous steps were used as predictors in the linear discrimination analysis (LDA), to predict the biological sex of the person on the image. A total of 5 models were estimated to examine if the accuracy of the model trained on faces of all ethnicities performs similarly to models trained on each ethnicity separately. The accuracy of models was evaluated on the validation set and showed that all models had ~80% accuracy. This result indicated that models had sufficiently high accuracy and that the model built on the total sample of faces performed similarly well as models built separately on different ethnic samples. This indicates that facial sexual dimorphism features are universal among humans of different ethnicities and that the model built on the overall sample is suitable for usage in further analyses since it does not capture less information than ethnic-specific models. Next, the masculinity index was calculated for each face from the CFD by applying the discriminant function from the LDA to results on components representing facial shape. The masculinity index was validated by calculating its’ correlation with subjective measures of facial masculinity and femininity provided in the CFD. Results showed that the correlation between the masculinity index and subjective measure of facial masculinity is statistically significant, positive, and high, whereas the correlation between the masculinity index and subjective femininity is statistically significant, negative, and high. Conclusion: The preliminary study showed that a) facial recognition software produces facial landmarks like human participants, b) the LDA model predicted the biological sex of a person on an image with high accuracy, c) the masculinity index calculated on the basis of discriminant function showed high positive relationship with the subjective measure of masculinity and high negative relationship with the subjective measure of femininity. Main research: Method: Sample of political elections. The data on the results of political elections was acquired from www.electionguide.org in the timeframe of January 1st, 2009, to December 31st, 2019. Acquired data included the results of two candidates that won the most votes (election winner vs. loser), for presidential and parliamentary elections, held in countries where leaders are chosen via direct democracy. Female candidates were excluded since they constituted less than 10% of the total sample. The final sample included the results of 407 elections and 814 candidates. Sample of politicians’ facial images. Politicians’ face images were obtained via the Google Images search engine. Three images were obtained for each candidate. The final sample consisted of 772 politicians that competed in 195 presidential and 191 parliamentary elections across 155 countries in the world. Facial masculinity. An R script was developed for the automated calculation of facial masculinity. First, images of politicians’ faces were sent to Face++ API and facial landmarks coordinates were downloaded. Generalized Procrustes analysis was applied to facial landmarks coordinates. Then, components estimated in preliminary research were calculated from Procrustes coordinates. Lastly, discriminant function scores were applied to calculate the index of facial masculinity. The national level of conflict. The level of conflict in countries in which the elections were held was estimated via Global Peace Index (GPI). This is a measure developed by the Institute for Economics and Peace and is calculated based on 23 parameters that indicate the presence of conflict, armament, or military presence in each country. Indices were collected for years corresponding to the year in which a country held an election in the sample used here. GPI was collected for a total of 327 elections, providing electoral context for runs of 654 candidates. The national level of political ideology. National political ideology was operationalized via a question on political ideology from the World Values Survey (WVS). For each election, the closest time point of VWS wave was selected, and raw data on political ideology were averaged to represent a national measure. National-level ideology was estimated for a total of 173 elections. Candidates’ political ideology. Candidate’s political ideology was estimated using a dataset from the Manifesto project, which systematically analyses political parties’ manifestos, and reports on party-level political ideology. This information was obtained for 92 parliamentary elections. Control variables. The type of political elections and incumbency were used in regression models as control variables. Results: All hypotheses were tested using linear regression models for a percentage of votes as the continuous dependent variable, and binary logistic regression models for election outcome as the binary dependent variable. For each hypothesis, a model that tests the hypothesis was applied, and afterward, it was re-run by including control variables: incumbency as a covariate and election type as a moderator. H1 was tested using politicians’ facial masculinity as the predictor. H2 was tested by the interaction terms of facial masculinity and national-level conflict. H3 was tested by the interaction terms of politicians’ facial masculinity and national ideology. H4 was tested by interaction terms of politicians’ facial masculinity and their political party’s ideology. All models were estimated with robust standard errors clustered at the election level. Results showed that voters did not show increased preferences for politicians with masculine faces in general, neither on presidential nor parliamentary elections, which is a result that does not support H1. Next, political candidates with masculine faces had a greater chance of winning the parliamentary elections in countries with levels of conflict, a result that supports H2. However, this effect was not shown for presidential elections, and candidates with feminine faces did not have greater chances of winning the elections in countries with low levels of conflict. Furthermore, political candidates with masculine faces had greater chances of winning the parliamentary elections in conservative countries, a finding that supports H3. This effect was not present during presidential elections, and feminine-looking candidates did not stand a better chance in liberal countries. Lastly, the candidate’s political ideology did not moderate the relationship between facial masculinity and election outcome, which is a finding that does not support H4. Discussion: The results presented here are mostly in line with expectations stemming from the evolutionary leadership theory. While facial masculinity was not shown to be a universally preferred trait, this might also have a theoretical explanation. Leadership among human ancestors is thought to be contextual and contingent, and facial masculinity seems not to be the cue to universal leadership ability, such as perceived competence. Furthermore, selecting a masculine and dominant leader might be a costly decision for followers, if they use the position of power to direct group resources toward themselves and their allies. However, the tests for H2 and H3 are in line with evolutionary-driven expectations. The results indicate that masculine-looking politicians are favored in the presence of conflict, and among conservative voters. These results give strong and ecologically valid support to the prior experimental findings. However, it is unclear at this stage why those hypotheses were confirmed only for parliamentary elections, and further research should be done to address potential different decision-making mechanisms for different election types. Lastly, the congruence between politicians’ facial masculinity and their political ideology was not shown to be favored during elections. Here the explanation might be methodological – the sample for testing this hypothesis was by far the smallest in this study. The second explanation might be the potential pre-selection of candidates during the intra-party elections, which is also a topic for future studies. The presented results contribute to applying evolutionary leadership theory to a real-world context. Voters’ relying on uninformative and shallow cues, such as physical appearance, might seem to present a threat to modern democracy which is thought to work best if the voters’ decision is based on thorough and deliberate analysis of political candidates’ traits and prospects. However, research on voting behavior might benefit from getting familiar with human nature and getting a more complete overview of hypothesized evolved preferences, before marking them as irrational.