\unnumbered

[1,3]\fnmZachary \surYang

1]\orgdivSchool of Computer Science, \orgnameMcGill University

2]\orgdivPolitical Science, \orgnameMcGill University

3]\orgnameMontreal Institute for Learning Algorithms

4]\orgdivDepartment of Political Science, \orgnameUniversity of Montreal

Regional and Temporal Patterns of Partisan Polarization during the COVID-19 Pandemic in the United States and Canada

[email protected] \fnmAnne \surImouza \fnmMaximilian \surPuelma Touzel \fnmCécile \surAmadoro \fnmGabrielle \surDesrosiers-Brisebois \fnmKellin \surPelrine \fnmSacha \surLevy \fnmJean-François \surGodbout \fnmReihaneh \surRabbany [ [ [ [

Abstract

Public health measures were among the most polarizing topics debated online during the COVID-19 pandemic. Much of the discussion surrounded specific events, such as when and which particular interventions came into practise. In this work, we develop and apply an approach to measure subnational and event-driven variation of partisan polarization and explore how these dynamics varied both across and within countries. We apply our measure to a dataset of over 50 million tweets posted during late 2020, a salient period of polarizing discourse in the early phase of the pandemic. In particular, we examine regional variations in both the United States and Canada, focusing on three specific health interventions: lockdowns, masks, and vaccines. We find that more politically conservative regions had higher levels of partisan polarization in both countries, especially in the US where a strong negative correlation exists between regional vaccination rates and degree of polarization in vaccine related discussions. We then analyze the timing, context, and profile of spikes in polarization, linking them to specific events discussed on social media across different regions in both countries. These typically last only a few days in duration, suggesting that online discussions reflect and could even drive changes in public opinion, which in the context of pandemic response impacts public health outcomes across different regions and over time.

keywords:

Polarization, COVID-19, Social Media, Computational Social Science

1 Introduction

Partisan polarization is increasingly prevalent in democracies around the world [1]. In the United States, the level of opposition between Democrats and Republicans has been steadily growing for decades [2, 3] and reached unprecedented heights during the 2020 presidential election [1, 4]. This polarization even affected how individuals responded to the COVID-19 pandemic by influencing their assessment of the dangers posed by the virus and their response to public health measures [5, 6, 7]. Several studies have now confirmed that supporters of the Democratic party were more likely to follow social distancing measures [8, 9, 10], wear masks [11, 12], and get vaccinated [13, 14, 15] when compared to their Republican counterparts. This polarizing trend among partisans is not limited to the United States [16, 17]. Other countries, like Canada, have also experienced the rapid politicization of pandemic responses, where researchers found that supporters of the Liberal Party were more likely to follow COVID-19 guidelines than supporters of the Conservative Party or the populist People’s Party [18, 19].

While countries like the United States and Canada adopted strategies to coordinate efforts in addressing the pandemic at the national level, there was still significant variation in both the amount and types of public health interventions introduced by subnational governments. For instance, Canadian provinces implemented different policies, ranging from comprehensive lockdowns and school closures to more targeted guidelines focusing on specific populations [20]. Similar variation can be observed in the United States, where some states implemented strict lockdown orders and mask mandates, while others refused to limit social distancing [21]. Much like at the national level, these different regional policies also became rapidly politicized along partisan lines [17], especially on social media platforms, where the politicization of the COVID-19 pandemic largely unfolded [22, 23, 10, 24].

Numerous studies have now confirmed that online discussions surrounding the pandemic [25, 9, 26, 27] exhibited clear regional patterns characterized by the same partisan animosity that impacted the heterogeneous implementation of public health measures [25] and the resulting epidemiological outcomes [9]. Additional research highlights that these partisan divisions also contributed to the increasing polarization observed on social media [25, 27, 24]. The intense reactions from both supporters and opponents of public health measures [28] implies that public opinion could have significantly been influenced by local political dynamics and the geography of the pandemic [29]. This regional heterogeneity provides us with a unique opportunity to study polarization around specific events, topics, and regions, to understand how various factors affected compliance to COVID-19 guidelines. However, reliably measuring the polarization of public discourse at more fine-grained resolution is a challenge, even with the large quantity of human-generated text with extensive meta-data available on digital platforms.

In this work, we propose a solution to this measurement problem by introducing a comprehensive approach to better understand the geographic and event-driven variation of online partisan polarization of COVID-19 discussions within American states and Canadian provinces. We examine variations in public discourse as contained on Twitter (now X) to determine how polarization is related to: (1) the ideological leanings of different regions; (2) the amount of conspiracy theory-related messages that users have been exposed to; and (3) vaccination data.

The paper is organized as follows. First, we briefly outline our approach to region- and time-resolved polarization data from Twitter (X). In this section, we also describe our machine learning method to classify users as conservative or liberal and justify our choice of topic-conditioned language dissimilarity as a proxy for partisan polarization. Next, we present the results of applying our approach to a large-scale dataset we collected in 2020, filtered through three prominent pandemic-related public health interventions: lockdowns, masks, and vaccines. Our findings indicate that conservative regions in both countries exhibited higher polarization levels on these topics overall. We also find strong negative correlation between vaccination rates in different U.S. regions and the level of polarization in their online discussions related to vaccines. We close with a discussion of limitations of the approach and promising new areas of application.

2 Approach

Refer to caption — Figure 1: Overview of the proposed method to estimate partisan polarization over date, region, and topic (top), as well as how to analyze this data by collapsing it over any of those three dimensions (bottom). We studied the topics of lockdowns, masks, and vaccines.

We developed a method to measure geographically-resolved partisan polarization over time from large-scale social media message datasets (see fig. 1). The language of political discussions across socio-demographic groups can vary significantly [30], each having their own lexicon, so dissimilar language on its own does not imply polarized positions. However, when conditioning on discussion of the same, contentious topic, the dissimilarity of the language used by different demographics is more likely to reflect alternative semantic understandings of that specific topic, which we assume are highly correlated with polarization. This correlation is weakened by linguistic differences not captured by the particular definition of semantic separation used, so the latter is only a noisy proxy of polarization strength. That said, we expect that a stronger correlation between semantic separation and polarization exists when language is represented in a more expressive model. Inspired by the demonstrated capacity of modern vector embeddings to represent the semantics of words, our approach focuses on transforming the sentences of social media posts using RoBERTa, a powerful open-source, language-embedding model [31]. As an indicator for polarization, we then measure dissimilarity by how far apart the tweets of left and right-leaning users are in this embedding space. In particular, we use the C-index [32], a robust clustering measure based on the average of pairwise distances of embedded partisan users within a partisan group relative to the average of the largest and smallest distances overall. To label partisans in our data, we developed and validated a machine learning method that identifies users as conservative or liberal on the ideological spectrum by cross-indexing multiple metadata sources. We also developed and validated a method to geolocate users to resolve polarization’s geographic heterogeneity. Details of these components of the approach can be found in the Methods section.

2.1 Application to late 2020 pandemic discourse in the United States & Canada

We collected a large-scale dataset of COVID-19 political discussions on Twitter (X) occurring between October \nth9, 2020, to January \nth4, 2021, comprising about 46.6 million tweets linked to Canada and 12.5 million tweets linked to the United States. We geolocated users based on their provided location and classified them by their declared party affiliation. Specifically, we include identifiers for the two major liberal and conservative ideological divisions in each country: the left (Liberal Party, New Democratic Party, and Green Party) and the right (Conservative Party and People’s Party) for Canada; and the Democratic (left) and Republican (right) parties for the United States. Using official population census and election results, we verify that these data provide a politically balanced set of users in the different regions of these two countries. For each of the users, we then compute a vector representation for the language they used in their social media messages during this period. We condition on region, time, and topic and for each combination compute the value of the C-index as a proxy for polarization. To narrow the content of the messages analyzed, we focus on three specific topics of discussion: lockdowns, masks and vaccines. These topics were chosen because of their salience for polarized discourse around the pandemic [27, 33, 34], and because they span different types of interventions (group behaviour, individual behaviour, and medicine, respectively). Based on these measurements, we then compare the polarization observed in different American states and Canadian provinces over time for each of the three topics. We also look into how polarization is correlated with epidemiological data and conspiracy-related content. We refer the reader to the Methods section for further details.

3 Results

We organize the presentation of results as follows. First, we report the geographical trends of the observed partisan polarization in the United States and Canada and confirm that conservative states and provinces display more polarized online discourse. Next, we highlight the correlation between partisan polarization on the topic of vaccines and the vaccination rates found across U.S. states. We then present our event-based analysis of the temporal patterns of polarization at the national level in both countries and report correlations between polarization and vaccination data, as well as the volume of conspiracy-related content on Twitter (X). Finally, we examine the different peaks in polarization and explain how they relate to various polarizing events.

3.1 Regional Variation in Partisan Polarization

Our analysis begins by visualizing the geography of partisan polarization in fig. 2 and fig. 3 for the United States and Canada. The regional heterogeneity in the amount of polarization observed over different topics is apparent in both countries. We also see heterogeneity in the amount of conspiracy-related tweets shown in fig. 2d and fig. 3d for both countries, respectively.

Next, we analyze how this heterogeneity varies with the partisan leanings found in each region by analyzing election voting patterns (the 2020 presidential election in the case of the United States and the 2019 federal election for Canada). Our first observation is that conservative states and provinces show higher levels of polarization compared to their liberal counterparts. To display results over all covered regions, we show the polarization rankings for American states and Canadian provinces in fig. 4 and fig. 5, respectively. This ranking is applied separately to each of the three topics and is based on their weekly polarization averaged over the 12-week period. An additional fourth ranking labelled overall is shown and gives the average over the three topics. Each region is associated with a color graded from blue to red based on the vote margin for the Republican party (US) or the conservative party family (Canada) obtained from the votes reported in the most recent election in their corresponding country. In these figures, a blue to red color gradient for conservative to progressive is used such that the names of predominantly conservative/Republican regions appear in red, predominantly liberal/Democratic regions in blue, and mixed or less definitive regions in purple. Referring to fig. 4, in the United States we can see clearly that conservative states are more polarized compared to liberal states overall and specifically on discussions related to masks and vaccines. The ranking is more mixed in the discussions about lockdown measures with outliers from both liberal and conservative states; namely Idaho, Alabama, and Arkansas showing the least polarization, and Delaware (ranked 1st), Colorado (ranked 17th), and New Jersey (ranked 15th) showing higher values. The expected relationship between pandemic response and state partisanship is however still present for lockdown discussions, with liberal states such as Vermont (ranked 41st) and Massachusetts (ranked 43rd) displaying less polarization compared to more conservative states such as Mississippi (ranked 2nd), North Dakota (ranked 3rd), and Oklahoma (ranked 4th).

Looking now at Canada fig. 4, we find that Alberta, a conservative province, shows higher polarization compared to Ontario, and British Columbia (among the Canadian provinces with the most number of social media users). Quebec is overall the highest ranked province. Although the pandemic was highly polarized in Quebec—e.g., with violent protests [35]—we want to acknowledge the limitation of our study, which was focused on the English language; in Quebec, the main language is French whereas only English tweets were included in our analysis.

Finally, the correlation between polarization and the partisan vote margin is more clearly represented in the scatter between the two, shown for both countries in fig. 6(a). We see a strong and significant correlation between Republican vote share in the United States and the polarization index around masks and vaccines discourse, but not lockdowns. The remaining associations (Lockdown for US, and all topics for Canada) are, however, insignificant (for Canada this is in part due to the relatively small number of regions).

The strong correlation that we observe between the polarization score for discourse around vaccines in conservative-leaning states follows the well-known negative correlation between Republican vote share and vaccination rates [36]. Since vaccines were not available yet over this time period, we nevertheless present a comparison using official vaccination rates measured for different states by the U.S. Centers for Disease Control and Prevention for a similar period of time one year later, after the vaccines were rolled out (i.e. October \nth11, 2021 to January \nth3, 2022). Averaging on a weekly basis, we confirm this correlation in fig. 6(b), where we observe that vaccine polarization is strongly negatively correlated to vaccination rates in the different American states. We did not observe a similar pattern in Canada, due to small sample size and the implementation of vaccine mandates. While disentangling the causal relationships among conservative vote margin, polarization score, and vaccination rate is not possible here, the results suggest that polarized discourse played a role in shaping the highly heterogeneous vaccination rates across the U.S.

3.2 Temporal Variation in Partisan Polarization

We next focus on the temporal trends of daily partisan polarization at the national level for each topic and overall, as displayed in fig. 7 for the United States and Canada, respectively. In these figures, the value of the metric fluctuates rapidly on the timescale of days. This is on the faster end of the range of timescales found in other topic tracking studies, e.g., [37]. These short timescales are consistent with our assumption that language adapts quickly in rapid anticipation of or as an immediate response to specific events. In particular, we considered two kinds of events. First, we preselected political and vaccine-related events (shown in table 1(a) and as vertical lines in fig. 7). These provide the scaffold for the socio-political trajectory of each country related to political discourse and pandemic response. Second, we detected highly polarized events through analysis of the highest two peaks in polarization (shown in table 1(b) and as red circles in fig. 7). We also show in the figures the tweet volume as a relative indicator of day-by-day reliability of the estimation of the polarization score. While we do not have direct causal evidence linking a highlighted peak in the polarization score to a specific event, we do find that many of the largest polarization peaks occur around highly contentious events related to each country’s specific context (table 1(b)). In the following two sections, we summarize these events and discuss how they relate to the topic for which the polarization simultaneously peaks.

(a) Major political and pandemic-related events in each country such as wheen the FDA (U.S. Food & Drug Administration), and PHAC (Public Health Agency of Canada) approved vaccines.

Date	Event	Country
Nov. 3	US National Election	US
Oct. 24	BC General Election	Canada
Oct. 26	Saskatchewan General Election	Canada
Dec. 8	States resolve controversies	US
Dec. 9	PHAC approves Pfizer vaccine	Canada
Dec. 11	FDA approves vaccines	US
	Electoral votes submitted	US
Dec. 14	Vaccination begins	Canada
Dec. 20	Moderna vaccine distributed	US
	Election votes arrive	US
Dec. 23	PHAC approves Moderna Vaccine	Canada

(b) Polarization peaks and their corresponding events. For each topic, we analyzed the two highest peaks, and inferred the content discussed on those peaks.

	United States		Canada
Topic	Date	Polarizing Event	Date	Polarizing Event
	Nov. 1	Viral tweet by Trump	Oct. 17	Toronto Mask Measures Protest
Lockdown	Nov. 21	Unidentified topic	Oct. 29	Calgary Mask Measures Protest
	Oct. 21	Viral tweet by Trump	Oct. 12	Unidentifed topic
Masks	Nov. 14	Biden proposes mandates	Nov. 14	PHAC recommends masks
	Dec. 20	Moderna vaccine distributed	Dec. 20	Moderna distributed in U.S.
Vaccines	Dec 22	Biden gets vaccinated	Dec. 23	PHAC approves Moderna

Table 1: Major dates and peaks within the United States and Canada during 2020.

United States Polarization Timecourse

The left column of fig. 7 reports the daily polarization measured for the three key topics of lockdowns, masks, and vaccines. Peaks in polarization on the lockdown topic in fig. 7(a) may correspond to partisan differences in public support (or discontent) and discourse surrounding COVID-19 measures. In the days leading up to the 2020 Presidential Election on November 3rd, a pillar of President Trump’s campaign messaging on the pandemic characterized lockdowns as tyranny and economic repression [38]. For example, on November \nth1, 2020, the date of the second-largest peak, Trump made a highly controversial claim by stating that the election was a choice between implementing deadly lockdown measures supported by Biden or an efficient end to the COVID-19 crisis with a safe vaccine [39]. Trump also made other similar claims on Twitter (X) during this period, e.g., when he said (sic): “Biden wants to LOCK DOWN our Country, maybe for years. Crazy! There will be NO LOCKDOWNS. The great American Comeback is underway!!!” [38].

Next, contentious debates related to masks were found coincident with peaks in polarization, as shown in fig. 7(c). For example, while we did not find an event external to social media on October \nth31, 2020, the date of the highest peak, we did find that the most retweeted tweet by Democrats on that day was “RT @JoeBiden: Be a patriot. Wear a mask.”. This, in turn, generated strong responses that day from Republicans with the third most retweeted tweet within this group: “RT @RealBrysonGray: There’s literally nothing patriotic about being so scared of a virus with a 99.9…”. This is then possibly an example of influencer post-driven, rather than real world event-driven polarization. Another set of divisive messages were observed on November \nth14, 2020, the next highest peak, after presidential candidate Biden proposed mandatory mask mandates, and South Dakota Governor Kristi Noem announced her opposition to this measure; nearly half of all the top retweets referred to Noem’s statement.

Finally, fig. 7(e) reports trends in polarization around the topic of vaccines. Here, some of the peaks observed are simultaneous with important events surrounding COVID-19 vaccine efficacy. We see that the second-largest peak occurred on December \nth22, 2020, a day after Biden received his first COVID-19 vaccine shot [40]. The most retweeted tweet for both partisan groups that day was “RT @JoeBiden: Today, I received the COVID-19 vaccine. To the scientists and researchers who worked tirelessly to make this possible - than…”. However, while supporters of Biden congratulated him, by tweeting messages like “RT @YAFBiden: And just like that, @JoeBiden has received the COVID-19 vaccine!”, opponents instead promoted pro-Trump messages, e.g. “RT @TheLeoTerrell: Finally a @JoeBiden confession. He finally gave credit to @realDonaldTrump and #OperationWarpSpeed. It’s about time.”.

Canadian Polarization Timecourse

The right column of fig. 7 shows the daily polarization measured for the three key topics of lockdowns, masks, and vaccines. The pre-selected events in the Canadian timeline (table 1(a)) are marked as vertical lines in the figure.

In fig. 7(b), on the topic of lockdowns, we observe the highest peak on October \nth17, 2020, which coincides with the Toronto anti-mask protest, a large demonstration where thousands of protesters rallied against COVID-19 lockdown measures. The second-highest peak is observed on November \nth29, 2020, when the national news reported a Calgary Mask Measures protest on the preceding day [41].

The highest polarization peak on mask-related tweets is found on November \nth14, 2020, as seen in fig. 7(d), coincident with the peak in the US plot 7(c) mentioned earlier. This event was discussed by the conservative-Party family users, showing how partisan discourse in the U.S. might be driving some polarization in Canada.

Looking at the polarization of discussions about vaccines in fig. 7(f), we also observe the highest peaks are in response to key vaccine-related events: The two highest peaks in polarization observed on December \nth20 and \nth23, 2020 coincided with the distribution of the Moderna COVID-19 vaccines in the U.S. and Health Canada’s approval of the Moderna vaccine [42], respectively. While the former is an event associated with the United States, it led to discussions in Canada about vaccine prioritization and availability [43]. On the \nth20, top retweets by liberal Party Family users focused on news of the Republican politicians being first in line for the vaccine, while conservative Party Family users retweeted more diverse anti-vaccine sentiment. On December \nth23, 2020, the top retweets were strong sentiments in support of and in opposition to the approval.

Aggregate Polarization

To complement the granular analysis presented above, we also evaluated the measure’s overall responsiveness to polarizing events. In particular, we computed an average of the polarization score over topics (shown in fig. 8a,b) and then performed event-triggered averaging around such events, to show how the metric varies in time before and after these particular dates on average. This aggregate result (shown in fig. 8c) confirms a fast (on the order of days) and largely symmetric rise-and-decay profile around these polarization peaks.

3.3 The Relationship between Conspiracy and Polarization

Finally, we explored the relationship between conspiracy discourse and polarization using our time-resolved measurements of the number of conspiracy-related tweets. In particular, we compared it with the time course of the aggregate daily polarization presented in the previous section. The profiles broken down by progressive and liberal partisanship for U.S. and Canada are shown in fig. 9(c) and fig. 9(a), respectively. For both countries, conservative partisans tweet conspiracy-related content in higher numbers than progressive partisans. The correlation with aggregate daily polarization for the U.S. and Canada (fig. 8(a) and fig. 8(c)) is shown in fig. 9(b) and fig. 9(d), respectively. For the U.S., we find a small but significant negative correlation with polarization.

4 Discussion

This article investigated regional and event-triggered variation in partisan division within social media debates surrounding the introduction of COVID-19 public health measures across American states and Canadian provinces. Our computational analysis was centered around quantifying partisan polarization by analyzing the language used in millions of online messages from users affiliated with different political parties. In particular, we focused on Twitter (X) discussions related to three key public health interventions during the early phases of the COVID-19 pandemic: lockdowns, masks, and vaccines, as well as tracking the volume of conspiracy-related tweets. Our analysis explored the geographic heterogeneity of polarization and identified political events that likely influenced public opinion over time.

Like several other studies before us [e.g., 44, 17, 45, 46], we found that more right leaning states and provinces exhibited greater partisan divisions around COVID-19 on Twitter (X), in particular concerning topics of mask mandates and vaccine distribution. However, we went beyond these studies to characterize the geographic heterogeneity and time course of polarization, relating features in our polarization metric to real world events. We looked into the relationship between polarization and public health initiatives in the U.S. and confirmed a strong negative correlation between partisan polarization and future vaccination rates and a moderate negative correlation between the temporal profiles of volume of conspiracy-related tweets and aggregate polarization. We did not observe similar patterns in Canada.

4.1 United States

The early phases of the COVID-19 pandemic in the United States prompted a variety of public discussions online that reflected strong regional variation of partisan support [25, 47, 48, 49, 50]. State-specific polarization obtained from our computational approach could be expected to be uniformly low, with the message content of conservative and liberal states each having internally homogeneous semantics. Instead, our analysis confirmed that polarization was notably higher in conservative states, where we found that Republican vote margins had a significant positive effect on polarization in discussions concerning masks and vaccines, after controlling for other factors.

This main result is consistent with previous studies that suggest conservative states exhibited higher levels of polarization in response to public health interventions compared to their liberal counterparts [10, 48]. Nevertheless, we found that this pattern does not hold across every state. Delaware, a liberal state, exhibited a distinctly high level of polarization, likely due to the strict public health measures implemented by the governor in response to the rapid increase in the number of cases during the first wave of the pandemic [51, 52, 53]. Similarly, low levels of polarization were observed in several conservative states, like Arkansas, Alabama, and Idaho, with a possible explanation coming from more unified opposition to restrictive COVID-19 measures like mask mandate orders [54]. Nevertheless, most conservative states exhibited relatively high levels of polarization (fig. 6(a)).

Our analysis did not identify a single, general cause for this relationship. That said, commonalities in each state’s trajectory in the pandemic offer some clues: e.g., Mississippi, North Dakota, and Oklahoma experienced specific political decisions—such as mask mandates—made in the earlier phase of the pandemic, that led to resistance despite rising COVID-19 cases [55, 56, 47]. In Mississippi, the decision by the governor to lift the statewide mask mandate in late September could also have contributed to heightened levels of polarization [55, 54]. A similar pattern was observed in North Dakota, South Dakota, and Oklahoma, where initial hesitancy to enforce mask mandates also appears to have led to increased partisan divisions [57, 58, 59, 60, 61, 62, 54]. This finding is not surprising, since the party affiliation of governors is the most important predictor of the widespread adoption of mask mandates [63]. One possible explanation for our main result that can be gleaned from these anecdotes is that pandemic severity increasingly strains the more uniform opposition to restrictive health measures in more conservative states, leading them to exhibit higher levels of polarization [64, 65, 54]. This is a distinct source of polarization than that in states with more equal distribution of partisans across competitive districts.

Through our approach, we could also dissect how polarization varies in time over the three topics we considered: lockdowns, masks, and vaccines. These three topics exhibited similar baseline levels of polarization during the period of study, which was between the second and third waves of the pandemic, punctuated by large positive deviations that typically rise and fall quickly. The prevalence of these deviations was smallest for the lockdown topic. Its low correlation with Republican party vote share suggests that it did not act as a meaningful indicator of partisan opposition. The polarization time course for masks and vaccines, however, contained many, sharp peaks, many of which we were able to identify with a real world event. For example, South Dakota initially experienced very high levels of mask polarization, coinciding with efforts by medical authorities to promote mask-wearing, despite Governor Kristi Noem’s opposition [54, 66]. Her public display of opposition to Biden’s suggestion of mask mandates lead to one such peak in polarization. Similarly, North Dakota also displayed early signs of increased polarization on conspiracy theories [47], which may have been exacerbated by the posthumous electoral win of a Republican candidate who died from COVID-19 [67, 68, 69].

The strong negative correlation between vaccines polarization and vaccination rates that we observed in the U.S. demonstrates that states with higher vaccination rates were also less polarized around this issue [70]. The exact origin of this correlation is unclear. However, factors such as education and political ideology, which also have a strong geographic dependence, likely played a role [71]. Indeed, higher education levels are generally associated with greater vaccine acceptance and trust in vaccine safety [72]. Moreover, Democrats tend to trust the COVID-19 vaccines more and have been early adopters, whereas Republicans generally show lower levels of such trust [73].

Overall, the high levels of polarization observed in the United States relative to Canada point to a more divided society. Several studies have confirmed that conservative states and counties were less likely to adopt social distancing measures, impose mask mandates, and get vaccinated in the second and third wave of the pandemic. Our study offers new insights into these trends by demonstrating that they correlate with regional heterogeneity in social media discourse, particularly during salient political events around health measures. We also found that this discourse reflects changes in the pandemic timeline, initially related to stay-at-home lockdown orders, followed by mask mandates, and later transitioning to vaccines as they first became available.

4.2 Canada

The Canadian set of results also suggest that partisan divisions influenced public responses to COVID-19 measures in this country [74]. Compared to the U.S., we found a similar, albeit much weaker association between polarization and conservative political leaning, with conservative provinces like Alberta and Saskatchewan experiencing higher levels of polarization during stricter lockdown measures than their more liberal counterparts, such as British Columbia and Ontario [75, 35]. Polarization levels varied over smaller and medium-sized provinces as well, measured as relatively high for New Brunswick and low for Nunavut, where the rates of COVID-19 infections remained relatively low during the pandemic (the sole COVID-free jurisdiction in North America until November 2020) [76, 77]. Additionally, we also found that polarization surrounding mask mandates and vaccines were not homogeneously distributed across provinces.

Among the Canadian provinces, Quebec is an interesting case for our analysis of polarization. For example, our results confirmed that Quebec had the highest level of partisan division over vaccines, but also the highest reported incidence of COVID-19 in Canada during the first and second waves of the pandemic [78]. Quebec’s unique approach to managing the pandemic with its more restrictive measures relative to other provinces is also somewhat reflected in our results. After a relative hiatus with several restrictions relaxed in the summer of 2020, Quebec once again became the epicenter of the pandemic in the fall [79]. This resurgence led to the reinstatement of strict pandemic control measures and a ban on public demonstrations following significant anti-government protests against lockdowns and mask mandates [80, 81]. These events also coincided with an increase in online conflicts, promoted by Canadian far-right populist rhetoric and conspiracy theories on Twitter (X) [82]. It is important to note, however, that most of these conversations in Canada were heavily influenced by discussions in the US, with Canadians retweeting American vaccine-related content 8 times as often as Canadian content during the period covered by our study [83, 84]. Likewise, vaccine hesitancy was also linked to political affiliation in Canada, with those supporting the Conservative Party more likely to refuse vaccination [85].

As in the U.S. case, the polarization time course computed for Canada also exhibits spikes observed around key events like protests against lockdown measures, mask mandates, and vaccine roll-outs. These findings suggest that public reactions to significant political and social events during the pandemic are reflected in the measure of polarization we use. We did not observe the negative correlation between polarization and volume of conspiracy-related tweets that we saw in the U.S. case. This contrasts with [83], who found a reduction in negative sentiment in Canadian vaccine-related tweets between January and December 2020. The relationship between polarization and sentiment is complex and long-term trends are likely driven by processes besides pure volume of discussion around conspiracy theories [86].

Finally, the relatively lower influence of polarization on vaccine attitudes may be attributed to the country’s more widespread vaccine mandates [87, 88]. This prevalence, along with higher levels of trust in politicians [89] and social capital [90, 91], could have contributed to a broader acceptance of COVID-19 health interventions [84, 85]. Indeed, there was a rare ‘cross-partisan consensus’ among Canadians regarding emergency measures in the early stages of the pandemic [92]. This consensus, however, was not mirrored on social media, where conspiracy theories widely circulated [24, 84]. Overall, our results indicate that online discussions surrounding lockdowns, masks, and vaccines did mirror polarization, and were shaped by regional reactions to events and circumstances specific to Canadian provinces.

4.3 Limitations

While our method offers valuable insights, it comes with certain limitations. First, we viewed partisan polarization only through the proxy of semantic similarity. This choice may in certain cases obscure some signals not captured by the semantic embedding representation. Second, specifically in the Canadian context, we categorized users into liberal (left) and conservative (right) party family groups. During the manual annotation of Twitter (X) profiles, we encountered few users who identified as supporters of the Bloc Quebecois political party; therefore, we opted to exclude them from the analysis. Additionally, our classification of users into liberal and conservative partisan groups is based on self-reported information, which may not be entirely accurate. Third, it is important to note that our analysis is based on Twitter (X) data, which may not fully capture the views and sentiments of the broader American and Canadian public. Fourth, our analysis is restricted to tweets in English. In the context of Canada, this means we are capturing only or primarily the perspectives of either anglophones or bilingual francophones, which could potentially bias our data; for example, the high levels of polarization observed in Quebec on COVID-19 measures may be influenced by this language bias. Finally, while several of our analyses rely on correlations, it is crucial to remember that these results do not imply causation; the relationship between polarization and public health measures is complex and multi-dimensional.

4.4 Conclusion

To conclude, our method has provided valuable insights into the dynamics of partisan polarization during the COVID-19 pandemic. Political ideology, public trust, and key events have emerged as important factors influencing public discussions on pandemic-related issues in the United States and Canada. By combining our polarization measure with other data, researchers and practitioners can better understand how polarization varies across location, time, and specific issues. This knowledge could help in detecting particularly polarizing discussions on social media and in developing communication strategies to mitigate the spread of misinformation, both for the current pandemic and for future health-related crises.

The differences observed between these two countries are somewhat harder to explain. Our analysis, along with insights from recent studies, suggests that Canadian responses to public health measures could explain the lower levels of polarization found in Canada. Indeed, there was a significant consensus on the effectiveness of stay-at-home orders (i.e., lockdowns), mask mandates, and vaccines not only at the federal and provincial levels, but also within the news media. And unlike the U.S., where an important number of Republican leaders aligned with Trump’s anti-mask and anti-lockdown positions, the pandemic did not become a salient partisan issue within a political campaign until much later in 2021. Prior to this, the opposition to public health measures in Canada was primarily found in online communities, outside of the mainstream media and political parties, where protesters remained heavily influenced by American sources. Although our results suggest that social networks contributed to the diffusion of these opinions during the COVID-19 pandemic, more work needs to be done to quantify the impact of online communities interactions on polarization.

5 Methodology

In the following section, we describe in detail our text-based measurement of partisan polarization. We first explain the data collection process. We then show how we classified tweets into respective topics, geo-located users and grouped them by party affiliation. Finally, we describe the equation used to measure partisan polarization as well as our approximation algorithm. Figure 1 provides a visual overview of our process in measuring partisan polarization. For additional details, please refer to Section 8 in the Supplementary Material.

5.1 Data Collection

5.1.1 Twitter (X) Data

We used Twitter’s (X) official API to collect 1% of real-time tweets for Canada and the United States from October \nth9, 2020 to January \nth4, 2021. This represents 231,841,790 tweets and 4,765,115 users for Canada (a dataset filtered for COVID and politics) and 387,090,097 tweets and 23,758,112 users for the United States (a dataset filtered for election politics). We fed the following list of keywords in the API to filter relevant tweets:

Canada: ‘trudeau’, ‘legault’, ‘doug ford’, ‘pallister’, ‘horgan’, ‘scott moe’, ‘jason kenney’, ‘dwight ball’, ‘blaine higgs’, ‘stephan mcneil’, ‘cdnpoli’, ‘canpol’, ‘cdnmedia’, ‘mcga’, ‘covidcanada’ and all combinations of ‘covid’ or ‘coronavirus’ as the prefix and the (full & abbreviated) name of each provinces and territories as the suffix.

United States: ‘JoeBiden’, ‘DonaldTrump’, ‘Biden’, ‘Trump’, ‘vote’, ‘election’, ‘2020Elections’, ‘Elections2020’, ‘PresidentElectJoe’, ‘MAGA’, ‘BidenHaris2020’, ‘Election2020’.

5.2 COVID-19 Vaccination Rate

Similar to the COVID-19 pandemic data, we also used the officially reported vaccination rate of the populations. We used the vaccination rates one year later compared to the Twitter (X) data, as COVID-19 vaccines were created and approved at the very end of our data collection process. Thus, the vaccination rates are for those who obtained at least two doses. For Canada, this is the $numtotal\_fully$ from the government’s vaccine coverage map. We normalize this column by Canada’s 2021 population per province or territory. For the United States, we use the $people\_fully\_vaccinated\_per\_hundred$ reported in the COVID Data Tracker from the CDC.

5.3 Classifying Tweets By Topics

For this study, we looked into three key topics for COVID-19: lockdowns, masks and vaccines. We also looked into conspiracy theories. For each topic, tweets were classified as relevant or irrelevant to the topic based on whether they contained at least one of the topic-specific keywords. For conspiracy-related tweets, relevant means that the content is related to COVID-19 conspiracy theories (either supporting or opposing). A tweet can belong to more than one topic.

We first used a hashtag-based filtering step. We extracted all hashtags within our dataset, ordered it by frequency, and discarded those that appeared less than 100 times. This filtered list contained 3,600 hashtags for Canada and 18,000 for the United States. Two political scientists manually annotated this list with topic and relevance labels. The list was narrowed to only those hashtags labeled as relevant, resulting in 631 relevant hashtags. We then merged these with hashtags identified in previous studies for the same topics—i.e., from refs. [93, 94, 95].

For Canada, this process resulted in 46,636,206 tweets and 1,757,675 users that shared content related to COVID-19. For the United States, this represents 12,552,213 tweets and 2,657,355 users. Using the RoBERTa-base model [31] from HuggingFace, we further pre-trained this model on the respective COVID-19 tweets from each country dataset—i.e., performing a self-supervised learning on predicting masked words within tweets. This results in two different country-specific pre-trained language models for COVID-19 tweets.

We then randomly sampled 200 relevant and 200 irrelevant tweets per topic from each dataset, for a total of 1,600 tweets. The same two political scientists manually reviewed each tweet separately to determine if the tweet was relevant/irrelevant. We discarded tweets where the annotators could not reach a consensus. We then trained the respective pre-trained RoBERTa-base model on each dataset to classify by topics—i.e., 4 topic language models per country, for a total of 8 language models. We report the support, Cohen Kappa, F1-score and number of tweets we extracted for each topic within each dataset in Table 4. Our analysis achieved a near perfect F1-score for each of these topics.

5.4 Classifying Users by Geo-Location

We wanted to quantify users in each province or state represented in our data, as the users retrieved from Twitter (X) could be imbalanced relative to region population size. For this, we geolocated all users with an explicit location provided in the location field, a free-form text, as part of their profile information. We process the information with Open Street Map and the ArcGIS API. Both of these return a latitude and longitude if a location was found. We set the threshold for a clear geolocation if both API responded with a latitude and longitude that was within one degree of each other. We found this to be more accurate, and preferable to using a pre-trained Named Entity Recognition algorithm; most of the user-provided locations can be handled by these Geographic Information System APIs, and the APIs could also provide important details such as the country, state and city. We correlated the geolocated users with the official population census for each country’s region. In total, we geolocated 282,454 users with strong correlation of 0.92 (n=13, p=6.20e-06, CI=[0.76, 0.98]) for Canada and 757,601 users with strong correlation of 0.98 (n=52, p=9.27e-35, CI=[0.96, 0.99]) for the United States. This means that each region is well represented in our data. Further information is reported in Table 5.

5.5 Classifying Users By Party Affiliation

We determine a user’s party affiliation using a two step approach. First, we classify politically explicit users based on their profiles. We then use the predictions from this profile classifier as labels to train a classifier based on the user activity. We report the support, F1-score, and number of users we classified for each party within each dataset in Table 6 for Canada and Table 9 for the United States. We achieve a macro-F1-score of 91% for both Canada and the United States.

Profile Classifier

As a preprocessing step, we filter out users that are not politically explicit. Politically explicit users are those whose profile description contains at least one political keyword defined for any political party. For Canada, we focused on the five main political parties: Conservative, Green, Liberal, New Democratic Party and People’s Party. For the United States, we focused on the Democratic and Republican parties. The following is the set of keywords we have per party:

Canada:

Conservative - ‘erin o’toole’, ‘andrew scheer’, ‘conservative’, ‘conservative party’, ‘cpc’, ‘cpc2021’, ‘cpc2019’, ‘conservative party of canada’

Green - ‘annamie paul’, ‘green party’, ‘gpc’, ‘gpc2019’, ‘gpc2021’, ‘green party of canada’

Liberal - ‘justin trudeau’, ‘liberal’, ‘liberal party’, ‘lpc’, ‘lpc2021’, ‘lpc2021’, ‘lpc2019’, ‘liberal party of canada’

New Democratic Party - ‘jagmeeet singh’, ‘new democrat’, ‘new democrats’, ‘new democratic party’, ‘ndp’, ‘ndp2021’, ‘ndp2019’

People’s Party - ‘maxime bernier’, ‘people’s party’, ‘ppc’, ‘ppc2019’, ‘ppc2021’, ‘people’s party of canada’

United States:

Democrat - ‘liberal,’ ‘progressive,’ ‘democrat,’ ‘biden’

Republican - ‘conservative,’ ‘gop,’ ‘republican,’ ‘trump,’

We then randomly selected a set of politically explicit users for each party to be manually annotated by two political scientists. We only use party labels that both annotators agreed upon. The Cohen Kappa score for the pair of annotation sets is 0.74 and 0.76 for Canada and the United States respectively. We then train a RoBERTA-large model with a 80-20 train-test split to determine user party affiliation (see the profile classifier section for more details). Exact numbers can be found in their respective tables in the supplementary.

Activity Classifier

For this second phase, we make use of the respective RoBERTa-base model pre-trained on COVID-19 tweets for each dataset to extract the tweet embeddings (768-dimensional vector). We then generate user embeddings (768-dimensional vector) by pooling together (mean aggregation) all tweet embeddings from that user.

We then train an MLP consisting of two fully connected layers with the user embeddings as input to predict the party affiliation. Before training, we filtered out users based on their activity $\alpha$ (i.e., number of COVID-19-related tweets in the dataset). We performed a hyperparameter search for $\alpha$ among $\{1,3,5,10,15,20\}$ using 5-fold cross validation. This was 5 tweets for Canada and 10 tweets for the United States.

Specifically for Canada, we found that the MLP could not distinguish the parties sufficiently. Hence, we grouped the parties based on their partisan leaning. The liberal (left) party family included the Liberal Party, New Democratic Party and Green Party while the conservative (right) party family included the Conservative Party and People’s Party. We removed supporters of other minor parties and the Bloc Quebecois.

External Evaluation

Following the best practice for evaluating party affiliation predictions [96], we matched Twitter (X) users from the United States with the primary voter registration records available for five states: Ohio, New York, Florida, Arkansas, North Carolina, as well as Washington DC. We describe this procedure and its result in more details in the supplementary material 10. We achieve an accuracy of 74.35% for the profile classifier and 73.35% for the activity classifier. Both classifiers are binary, but users can also be independent or support a third party, despite an ideology (and behavior/voting) that aligns strongly with one of the two main parties. They can also have an outdated registration that no longer reflects the beliefs they currently hold and express on Twitter (X). Therefore, although accuracy here is lower than in our training model, it still indicates our classification is acceptable and on par with the standard for this type of evaluation [96].

5.6 Measuring Partisan Polarization

Given a set of political parties $\mathcal{P}$ and a set of given user embeddings $\mathcal{U}=\{{u^{(1)},u^{(2)},\dots,u^{(n)}}\}$ where $u^{(i)}\in\mathbb{R}^{768}$ and $n$ is the number of users, we measure polarization as follows.

We first look at the distance and dispersion between each party, $p\in\mathcal{P}$ . We base our measure on the C-index of Hubert [32] to quantify the extent of clustering and overlap of each political party. This is done by first calculating the sum of inter-cluster distances:

S_{w}=\frac{1}{2}\sum_{p\in\mathcal{P}}\sum_{u,v\in p}||u-v||

(1)

We then normalize this value based on its minimum and maximum possible ranges, $S_{min}$ and $S_{max}$ . These correspond to the sum of the $m$ smallest (resp. largest) distances between points in $\mathcal{U}$ ; where $m=\sum_{p\in\mathcal{P}}\frac{|p|(|p|-1)}{2}$ .

Based on these, we define our polarization index $poli$ as:

poli=\frac{S_{max}-S_{w}}{S_{max}-S_{min}}

(2)

The minimum value 0 represents no polarization, whereas the maximum value 1 represents the most extreme possible polarization, i.e., $p\in\mathcal{P}$ are completely isolated from each other.

5.7 Approximation of Polarization

Equation 2 is not scalable to large $n$ , as it is $O(n^{2}\log(n^{2}))$ . We approximate it by sub-sampling a sufficient set of users which enables us to scale to a large number of users. To determine the minimum sample size needed, we use the coefficient of variation, which is defined as $\frac{std}{mean}$ and expressed as a percentage. Generally, a coefficient of variation under 10 gives reasonable results [97].

Algorithm 1 summarizes this procedure. This approximation allows us to scale our measure significantly without compromising accuracy. One loop of the approximation has a time complexity of $O(rf^{2}*n^{2}\log((n)^{2}))$ where $r$ is the repeat count and $f$ is the fraction (e.g., 0.01). From our testing, we know that at large values of $n$ , the fraction needed rarely increases, so only one loop is required. Therefore, the time saved is $rf^{2}$ for large values of $n$ .

We test the accuracy of approximation in Algorithm 1 over the daily lockdown and vaccine tweets. In Figure 19(a), we plot the total number of users against the absolute error (and its standard deviation) of the approximated $poli$ compared to the exact value, binned for every 1,000 users. We observe a dramatic drop of the absolute error term at around 3,000 users. When we reach the 10,000 users value, the absolute error is usually below 0.001. In Figure 19(b), we plot the total number of users against the time saved in running the approximation algorithm compared to running the exact $poli$ , binned also for every 1,000 users. We observe that the time saved is exponential to the number of users. We note that at around 50,000 users, the approximation rarely needs to increase the fraction of users sampled.

These findings confirm that we can accurately approximate $poli$ for large-scale data that is impossible to measure exactly because of memory constraints. As $poli$ relies heavily on finding pairwise distances (time and memory intensive), we see from our analysis that a sampling approach can save both time and memory exponentially.

Algorithm 1 Approximating

poli

\mathcal{U}

fraction

epsilon

step\_size

repeats

2:while

cv\_poli>epsilon

poli\_indices\leftarrow[]

num\_of\_runs\leftarrow 0

5: for

i

repeats

\mathcal{U}_{s}\leftarrow sample(\mathcal{U},fraction)

poli\_indices.append(poli(\mathcal{U}_{s}))

8: end for

mean\_poli\leftarrow mean(poli\_indices)

10:

std\_poli\leftarrow std(poli\_indices)

11:

cv\_poli\leftarrow std\_poli/mean\_poli

12:

fraction\leftarrow fraction+step\_size

13:end while

14:return

mean\_poli

6 Acknowledgements

This research is supported by CIFAR AI Catalyst Grants and Canada CIFAR AI Research Chair funding.

References

\bibcommenthead
[1] Hart, P. S., Chinn, S. & Soroka, S. Politicization and polarization in covid-19 news coverage. Science Communication 42, 679–697 (2020). URL https://doi.org/10.1177/1075547020950735.
[2] Iyengar, S., Lelkes, Y., Levendusky, M., Malhotra, N. & Westwood, S. J. The origins and consequences of affective polarization in the united states. Annual Review of Political Science 22, 129–146 (2019). URL https://doi.org/10.1146/annurev-polisci-051117-073034.
[3] Pew Research Center, D., Washington. Political polarization in the american public. https://www.pewresearch.org/politics/2014/06/12/political-polarization-in-the-american-public/ (2014).
[4] Amlani, S. & Algara, C. Partisanship & nationalization in american elections: Evidence from presidential, senatorial, & gubernatorial elections in the us counties, 1872–2020. Electoral Studies 73, 102387 (2021).
[5] Stewart, A. J., McCarty, N. & Bryson, J. J. Polarization under rising inequality and economic decline. Science advances 6, eabd4201 (2020).
[6] Chu, H., Yang, J. Z. & Liu, S. Not my pandemic: Solution aversion and the polarized public perception of covid-19. Science Communication 43, 508–528 (2021).
[7] Gadarian, S. K., Goodman, S. W. & Pepinsky, T. B. Partisanship, health behavior, and policy attitudes in the early stages of the covid-19 pandemic. Plos one 16, e0249596 (2021).
[8] Wu, J. D. & Huber, G. A. Partisan differences in social distancing may originate in norms and beliefs: Results from novel data. Social Science Quarterly n/a. URL https://onlinelibrary.wiley.com/doi/abs/10.1111/ssqu.12947.
[9] Gollwitzer, A. et al. Partisan differences in physical distancing are linked to health outcomes during the covid-19 pandemic. Nature human behaviour 4, 1186–1197 (2020).
[10] Allcott, H. et al. Polarization and public health: Partisan differences in social distancing during the coronavirus pandemic. Journal of public economics 191, 104254 (2020).
[11] Kahane, L. H. Politicizing the mask: Political, economic and demographic factors affecting mask wearing behavior in the usa. Eastern Economic Journal 1 – 21 (2021).
[12] Milosh, M., Painter, M., Sonin, K., Van Dijcke, D. & Wright, A. L. Unmasking partisanship: Polarization undermines public response to collective risk. Journal of Public Economics 204, 104538 (2021).
[13] Fridman, A., Gershon, R. & Gneezy, A. Covid-19 and vaccine hesitancy: A longitudinal study. PLOS ONE 16, 1–12 (2021). URL https://doi.org/10.1371/journal.pone.0250123.
[14] Druckman, J. N., Klar, S., Krupnikov, Y., Levendusky, M. & Ryan, J. B. Affective polarization, local contexts and public opinion in america. Nature human behaviour 5, 28–38 (2021).
[15] Ojea Quintana, I., Reimann, R., Cheong, M., Alfano, M. & Klein, C. Polarization and trust in the evolution of vaccine discourse on twitter during covid-19. PLos One 17, e0277292 (2022).
[16] Ward, J. K. et al. The french public’s attitudes to a future covid-19 vaccine: The politicization of a public health issue. Social Science and Medicine 265, 113414 (2020). URL https://www.sciencedirect.com/science/article/pii/S027795362030633X.
[17] Jiang, J., Ren, X., Ferrara, E. et al. Social media polarization and echo chambers in the context of covid-19: Case study. JMIRx med 2, e29570 (2021).
[18] Medeiros, M. & Gravelle, T. B. Pandemic populism: Explaining support for the people’s party of canada in the 2021 federal election. Canadian Journal of Political Science/Revue canadienne de science politique 56, 413–434 (2023).
[19] Pammett, J. H. & Dornan, C. The canadian federal election (2022).
[20] Blouin Genest, G., Burlone, N., Champagne, E., Eastin, C. & Ogaranko, C. Translating covid-19 emergency plans into policy: A comparative analysis of three canadian provinces. Policy Design and Practice 4, 115–132 (2021).
[21] Adolph, C., Amano, K., Bang-Jensen, B., Fullman, N. & Wilkerson, J. Pandemic politics: Timing state-level social distancing responses to covid-19. Journal of Health Politics, Policy and Law 46, 211–233 (2021).
[22] Ashokkumar, A. & Pennebaker, J. W. Social media conversations reveal large psychological shifts caused by covid-19’s onset across us cities. Science advances 7, eabg7843 (2021).
[23] Wu, J. D. & Huber, G. A. Partisan differences in social distancing may originate in norms and beliefs: Results from novel data. Social Science Quarterly (2021).
[24] Bridgman, A. et al. The causes and consequences of COVID-19 misperceptions: Understanding the role of news and social media. HKS Misinfo Review (2020).
[25] Jiang, J., Chen, E., Yan, S., Lerman, K. & Ferrara, E. Political polarization drives online conversations about covid-19 in the united states. Human Behavior and Emerging Technologies 2, 200–211 (2020).
[26] Gallotti, R., Valle, F., Castaldo, N., Sacco, P. & De Domenico, M. Assessing the risks of ‘infodemics’ in response to covid-19 epidemics. Nature Human Behaviour 4, 1285–1293 (2020).
[27] Lang, J., Erickson, W. W. & Jing-Schmidt, Z. # maskon!# maskoff! digital polarization of mask-wearing in the united states during covid-19. PloS one 16, e0250817 (2021).
[28] Rodriguez, C. G., Gadarian, S. K., Goodman, S. W. & Pepinsky, T. B. Morbid polarization: Exposure to covid-19 and partisan disagreement about pandemic response. Political psychology 43, 1169–1189 (2022).
[29] Clinton, J., Cohen, J., Lapinski, J. & Trussler, M. Partisan pandemic: How partisanship and public health concerns affect individuals’ social mobility during covid-19. Science advances 7, eabd7204 (2021).
[30] Diermeier, D., Godbout, J.-F., Yu, B. & Kaufmann, S. Language and ideology in congress. British Journal of Political Science 42, 31–55 (2012).
[31] Liu, Y. et al. Roberta: A robustly optimized bert pretraining approach (2019). 1907.11692.
[32] Hubert, L. & Levin, J. A general statistical framework for assessing categorical clustering in free recall. Psychological Bulletin 83, 1072–1080 (1976).
[33] Cascini, F. et al. eClinicalMedicine 48 (2022). URL https://doi.org/10.1016/j.eclinm.2022.101454.
[34] Wicke, P. & Bolognesi, M. M. Framing covid-19: How we conceptualize and discuss the pandemic on twitter. PLOS ONE 15, 1–24 (2020). URL https://doi.org/10.1371/journal.pone.0240010.
[35] Rowe, D. Police hand out hundreds of fines at montreal anti-lockdown demonstration. CTV News (2020). URL https://montreal.ctvnews.ca/police-hand-out-hundreds-of-fines-at-montreal-anti-lockdown-demonstration-1.5239414.
[36] Albrecht, D. Vaccination, politics and covid-19 impacts. BMC Public Health 22, 96 (2022). URL https://doi.org/10.1186/s12889-021-12432-x.
[37] Leskovec, J., Backstrom, L. & Kleinberg, J. Meme-tracking and the dynamics of the news cycle, 497–506 (2009).
[38] Algara, C., Amlani, S., Collitt, S., Hale, I. & Kazemian, S. Nail in the coffin or lifeline? evaluating the electoral impact of covid-19 on president trump in the 2020 election. Political Behavior (2022). URL https://doi.org/10.1007/s11109-022-09826-x.
[39] Bryant, M. Donald trump tries to stoke fears of covid lockdown under joe biden. The Guardian (2020). URL https://www.theguardian.com/us-news/2020/nov/02/trump-biden-coronavirus-covid-lockdown.
[40] Higgings, T. Joe biden receives covid vaccine on live television, encourages americans to get inoculated. cnbc (2020).
[41] Rieger, S. Anti-mask protests show need for better public health messaging, calgary researcher says. cbc (2020).
[42] Canada, H. Health canada authorizes moderna covid-19 vaccine. https://www.canada.ca/en/health-canada/news/2020/12/health-canada-authorizes-moderna-covid-19-vaccine.html (2020).
[43] White, L. B., Joseph & O’Donnell, C. Moderna covid-19 vaccine begins rollout as u.s. races to broaden injection campaign. https://www.reuters.com/business/healthcare-pharmaceuticals/moderna-covid-19-vaccine-begins-rollout-us-races-broaden-injection-campaign-2020-12-19/ (2020).
[44] Jiang, X. et al. Polarization over vaccination: Ideological differences in twitter expression about covid-19 vaccine favorability and specific hesitancy concerns. Social Media+ Society 7, 20563051211048413 (2021).
[45] Rathje, S., He, J. K., Roozenbeek, J., Van Bavel, J. J. & van der Linden, S. Social media behavior is associated with vaccine hesitancy. PNAS nexus 1, pgac207 (2022).
[46] Bollyky, T. J. et al. Assessing covid-19 pandemic policies and behaviours and their economic and educational trade-offs across us states from jan 1, 2020, to july 31, 2022: an observational analysis. The Lancet 401, 1341–1360 (2023).
[47] Rao, A. et al. Political partisanship and anti-science attitudes in online discussions about covid-19. arXiv preprint arXiv:2011.08498 (2020).
[48] Morris, D. S. Polarization, partisanship, and pandemic: The relationship between county-level support for donald trump and the spread of covid-19 during the spring and summer of 2020. Social Science Quarterly 102, 2412–2431 (2021).
[49] Sehgal, N. J., Yue, D., Pope, E., Wang, R. H. & Roby, D. H. The association between covid-19 mortality and the county-level partisan divide in the united states: study examines the association between covid-19 mortality and county-level political party affiliation. Health Affairs 41, 853–863 (2022).
[50] Kaashoek, J. et al. The evolving roles of us political partisanship and social vulnerability in the covid-19 pandemic from february 2020–february 2021. PLOS global public health 2, e0000557 (2022).
[51] Governor Carney, J. Governor carney announces additional covid-19 restrictions. Delaware.gov (2020). URL https://news.delaware.gov/2020/11/17/governor-carney-announces-additional-covid-19-restrictions/.
[52] Neiburg, J. Covid-19 in delaware: Here are the latest restrictions and what you need to know. delaware online (2020).
[53] Goldstein, N. D. & Suder, J. S. Application of state law in the public health emergency response to covid-19: an example from delaware in the united states. Journal of Public Health Policy 42, 167–175 (2021). URL https://doi.org/10.1057/s41271-020-00257-8.
[54] Adolph, C. et al. Governor partisanship explains the adoption of statewide mask mandates in response to covid-19. State Politics & Policy Quarterly 22, 24–49 (2022).
[55] Wislon, R. Why mississippi’s governor revoked a statewide mask mandate. The Hill (2020). URL https://thehill.com/homenews/state-watch/519658-why-mississippis-governor-revoked-a-statewide-mask-mandate/.
[56] Haines, M. Mask-resistant north dakota town battles pandemic spike. VOA (2020). URL https://www.voanews.com/a/covid-19-pandemic_mask-resistant-north-dakota-town-battles-pandemic-spike/6197674.html.
[57] Carter, J. & Anthony, W. Reeves: Mississippi ‘not going to participate’ in nationwide lockdown. wlbt (2020). URL https://www.wlbt.com/2020/11/12/watch-live-reeves-press-conference/.
[58] Lopez, G. Why north and south dakota are suffering the worst covid-19 epidemics in the us. Vox (2020). URL https://www.vox.com/future-perfect/2020/10/27/21534480/north-dakota-south-dakota-covid-coronavirus-pandemic-third-wave.
[59] Allcott, H. et al. Polarization and public health: Partisan differences in social distancing during the coronavirus pandemic. Journal of Public Economics 191, 104254 (2020). URL https://www.sciencedirect.com/science/article/pii/S0047272720301183.
[60] of Oklahoma, S. Governor stitt, commissioner frye announce statewide covid-19 mitigation efforts in executive order. Oklahoma, Governor J. Kevin Stitt (2020). URL https://oklahoma.gov/governor/newsroom/newsroom/2020/december/governor-stitt--commissioner-frye-announce-statewide-covid-19-mi.html.
[61] Macpherson, J. North dakota governor changes tack and issues mask mandate. AP (2020). URL https://apnews.com/article/bismarck-north-dakota-coronavirus-pandemic-45788d9dfaae23c1db8125088dfa242b.
[62] Hallas, L., Hatibie, A., Majumdar, S., Pyarali, M. & Hale, T. Variation in us states’ responses to covid-19. University of Oxford (2021).
[63] Mayer, M. K. et al. Politics or need? explaining state protective measures in the coronavirus pandemic. Social Science Quarterly 103, 1140–1154 (2022).
[64] Grossman, G., Kim, S., Rexer, J. M. & Thirumurthy, H. Political partisanship influences behavioral responses to governors’ recommendations for covid-19 prevention in the united states. Proceedings of the National Academy of Sciences 117, 24144–24153 (2020).
[65] Gusmano, M. K., Miller, E. A., Nadash, P. & Simpson, E. J. Partisanship in initial state responses to the covid-19 pandemic. World Medical & Health Policy 12, 380–389 (2020).
[66] AP. South dakota medical groups promote masks, countering noem. Valley News Live (2020). URL https://www.valleynewslive.com/2020/10/28/south-dakota-medical-groups-promote-masks-countering-noem/.
[67] Kaurm, H. A north dakota state legislature candidate who died from covid-19 appears to have won his election. CNN (2020). URL https://www.cnn.com/2020/11/04/politics/north-dakota-candidate-died-covid-wins-election-trnd/index.html.
[68] Higgins, T. North dakota republican who died of covid-19 wins seat in state legislature. CNBC (2020). URL https://www.cnbc.com/2020/11/04/north-dakota-man-who-died-of-covid-19-wins-seat-in-state-legislature.html.
[69] Douglas, K. M. et al. Understanding conspiracy theories. Political psychology 40, 3–35 (2019).
[70] Ye, X. Exploring the relationship between political partisanship and covid-19 vaccination rate. Journal of Public Health 45, 91–98 (2023).
[71] Bollyky, T. J. et al. Assessing covid-19 pandemic policies and behaviours and their economic and educational trade-offs across us states from jan 1, 2020, to july 31, 2022: an observational analysis. The Lancet 401, 1341–1360 (2023). URL https://doi.org/10.1016/S0140-6736(23)00461-0.
[72] Miller, J. Education is a bigger factor than race in desire for covid-19 vaccine (2021). URL https://news.usc.edu/182848/education-covid-19-vaccine-safety-risks-usc-study/.
[73] Latkin, C. A., Dayton, L., Yi, G., Konstantopoulos, A. & Boodram, B. Trust in a covid-19 vaccine in the us: A social-ecological perspective. Social science & medicine 270, 113684 (2021).
[74] Pennycook, G., McPhetres, J., Bago, B. & Rand, D. G. Beliefs about covid-19 in canada, the united kingdom, and the united states: A novel test of political polarization and motivated reasoning. Personality and Social Psychology Bulletin 48, 750–765 (2022).
[75] Cheung, C., Lyons, J., Madsen, B., Miller, S. & Sheikh, S. The bank of canada covid-19 stringency index: measuring policy response across provinces. Tech. Rep., Bank of Canada (2021).
[76] Cameron-Blake, E. et al. Variation in the canadian provincial and territorial responses to covid-19. Blavatnik School of Government Working Paper Series 39 (2021).
[77] Akanteva, A., Dick, D. W., Amiraslani, S. & Heffernan, J. M. Canadian covid-19 pandemic public health mitigation measures at the province level. Scientific Data 10, 882 (2023). URL https://doi.org/10.1038/s41597-023-02759-y.
[78] Dubé, È., Dionne, M., Pelletier, C., Hamel, D. & Gadio, S. Covid-19 vaccination attitudes and intention among quebecers during the first and second waves of the pandemic: findings from repeated cross-sectional surveys. Human Vaccines & Immunotherapeutics 17, 3922–3932 (2021).
[79] Shim, E. Regional variability in covid-19 case fatality rate in canada, february–december 2020. International Journal of Environmental Research and Public Health 18, 1839 (2021).
[80] Montpetit, J. Quebec sets new record for covid-19 cases amid pockets of resistance to safety measures. https://www.cbc.ca/news/canada/montreal/anti-mask-demonstration-quebec-covid-19-cases-1.5849499 (2020). Accessed: 2024-03-01.
[81] Gazette, M. Thousands of montrealers march to protest against wearing masks. https://montrealgazette.com/news/thousands-of-montrealers-march-to-protest-against-wearing-masks (2020). Accessed: 2024-03-01.
[82] Chaput, M. Figures de l’identité anti-masque et rhétorique de l’organisationnalité. Communication & Organisation 107–120 (2021).
[83] Owen, T. et al. Understanding vaccine hesitancy in canada: attitudes, beliefs, and the information ecosystem (2020).
[84] Boucher, J.-C. et al. Analyzing social media to explore the attitudes and behaviors following the announcement of successful covid-19 vaccine trials: infodemiology study. JMIR infodemiology 1, e28800 (2021).
[85] Burns, K. E., Dubé, È., Nascimento, H. G. & Meyer, S. B. Examining vaccine hesitancy among a diverse sample of canadian adults. Vaccine 42, 129–135 (2024).
[86] Van Bavel, J. J., Rathje, S., Harris, E., Robertson, C. & Sternisko, A. How social media shapes polarization. Trends in Cognitive Sciences 25, 913–916 (2021).
[87] Karaivanov, A., Kim, D., Lu, S. E. & Shigeoka, H. Covid-19 vaccination mandates and vaccine uptake. Nature Human Behaviour 6, 1615–1624 (2022).
[88] Cameron-Blake, E. et al. Variation in the canadian provincial and territorial responses to covid19”. Blavatnik School of Government Working Paper URL www.bsg.ox.ac.uk/covidtracker. Available:.
[89] Mansoor, M. Citizens’ trust in government as a function of good governance and government agency’s provision of quality information on social media during covid-19. Government information quarterly 38, 101597 (2021).
[90] Hetherington, M. J. & Rudolph, T. J. Political trust and polarization (2017).
[91] Makridis, C. A. & Wu, C. How social capital helps communities weather the covid-19 pandemic. PloS one 16, e0245135 (2021).
[92] Merkley, E. et al. A rare moment of cross-partisan consensus: Elite and public response to the COVID-19 pandemic in canada. Can. J. Polit. Sci. 53, 311–318 (2020).
[93] Kouzy, R. et al. Coronavirus goes viral: Quantifying the covid-19 misinformation epidemic on twitter. Cureus 12, e7255 (2020). URL https://europepmc.org/articles/PMC7152572.
[94] Al-Ramahi, M., Elnoshokaty, A., El-Gayar, O., Nasralah, T. & Wahbeh, A. Public discourse against masks in the covid-19 era: Infodemiology study of twitter data. JMIR Public Health Surveill 7, e26780 (2021). URL https://doi.org/10.2196/26780.
[95] Ahmed, W., López Seguí, F., Vidal-Alaball, J. & Katz, M. S. Covid-19 and the “film your hospital” conspiracy theory: Social network analysis of twitter data. J Med Internet Res 22, e22374 (2020). URL http://www.jmir.org/2020/10/e22374/.
[96] Barberá, P. Birds of the same feather tweet together: Bayesian ideal point estimation using twitter data. Political analysis 23, 76–91 (2015).
[97] Reed, G. F., Lynn, F. & Meade, B. D. Use of coefficient of variation in assessing variability of quantitative assays. Clinical and Vaccine Immunology 9, 1235–1239 (2002).

7 Extended Results

7.1 Regional Variations of Partisan Polarization

In fig. 10 and fig. 11, we show the ranking of the regions (American states and Canadian provinces and territories, respectively) by partisan polarization per topic.

In fig. 12, we show the correlation in the polarization scores and relative vote margins. The latter shows that states in which the margin by which Republican party votes exceeded those of the Democractic party correlates significantly with the amount of polarization exhibited by the Twitter (X) discourse in those states, when conditioning the discourse on masks and on vaccines, but though significantly when conditioning on lockdowns.

7.2 The Relationship between Regional Vaccine Polarization and Vaccination Rates in Canada

In Figure 13, we remove Nunavut as an outlier because of its very small population. While we get strong positive correlation with vaccination rate, it is over a relatively low number of points. In Canada, vaccines were mandated, requiring vaccine passports to be served in public areas. We assume that vaccine partisan polarization increases, as people are not happy with being forced to be vaccinated, but most of the population still are vaccinated. However, with the few points, we do not have a definite conclusion for this result.

7.3 Specific Regional Partisan Polarization and COVID-19 Deaths

Here, we investigate the topic-specific polarization over time and how it relates to the reported number of Deaths for COVID-19 for specific regions in Figure 14 for Canada and in Figure 15 for the United States.

7.4 Correlation Matrices

Table 2: Correlation Matrix between Topic Polarization and External Data in Canada. Bolded means

p<0.001

. Italicized means

p<0.01

. Underline means

p<0.05

. Background color of green or red signifies the positive or negative correlation for significant p-values only.

Cases

Deaths

Conspiracy (Volume)

Stringency Index

Polarization

-0.338

CI=[-0.511,-0.138]

p=0.001

-0.280

CI=[-0.462,-0.075]

p=0.008

0.190

CI=[-0.020,0.384]

p=0.076

-0.518

CI=[-0.657,-0.347]

p=0.000

Volume

0.244

CI=[0.037,0.432]

p=0.022

0.171

CI=[-0.040,0.367]

p=0.112

-0.084

CI=[-0.288,0.128]

p=0.437

0.259

CI=[0.052,0.444]

p=0.015

% Volume

-0.368

CI=[-0.536,-0.172]

p=0.000

-0.340

CI=[-0.513,-0.141]

p=0.001

0.149

CI=[-0.062,0.348]

p=0.165

-0.391

CI=[-0.555,-0.198]

p=0.000

Lockdown

Weighted

Polarization

-0.402

CI=[-0.564,-0.210]

p=0.000

-0.368

CI=[-0.536,-0.172]

p=0.000

0.172

CI=[-0.039,0.368]

p=0.110

-0.443

CI=[-0.597,-0.258]

p=0.000

Polarization

0.117

CI=[-0.095,0.318]

p=0.279

0.102

CI=[-0.110,0.305]

p=0.345

-0.132

CI=[-0.332,0.080]

p=0.221

0.011

CI=[-0.198,0.220]

p=0.916

Volume

-0.202

CI=[-0.395,0.008]

p=0.059

-0.306

CI=[-0.485,-0.104]

p=0.004

0.462

CI=[0.280,0.612]

p=0.000

-0.267

CI=[-0.452,-0.061]

p=0.012

% Volume

-0.688

CI=[-0.784,-0.559]

p=0.000

-0.689

CI=[-0.785,-0.561]

p=0.000

0.615

CI=[0.465,0.730]

p=0.000

-0.804

CI=[-0.867,-0.715]

p=0.000

Mask

Weighted

Polarization

-0.687

CI=[-0.783,-0.557]

p=0.000

-0.688

CI=[-0.784,-0.559]

p=0.000

0.610

CI=[0.459,0.726]

p=0.000

-0.804

CI=[-0.867,-0.715]

p=0.000

Polarization

0.271

CI=[0.066,0.455]

p=0.011

0.361

CI=[0.164,0.530]

p=0.001

-0.202

CI=[-0.395,0.007]

p=0.059

0.254

CI=[0.047,0.440]

p=0.017

Volume

0.718

CI=[0.599,0.806]

p=0.000

0.658

CI=[0.520,0.762]

p=0.000

-0.364

CI=[-0.533,-0.168]

p=0.000

0.711

CI=[0.590,0.801]

p=0.000

% Volume

0.683

CI=[0.553,0.781]

p=0.000

0.672

CI=[0.538,0.773]

p=0.000

-0.532

CI=[-0.667,-0.363]

p=0.000

0.781

CI=[0.684,0.851]

p=0.000

Vaccine

Weighted

Polarization

0.687

CI=[0.558,0.784]

p=0.000

0.678

CI=[0.546,0.777]

p=0.000

-0.533

CI=[-0.668,-0.364]

p=0.000

0.782

CI=[0.684,0.852]

p=0.000

Sum

-0.121

CI=[-0.322,0.091]

p=0.261

-0.044

CI=[-0.251,0.167]

p=0.681

0.023

CI=[-0.187,0.231]

p=0.831

-0.315

CI=[-0.492,-0.113]

p=0.003

Aggregated

Weighted

Sum

-0.062

CI=[-0.268,0.149]

p=0.566

0.027

CI=[-0.183,0.236]

p=0.799

-0.032

CI=[-0.240,0.178]

p=0.764

-0.256

CI=[-0.442,-0.049]

p=0.016

Table 3: Correlation Matrix between Topic Polarization and External Data in the United States. Bolded means

p<0.001

. Italacized means

p<0.01

. Underline means

p<0.05

. Background color of green or red signifies the positive or negative correlation for significant p-values only.

Cases

Deaths

Conspiracy (Volume)

Stringency Index

Polarization

-0.020

CI=[-0.244,0.207]

p=0.867

-0.070

CI=[-0.291,0.158]

p=0.548

-0.146

CI=[-0.360,0.082]

p=0.207

0.010

CI=[-0.216,0.235]

p=0.931

Volume

-0.196

CI=[-0.403,0.031]

p=0.090

-0.319

CI=[-0.508,-0.100]

p=0.005

0.497

CI=[0.306,0.650]

p=0.000

-0.262

CI=[-0.460,-0.038]

p=0.022

% Volume

-0.394

CI=[-0.569,-0.185]

p=0.000

-0.436

CI=[-0.602,-0.233]

p=0.000

0.120

CI=[-0.109,0.336]

p=0.303

-0.356

CI=[-0.538,-0.142]

p=0.002

Lockdown

Weighted

Polarization

-0.395

CI=[-0.570,-0.186]

p=0.000

-0.442

CI=[-0.607,-0.240]

p=0.000

0.101

CI=[-0.128,0.319]

p=0.387

-0.354

CI=[-0.537,-0.140]

p=0.002

Polarization

-0.103

CI=[-0.321,0.125]

p=0.376

-0.108

CI=[-0.325,0.121]

p=0.355

-0.078

CI=[-0.298,0.150]

p=0.502

-0.121

CI=[-0.337,0.107]

p=0.298

Volume

-0.311

CI=[-0.501,-0.092]

p=0.006

-0.378

CI=[-0.556,-0.167]

p=0.001

0.500

CI=[0.310,0.652]

p=0.000

-0.288

CI=[-0.482,-0.067]

p=0.012

% Volume

-0.541

CI=[-0.683,-0.360]

p=0.000

-0.526

CI=[-0.672,-0.341]

p=0.000

-0.012

CI=[-0.237,0.214]

p=0.915

-0.452

CI=[-0.615,-0.252]

p=0.000

Mask

Weighted

Polarization

-0.565

CI=[-0.701,-0.390]

p=0.000

-0.547

CI=[-0.688,-0.367]

p=0.000

-0.020

CI=[-0.245,0.206]

p=0.861

-0.474

CI=[-0.632,-0.279]

p=0.000

Polarization

-0.246

CI=[-0.446,-0.021]

p=0.033

-0.216

CI=[-0.421,0.010]

p=0.061

-0.268

CI=[-0.466,-0.046]

p=0.019

-0.133

CI=[-0.348,0.096]

p=0.253

Volume

0.360

CI=[0.147,0.542]

p=0.001

0.305

CI=[0.085,0.496]

p=0.007

0.303

CI=[0.084,0.495]

p=0.008

0.196

CI=[-0.031,0.404]

p=0.089

% Volume

0.666

CI=[0.518,0.775]

p=0.000

0.682

CI=[0.539,0.786]

p=0.000

-0.068

CI=[-0.289,0.160]

p=0.561

0.573

CI=[0.400,0.707]

p=0.000

Vaccine

Weighted

Polarization

0.633

CI=[0.475,0.751]

p=0.000

0.655

CI=[0.505,0.767]

p=0.000

-0.115

CI=[-0.332,0.113]

p=0.322

0.576

CI=[0.403,0.710]

p=0.000

Sum

-0.184

CI=[-0.393,0.044]

p=0.112

-0.196

CI=[-0.403,0.031]

p=0.090

-0.247

CI=[-0.448,-0.023]

p=0.031

-0.119

CI=[-0.335,0.110]

p=0.307

Aggregated

Weighted

Sum

-0.217

CI=[-0.422,0.009]

p=0.060

-0.198

CI=[-0.405,0.029]

p=0.086

-0.193

CI=[-0.401,0.034]

p=0.095

-0.091

CI=[-0.310,0.137]

p=0.434

7.5 The Relationship between National Partisan Polarization and Epidemiological Data

Here, we investigate the aggregated polarization over time for each country and how it relates to the reported number of New Cases and Deaths for COVID-19. In Figure 16, we observe that polarization is not correlated with the severity of the pandemic, in both Canada and the United States. To compute the daily aggregate polarization measure, we employ a weighted sum of each topic’s polarization, considering the percentage of each topic’s tweets within that day’s volume of COVID-19-related tweets.

8 Methodology Details

In the following section, we report the classification metrics of each module in the pipeline for measuring polarization.

8.1 Classifying Tweets by Topics

Table 4: Tweet Topic Classification Metrics. 200 relevant and 200 irrelevant tweets were sampled for manual annotation. F1-score is calculated on the true labels where both annotators agreed upon over 5 runs with a different random seed.

Canada
Topic	Relevant	Irrelevant	Cohen	F1-Score	# of Tweets
Lockdown	170	356	0.73	97.13 ± 1.57	1,553,984
Mask	292	282	0.91	98.48 ± 1.24	1,994,293
Vaccine	326	248	0.91	99.84 ± 0.31	2,145,549
Conspiracy	338	171	0.67	97.20 ± 0.71	16,575,934
United States
Topic	Relevant	Irrelevant	Cohen	F1-Score	# of Tweets
Lockdown	126	199	0.63	100.00 ± 0.00	897,565
Mask	197	200	0.98	99.49 ± 0.62	1,562,706
Vaccine	192	201	0.96	100.00 ± 0.00	1,541,360
Conspiracy	195	155	0.75	95.55 ± 2.39	926,389

8.2 Classifying Users by Geolocation

Table 5: Geolocated Users Number and Correlation. Correlation is done with the official 2021 population census for each country.

Canada
Users	Total	Correlation
Geolocated	282,454	0.92 (n=13, p=6.20e-06, CI=[0.76, 0.98])
w/ Party Affiliation	195,456	0.92 (n=13, p=6.83e-06, CI=[0.76, 0.98])
United States
Users	Total	Correlation
Geolocated	757,601	0.98 (n=52, p=9.27e-35, CI=[0.96, 0.99])
w/ Party Affiliation	242,056	0.97 (n=52, p=2.51e-34, CI=[0.96, 0.99])

8.3 Classifying User Party Affiliation

We report the classification metrics for Canada in Table 6 and for the United States in Table 9. For Canada, our model classified users for each parties for the activity as well, but the F1-score was not satisfactory as parties within the liberal (left) party family and conservative (right) party family was easily confused as shown in the confusion matrix in Table 7. Thus, for our analysis, we merged the parties in Canada, and show the confusion matrix for after the merge in Table 8.

Table 6: Canadian User Party Affiliation Classification. Cohen Kappa score of 0.74

	Party	Support	F1-Score	# of Users
Profile	CPC	98	92.93 ± 1.12	1,769
	GPC	60	88.50 ± 1.54	97
	LPC	100	90.89 ± 1.34	783
	NDP	124	93.44 ± 0.53	370
	NO_PARTY	105	86.16 ± 1.97	667
	PPC	95	94.09 ± 1.34	402
Activity	RPF	628	93.85 ± 0.47	193,225
Activity	LPF	357	89.10 ± 0.73	299,836
Combined	RPF	–	–	196,338
Combined	LPF	–	–	302,023

Table 7: Canadian User Party Affiliation Activity Classifier Confusion Matrix

	CPC	PPC	GPC	LPC	NDP
CPC	398	24	1	29	6
PPC	59	49	0	1	1
GPC	3	0	6	10	6
LPC	18	3	2	161	26
NDP	5	1	1	19	58

Table 8: Canadian User Party Affiliation Activity Classifier Confusion Matrix - Liberal (left) and Conservative (right) Party Family

	Right Party Family	Left Party Family
Right Party Family	530	38
Left Party Family	27	289

Table 9: American User Party Affiliation Classification. Cohen Kappa score of 0.76. Total 763,164 users.

	Party	Support	F1-Score	# of Users
Profile	Republican	854	97.21 ± 0.66	86,989
Profile	Democrat	928	97.40 ± 0.63	82,923
Activity	Republican	10,583	92.98 ± 0.18	239,449
Activity	Democrat	10,226	92.99 ± 0.16	145,733
Combined	Republican	–	–	336,231
Combined	Democrat	–	–	426,933

8.4 Distribution Matching with Election Results

We further validate our party affiliation distribution using the 2019 Canada Federal Election and the US 2020 Election results for the respective country. We calculate the correlation between the ratio of numbers of liberal and conservative families-labelled users per region in our data compared to the election results and obtain strong correlations of 0.815 for Canada visualized in Figure 17 and 0.802 for the United States visualized in Figure 18. We also visualize the ratio for all geolocated users for the respective topics.

8.5 Matching Users to US Voter Registration

Following the best practice for evaluating party affiliation predictions [96], we matched Twitter (X) users from our dataset with the primary voter registration records available for five states: Ohio, New York, Florida, Arkansas, North Carolina, as well as Washington DC. From these records, we obtain the party affiliation of unique users in each state by md5-hashing their names and county to construct a key identifier. From our set of geolocated Twitter (X) users, we kept everyone that belonged to one of the five states or DC, and we removed those whose location could not be retrieved. Finally, we matched the most recent voter party affiliation records from the registration data to the unique Twitter (X) users that matched both the county and either the first name and last name or the first, middle and last name. We pre-processed the user’s name on Twitter (X) to remove emojis. After matching, we removed users not affiliated with either of the two major parties and users whose name matched with more than one record per county (indicating a non-unique match).

We then compare users’ party from voter registration with their predicted party, first using their profile description and second their COVID-19-related tweets. Using our geolocation and voter record matching, Table 10 shows we are able match a significant number of users, more than 30k, across the 5 states and DC with their voter records.

Table 10: Users matched to their voter registration. Voters* and Users* respectively correspond to the number of unique voters in the records and unique Twitter (X) users in our data.

State	Voters*	Users*	Matched	Democrat	Republican	Other
Ohio	7,771,590	4,913	1,431	320	193	917
New York	17,718,437	30,927	8,255	4,843	1,631	1,781
Florida	14,477,882	50,541	12,905	5,585	4,508	2,810
Arkansas	1,722,465	4,311	1,280	145	140	995
District of Columbia	510,026	17,661	2,538	1,929	153	456
North Carolina	8,004,814	20,761	6,050	2,450	1,655	1,945
Total			32,456	15,272	8,280	8,904

We get an accuracy of 74.35% for the first method and 73.35% for the second one. We note that both methods are binary classifiers while users can also be independent or support a third party, despite an ideology (and behavior/voting) that aligns strongly with one of the two main parties. They can also have an outdated registration that no longer reflects the beliefs they currently hold and express on Twitter (X). Therefore, although accuracy here is lower, it still indicates our classification is acceptable and on par with the standard for this type of evaluation [96].

8.6 Approximation of Polarization

We test accuracy of approximation in Algorithm 1 over the daily Lockdown and Vaccine tweets.

In Figure 19(a), we plot the total number of users against the absolute error (and its standard deviation) of the approximated $poli$ compared to the exact value, binned for every 1,000 users. We observe a dramatic drop of the absolute error term at around 3,000 users. When we reach the 10,000 users value, the absolute error is usually below 0.001.

In Figure 19(b), we plot the total number of users against the time saved in running the approximation algorithm compared to running the exact $poli$ , binned also for every 1,000 users. We observe that the time saved is exponential to the number of users. We note that at around 50,000 users, the approximation rarely needs to increase the fraction of users sampled.

These findings confirm that we can accurately approximate $poli$ for large-scale data that is impossible to measure exactly because of memory constraints. As $poli$ relies heavily on finding pairwise distances (time and memory intensive), we see from our analysis how sampling can save both time and memory exponentially.

We also explore the impact of changing the minimum coefficient of variation. We start with a minimum sample size of 1% or $fraction=0.01$ . We keep the step_size constant at 0.01. For the first experiment, $repeats$ is set to $10$ . For the second experiment, shown here in figures 20 and 21, $epsilon$ is set to $0.05$ .